Thursday, September 28, 2006


- Information retrieval based on historical data
- Google Patent Application
- SEO by the Sea
- IPLists

Papers on Spam in search engine results (PDFs)

- Paper on detecting spam
- Paper on finding spam

What is the size of English language content on the World Wide Web?

Search for a common English word that is likely to be found on any Web page. Examples: the, of, to, and, a, in, is, it, you, or that.
The average of the number of Web pages found for these searches comes to ~12.5bn.
Add Web pages in other languages, images, news, groups, blogs, rss feeds, atom feeds, maps, local listings, directories, and other multimedia content and as a rough estimate the total size of searchable content on the Internet would be ~18bn-19bn documents, that's ~20 times the population of India!