In the wake of Yahoo!'s announcement of a 20-billion-item index, nearly doubling Google's claimed index size, some debate has ensued. Google co-founder Sergey Brin asserted that Yahoo! was counting duplicate results to arrive at its figure. A study at the National Center for Supercomputing Applications determined that Google delivered more results in a series of 10,000 randomized searches. But that study eliminated all searches that returned over 1,000 results in either engine. I was talking with Susan Kuchinskas of InternetNews.com last night, and she wondered whether that methodology made sense. I didn't have a good answer, but this morning I tried my own (much smaller) series of random two-keyword searches (bilious blouse, frequent gestation, arcane pigeon, etc.), all of which returned over 1,000 results. I used the Gahoo!Yoogle engine to quickly compare Yahoo! and Google. In every case Yahoo! returned nearly twice as many results as Google. Not deeply scientific, but it does make you wonder whether the study missed a boat.
The Size Debate
Reader Comments
(Page 1)2. The size isn't really what's important, is it? After all, it's that first page of results that counts, and in that regard, Google is still superior. Actually, having a larger catalog becomes a detriment when the relevancy of the results isn't as up to snuff. I love Yahoo, and I use it for everything... except search.
3. Try to search for "bilious blouse one two" - yahoo says >4000 results and Google <1000. But if you try to see the results, it appears Google has more for you than Yahoo.
Posted at 4:42AM on Dec 19th 2005 by aleckg
4. E-Rock is right. Very few people go through all 1000 search query results. I've never gone past 5 pages, except for a few times when I've clicked on a random page and picked a random result. The real question is: which search engine helps you find what you want, and faster?
Posted at 4:42AM on Dec 19th 2005 by Andrew Kaufmann
5. Speed-wise, I think both Yahoo and Google are returning the results fast enough that normal users won't even notice the difference.
I like Yahoo because their directory list has more meaningful websites (afterall most people need to pay in order to get into the list).
6. I have to agree with Tom on this one, the estimated number of results for search engines is known to be simply an estimate. Given the sizes of the databases involved, the engines have to make an "educated guess" as to how many results match because otherwise the page would take forever to load. Perhaps Yahoo! moved a decimal point in their estimator code!
Brad, going back to the issue you mentioned about cutting all querys with more than 1000 results: The study did this so it could actually *verify* (by clicking through all the result pages) the true number of results returned (since we can't trust the estimator code). It is important to remember that the NCSA study is simply measuring relative size.
Posted at 4:42AM on Dec 19th 2005 by Brandon
7. E-Rock --
I think that both size and relevancy are important, but it is hard to decide which is most important, isn't it?
For example, if a search engine doesn't have what you're looking for, then the results are obviously not relevant. Therefore, the engine needs to focus on growing its index first.
Yours,
Joe S.
Posted at 4:42AM on Dec 19th 2005 by Joe S.
8. it also might be worthy of note that some "private" sites, respected by google are not similarly observed as "private" by yahoo engines. i discovered this through a friend's complaints on his blog being made more public by yahoo. the "privacy" issue isn't the point, here, though--just that those are extra hits for yahoo's search engines.
Posted at 4:42AM on Dec 19th 2005 by amanda
9. Hey Brad,
I'm sure you've seen this:
http://jeremy.zawodny.com/blog/archives/005016.html
and maybe this:
http://aixtal.blogspot.com/2005/08/yahoo-pages-manquantes-2.html
by now. I have an english transation of that second one if you're not thrilled w/what Babelfish does.
Jeremy
Posted at 4:42AM on Dec 19th 2005 by Jeremy Zawodny








1. The reason the study only used queries with under 1000 results is that for anything larger, the number of results reported is an estimate. To quote the study:
"Unfortunately, both the Yahoo! and Google search engines truncate results returned to the user after 1,000 results. Thus, for the purposes of this study, we were forced to restrict our searches to those queries that returned less than 1,000 results on both Yahoo! and Google. Any search result found to have more than 1,000 returned results on either search engine was disregarded from our sample."
http://vburton.ncsa.uiuc.edu/indexsize.html
What you observed is the fact Yahoo happens to be roughly twice as 'optimistic' in the estimated number of results than Google.
Posted at 4:42AM on Dec 19th 2005 by Tom