Extending & Rebutting Edelman & Lockwood on Search Bias
In my last post, I discussed Edelman & Lockwood’s (E&L’s) attempt to catch search engines in the act of biasing their results—as well as their failure to actually do so. In this post, I present my own results from replicating their study. Unlike E&L, I find that Bing is consistently more biased than Google, for reasons discussed further below, although neither engine references its own content as frequently as E&L suggest.
I ran searches for E&L’s original 32 non-random queries using three different search engines—Google, Bing, and Blekko—between June 23 and July 5 of this year. This replication is useful, as search technology has changed dramatically since E&L recorded their results in August 2010. Bing now powers Yahoo, and Blekko has had more time to mature and enhance its results. Blekko serves as a helpful “control” engine in my study, as it is totally independent of Google and Microsoft, and so has no incentive to refer to Google or Microsoft content unless it is actually relevant to users. In addition, because Blekko’s model is significantly different than Google and Microsoft’s, if results on all three engines agree that specific content is highly relevant to the user query, it lends significant credibility to the notion that the content places well on the merits rather than being attributable to bias or other factors.
How Do Search Engines Rank Their Own Content?
Focusing solely upon the first position, Google refers to its own products or services when no other search engine does in 21.9% of queries; in another 21.9% of queries, both Google and at least one other search engine rival (i.e. Bing or Blekko) refer to the same Google content with their first links.
But restricting focus upon the first position is too narrow. Assuming that all instances in which Google or Bing rank their own content first and rivals do not amounts to bias would be a mistake; such a restrictive definition would include cases in which all three search engines rank the same content prominently—agreeing that it is highly relevant—although not all in the first position.
The entire first page of results provides a more informative comparison. I find that Google and at least one other engine return Google content on the first page of results in 7% of the queries. Google refers to its own content on the first page of results without agreement from either rival search engine in only 7.9% of the queries. Meanwhile, Bing and at least one other engine refer to Microsoft content in 3.2% of the queries. Bing references Microsoft content without agreement from either Google or Blekko in 13.2% of the queries:
This evidence indicates that Google’s ranking of its own content differs significantly from its rivals in only 7.9% of queries, and that when Google ranks its own content prominently it is generally perceived as relevant. Further, these results suggest that Bing’s organic search results are significantly more biased in favor of Microsoft content than Google’s search results are in favor of Google’s content.
Examining Search Engine “Bias” on Google
The following table presents the percentages of queries for which Google’s ranking of its own content differs significantly from its rivals’ ranking of that same content.
Note that percentages below 50 in this table indicate that rival search engines generally see the referenced Google content as relevant and independently believe that it should be ranked similarly.
So when Google ranks its own content highly, at least one rival engine typically agrees with this ranking; for example, when Google places its own content in its Top 3 results, at least one rival agrees with this ranking in over 70% of queries. Bing especially agrees with Google’s rankings of Google content within its Top 3 and 5 results, failing to include Google content that Google ranks similarly in only a little more than a third of queries.
Examining Search Engine “Bias” on Bing
Bing refers to Microsoft content in its search results far more frequently than its rivals reference the same Microsoft content. For example, Bing’s top result references Microsoft content for 5 queries, while neither Google nor Blekko ever rank Microsoft content in the first position:
This table illustrates the significant discrepancies between Bing’s treatment of its own Microsoft content relative to Google and Blekko. Neither rival engine refers to Microsoft content Bing ranks within its Top 3 results; Google and Blekko do not include any Microsoft content Bing refers to on the first page of results in nearly 80% of queries.
Moreover, Bing frequently ranks Microsoft content highly even when rival engines do not refer to the same content at all in the first page of results. For example, of the 5 queries for which Bing ranks Microsoft content in its top result, Google refers to only one of these 5 within its first page of results, while Blekko refers to none. Even when comparing results across each engine’s full page of results, Google and Blekko only agree with Bing’s referral of Microsoft content in 20.4% of queries.
Although there are not enough Bing data to test results in the first position in E&L’s sample, Microsoft content appears as results on the first page of a Bing search about 7 times more often than Microsoft content appears on the first page of rival engines. Also, Google is much more likely to refer to Microsoft content than Blekko, though both refer to significantly less Microsoft content than Bing.
A Closer Look at Google v. Bing
On E&L’s own terms, Bing results are more biased than Google results; rivals are more likely to agree with Google’s algorithmic assessment (than with Bing’s) that its own content is relevant to user queries. Bing refers to Microsoft content other engines do not rank at all more often than Google refers its own content without any agreement from rivals. Figures 1 and 2 display the same data presented above in order to facilitate direct comparisons between Google and Bing.
As Figures 1 and 2 illustrate, Bing search results for these 32 queries are more frequently “biased” in favor of its own content than are Google’s. The bias is greatest for the Top 1 and Top 3 search results.
My study finds that Bing exhibits far more “bias” than E&L identify in their earlier analysis. For example, in E&L’s study, Bing does not refer to Microsoft content at all in its Top 1 or Top 3 results; moreover, Bing refers to Microsoft content within its entire first page 11 times, while Google and Yahoo refer to Microsoft content 8 and 9 times, respectively. Most likely, the significant increase in Bing’s “bias” differential is largely a function of Bing’s introduction of localized and personalized search results and represents serious competitive efforts on Bing’s behalf.
Again, it’s important to stress E&L’s limited and non-random sample, and to emphasize the danger of making strong inferences about the general nature or magnitude of search bias based upon these data alone. However, the data indicate that Google’s own-content bias is relatively small even in a sample collected precisely to focus upon the queries most likely to generate it. In fact—as I’ll discuss in my next post—own-content bias occurs even less often in a more representative sample of queries, strongly suggesting that such bias does not raise the competitive concerns attributed to it.
Filed under: antitrust, business, economics, google, Internet search, law and economics, monopolization, technology Tagged: antitrust, Bias, Bing, Blekko, google, microsoft, search, Web search engine, Yahoo