Search quality highlights: new monthly series on algorithm changes

12/1/11 | 9:46:00 AM

Labels:

Update 16 May, 2012, 7:00pm: We made a minor update to the description of the "parked domain" classifier to ensure we aren't implying ads can't be useful.

Today we’re publishing another list of search improvements, beginning a monthly series where we’ll be sharing even more details about the algorithm and feature enhancements we make on a near-daily basis. We piloted a post like this earlier in November, and we were glad to hear you liked it.

We know people care about how search works, so we always want to push the envelope when it comes to transparency. We added it up, and to date we’ve published almost 1,000 blog posts about search, more than 400 webmaster videos and thousands of forum posts. For years now we’ve been blogging about significant algorithmic updates like Panda and our recent freshness update. So, why do we need yet another blog series?

We’ve been wracking our brains trying to think about how to make search even more transparent. The good news is that we make roughly 500 improvements in a given year, so there’s always more to share. With this blog series, we’ll be highlighting many of the subtler algorithmic and visible feature changes we make. These are changes that aren’t necessarily big enough to warrant entire blog posts on their own.

Here’s a list since our post on November 14th:

  • Related query results refinements: Sometimes we fetch results for queries that are similar to the actual search you type. This change makes it less likely that these results will rank highly if the original query had a rare word that was dropped in the alternate query. For example, if you are searching for [rare red widgets], you might not be as interested in a page that only mentions “red widgets.”
  • More comprehensive indexing: This change makes more long-tail documents available in our index, so they are more likely to rank for relevant queries.
  • New “parked domain” classifier: This is a new algorithm for automatically detecting parked domains. Parked domains are placeholder sites with little unique content for our users and are often filled only with ads. In most cases, we prefer not to show them.
  • More autocomplete predictions: With autocomplete, we try to strike a balance between coming up with flexible predictions and remaining true to your intentions. This change makes our prediction algorithm a little more flexible for certain queries, without losing your original intention.
  • Fresher and more complete blog search results: We made a change to our blog search index to get coverage that is both fresher and more comprehensive.
  • Original content: We added new signals to help us make better predictions about which of two similar web pages is the original one.
  • Live results for Major League Soccer and the Canadian Football League: This change displays the latest scores & schedules from these leagues along with quick access to game recaps and box scores.
  • Image result freshness: We made a change to how we determine image freshness for news queries. This will help us find the freshest images more often.
  • Layout on tablets: We made some minor color and layout changes to improve usability on tablet devices.
  • Top result selection code rewrite: This code handles extra processing on the top set of results. For example, it ensures that we don’t show too many results from one site (“host crowding”). We rewrote the code to make it easier to understand, simpler to maintain and more flexible for future extensions.

And here’s a recap of improvements we’ve already blogged about since last time:


We’ll report back in early January with our next batch and plan to continue monthly after that. Subscribe to the blog and soon you’ll be real search geeks like us!