Search quality highlights: 50 changes for March

4/3/12 | 12:37:00 PM

Labels:

Here’s our latest installment of search quality highlights, with another 50 changes to report for March. We’re starting to get into a groove with these posts, so we’re getting more and more comprehensive as the months go by. New for this month, we’ve published uncut video from our search quality meeting, which gives a great flavor for how these decisions get made.

Here’s the list for March:

  • Autocomplete with math symbols. [launch codename "Blackboard", project codename "Suggest"] When we process queries to return predictions in autocomplete, we generally normalize them to match more relevant predictions in our database. This change incorporates several characters that were previously normalized: “+”, “-”, “*”, “/”, “^”, “(“, “)”, and “=”. This should make it easier to search for popular equations, for example [e = mc2] or [y = mx+b].
  • Improvements to handling of symbols for indexing. [launch codename "Deep Maroon"] We generally ignore punctuation symbols in queries. Based on analysis of our query stream, we’ve now started to index the following heavily used symbols: “%”, “$”, “\”, “.”, “@”, “#”, and “+”. We’ll continue to index more symbols as usage warrants.
  • Better scoring of news groupings. [launch codename "avenger_2"] News results on Google are organized into groups that are about the same story. We have scoring systems to determine the ordering of these groups for a given query. This subtle change slightly improves our scoring system, leading to better ranking of news clusters.
  • Sitelinks data refresh. [launch codename "Saralee-76"] Sitelinks (the links that appear beneath some search results and link deeper into the respective site) are generated in part by an offline process that analyzes site structure and other data to determine the most relevant links to show users. We’ve recently updated the data through our offline process. These updates happen frequently (on the order of weeks).
  • Improvements to autocomplete backends, coverage. [launch codename "sovereign", project codename "Suggest"] We’ve consolidated systems and reduced the number of backend calls required to prepare autocomplete predictions for your query. The result is more efficient CPU usage and more comprehensive predictions.
  • Better handling of password changes. Our general approach is that when you change passwords, you’ll be signed out from your account on all machines. This change ensures that changing your password more consistently signs your account out of Search, everywhere.
  • Better indexing of profile pages. [launch codename "Prof-2"] This change improves the comprehensiveness of public profile pages in our index from more than two-hundred social sites.
  • UI refresh for News Universal. [launch codename "Cosmos Newsy", project codename "Cosmos"] We’ve refreshed the design of News Universal results by providing more results from the top cluster, unifying the UI treatment of clusters of different sizes, adding a larger font for the top article, adding larger images (from licensed sources), and adding author information.
  • Improvements to results for navigational queries. [launch codename "IceMan5"] A “navigational query” is a search where it looks like the user is looking to navigate to a particular website, such as [New York Times] or [wikipedia.org]. While these searches may seem straightforward, there are still challenges to serving the best results. For example, what if the user doesn’t actually know the right URL? What if the URL they’re searching for seems to be a parked domain (with no content)? This change improves results for this kind of search.
  • High-quality sites algorithm data update and freshness improvements. [launch codename “mm”, project codename "Panda"] Like many of the changes we make, aspects of our high-quality sites algorithm depend on processing that’s done offline and pushed on a periodic cycle. In the past month, we’ve pushed updated data for “Panda,” as we mentioned in a recent tweet. We’ve also made improvements to keep our database fresher overall.
  • Live results for UEFA Champions League and KHL. We’ve added live-updating snippets in our search results for the KHL (Russian Hockey League) and UEFA Champions League, including scores and schedules. Now you can find live results from a variety of sports leagues, including the NFL, NBA, NHL and others.
  • Tennis search feature. [launch codename "DoubleFault"] We’ve introduced a new search feature to provide realtime tennis scores at the top of the search results page. Try [maria sharapova] or [sony ericsson open].
  • More relevant image search results. [launch codename "Lice"] This change tunes signals we use related to landing page quality for images. This makes it more likely that you’ll find highly relevant images, even if those images are on pages that are lower quality.
  • Fresher image predictions in all languages. [launch codename "imagine2", project codename "Suggest"] We recently rolled out a change to surface more relevant image search predictions in autocomplete in English. This improvement extends the update to all languages.
  • SafeSearch algorithm tuning. [launch codenames "Fiorentini", “SuperDyn”; project codename "SafeSearch"] This month we rolled out a couple of changes to our SafeSearch algorithm. We’ve updated our classifier to make it smarter and more precise, and we’ve found new ways to make adult content less likely to appear when a user isn't looking for it
  • Tweaks to handling of anchor text. [launch codename "PC"] This month we turned off a classifier related to anchor text (the visible text appearing in links). Our experimental data suggested that other methods of anchor processing had greater success, so turning off this component made our scoring cleaner and more robust.
  • Simplification to Images Universal codebase. [launch codename "Galactic Center"] We’ve made some improvements to simplify our codebase for Images Universal and to better utilize improvements in our general web ranking to also provide better image results.
  • Better application ranking and UI on mobile. When you search for apps on your phone, you’ll now see richer results with app icons, star ratings, prices, and download buttons arranged to fit well on smaller screens. You’ll also see more relevant ranking of mobile applications based on your device platform, for example Android or iOS.
  • Improvements to freshness in Video Universal. [launch codename "graphite", project codename "Freshness"] We’ve improved the freshness of video results to better detect stale videos and return fresh content.
  • Fewer undesired synonyms. [project codename "Synonyms"] When you search on Google, we often identify other search terms that might have the same meaning as what you entered in the box (synonyms) and surface results for those terms as well when it might be helpful. This month we tweaked a classifier to prevent unhelpful synonyms from being introduced as content in the results set.
  • Better handling of queries with both navigational and local intent. [launch codename "ShieldsUp"] Some queries have both local intent and are very navigational (directed towards a particular website). This change improves the balance of results we show, and helps ensure you’ll find highly relevant navigational results or local results towards the top of the page as appropriate for your query.
  • Improvements to freshness. [launch codename "Abacus", project codename "Freshness"] We launched an improvement to freshness late last year that was very helpful, but it cost significant machine resources. At the time we decided to roll out the change only for news-related traffic. This month we rolled it out for all queries.
  • Improvements to processing for detection of site quality. [launch codename "Curlup"] We’ve made some improvements to a longstanding system we have to detect site quality. This improvement allows us to get greater confidence in our classifications.
  • Better interpretation and use of anchor text. We’ve improved systems we use to interpret and use anchor text, and determine how relevant a given anchor might be for a given query and website.
  • Better local results and sources in Google News. [launch codename "barefoot", project codename "news search"] We’re deprecating a signal we had to help people find content from their local country, and we’re building similar logic into other signals we use. The result is more locally relevant Google News results and higher quality sources.
  • Deprecating signal related to ranking in a news cluster. [launch codename "decaffeination", project codename "news search”] We’re deprecating a signal that’s no longer improving relevance in Google News. The signal was originally developed to help people find higher quality articles on Google News. (Note: Despite the launch codename, this project has nothing to do with Caffeine, our update to indexing in 2010).
  • Fewer “sibling” synonyms. [launch codename "Gemini", project codename "Synonyms"] One of the main signals we look at to identify synonyms is context. For example, if the word “cat” often appears next to the term “pet” and “furry,” and so does the word “kitten”, our algorithms may guess that “cat” and “kitten” have similar meanings. The problem is that sometimes this method will introduce “synonyms” that actually are different entities in the same category. To continue the example, dogs are also “furry pets” -- so sometimes “dog” may be incorrectly introduced as a synonym for “cat”. We’ve been working for some time to appropriately ferret out these “sibling” synonyms, and our latest system is more maintainable, updatable, debuggable, and extensible to other systems.
  • Better synonym accuracy and performance. [project codename "Synonyms"] We’ve made further improvements to our synonyms system by eliminating duplicate logic. We’ve also found ways to more accurately identify appropriate synonyms in cases where there are multiple synonym candidates with different contexts.
  • Retrieval system tuning. [launch codename "emonga", project codename "Optionalization"] We’ve improved systems that identify terms in a query which are not necessarily required to retrieve relevant documents. This will make results more faithful to the original query.
  • Less aggressive synonyms. [launch codename "zilong", project codename "Synonyms"] We’ve heard feedback from users that sometimes our algorithms are too aggressive at incorporating search results for other terms. The underlying cause is often our synonym system, which will include results for other terms in many cases. This change makes our synonym system less aggressive in the way it incorporates results for other query terms, putting greater weight on the original user query.
  • Update to systems relying on geographic data. [launch codename "Maestro, Maitre"] We have a number of signals that rely on geographic data (similar to the data we surface in Google Earth and Maps). This change updates some of the geographic data we’re using.
  • Improvements to name detection. [launch codename "edge", project codename "NameDetector"] We’ve improved a system for detecting names, particularly for celebrity names.
  • Updates to personalization signals. [project codename "PSearch"] This change updates signals used to personalize search results.
  • Improvements to Image Search relevance. [launch codename "sib"] We’ve updated signals to better promote reasonably sized images on high-quality landing pages.
  • Remove deprecated signal from site relevance signals. [launch codename "Freedom"] We’ve removed a deprecated product-focused signal from a site-understanding algorithm.
  • More precise detection of old pages. [launch codename "oldn23", project codename “Freshness"] This change improves detection of stale pages in our index by relying on more relevant signals. As a result, fewer stale pages are shown to users.
  • Tweaks to language detection in autocomplete. [launch codename “Dejavu”, project codename "Suggest"] In general, autocomplete relies on the display language to determine what language predictions to show. For most languages, we also try to detect the user query language by analyzing the script, and this change extends that behavior to Chinese (Simplified and Traditional), Japanese and Korean. The net effect is that when users forget to turn off their IMEs, they’ll still get English predictions if they start typing English terms.
  • Improvements in date detection for blog/forum pages. [launch codename "fibyen", project codename "Dates"] This change improves the algorithm that determines dates for blog and forum pages.
  • More predictions in autocomplete by live rewriting of query prefixes. [launch codename "Lombart", project codename "Suggest”] In this change we’re rewriting partial queries on the fly to retrieve more potential matching predictions for the user query. We use synonyms and other features to get the best overall match. Rewritten prefixes can include term re-orderings, term additions, term removals and more.
  • Expanded sitelinks on mobile. We’ve launched our expanded sitelinks feature for mobile browsers, providing better organization and presentation of sitelinks in search results.
  • More accurate short answers. [project codename “Porky Pig”] We’ve updated the sources behind our short answers feature to rely on data from Freebase. This improves accuracy and makes it easier to fix bugs.
  • Migration of video advanced search backends. We’ve migrated some backends used in video advanced search to our main search infrastructure.
  • +1 button in search for more countries and domains. This month we’ve internationalized the +1 button on the search results page to additional languages and domains. The +1 button in search makes it easy to share recommendations with the world right from your search results. As we said in our initial blog post, the beauty of +1’s is their relevance—you get the right recommendations (because they come from people who matter to you), at the right time (when you are actually looking for information about that topic) and in the right format (your search results).
  • Local result UI refresh on tablet. We’ve updated the user interface of local results on tablets to make them more compact and easier to scan.

And here are a few other changes we’ve blogged about since last time: