Now that Google has acknowledged it is moving from a search engine to a knowledge engine, the question remains for those of us interested in search engine marketing and optimization: how exactly will this transition affect a site’s ability to move up higher on the SERP’s?
This answer is not simple, because it depends on what the search query is. Knowledge bases (Wikipedia, FreeBase, etc.) tend to reflect structural data in relation to ‘named entities.’ For those who do not know what a named entity is, the simple understanding would be a specific subject (person, place, company, event, etc.) like Yankees, Falkland Islands, or Bank of America. Since the beginning, Google and other Search Engines have been after named entity extraction. This can beseen in Applied Semantics’ (one of Google’s early acquisitions) white paper.
So What’s Changed?
Google has been building an entire concept based system at least since the time they acquired Metaweb (owners 0f Freebase). By using their schemas and categorization models, along with the info on Wikipedia, Google no longer needs to use syntactic relationships (see my last post) to understand what a named entity means. Google can use the information already stored in “reliable” databases to build a series concepts around a named entity.
Perhaps one of the most interesting research papers I came across recently is entitled Topical Clustering of Search Results (by Googlers). Below is the section I found most important:
The traditional approach to IR tasks is to represent a text
as a bag of words in which purely statistical and syntactic
measures of similarity are applied. In this work we propose
to move away from the classic bag-of-words paradigm towards a more ambitious graph-of-topics paradigm derived
by using the Tagme annotator .
The idea is to deploy Tagme to process on-the-ﬂy and
with high accuracy the snippets returned by search engines.
Every snippet is thus annotated with a few topics, which are
represented by means of Wikipedia pages. We then build a
graph consisting of two types of nodes: the topics annotated by Tagme, and the snippets returned by the queried
search engines. Edges in this graph are weighted to denote either topic-to-topic similarities, computed via the Wikipedia linked-structure, or topic-to-snippet memberships,
weighted by using proper statistics derived by Tagme.
Essentially, Google is moving towards using Wikipedia information to help craft their “concept graph” when putting together search results.
How Does this Affect Search?
To demonstrate how important of a change this is, I have taken three diseases and compared the top tens. Between Lung Cancer, Heart Disease, and Diabetes their SERP’s are almost identical. Each share the following sites:
Diabetes and Heart Disease also share a
Lung Cancer and Heart Disease share a
Lung Cancer and Diabetes share a
Besides the shared root domains, each keyword has a keyword.org site (lung.org, diabetes.org, and heart.org) added to this are more .gov’s and .org’s. Google’s search query hierarchy has become far less maleable. Those sites that represent a given concept for a keyword will be in the top ten. Google prefers sites that are in themselves trusted “knowledge bases” that have inherently useful taxonomies.
Ultimately Google and for that matter Bing will become harder and harder to game when it comes to “named entities.” These named entities and their SERP hierarchy are being powered by a concept graph. This graph is the basis for determining the concepts that sites need to match up with to belong.
Its now about knowledge and you are either in or out!
No related posts.