• Contact

    info @ searchengineisrael.com

What’s Next for Search?

Search Marketing has been in quite an upheavel for sometime.  Many companies have approached me to help them get a boost on search.  Unfortunately for them, their paradigm has yet to switch from vertical search marketing (like ranking higher) versus something we have coined called cross ecosystem lead generation.  When I founded SERPIntelligence I sensed that the shift search and social was undergoing was far more than a few technical changes to Google’s search algorithm, but rather an entire though model was being buried.

What is this thought model you ask?

SEO. No I am not a believer in the SEO is dead mantra, but SEO as a valuable vertical unto itself is quite dead.  Just think, most people who search for shopping keywords do so on Amazon and not Google.  Google themselves acknowledged this shift over a period of time as they began to integrate the focus on knowledge based searches. With this shift, SEO as a valuable ROI service is over.

What does that leave people in the industry? Quite interestingly, it leaves those of us who want to think outside the box in a very good position.  For example, a large bank recently approached to run their failing SEO campaign.  We explained to them SEO will not help them reach wealthy people interested in private banking or wealth management.

We obviously pushed them to our boutique lead generation service, which focuses on select audiences and focused results.

The answer about using search is simple, these high end leads still use Google to cement industry proof and there lies the power of thinking in a cross ecosystem strategy.  We make sure to drive our clients high for words that that industry leaders will search for in order to cement the brand into the minds of those that matter, key influencers.  Essentially we are going with Google and not fighting a dying wave.

I suggest you do the same thing.

How Google is Generating Query Refinements the Orion Way

(The post below is important for many reasons. Bill is one of the best in the industry in describing how patent research leads to a refined and focused approach for SEO. The post also centers around an Israeli Ori Allon who has been a huge part of many revolutions within search. Read the original.)

By Bill Slawski,

In 2006, Google battled Yahoo! and Microsoft for an algorithm developed by an Israeli Ph.D.student in Australia. The algorithm had a semantic element to it, and advanced Google in an algorithm arms race between the search giants (one of which doesn’t even have a search engine of its own now). We’ve seen the technology described in terms of how it is displayed in search results, but not how it does what it does. Until now.

Google was awarded a patent this week that looks at search results for specific queries and the entities that appear within them, to produce query refinements. This invention is from Google, but the lead inventor behind it was part of abidding war between Google, Yahoo!, and Microsoft. In 2009, the breakthrough was made public on Google in the form of Orion technology.

The Orion approach involved both extended snippets for queries (three or more lines of descriptive snippet instead of two for some longer queries), and “more and better query refinements.” How this technology is displayed is described in a Google Official Blog post from March 24, 2009 titled Two new improvements to Google results pages.

 

One of the co-authors of that post is Ori Allon, who developed the Orion Technology as a student in Australia. (Ori has been busy since then, with stints at Google and Twitter, and a new project on his own.)

If you do some of the searches at Google described in that blog post, you’ll see both extended snippets and a good number of suggested query refinements. Try a search for [earth's rotation axis tilt and distance from sun] (without the brackets), for an example. Three of the top 10 results from my search have three lines of snippets, and another has 4 lines. Here’s an example of one of those extended snippets:

A three line search result for a query about Earth's rotatonal Axis and distance from the Sun.

The patent provides us with a better look at how the Orion technology actually works:

Refining search queries
Invented by Ori Allon, Ugo Di Girolamo, Tomer Shmiel, Alexandre Petcherski, and Tzvika Hartman
Assigned to Google
US Patent 8,392,443
Granted March 5, 2013
Filed: March 17, 2010

Abstract

Methods, systems, and apparatus, including computer program products, for refining search queries.

A method includes:

  • Obtaining a submitted search query, and in response to obtaining the search query:
  • Obtaining search results responsive to the search query;
  • Selecting a document from a group of documents identified by the search results;
  • Generating from a subset of one or more entities associated with the document one or more candidates for refined search queries, including:
    • Identifying one or more terms in the search query, where the one or more terms occur in the search query in a particular order relative to each other, and
    • Combining the one or more terms with the entity to generate a candidate, where the one or more terms occur in the particular order relative to each other; and identifying one or more of the candidates as being refined search queries for providing with the search results.

Generating Query Refinements

The patent itself focuses upon query refinements rather than upon extended snippets, and my guess is that there’s probably another unpublished patent out there focusing upon those extended snippets. But the query refinement approach is interesting in a few ways.

It refers to entities found in documents that rank for specific queries (a co-occurrence of entities), and those entities might be used in combination with words from the original query (or synonyms of those words) to provide query refinements.

The “entities” described in this patent sound similar to the kinds of entities that we see in Google’s knowledge base approach, though there might be some differences.

Entities

Pages returned in a search for a query could be associated with specific entities, which are included in the documents returned for that search.

Entities make up a “meaningful, self-contained concept.”

An entity could be a single word, a phrase, or other character strings. An entity might be a sequence of one or more characters that show up in previously-submitted search queries at a frequency that is greater than a certain threshold of searches over a certain period of time. A document could be associated with more than one entity.

Example

Someone searches for [Mona Lisa] and Google returns search results pages in response. A number of other entities might appear in those search results, such as “Leonardo da Vinci”, “Louvre”, “renaissance”, and others.

These entities might be passed along to a query refinement server as parts of candidate query refinements.

Scoring Entities Associated with Documents

Refined search queries can be created in real time, because the entities that are used to generate those queries are associated with documents identified by search results in response to the query.

Entities associated with a document can also be previously-submitted search queries for which search results that identify the document have been returned more than a certain number of times.

Inverse Document Frequency (IDF) Score

Part of the process described in this patent involves filtering entities as possible query refinements that the search engine might list, to get just a small number of refinements.

An entity might have an IDF score, which is based upon counting the number of documents being searched “which contain (or are indexed by) the term in question. The intuition was that a query term which occurs in many documents is not a good discriminator, and should be given less weight than one which occurs in few documents.” See: Understanding Inverse Document Frequency: On theoretical arguments for IDF (pdf).

The score for the entity may be based on a sum of the IDFs of each word in an entity. The score of the entity “Mona Lisa” might be calculated by taking the sum of the IDF of “Mona” and the IDF of “Lisa”. (The British Rock band from the 80s, “The The” might not have the greatest IDF score in the world under that approach.)

As the IDF score of an entity increases, the likelihood that the entity is important or relevant to a document responsive to the search query also increases. Therefore, entities of a document with a higher score are also ranked higher than entities with a lower score.

Co-occurrence Score

If you look through the document, you may see an entity appear more than once. A score for an entity can be created from “determining a co-occurrence relationship between the entity and a search query.”

If an entity appears in a document more than once, it’s importance to the document is probably higher, and that score might be used with, or incorporated into the IDF score.

Query Click and Dwell Time Score

The score for an entity could also be increased as the number of times the entity is found in a previously-submitted query increases.

Every selection of the document for the previously-submitted query within search results is counted as a click. The amount of time someone views or “dwells” on the document may also be tracked. The more time they spend there (i.e., a long click), the more relevant the document might be seen for that previous click.

If they don’t spend much time there, that might be perceived as a lack of relevance of the document.

The score for an entity can increase based upon long clicks, and/or based on an increase of the ratio of long clicks to total clicks for queries which use that entity.

Candidate Query Refinements used in Titles

The score of an entity can be increased if the entity is found in the title of a document.

Previously Submitted Queries

This score for an entity can increase by an increase in the number of times it is found in previously-submitted queries, in the number of documents it is found in presently, in the number of times in which the entity is included in the titles of documents, and also as the number of terms (or tokens) in the entity increases.

Other Collected Information

It’s possible that some other information might be used to score an entity that might be used as part of a candidate query. Some of the information collected may include the:

  • Search query
  • Frequency of submission over a period of time
  • Dates and times of submission
  • Language of the search query, and/or
  • Other information associated with the search query

Evaluating Candidate Refinements

The patent describes how these entities might be merged with the original query that refinements are being selected for, and how they might be selected from among all of the candidates. Some of this evaluation might involve looking at:

  • A number of words in the candidate
  • An amount of overlap between the candidate and an entity
  • An amount of overlap between the candidate and the search query
  • A number of times the candidate appears in the search logs
  • A sum of the IDF of all the terms in the candidate
  • An IDF of the most unique term in the candidate

The top 8 or so refinements might be selected to be shown in search engine results.

Conclusion

It’s hard to say how much of what Ori Alon developed is still in use in generating the query refinements that Google shows for queries. Ori Allon is no longer with Google, but it’s possible that others at Google have worked to improve this query refinement approach.

The”entities” described in this patent feel just a little different than the named entities described in Google’s knowledgebase approach to search. I’ve described some of the changes we’ve seen in search from keyword mapping to phrase-based indexing to concept matching in SEO is Undead Again (Profiles, Phrases, Entities, and Language Models). If we think of named entities as specific “people, places, and things,” then maybe entities found in documents to create query refinements aren’t so different, though.

If you were to take a set of query refinements suggested for a particular query, and start looking through the pages returned for that query, you might start seeing some of the entities that were used to generate those refinements.

Graph Search Goes Bust

facebook-dislikeA search engine that needs instructions is doomed to fail.  This has been my feeling since turning on Facebook’s new Graph Search.  With all of their money and know how, Facebook couldn’t even make a dent in Google’s search empire.  

I really felt that applying a real search option to Facebook’s social data would generate the type of results and platform that would usher in a new form of search, one that could bring real answers back for what the world is thinking.  Facebook’s Graph Search is nothing but a fancy organizational tool for a user’s friends and their circles of friends.

Yet I can deal with that.  What bothers me more is the stiffness of the search.  Queries need to be structured in a  certain way and the data they use is essentially the data that we mark down ourselves when building a Facebook Profile.

I tested out a simple query: “Which Friends of Mine are Jewish?”  I was surprised that hardly any came back and of course the fact is I go through my friend list and can measure easily 90% that are of the Jewish faith.

So what went wrong?  It seems that Facebook is not using action based data, like from stories, posts, and images that a person likes.  They are in a sense filtering a user’s pre-picked interests and profile, data which is at the most very surface level.

I am hoping that guys (and girls) out in Palo Alto begin to dig down and make a truly sophisticated search engine based on more than just vertical matches, after all we expect  a whole lot more.

Search Rankings and Understanding the Deeper Meaning

All complex search engines, whether they are Google or Bing, and even shopping platforms like Amazon and eBay use search hierarchy as way to convey meaning.  Ordered search hierarchy is the best way to tell and differentiate between useful results.  We have grown used to seeing you tubes at a certain place for related keywords or Wikipedia’s dominating search queries that encompass named entities, so much so that when they are not there are brains tell us that something is off.  Structured at their basic level create meaning and definition within a set of data.  Shopping platforms like Amazon and eBay rely on ordered search results as well to give rich meaning to a particular search.

The greater the definition and clearer the parameters the more meaning there is behind any given search query. This is ultimately the idea behind Google’s Knowledge Graph as well as Amazon’s “Buy Box.”  Usefulness in search creates higher CTR and a more valuable and informative search experience.

Ultimately Google and others know that the best search are those that are useful and essentially our role as search experts is to fit our information into a search engines set of parameters.

Water_Lily_Lake_Victoria_77[1]

  • Lets Get Social!

  • David Mark
    David Mark is the Founder and CEO of SERPIntelligence a boutique intelligence company focusing on data enhanced Lead Generation.


  • Contact

    info @ searchengineisrael.com
  • Recent Tweets

  • Recent Posts