• Contact

    info @ searchengineisrael.com

Combining Static Data with Fluid Data

In the hunt for accurate micro-segments, good firms use all the available data there is out there.  Of course there is data and there is data.  One of the most important things to understand about much of the data we use to evaluate certain premises is that it is what I what would call static.  Either it is from governments or other open data sources.  These data sets help us to evaluate population segments and many of their important parameters.  And although this raw data is important, much of the marketing we do, whether it is search related or social ad placement, relies on data gleamed from real time data.  Many marketing firms use population segmentation to understand potential marketing groups better. This is an excellent approach and it is something I have used countless times, but something of importance that is often overlooked is fluid or reactionary data that is often only collected through various targeted campaigns.

Last week I was speaking to a very senior executive at a large financial corporation here in Israel.  He was very excited to tell me that they begun to analyse and organize all of their clients’ data that is in their internal system.  Of course I wasted no time in explaining that data mining of already collected static data is essential, but it is only the beginning.

We can only understand how people behave online through what we actually test and learn in real time.  This is a process that combines the assumptions gleaned from analyzing collected static data with real time reactions and data mined from actual micro-segments. By combining both, outreach is much more effective.

Data is the Work Around for Corporate Ad Platforms

One of the methodologies we pride ourselves on at SERPIntelligence is exploring other ways to use data to help us hyper focus on ideal segments. This has two purposes.  The first of course is a higher conversion rate as well as an increased degree of crowd influence.  The second is usually an after thought in the online marketing industry and that is ad spend.

Large platforms like Facebook or Google do not share all of their data with their customers in an obvious play to increase bidding during the ad auction.  There is of course a balance between targeted data that they offer in order to maximize ROI versus getting higher bids among interested parties.

Below is part of an abstract to a paper that deals with this very issue:

The holy grail of online advertising is to target users with ads matched to their needs with such precision that the users respond to the ads, thereby increasing both advertisers’ and users’ value. The current approach to this challenge utilizes information about the users: their gender, their location, the websites they have visited before, and soon.  Incorporating this data in ad auctions poses an economic challenge:
can this be done in a way that the auctioneer’s revenue does not decrease (at least on average)?

-Ad Auctions with Data, Mahdian
-Hu Fu, Patrick Jordan, Mohammad Mahdian, Uri Nadav, Inbal, Talgam-Cohen, and Sergei Vassilvitskii

The best way around many of the issues with large ad platforms is drilling down on alternative data sources.  Researching geo-located data parameters such as localities in a given country in comparison to wealth trends in that same area is just one of many ways to narrow and learn about selling opportunities for products.

The key to high conversion rates, while saving money is data and of course the research that gets you the right kind of data.

What’s Next for Search?

Search Marketing has been in quite an upheavel for sometime.  Many companies have approached me to help them get a boost on search.  Unfortunately for them, their paradigm has yet to switch from vertical search marketing (like ranking higher) versus something we have coined called cross ecosystem lead generation.  When I founded SERPIntelligence I sensed that the shift search and social was undergoing was far more than a few technical changes to Google’s search algorithm, but rather an entire though model was being buried.

What is this thought model you ask?

SEO. No I am not a believer in the SEO is dead mantra, but SEO as a valuable vertical unto itself is quite dead.  Just think, most people who search for shopping keywords do so on Amazon and not Google.  Google themselves acknowledged this shift over a period of time as they began to integrate the focus on knowledge based searches. With this shift, SEO as a valuable ROI service is over.

What does that leave people in the industry? Quite interestingly, it leaves those of us who want to think outside the box in a very good position.  For example, a large bank recently approached to run their failing SEO campaign.  We explained to them SEO will not help them reach wealthy people interested in private banking or wealth management.

We obviously pushed them to our boutique lead generation service, which focuses on select audiences and focused results.

The answer about using search is simple, these high end leads still use Google to cement industry proof and there lies the power of thinking in a cross ecosystem strategy.  We make sure to drive our clients high for words that that industry leaders will search for in order to cement the brand into the minds of those that matter, key influencers.  Essentially we are going with Google and not fighting a dying wave.

I suggest you do the same thing.

How Google is Generating Query Refinements the Orion Way

(The post below is important for many reasons. Bill is one of the best in the industry in describing how patent research leads to a refined and focused approach for SEO. The post also centers around an Israeli Ori Allon who has been a huge part of many revolutions within search. Read the original.)

By Bill Slawski,

In 2006, Google battled Yahoo! and Microsoft for an algorithm developed by an Israeli Ph.D.student in Australia. The algorithm had a semantic element to it, and advanced Google in an algorithm arms race between the search giants (one of which doesn’t even have a search engine of its own now). We’ve seen the technology described in terms of how it is displayed in search results, but not how it does what it does. Until now.

Google was awarded a patent this week that looks at search results for specific queries and the entities that appear within them, to produce query refinements. This invention is from Google, but the lead inventor behind it was part of abidding war between Google, Yahoo!, and Microsoft. In 2009, the breakthrough was made public on Google in the form of Orion technology.

The Orion approach involved both extended snippets for queries (three or more lines of descriptive snippet instead of two for some longer queries), and “more and better query refinements.” How this technology is displayed is described in a Google Official Blog post from March 24, 2009 titled Two new improvements to Google results pages.


One of the co-authors of that post is Ori Allon, who developed the Orion Technology as a student in Australia. (Ori has been busy since then, with stints at Google and Twitter, and a new project on his own.)

If you do some of the searches at Google described in that blog post, you’ll see both extended snippets and a good number of suggested query refinements. Try a search for [earth's rotation axis tilt and distance from sun] (without the brackets), for an example. Three of the top 10 results from my search have three lines of snippets, and another has 4 lines. Here’s an example of one of those extended snippets:

A three line search result for a query about Earth's rotatonal Axis and distance from the Sun.

The patent provides us with a better look at how the Orion technology actually works:

Refining search queries
Invented by Ori Allon, Ugo Di Girolamo, Tomer Shmiel, Alexandre Petcherski, and Tzvika Hartman
Assigned to Google
US Patent 8,392,443
Granted March 5, 2013
Filed: March 17, 2010


Methods, systems, and apparatus, including computer program products, for refining search queries.

A method includes:

  • Obtaining a submitted search query, and in response to obtaining the search query:
  • Obtaining search results responsive to the search query;
  • Selecting a document from a group of documents identified by the search results;
  • Generating from a subset of one or more entities associated with the document one or more candidates for refined search queries, including:
    • Identifying one or more terms in the search query, where the one or more terms occur in the search query in a particular order relative to each other, and
    • Combining the one or more terms with the entity to generate a candidate, where the one or more terms occur in the particular order relative to each other; and identifying one or more of the candidates as being refined search queries for providing with the search results.

Generating Query Refinements

The patent itself focuses upon query refinements rather than upon extended snippets, and my guess is that there’s probably another unpublished patent out there focusing upon those extended snippets. But the query refinement approach is interesting in a few ways.

It refers to entities found in documents that rank for specific queries (a co-occurrence of entities), and those entities might be used in combination with words from the original query (or synonyms of those words) to provide query refinements.

The “entities” described in this patent sound similar to the kinds of entities that we see in Google’s knowledge base approach, though there might be some differences.


Pages returned in a search for a query could be associated with specific entities, which are included in the documents returned for that search.

Entities make up a “meaningful, self-contained concept.”

An entity could be a single word, a phrase, or other character strings. An entity might be a sequence of one or more characters that show up in previously-submitted search queries at a frequency that is greater than a certain threshold of searches over a certain period of time. A document could be associated with more than one entity.


Someone searches for [Mona Lisa] and Google returns search results pages in response. A number of other entities might appear in those search results, such as “Leonardo da Vinci”, “Louvre”, “renaissance”, and others.

These entities might be passed along to a query refinement server as parts of candidate query refinements.

Scoring Entities Associated with Documents

Refined search queries can be created in real time, because the entities that are used to generate those queries are associated with documents identified by search results in response to the query.

Entities associated with a document can also be previously-submitted search queries for which search results that identify the document have been returned more than a certain number of times.

Inverse Document Frequency (IDF) Score

Part of the process described in this patent involves filtering entities as possible query refinements that the search engine might list, to get just a small number of refinements.

An entity might have an IDF score, which is based upon counting the number of documents being searched “which contain (or are indexed by) the term in question. The intuition was that a query term which occurs in many documents is not a good discriminator, and should be given less weight than one which occurs in few documents.” See: Understanding Inverse Document Frequency: On theoretical arguments for IDF (pdf).

The score for the entity may be based on a sum of the IDFs of each word in an entity. The score of the entity “Mona Lisa” might be calculated by taking the sum of the IDF of “Mona” and the IDF of “Lisa”. (The British Rock band from the 80s, “The The” might not have the greatest IDF score in the world under that approach.)

As the IDF score of an entity increases, the likelihood that the entity is important or relevant to a document responsive to the search query also increases. Therefore, entities of a document with a higher score are also ranked higher than entities with a lower score.

Co-occurrence Score

If you look through the document, you may see an entity appear more than once. A score for an entity can be created from “determining a co-occurrence relationship between the entity and a search query.”

If an entity appears in a document more than once, it’s importance to the document is probably higher, and that score might be used with, or incorporated into the IDF score.

Query Click and Dwell Time Score

The score for an entity could also be increased as the number of times the entity is found in a previously-submitted query increases.

Every selection of the document for the previously-submitted query within search results is counted as a click. The amount of time someone views or “dwells” on the document may also be tracked. The more time they spend there (i.e., a long click), the more relevant the document might be seen for that previous click.

If they don’t spend much time there, that might be perceived as a lack of relevance of the document.

The score for an entity can increase based upon long clicks, and/or based on an increase of the ratio of long clicks to total clicks for queries which use that entity.

Candidate Query Refinements used in Titles

The score of an entity can be increased if the entity is found in the title of a document.

Previously Submitted Queries

This score for an entity can increase by an increase in the number of times it is found in previously-submitted queries, in the number of documents it is found in presently, in the number of times in which the entity is included in the titles of documents, and also as the number of terms (or tokens) in the entity increases.

Other Collected Information

It’s possible that some other information might be used to score an entity that might be used as part of a candidate query. Some of the information collected may include the:

  • Search query
  • Frequency of submission over a period of time
  • Dates and times of submission
  • Language of the search query, and/or
  • Other information associated with the search query

Evaluating Candidate Refinements

The patent describes how these entities might be merged with the original query that refinements are being selected for, and how they might be selected from among all of the candidates. Some of this evaluation might involve looking at:

  • A number of words in the candidate
  • An amount of overlap between the candidate and an entity
  • An amount of overlap between the candidate and the search query
  • A number of times the candidate appears in the search logs
  • A sum of the IDF of all the terms in the candidate
  • An IDF of the most unique term in the candidate

The top 8 or so refinements might be selected to be shown in search engine results.


It’s hard to say how much of what Ori Alon developed is still in use in generating the query refinements that Google shows for queries. Ori Allon is no longer with Google, but it’s possible that others at Google have worked to improve this query refinement approach.

The”entities” described in this patent feel just a little different than the named entities described in Google’s knowledgebase approach to search. I’ve described some of the changes we’ve seen in search from keyword mapping to phrase-based indexing to concept matching in SEO is Undead Again (Profiles, Phrases, Entities, and Language Models). If we think of named entities as specific “people, places, and things,” then maybe entities found in documents to create query refinements aren’t so different, though.

If you were to take a set of query refinements suggested for a particular query, and start looking through the pages returned for that query, you might start seeing some of the entities that were used to generate those refinements.

  • Lets Get Social!

  • David Mark
    David Mark is the Founder and CEO of SERPIntelligence a boutique intelligence company focusing on data enhanced Lead Generation.

  • Contact

    info @ searchengineisrael.com
  • Recent Tweets

  • Recent Posts