Friday, October 3, 2008

Synonyms for your Custom Search Engine

Posted by: Victor Wang and Bartlomiej Niechwiej, Software Engineers

With our launch of Google Site Search, we added a new feature to the Custom Search platform: custom synonyms. Here's how this feature can be used to improve the quality of your Custom Search Engine (CSE).

How can custom synonyms help?
Synonyms help by finding documents with relevant related terms and ranking them higher. Synonyms can help to alleviate the mismatch that often occurs between the queries that users type, and the actual words, phrases and concepts used in the documents being searched. Custom Search, of course, automatically takes advantage of synonyms used in Google Web search. In addition, Custom Search goes one step further: we allow you to explicitly define custom synonyms that are specific to your web site, community or topic of interest.

To illustrate situations where CSE custom synonyms can help, we created 2 CSEs that both search content from the Palo Alto Medical Foundation (PAMF). The first CSE does not have custom synonyms enabled, while the second CSE has a few custom synonyms enabled:


Terminology: The queries people use sometimes don't match up with the words and phrases used in the content being searched. During allergy season, for example, many people look for information on "hayfever", but the results without synonyms aren't that great since the web pages we're searching across don't necessarily use this specific term. However, if the technical phrase "allergic rhinitis" is added as a synonym for "hayfever", the results with the synonym are far better.

Acronyms: Acronyms often stand for different terms in different contexts. This is especially true within organizations, where acronyms are used frequently. In such cases, it may be possible to improve retrieval via the use of synonyms. Thus, "PAMF" refers to "Palo Alto Medical Foundation", and adding this synonym improves search results: searching for "PAMF" without synonyms gives only a few relevant results, while the CSE with the synonym returns more relevant results.

Community: Within specific target user communities, words or phrases have different usage and significance. When a patient is looking for "hearing doctor", adding the synonym "audiologist" to the CSE provides much better results while the results in the CSE without synonyms are not optimal. Conversely, if a doctor is searching for "somnambulism", she finds no results at all in the CSE without synonyms, but much better results via addition of the synonym "sleepwalking" to the CSE. Synonyms can therefore be used to improve the experience of specific classes of users of a web site.

How can I add synonyms to my Custom Search Engine?
The CSE administrator can control the set of synonyms used by uploading a synonym dictionary that is specific to the domain and website. The synonym dictionary can include alternate words or phrases for common search queries. The following steps show how to add the synonyms for the Palo Alto Medical Foundation CSE.


  1. Download the existing CSE context file through "Control panel"->Advanced->"Download context"

  2. Add custom synonyms to your search engine. The synonym dictionary is uploaded as part of the context XML file. Here are the synonyms we added for the above examples:

    <customsearchengine>
    <title>...</title>
    <description>...</description>
    <context>
    <backgroundlabels>...</backgroundlabels>

    <synonyms>
    <synonymentry word="hearing doctor">
    <synonym>audiologist</synonym>
    </synonymentry>
    <synonymentry word="hayfever">
    <synonym>allergic rhinitis</synonym>
    </synonymentry>
    <synonymentry word="somnambulism">
    <synonym>sleepwalking</synonym>
    </synonymentry>
    <synonymentry word="pamf">
    <synonym>Palo Alto Medical Foundation</synonym>
    </synonymentry>
    </synonyms>

    </context>
    </customsearchengine>

  3. Upload the context file through "Control panel"->Advanced->"Upload context"


A few notes:


  • CSE synonyms are unidirectional, not bidirectional. Thus, a context file with
    <synonymentry word="migraine"><synonym>headache</synonym></synonymentry>
    defines "headache" to be a synonym for "migraine". However, if you also want "migraine" to be considered as a synonym for "headache", you need to add a separate SynonymEntry to the XML, as follows:

    <synonymentry word="migraine"><synonym>headache</synonym></synonymentry>
    <synonymentry word="headache"><synonym>migraine</synonym></synonymentry>

  • In the current version, the synonym dictionary can only be uploaded/downloaded as a part of the context XML file. We hope to make this much easier in the future.

  • We allow up to 500 individual synonyms for a given CSE.

  • Each word can have no more than 10 synonyms. If there multiple synonyms, the query will be expanded to include all synonyms uniformly.


If you are using custom synonyms in your CSE, we hope to get feedback from you about what improvements we can make.



Link - from Custom Search Engine
Related:

No comments: