This article was originally published in the Fall 2012 PIUG (Patent Information User Group) Newsletter:
In my role as a partner at AcclaimIP.com, I talk to patent searchers every day whose search skills range the gamut from novices to 20+ year salty dog gray beard patent research veterans.
As a result, I dish out lots of advice on how to solve specific search problems. A few techniques consistently resonate with our clients and almost always generate the proverbial “Ah ha!” moment. Sometimes, what may seem like a basic concept, can trip up even the most experienced search specialist who may not realize how the search engine is really working under the hood.
My favorite topic is a discussion about keyword searching using the Boolean OR operator. Now, I know it may sound like an unexciting topic if you are an experienced patent searcher, but in the following context, it may just blow you away, and possibly change the way you approach your job.
A Quick Review of Boolean Operators
You already know Boolean logic, right? You probably learned it the first time in 6th grade. It was developed in the 1850’s, so what could possibly be new?
First a little review. As you probably know, there are three common Boolean operators you use when patent searching: AND, OR and NOT. Here are some simple examples:
- touch AND screen –> This search finds patents containing both terms. AND always narrows your search.
- touch OR screen –> This search finds patents that contain either term. OR always expands your search.
- touch NOT screen –> This search finds patents that contain the term touch but not screen. NOT always excludes documents with the specified term.
The natural tendency for most patent searchers is to overuse, abuse and misuse the AND operator because ORs (expand) return far too many search results when you have a large set of terms that may occur in the documents you want to find.
The problem: AND excludes patents that may be spot on the technology, but don’t happen to contain one or more of the terms you AND’ed in your query.
Realizing this, you might spend hours iterating through hundreds of combinations of terms in your queries to make sure you haven’t missed anything. This method is undeniably the wrong approach. Let me prove it to you…
Say you identify 20 terms that are commonly found in documents that cover the technology you are researching. Did you know that there are 1,233,330 ways you can combine 20 terms? For complete coverage of the subject matter, you would spend 6.5 years executing these searches if each took only 5 minutes, which is optimistic. This is, of course, a ridiculous extreme, but it illustrates my main point:
There is no possible way you can give complete coverage to a search on a topic using the Boolean AND operator.
You could have used the OR operator and found the same documents with only a handful of searches! Here is how:
Relevancy Sorting Algorithms
All modern patent search engines have sophisticated relevancy algorithms under the hood. Construct your query containing 20 or more terms, and OR them together. Of course, you’ll likely get well over a million documents that contain at least one of the terms in your big OR’ed query, BUT the search engine’s relevancy algorithm will return the best matches first. Just make sure your default sort is by the RELEVANCY column.
The best matches are those documents that contain the most term hits with the highest frequency. It is likely that only the first hundred or so documents will be truly relevant. Grab these documents, save them to a folder and ignore the other million or so documents returned.
Certainly, you’ll iterate and refine your OR’ed searches too. The art of the search is to decide which terms to require by using the Boolean AND, and which terms to make optional by using the Boolean OR so that your massive OR’d query returns the best results at the top of your search results.
Of Course, AND Has Its Place
One of the most effective uses of the AND operator is to use it by requiring at least one of several synonyms. For example:
Touch AND (screen OR display OR monitor) OR capacitance OR polarize OR etc…
…where the terms screen, display, monitor are commonly used synonyms. This query says, “Require touch, and require at least one of the 3 synonyms, and favor the rest of the terms in the sorting algorithm, but don’t require them.”
Improve Your OR’d Queries With Term Weighting.
Another technique you can use to iterate your search is to give each of your terms a weighting factor. I think every commercial search engine supports term weighting. Keep in mind, weighting a term will not affect the search engine’s recall. In other words, if you weight “touch” higher than “polarize” you’ll get the exact same number of results, but it will change the sort order and favor patents with the terms you’ve weighted higher.
A Repeatable Process:
I recommend a simple step-by-step process that you can use on any technology even if you are not a subject matter expert in the field.
- Find exemplary examples of a dozen or more patents in the technological field you are researching. This is typically not too difficult.
- Study the keywords in those documents, full text and claims only. Your search engine likely has keyword analyzer tools.
- Pick the top N terms that you discover occurring in the documents with a reasonable frequency, where N is usually between 20 and 30 terms or strings.
- Decide if any terms must exist in the documents, or if any are optional, or can be grouped as required synonyms.
- Make your best intelligent guess at a well-constructed query and run it.
- Test the query against your original exemplary examples. Are they in the top 100 documents?
- Rinse, repeat and refine.
- Save search so you can apply it again in the future.
This process, which leverages the Boolean OR and the relevancy-sorting algorithm, is repeatable for any technology. Conceivably, you could even do this type of search in a language that you don’t even speak or a technology with which you have no familiarity and still get highly accurate results.
Keep in mind that while it is effective, and repeatable, it is not something you can do in 5 minutes. Some veteran patent searchers I know may spend a day or more refining and testing a query, but once it is done, they can save it and reapply it over and over. Just remember, that’s why we patent searchers get paid the big bucks .
For example, say you created a search on cell phone antennas that consistently returns highly accurate results. You can then apply the search to all the competitors in the market and create powerful visualizations of the competitive landscape depending on what your search platform supports. Don’t forget, as technology evolves, and the lexicon changes, you’ll have to update your search to include the newer terms.
Use the Boolean OR, particularly with a large set of terms that are likely found, but not necessarily found in every relevant document. Rely on a good relevancy algorithm to bubble the best matches to the top of the list.
After a little practice, you’ll learn to love the Boolean OR and the millions of results it returns!