I found an interesting bit of information that has been missed by most of the SEO community. As quietly as Google dropped the Google Search API at the end of last year, they decided to bring it back—but only to the research community.
It’s now called the University Research Program for Search and brings with it the following limitations:
- The research program is open to faculty members and their research teams at colleges and universities, by registration only.
- The program may be used exclusively for academic research, and research results must be made available to the public.
- The program must not be used to display or retrieve interactive search results for end users.
- The program may be used only by registered researchers and their teams, and access may not be shared with others.
Getting the information you need
As an advanced SEO you are no doubt aware that in order to test many ideas, theories and routines, you need to create custom tools and scripts that automate most of the work for you. The most crucial information resides in the search engines space, and gaining access to it is critical.
There are two ways to do it. You can scrape their results or you can access them from their sanctioned APIs. Scraping has many disadvantages, including:
Your code breaks every time the search engine makes a change to the results page, and your regular expressions need to be adapted.
You are breaching the search engine’s terms of service, something they are not happy about.
You are exposed to getting your IP(s) banned. (Alternatively, you could lease proxy servers to access multiple IPs.)
On the other hand, accessing an API doesn't carry with it those negatives. At the moment, the only drawbacks I see are the usage limits and the fact that sometimes the results are not totally accurate.
Yahoo and Microsoft still support their search APIs, and they limit the tokens by requesting IP. I was one of the lucky ones to get a Google SOAP API before they closed it, but it is only useful for my own internal purposes as nobody can get new developer tokens. For example, I can give out a single Yahoo and Microsoft API token to anyone using my software and they will be limited by their IP. But if I give out my Google token to a lot of folks, the token limit will be shared among everybody that uses it.
What about Google AJAX API?
After Google was heavily criticized for taking away such a useful tool from the developer community, they responded with a replacement: the Google AJAX API. Unfortunately, that API is far more limited than the previous one was. While I agree that the SOAP, which supposedly stands for Simple Object Access Protocol, is definitely not simple, it did provide far more flexibility.
If your only concern is to display some search results from your website, the Google Custom Search Engine (CSE) does an excellent job. Why release an AJAX API that developers were not asking for?
Is the new Search API any good?
However, the limits on how the API can be used may be a problem. From the documentation:
Requests to the service MUST be throttled … A time period of at least one second must be allowed between requests.
This makes some interesting types of research impossible using this API, anything that would need to fire off multiple queries quickly.
For example, let's say I am working on a technique for query expansion, so I want results not only for the search given, but also for tens of other related searchers, which I will then combine. With a one second delay between queries, my research prototype will take tens of seconds to respond, making it no longer interactive.
Nor can I try out some natural language analysis for question answering where I first get the results for the search given, then look at the results, then fire off dozens of additional queries to learn more about what I found in those results.
I cannot even do something that attempts to use conditional probabilities of finding two words together versus finding them apart on the Web as part of the analysis, since each of those requires two queries to the search engine and many of them might be required.
Other researchers point out alternative approaches in the comments. The whole post is definitely an interesting read and I highly recommend it.
At the moment the new API is not available to SEOs, unless you are currently studying Information Retrieval at university. It will be very interesting and beneficial to see Google release a commercial API, similar to the Google Adwords API or Amazon's Search API.
What do you think? Would you be willing to pay for such a service?