Avoiding rate limits when polling feeds

One of the cool things about the web is the ability to integrate content from a variety of sources. For example, you might want to pull in news data from an RSS feed and have the news items displayed on your web site.

This is easy to do with a script (such as a program written in Perl, PHP or Python etc) running via a cron job. The script then stores a fragment of content that you can use within your pages.

However, many services limit the number of content requests you can make in a specific time frame. For example, Twitter limit the number of requests a client can make to their service each hour. This prevents the service being swamped. Exceeding this limit, depending on the service, might result in an error or the IP address of your server being temporarily blocked from accessing that service.

Hitting the buffers

Under normal circumstances you probably won’t need to request a feed so many times that you hit the limits yourself. However, imagine the scenario where your web site is hosted on a shared server. You may only be requesting the feed once every two hours but, if other users on the same server are requesting data from the same service at a much higher rate, the combined number of requests might push you over the limit.

In this situation you might find that your requests will fail even though you are making requests at a low frequency.

Creating another source of the feed

If your access to a service is being limited by factors outside of your control then it’s often helpful to find an alternate source of the feed. This is rarely provided by the host service themselves but came be easily set up using Yahoo! Pipes. Yahoo! Pipes are a great way to aggregate, filter and republish data from online sources.

However, at the most basic level, you can set up a simple pipe that requests the RSS feed from the external service and then republishes the data as RSS at a http://pipes.yahoo.com address (see the ‘Get as RSS’ link for your Pipe).

Screenshot of a simple pipe created in Yahoo! Pipes

Once you have created the Pipe you can then request the Pipes version of the RSS feed rather than the original (remembering that you should still keep the rate of requests at a low level).

The result is that the request for the feed comes from the Yahoo! Pipes service and not directly from your script and you can circumvent the issues caused by other users on the same hosting. This is because services like Yahoo! Pipes are intended to be polled and are often white-listed.