Samuel Tardieu @ rfc1149.net

Getting rid of RSS slammers

,

A few weeks ago, I noticed that some people were getting my RSS feed once every minute. The load on the WWW server was already high and I found a much cheaper solution on my side: redirect them to the RSScache service through an Apache redirection.

This morning, I read that Daniel Glazman had the same problem and I suggested him (in a private email as he forbids comments on his blog) to do the same. After discussing a while, we thought it could be a good idea to automate the process.

I wrote a small Python script called rssabuse.py which parses your web server access log, tries to detect the abusers for the previous day and rewrites part of your .htaccess so that abusers are redirected transparently to RSSCache. Ok, they may get extra advertisments in the feed, so what? This is their problem, not yours. A HTTP redirection is much less costly than a full feed serving and they can still follow your blog activity. This should work with many blogs software (using WordPress or DotClear for example), provided that you can use Apache’s mod_rewrite in your .htaccess.

The idea is to put something like that in your .htaccess:

RewriteEngine on
RewriteBase /blog
# rssabuse section
RewriteCond %{REMOTE_ADDR} 0.0.0.0  [replaced later by this script]
RewriteRule ^(feed.*)$ https://my.rsscache.com/www.rfc1149.net/blog/$1 [R,L]

and then, every night, shortly after midnight, you launch (through a crontab for example):

rssabuse.py /home/log/apache/access.log '^/blog/feed' 100 /home/sam/blog/.htaccess

(100 means 96 times a day plus a few hits to be on the safe side)

The script will count accesses to ^/blog/feed as a regular expression and redirect the hosts (by name or address) abusing your feeds to RSScache by rewriting your .htaccess file. You should see your server load decrease as the abusers are kept away.

A note for the technical junkies: the script will try very hard to make the file update atomic so that no hit to your web server can see a partial or missing .htaccess.

rssabuse.py is made available under the GNU General Public License version 2.

  • Version 1.0: initial release
  • Version 1.1: the list of abusers is available on standard output so that you can see that it is working
  • Version 1.2: fix a bug in date computation and output more helpful statistics with the number of accesses that caused a host to be blocked
blog comments powered by Disqus