I just found some referrers in my logs that I absolutely couldn't find anything on that would point back to me. Nothing unusual so far - referrer spam would be the first suspicion. But the sites mentioned in the referrers are perfectly normal weblogs and other sites - no one who would have reason to spam their site (for example, a blog with about 1 post per month, or an Irish site and a few other strange referrers). The numbers are also different than with normal referrer spam: that usually comes either only 1-2 times or if so with many addresses and each one then about 100x or similar. This one comes about 15 times.

So I dug around in the logs a bit to see if I could find something. And sure enough, the referrers have unusual characteristics: they don't end with a /. Normally an address that doesn't end with / is automatically redirected to the /-variant. Referrers are thus normally /-terminated or direct HTML pages or something comparable. Pure site specifications without a / at the end are rather rare.

Something else also stands out: the pages were actually accessed - or at least downloaded. And the pages belonging to one referrer are quite randomly mixed - with normal users you'd actually expect some form of consistency in what comes through as a referrer. Above all, it's rare for 15 links to come to one page all at once...

And the essential criterion: the IP of the accessing computer is always the same across the different ones. An analysis then produced the following picture:


 15 betas.intercom.net
 15 blogs.salon.com
 15 hfilesreviewer.f2o.org
 30 irish.typepad.com
 5 spleenville.com
 5 vowe.net
 15 www208.pair.com

All clearly fake referrers. Additionally, 34 accesses to my RSS feeds without a referrer. Accesses were only to direct posts and RSS feeds - not to overview pages or archive pages. It looks very much like the bot is proceeding as follows: search for RSS feeds, grab them, then search for permalinks to articles in them and download them to access comment forms, for example. The whole thing nicely disguised as supposed visitors, including forged referrers that seem unsuspicious. Also not too many accesses from one referrer, rather switch it up more often.

Actually nothing new - with email spam, forged real senders are quite common and usual to be harder to filter. But with scraper bots, I'm seeing this kind of mimicry live for the first time - I've only been observing these symptoms for about 1-2 weeks now.

For admins, this whole thing is quite annoying, since you can use referrer logs even less than you could before. Previous referrer spam was certainly a nuisance, but due to the pretty dumb names of the referrers it was easy to recognize. This form of log phenomenon also falsifies the referrers - but is much less noticeable. Could be interesting for weblogs that display their referrers directly in the post.

And of course the problem remains that I still don't know what the bot wants to do with the collected information. Although I'm strongly suspecting spam, but that's just a guess - could also be a bot searching for typical security holes. In any case it's a bot and in any case it has no good intentions - because otherwise it wouldn't need to hide.