Sunday, 06 February, 2005

Referral Log Madness

At the end of last month, I mentioned that my Web traffic had increased by over 30% since I submitted my site to some search engines.  It looks like traffic has leveled off at the higher rate, but I'm not sure if it's what I would call valid traffic.  I noticed a lot of odd domains in my referrer logs for October (see November 6, 2004), and wondered where those referrals were really coming from.  That trend has continued, and over the past few weeks I've seen a huge number of referrals from URLs that contain the words "poker" and "freakycheats".  So many referrals, in fact, that those sites take up nine of the top 10 spots in the referrals report.  If those were real referrals I'd expect to see a change in the distribution of served pages, but I don't.  All of my pages are getting more hits, but the distribution remains the same.  There has been little change in the distribution of served pages here since August.  I'm getting twice as much traffic, but the top 20 pages are still getting the same percentage of traffic.    "Very strange," I thought.  So I went digging.

It turns out that I have discovered referrer log marketing.  No, this is nothing new.  It's been going on for at least four years.  Remember, I've only been looking at my traffic reports for the last 6 months or so.  This is kind of an indirect form of spam that takes advantage of the "blogroll" or "links" pages that many sites maintain.  Some site operators decided to write a script that automatically adds pages that link to their Web site to a list on a links page.  So, for example, if I found that Jeff Duntemann's diary linked to my site, such a script running on my server would add a link back to it on my links page.  The way I find who's linking to me is by examining the referrer logs.

Some genius figured out that if he cobbled together a program that would ping a site with a Web request that had their own site in the HTTP_REFERRER field, that site might add a link back to the bogus referral page.  Unsurprisingly, those bogus pages are predominantly porn, gambling, or fly-by-night pharmacies--the same sites that send so much email spam.  This kind of thing apparently worked very well for a while, filling links pages with all manner of links to things I'm sure the site operators didn't want linked from their sites.  Not only did the sites get more traffic from unsuspecting people clicking on "recommended" links, but they also got a boost in the search engine rankings because many search engines rank a site at least in part by how many other sites link to it.

What I don't know is if there's some machine spamming the heck out of my site to put all those URLs in my referrer logs, or if there is a small army of bot-infested client machines out there doing it.  Or, perhaps these are real hits from people whose browsers have been compromised to change the referrer header field.  The browser stats don't help, as they show pretty much the same distribution of browsers (MSIE, Firefox, everything else), but it's as easy to fake the browser report as it is to fake the referrer.

The Web reports also tell me what IP addresses are used to access my site, and there are several sites where I can enter an IP address to determine who it's assigned to.  (Search of "ip address lookup".)  I might be able to use that and NSLOOKUP to figure out what's going on.  This is going to take a little research.  I'll let you know what I find.