Joho the Blog » How comment spam doesn’t work
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

How comment spam doesn’t work

Would you please allow me to be dumb in public? Again? Thank you.

I use Movable Type and have yet to upgrade from 2.66, but I ask the following not only for practical reasons. I want to know how comment spam works.

A few months ago, I tried installing some cgi stuff that was supposed to generate a graphic of random numbers embedded in swirly shapes; if you didn’t enter those numbers into a box on the comment form, your comment would be rejected. But I couldn’t get the graphics library installed correctly, so it didn’t work and uninstalled it.

No, I’m not looking for help installing the graphics library. I want to understand why it wouldn’t work simply to ask commenters to type any particular string – the same for everyone – into a box on the comment form and then reject any submissions that leave that box blank. In fact, why would I even have to make people type in a number? Why not just set a hidden value in the form? Do comment spammers actually note the parameters embedded in the <form> or do they simply find the address of an MT blog entry and assume that all MT comment pages use the same paramater names and values? Or am I way off in my understanding?

I tried experimenting with this yesterday, adding code to comments.pm to have it look for an extra parameter, which would explain why if you tried posting a comment on my site yesterday afternoon you got back intense mounds of gibberish.

(Note to self: Possible new Joho tagline: Generating intense mounds of gibberish since 1999.)

Previous: « || Next: »

18 Responses to “How comment spam doesn’t work”

  1. A hidden value in the form on the submission page? The idea here being that spammers don’t submit via the normal submission page, but actually via a batch script pointed directly at the comment.cgi file and thus wouldn’t have the required value? Do I have this reasoning correct? Interesting idea.

    I also tried once to install that random number/image thing, failed and abandoned the idea. My spam has reduced quite a bit by switching to WordPress. That which I do still get is usually caught by in the “waiting for moderation” queue for easy deletion. But if things get worse I’ll probably give it another try.

  2. Spammers use tools (you can buy them for 40$) that let them autosubmit large amounts of spam. The tools include tools to analyze a submission form, and send on any values in the page or on the form, or go straight to the cgi that does the submitting.

    It is supereasy for the script to take the ‘hidden’ form value and POST it on. The idea behind the image is that it can’t be automated – a tool can’t read it, so the cost for the spammer to spam you becomes higher, they have to manually enter the number every time.

    A captcha (image with deformed number) is a high barrier to entry for a spammer. If you set a cookie after someone enters the code and then don’t ask it again, the amount of annoyance for your users is limited.

  3. In other words, it’s a balance between annoying your users and increasing the barrier of entry for spammers. Having a hidden field doesn’t do much to stop a spammer, unless they use the crudest of scripts that don’t even analyze forms.

  4. Damn. Thanks, Peter.

    But, doesn’t that mean that I don’t even need a captcha? Suppose the form were to say, “Please type ‘1’ into this field: …” The comment spammers’ sw wouldn’t figure out to enter a 1, right? The captcha itself seems like overkill.

  5. Jeremy Zawodny has a field on his comment form that says “What’s Jeremy’s first name?” It’s intended to force spammers to fill out the form manually, or at least customize their software for his site. No mention on his blog about how effective it’s been.

    Having a hidden field does stop some spammers, at least if it is different for every entry. Many spam runs start sending a comment to entry 1 and then submit to 2, 3, 4, etc. I have a field that consists of an md5 hash of the entry number and a secret word. When the comment is submitted, the hash is checked and if it’s wrong or missing, the comment is rejected. This hasn’t prevented single comment spams, but it has prevented spam runs of thousands of sequential comments.

    You could turn this concept into a non-visual CAPTCHA. Have a hidden field that contains a simple math problem. Something like 2+3. As the user to also solve that math problem prior to commenting. Check the math and reject the comment if it’s wrong. If you dynamically generate your comment form, you could automatically create simple math problems and insert them into the form.

    Changing your comment script filename is helpful as well. Some spam bots discover your comment form by searching for mt-comments.cgi. If your script is named something else, it will keep these bots away. Mine’s named fbda07e9fd3bb656bbf62c5b0ed6480e.cgi

    But the real scary part of this is that none of these solutions address TrackBack spam. Almost everything we discuss about fighting comment spam is focused on ensuring that a human is leaving the comment. TrackBack is *designed* to allow computers to send comments to your site without the intervention of a human.

  6. There are a few widely used weblog systems out there, and you can bet the spammers will make use of this fact. Lets say one of the well known blogging systems used a pattern like “Please type ‘1’ into this field: …” — it doesn’t matter if it is a ‘1’ or anything else, even if it is a random value — the combination of a known pattern with a machine readable value means that the spammer will know what to do. The graphic of random numbers can make this value really hard to read with a machine (if it is distorted somehow).

    On the other hand, if you code something unique to your blog, so the pattern is only used by you, then the spammer has to specifically target your blog. This defence is based on raising the cost to the spammer. You might be able to come up with a way to require a person to be involved (e.g. use a single unchanging distorted graphic, even a non-obvious-to-a-computer machine readable thing, maybe: “you can post if you can answer a skill testing question: what is ‘1 + 1’?”). Then they’d have to have an actual human being involved. That would raise the cost significantly, even if they out-sourced to some third world sweat shop.

  7. In googling around, I just found Burningbird’s instructions for adding a static but random value to your comment forms: http://weblog.burningbird.net/fires/000638.htm

    Thank you, Shelley. Unfortunately, I can’t get it to work. I’m 100% sure it’s my fault. (Yes, I did replace the fancy quotes my editor wanted to put in with plain ol’ quotes.)

  8. Flemmimg Funch at ming.tv has exactly that installed & operating on his blog software, and has had for some time now … and the blogging application is free and completely available to anyone who joins the New Civilization Network. I joined it once, and “it” never bothered me with any kind of pushed information … I wasn’t active, and so there sits the membership.

    I believe I’ve also seen this feature on AOL’s LiveJournal ?

    Ming’s seems to work quite well … don’t know that i’ve ever observed any machine-generated comment spam on his blog.

    The software also has a very nice way of displaying related articles in a given category below the comments.

  9. Yeah, I get blasted with comment spam on both of my blogs (movie reviews & FreeBSD). I use MT-Blacklist, which is a wonderful way to get around comment spam. For MT 2.x, you can get the older one, which at least lets you delete things in a hurry.

    For MT 3.x, which I upgraded to just to get the latest MT-Blacklist, it adds a couple of other checks and generally comment spam never actually shows up. It gets put into a “moderation queue” and then you can again, do a whole batch every now and then.

    http://www.jayallen.org/projects/mt-blacklist

  10. I use Jay Allen’s mt-blacklist also. In fact, I rely on it. But it’s better at letting me clean up the spam than at preventing it.

  11. My new blog (BigMediaBlog.com) uses MT 3.121 and requires a TypeKey ID to post comments. Because of that, I don’t expect any comments spam. However, I can see that as a disincentive for even the few people who’ve so far visited it to leave comments.

    command-post.org uses TypeKey too, and the same small group of people leave occasionally content-free comments. I’d imagine if comments were open there would be more of them and from more people. So, TypeKey might hurt more than it helps.

    At Lonewacko.com I use MT 2.21. I’ve received thousands of comments spams because I don’t use TypeKey or Mt-blacklist, neither of which work with that version.

    I tried a super-secret hidden field, but that didn’t work for the reason described above.

    I wrote a Java program to delete comments I specified based on URL, but that became too cumbersome.

    Now I log into MySQL and run two scripts. One shows me the latest comments and the distinct URLs. The other deletes comments based on their URLs or comment text. I put commands into the second script based on the list of distinct URLs. In effect, my own form of an after-the-spam MT-blacklist.

    I also modified [installdir]/lib/MT/App/Comments.pm to reject comments based on their URL. I only did that for a small number of specific spam terms, not for every URL. This requires more work than a completely automated system, but it’s not that difficult. I still get a lot of spam but it’s a fairly quick ritual to remove it. The only major problem is occasional spam floods that corrupt MySQL’s tables or make it crash. The latest version of MT has something named throttle, which I assume only allows a certain number of comments in a certain time, but I haven’t investigated that.

  12. One thing that is similar to the “Enter the number 1” approach is simple to have a “I am Human” checkbox. Since it is not implemented on an MT wide basis, if you could do it yourself, chances are they would have to target your blog specifically, as another commenter mentioned. Most likely, they won’t do that, and you could save yourself a lot of spam.

  13. You have your answer in the previous posts, but I have a deeper question to raise. I use WordPress and I had to implement the CAPTCHA because of Spam as well. After thinking about it for a while, can’t we do some thing to keep them from doing it. I realize our sites are available worldwide, but there should be something we can do at least in our country to keep their crap off our servers. If someone misuses my site, especially if I warn them that the comments are for users only. I should be able to take some kind of action against them or be part of a class action. In the physical space, someone cannot walk into your business and post a sign to their business in your window. Same goes for Spam using other peoples domains. I own shaff.com but spammers use fake emails @shaff.com when they spam the public. Even though I didn’t send the spam I can get on blacklists. Spammers shouldn’t be able to spoof identities. Some large sites have sued and won, but small domain owners don’t have any real options. Sorry to go off topic.

  14. Just another field report on simple and efficient spam blocking: I use the “please enter this fixed value in the next box” solution with some successm but not with total success.
    Within 5 days of implementation the first spammer took the time to enter spam manually, and I actually had a spammer who bothered to automate specifically against my weblog after only 3 weeks of operation of the new ‘I am human’ field.
    To remove the incentive to do that I also audit all comments, so that the only thing a spammer is able to do is send me personally 1000 spammy comment notifications, not generate false comments. I read all comments before they’re allowed to post.
    I don’t get more legitimate comments than I can manage to audit, but it’s obviously not feasible for everyone. And again, my experience indicates that people WILL automate against too simple captchas.

    As an aside: Your “commenters must preview” idea likely has no effect – spammers attack the http://www.hyperorg.com/movabletype/mt-comments.cgi script directly, which seems to have the post option enabled.

  15. I use MTCloseComments to auto close comments after 7 days. That seems to eliminate 99% of comment spam.

  16. Comment spam

    Wow, I got hit pretty hard today with comment spam. Hmm, I might have to implement some protection. Perhaps using Jeff Atwood’s CAPTCHA control.

    Apparently CAPTCHA is not foolproof — Casey Chestnut did a proof-of-concept for defeating CAPTCHA…

  17. Hi, I had also SPAM problem and I installed mt-blacklist! It’s a great comment-spam-filter! I use it for weeks and am perfectly happy with it.

Leave a Reply

Comments (RSS).  RSS icon