After months of absence, Typosphere has returned from the dead!

We migrated off of Planet Argon and onto DreamHost, where we should have more control. We also upgraded to Trac 0.10.3 and turned off anonymous editing (users now have to register to file a ticket). This should (hopefully) prevent the issue that lead to Typosphere dying in the first place.

One important thing to note is that as part of this process, we also moved the subversion repository. Unfortunately, the old repository was hosted as an svn:// URI using the typosphere.org domain, which meant there was no way to preserve this URI (since we can’t run long-lived background daemons on DreamHost). The new URI uses http and a new subdomain, so if necessary we can move the repository without moving the website.

The new repository URL is http://svn.typosphere.org/typo/trunk.

The issue, as near as I can tell, is Typosphere started getting spammed massively. At this time none of the developers (and that includes me) was really paying attention to Typo, as we were busy with other things. So for about a month Typosphere Trac got so full of spam that, well, it was more spam in one location than I’ve ever seen in the rest of my life. This managed to trip a bug in Trac that caused it to start sucking CPU and RAM, and so Planet Argon turned off Trac for our account.

Some time later, the other developers and I started trying to resurrect Typosphere. Unfortunately, at about this time the systems administrator for Planet Argon was preparing to leave the company, so any attempts at contacting him to resolve the issue went unanswered. I eventually called Planet Argon (which is how I learned that the systems administrator had, in fact, left that very day) and spoke to the new systems administrator. He agreed to try and fix Trac for us, but after hearing nothing for a few days, I decided it would be better to seek hosting elsewhere.

Luckily, I had access to a DreamHost account with plenty of spare bandwidth and disk space, so we decided to move there. For the most part the migration went smoothly, until I started up Trac and discovered exactly how much spam was in there.

This problem stumped me for about 2 weeks. I spent several hours trying to clean it by hand one day, and after those several hours I couldn’t tell the difference. So, yesterday, I finally sat down to try and solve the problem.

With the help of the fine folks on the #trac IRC channel, especially coderanger, I wrote a script which deleted every single ticket change after a certain timestamp (corresponding to the first spam comment). Unfortunately, there were probably a handful of legitimate changes lost, but there really was no other alternative. In any case, this script worked flawlessly, and Trac was de-spammed. To prevent this from happening in the future, I turned off anonymous editing and installed a plugin which allows users to register for an account. Hopefully the requirement of registration will block most spam.

There was one interesting aspect to this that puzzled me until yesterday. The vast majority of the spam I saw contained words that I had placed into the blacklist ages ago. I couldn’t figure out why the spam protection wasn’t working. And then yesterday I discovered the reason. The blacklist is kept on a page called BadContent. The first code block on that page consists of regular expressions, one per line, that each match a blacklisted expression. Unfortunately, I forgot to mark this page read-only. So what happened was one of the random spam attempts happened to target this page. The spammer replaced the content with his own code block containing a vast number of <a href> tags linking to spammy websites. This had the effect of replacing the entire blacklist with a bunch of regular expressions matching <a href> tags. This meant that all of the stuff that was previously blacklisted was no longer being blocked, opening the floodgates for all sorts of spam.

3 Responses to “The Resurrection of Typosphere”
  1. I had been wondering what ever happened to Typo. Thanks for letting us know.

    Also, please update us with how well Typo is running on DH. That’s where I’m probly going to be hosting BundleForge (if I ever get around to it).

  2. Disk full? When try to register:

    Traceback (most recent call last):
    File “/home/typosphere/packages/lib/python2.3/site-packages/trac/web/main.py”, line 387, in dispatch_request
    dispatcher.dispatch(req)
    File “/home/typosphere/packages/lib/python2.3/site-packages/trac/web/main.py”, line 237, in dispatch
    resp = chosen_handler.process_request(req)
    File “build/bdist.linux-i686/egg/acct_mgr/web_ui.py”, line 302, in process_request
    File “build/bdist.linux-i686/egg/acct_mgr/web_ui.py”, line 53, in _create_user
    File “build/bdist.linux-i686/egg/acct_mgr/api.py”, line 98, in set_password
    File “build/bdist.linux-i686/egg/acct_mgr/htfile.py”, line 70, in set_password
    File “build/bdist.linux-i686/egg/acct_mgr/htfile.py”, line 104, in _update_file
    File “/usr/lib/python2.3/fileinput.py”, line 231, in next
    line = self.readline()
    File “/usr/lib/python2.3/fileinput.py”, line 309, in readline
    perm)
    OSError: [Errno 122] Disk quota exceeded: ‘/home/typosphere/trac_sites/trac.htpasswd’

  3. Kevin Ballard says:

    Oh wow, thanks. That’s fixed now. Turns out DreamHost has core dumps enabled, so every time trac crashed it left a big core lying around, which piled up over time.