After months of absence, Typosphere has returned from the dead!
We migrated off of Planet Argon and onto DreamHost, where we should have more control. We also upgraded to Trac 0.10.3 and turned off anonymous editing (users now have to register to file a ticket). This should (hopefully) prevent the issue that lead to Typosphere dying in the first place.
One important thing to note is that as part of this process, we also moved the subversion repository. Unfortunately, the old repository was hosted as an svn:// URI using the typosphere.org domain, which meant there was no way to preserve this URI (since we can’t run long-lived background daemons on DreamHost). The new URI uses http and a new subdomain, so if necessary we can move the repository without moving the website.
The new repository URL is http://svn.typosphere.org/typo/trunk.
The issue, as near as I can tell, is Typosphere started getting spammed massively. At this time none of the developers (and that includes me) was really paying attention to Typo, as we were busy with other things. So for about a month Typosphere Trac got so full of spam that, well, it was more spam in one location than I’ve ever seen in the rest of my life. This managed to trip a bug in Trac that caused it to start sucking CPU and RAM, and so Planet Argon turned off Trac for our account.
Some time later, the other developers and I started trying to resurrect Typosphere. Unfortunately, at about this time the systems administrator for Planet Argon was preparing to leave the company, so any attempts at contacting him to resolve the issue went unanswered. I eventually called Planet Argon (which is how I learned that the systems administrator had, in fact, left that very day) and spoke to the new systems administrator. He agreed to try and fix Trac for us, but after hearing nothing for a few days, I decided it would be better to seek hosting elsewhere.
Luckily, I had access to a DreamHost account with plenty of spare bandwidth and disk space, so we decided to move there. For the most part the migration went smoothly, until I started up Trac and discovered exactly how much spam was in there.
This problem stumped me for about 2 weeks. I spent several hours trying to clean it by hand one day, and after those several hours I couldn’t tell the difference. So, yesterday, I finally sat down to try and solve the problem.
With the help of the fine folks on the #trac IRC channel, especially coderanger, I wrote a script which deleted every single ticket change after a certain timestamp (corresponding to the first spam comment). Unfortunately, there were probably a handful of legitimate changes lost, but there really was no other alternative. In any case, this script worked flawlessly, and Trac was de-spammed. To prevent this from happening in the future, I turned off anonymous editing and installed a plugin which allows users to register for an account. Hopefully the requirement of registration will block most spam.
There was one interesting aspect to this that puzzled me until yesterday. The vast majority of the spam I saw contained
words that I had placed into the blacklist ages ago. I couldn’t figure out why the spam protection wasn’t working.
And then yesterday I discovered the reason. The blacklist is kept on a page called BadContent. The first code block
on that page consists of regular expressions, one per line, that each match a blacklisted expression. Unfortunately,
I forgot to mark this page read-only. So what happened was one of the random spam attempts happened to target this
page. The spammer replaced the content with his own code block containing a vast number of
<a href> tags linking
to spammy websites. This had the effect of replacing the entire blacklist with a bunch of regular expressions matching
<a href> tags. This meant that all of the stuff that was previously blacklisted was no longer being blocked, opening
the floodgates for all sorts of spam.