Code documentation systems

There is an automated documentation system for the MediaWiki code, that produces online documentation from the comments and tags in the code.

This system uses a open-source documentation system called doxygen ; There are other open-source systems too, such as phpDocumentor, which I think MediaWiki might have used previously. However, there is significant amount in common to what both systems understand, if you use the fairly neutral documentation style that most MediaWiki code does.

So anyway, I was looking at phpDocumentor out of curiosity, and ran it over MediaWiki, and it gave a list of errors or warnings about MediaWiki’s documentation, so I fixed some of those, whilst checking the Doxygen documentation to try and ensure that I wasn’t stuffing things up for everyone else. What happened next when people saw this can (with much poetic license) be described by the following diagram (an arrow between two blue things = something I was trying to change; an arrow between a person and something = their position, or what’s on their mind) :

Documentation fun

Oh, and just in case anyone takes this too seriously, creating this diagram was actually just a glorified excuse for me to experiment with Inkscape, something I had been meaning to do for a few weeks. If you’re looking for a vector-based open-source drawing package, it seems pretty good to me, and works fine for me on Windows (as well as being available for Mac OS X and Linux), so give it a whirl if you haven’t already.

Five new language incubator wiki planets

I’ve added 5 new planet incubators for various languages that people have shown an interest in. They are:

Polish as been omitted as there is already a Polish Wikipedia planet.

To be clear:

  • These planets are run independently, and are not official. However, there is no competition here: I view these new planets as incubators, where feeds can be added and the planets can make a start. I will HTTP redirect these 5 new planets to the Planet Wikimedia equivalents as and when they come into existence. I’m happy to host them in the short or medium term, but ultimately I want for them to find a home with Wikimedia. Therefore, please add feeds according to Wikimedia’s directions (e.g. the blog author must request or explicitly permit feed inclusion), so that there can be a seamless transition using the same list of feeds later on.
  • It could be really super if there was an easier way for non-technical folks to manage their own feeds, without involving the developers. Editing the config.ini through subversion seems like a potential barrier to me (the turnaround time is a bit slower, I’m dubious about whether it scales, it requires too much technical knowledge, and it’s unclear what happens if the maintainer goes AWOL). In particular, it’s a problem for French, Portuguese, and Russian (where nobody has stepped forward offering to maintain the config files), and the incubator exists primarily for these 3 languages, so that they can start now.
  • I don’t speak any of these new languages (other than French at the level of a 2-year-old, which doesn’t count) – so please manage the feeds amongst yourselves, and please play nice together.

Good luck, and have fun!

So this guy broke into our home, whilst we were there…

My girlfriend and I were making toasted sandwiches for lunch, in the kitchen of our apartment. And there was this strange clicking sound: “click, click, click, click”. I thought it was nearby, but thought it was coming from the outside, as the block we’re in gets a gardener to come in once a month to mow the communal lawn, trims plants, and so forth, and I just assumed it was something to do with that. Then we heard it again, and it sounded much closer: “click, click, click, click, click, click, click” … and as we looked at each other, I said: “What the hell IS that?” It seemed to be coming from towards the front door, so we both walked towards it … and we found this 40-year-old 6-foot-tall complete stranger standing inside our apartment, having just picked open the deadlock on our front door.

  • Us: “Are you right?” (If that’s an Australianism, it’s roughly the equivalent of saying “Can I help you?”, in a sharp and sarcastic tone).
  • Him: “Oh, you weren’t supposed to be here. I’m here to change the locks.” (At this point we notice he’s wearing a shirt with the name of some locksmith company on it)
  • Us: “Umm …. why? We haven’t contacted any locksmiths.”
  • Him: “Err… Is this <such-and-such-address>?”
  • Us: “No. It’s <such-and-such-address>” (which is similar to the address he gave, but definitely different)
  • Him: “Oh! That explains why the key they gave me didn’t work. Right.” This was promptly followed by a very long and profuse apology.

After this, as he left, I watched where he went (ready to call the police in case he was a well-prepared thief, with a plausible-sounding back-story ready to go), and he did go to the correct address, and I saw him changing the locks there later in the day.

Of course, the real worry would have been if we weren’t there, and he had changed all the locks, and left. Can you imagine, standing at your front door, trying to get in: “I’m sure this is my home. And I’m sure this is my key. Why doesn’t it work? What’s going on? Help!”

So, points to remember:

  • A clicking sound coming from your front door is a bad sign.
  • No matter how expensive your locks are (and this lock was from a reputable brand, and was not cheap), someone who knows what they’re doing can pick it in about 30 seconds.
  • If you’re a locksmith, you really need to check that you’ve got the right address before breaking in.

Citizendium read-restrictions & quick technical notes

Is it just me, or can anyone else not read the new Citizendium site without logging in, including the following pages:

Having to login to edit, that I can understand (given the vandalism problems, and that it’s a pilot), but surely the above pages at the very least should be open for public viewing? Surely this stuff is “need-to-know” information for potential editors trying to determine if the project is for them?

I find hiding that stuff rather curious, and it made me curious about what else they’re doing. A couple of tech notes after a quick bit of poking around the new site, but not logging in :

As a personal opinion, at the very least I think they should allow anon access to Special:Export for pages in the main namespace, as well as the list of recent changes, because that way information can be shared in a two-way street between the Wikipedia and Citizendium. That would certainly be an interesting bot project for someone (to keep the two sites in sync, and flag those edits that cannot be automatically synced for human review), and if it results in better quality articles for either or both sites, then I’m all for it.

Also, there is a concept of interwiki links in MediaWiki (which makes linking to a special list of external sites easier). It could be nice to have the citizendium included in this list by default, if their articles are publicly accessible. Heck, if they come to the party, and share what they’re doing in an open way, I’ll even add it myself! (Of course, it may get reverted by someone else, but that’s up to them, not me).

Email bug wastes 4 days

Jeez, I hate software sometimes. At the start of Wednesday morning last week, Outlook 2000 ate my email, and it took me until the end of Saturday afternoon (4 frustrating days) to recover most of it.

The bug:

  • Specifically, there is a data corruption bug: when you reach around 2 gigabytes of data, Outlook will corrupt its PST mail store file, so that it can no longer read it (which is what happened last Wednesday morning). There is a tool though for repairing corruption in that file, which is shipped with Outlook. However, it dies with a cryptic 8-digit error code when you try to repair one of these 2-gigabyte files. So, not only does Outlook reproducibly corrupt its data (thus making all email, notes, calendar items, to-do items, etc in that file inaccessible), there is simply no way to recover the data using the tools that the product ships with.
  • Microsoft are aware of the problem, and they have a “solution” of sorts: chop off the end of the file using a tool (thus losing any data contained in the removed portion), run the recovery tool again, and try to recover your data. They claim that you can just cut off 25 to 50 megabytes, but this is incorrect – and the reason it is incorrect is that the repair process significant increases the file size, which can easily cause the recovered file to exceed 2 gigabytes, thus causing the recovery tool to fail again. By a process of trial and error (and each attempt took around 3 hours of non-stop hard-disk thrashing to succeed or fail), I was able to find that trimming 355 megabytes of mail (thus deleting around 17% of my data) would make the tool run successfully, whilst just avoiding the 2-gigabyte limit.

So, what do we learn about software in general from this? For me, these were the most important points:

  • Test your software. First and foremost, this bug represents a failure to test (because I doubt anyone actually intended for data corruption to happen). The minimal test case for this would have been: Create a new email, attach a 100 MB file, save the draft email, and close the email item: Repeat 25 times; Close program, open program, verify that the program opens without errors. This test (verifying that you can read your own data) would have demonstrated data corruption, and it does not require sending any email, or talking to the network at all – so would have been a comparatively simple test to preform.
  • Fix severe problems quickly. This problem existed in Outlook 97 through to Outlook 2002, which (from the list of office versions) meant it was in the latest-and-greatest editions of office from Dec 1996 through to Nov 2003 (i.e. 7 years). Seven years is way too long have a severe data corruption bug like this.
  • Test your fixes. There is an update that is claimed to “prevent Outlook from allowing the .PST file to exceed the 2 GB maximum size”. Since I has this update installed at the time, all I can say is that the fix seems broken to me.
  • Fix severe problems in multiple ways. Bugs happen: I’ve certainly made plenty of mistakes, including ones that lost data. For the most severe ones though, I try to make it a point to fix them in multiple ways, at every point where I have made a bad assumption, or where I could be checking the passed data obeys certain constraints. My personal record is fixing a logic bug in 4 different ways, although 3-way fixes are slightly more common, and 2-way fixes are fairly standard – and when you fix bugs thoroughly like this, you never see them again. It’s the same in aviation, where they use the Swiss Cheese model – which basically says that for any accident, there are usually many cumulative failures, and you need to fix all of them to stop the same mistake happening again. Now, this particular email bug was not one bug: rather, it was two (at the very least). The first bug is that corruption occurs. The second bug is that when corruption occurs, you cannot recover from it. Microsoft only attempted to fix the first bug (and failed). If they had attempted to fix the second bug (making data recovery work), and succeeded, it would have been much less of a problem. I even tried to recover the data in a virtual machine, running the free 60-day trial edition of Outlook 2007 (the latest version), which you can download from the Microsoft site. It didn’t work, and the recovery tool still failed to recover data. As a result, this bug (making data recovery work for oversized PST files) is still unfixed in the latest edition of office (and has been present for 10.5 years now, and counting).
  • Automate the backup of your data. My last backup of email data was from May 2005. Backing up my data was on my TODO list, however that TODO list was stored in the very file that got corrupted (yay, irony!). I’ve come to the realisation that if it’s not automated, I probably won’t back up my data – and I suspect most people are the same. I’m currently planning to buy one of those small network-drive devices that runs Linux, install samba, and script it to once a day delete the oldest backups until there is 10 gig of free space, then make a copy of yesterday’s data, and then reach out across the network and rsync yesterday’s data remotely against the current data, and then share out all my data as read-only and password protected. This way, I should be able get back to a previous state from any day in the last 30 days; and even if I get a severe virus or accidentally try to delete my backup, it can’t be deleted or altered because it’s read-only.
  • Proprietary data formats suck. If my email had been stored in mbox format, I would have able to open it another app, even when Outlook could not. If my notes were in text files, I would have been able to open them in a text editor. If my calendar items were in iCalendar format, I could have been able to import them into other calendaring software. As it was, the data was in a proprietary format, so none of these things were possible. However, because of the point below, it’s still not clear what to do about that:
  • There still appears to be no email plus Personal Information manager open-source killer-app. Despite everything, there still seems to be no free email app that’s doing what Firefox did for web browsers, or what OpenOffice is doing now for word processing and spreadsheets: Provide a fully-featured, cross-platform, kick-arse implementation, that’s easy to switch to. There are many email clients, but none that seem suitable replacements. Mozilla Thunderbird for example is clear that “It is not a personal information manager” – which is fine, but not what I’m looking for (although having builds available for Windows, Mac and Linux is a definite plus, because I want the option to change platforms at any point, and bring all my data and favourite apps with me). However, I’m looking for email + calendaring + TODO task lists + notes, well integrated in one app, rather than 4 separate apps. The closest candidate seems to be Novell Evolution, which feature-wise seems closest, but which is limited in two regards: 1) It’s part of the GNOME desktop, thus philosophically tied somewhat to a single platform, and not officially available for Windows (there are people working on a Windows port, but builds happen on an ad-hoc basis, rather than being a first-class citizens with automated nightly builds like Firefox), and 2) there seems to be no mechanism for importing data from Outlook (originally requested, twice 5 years ago), which is a missed opportunity because Outlook is a pretty popular email client, and I’m sure a lot of people and corporations would switch to an open-source app if there was a compelling pathway for them to do so.
  • If you run Outlook, check now that your PST file is significantly smaller than 2 gigabytes. If it’s getting close, take action now.

Simplification: Two planets rather than three

Walter asked the excellent question:

The difference between “open.wikiblogplanet.com” and “wikiblogplanet.com” is not clear to me. Yes, the open version can be edited online on the wiki and the other not. But the two seem nearly identical. What is benefit of using the non-open version? Why does the non-open version exist?

The answer is that the open one was originally an experiment, with the non-open one as a fallback in case the open one flopped. It’s not perfect, but the open one seems to be working well and expanding quickly (e.g. currently up to around 53 feeds versus 36 for the non-open one), and having two planets that looked so similar was just confusing for readers.

So, as of 5 minutes ago, http://wikiblogplanet.com was set to redirect to http://open.wikiblogplanet.com ; This means that the non-open planet is gone, and there is now only the open planet, which should hopefully eliminate any confusion, whilst keeping all the good stuff. Any old bookmarks you have will still work okay (you’ll get redirected), so you don’t have to do anything (unless you’re grabbing an ATOM feed from the planet, in which case you just need to add “open.” to the start of the URL).

The result is that we’re now back to two planets, each with different takes: Open Wiki Blog Planet (more flavourful, more feeds, you can add and edit the feeds directly, more prone to temporarily blowing up due to a bad config file or shared webhost downtime), and Planet Wikimedia (more focussed, slightly fewer feeds, you request to be added by a dev, very unlikely to blow up).

Also, you can now have add a hackergotchi, avatar, logo, or icon to your feed of blog entries on Open Wiki Blog Planet, if you want to. You just need the URL to the image on the web (example of adding avatar).

For the Planet Wikimedia folks, suggest editing the Planet’s “index.html.tmpl” file, and changing this line:

<img class=”face” src=”images/<TMPL_VAR channel_face ESCAPE=”HTML”>” width ….

to:

<img class=”face” src=”<TMPL_VAR channel_face ESCAPE=”HTML”>” width ….

That way we can both use the same images on the web, rather than each downloading images to our separate “images/” directories, and we can just copy-and-paste face / facewidth / faceheight lines between our config files.

First Post, Wiki-related Planets, and a new very experimental Planet

I suppose I should try this whole blog thing out – heck, all the cool kids have one! :-)

So anyway, within the space of few days we’ve gone from no Planets, to two: WikiBlogPlanet and Planet Wikimedia (Yay! Welcome!)

Whilst this was going on, I exchanged a few private emails with Erik Moeller – in short, he wanted a more focussed planet (which is where Planet Wikimedia comes in), and I wanted a broader planet that included the focussed stuff, but which had more voices and which was more able to go wherever the flow took it … but the whole exchange gave me this nagging thought: “I’m sure there are other people who want other stuff too, and some of that could be really good stuff, which I just don’t know about”. And at the same time I was getting a some emails about WikiBlogPlanet asking that I use a different feed for some blogs, that don’t include some blogs because they were semi-private or not wiki-related, that I use a certain avatar as a Hackergotchi, that maybe I should include this group feed (and even though it seemed relevant, I wasn’t sure if I should or not, because it seemed to include some stuff from one of the blogs that had asked not to be included), can I please add this blog, and so on.

So all of the above was in my head, and it just made me think: “WHY am I doing this?”. I don’t mean: “why am I doing WikiBlogPlanet” – I get that (entertainment value, of course!) – but “Why am I doing this administrative crap? Why should I be the one to decide which feeds are and are not included? Who am I to say what is and is not relevant? What if I’m on holiday, or hit by a bus, or my net connection goes down for a week – why should people have to wait for me to get back up to speed, in order for something to change? What if (even though I try not to be) I’m biased in my views of what feeds to include – in short, Who watches the watchers?” The current approach of a single administrator editing a text file on a server just seems so centralised, so top-down command-and-control, and so un-“We the Media”, and fundamentally un-wiki-like.

So although you’re probably just getting used to two planets, this is a good time to mix-it-up a little, and throw a wacky third planet out there, just to see what happens. This planet however will have a special twist: it’s totally open – it’ll be a planet that contains whatever feeds the community wants it to contain, without involving me or requiring me to do anything. (Yay laziness!)

Brion queried whether there was a web administration interface to PlanetPlanet, and it’s a good question, but it raises other questions: “Who does the administrating? Who gets to add / edit / delete feeds? How do they decide what to include? And if anyone can edit feeds, what about vandalism? If only a handful of people can edit feeds, what about the good ideas that get excluded?”

The upshot is I’m going to try another approach, one what may very well be described as crazy, insane, and absolutely nuts. What I’m going to do is set up a new planet (at http://open.wikiblogplanet.com), which pulls its configuration file directly off of the Wikipedia (located here), that tells it what feeds to include, and then generates the planet. The configuration file will be open for people to edit as they see fit, so you can add feeds, edit feeds, remove feeds, and you don’t need to involve me. Don’t ask, just do. In fact it’s a design goal: I don’t particularly want to be involved in deciding what gets included, I more want to read the result. The usual Wikipedia editing rules about assuming good faith, reverting blatant vandalism, 3RR, and so on, will also apply also to the configuration file. The configuration file itself is very simple (and are a few comments in there to explain stuff), so if it sounds complicated, don’t worry, it honestly isn’t. And if people could try and keep the configuration file valid so that the site still works, well, then that would be really super. And to try and make it a little harder for jerks to put bad stuff in the configuration file, I’m going to request that it be marked as semi-protected (so that only established accounts can edit it).

And of course security is a big concern if people can edit the config file – and for the record, I haven’t even attempted to make this secure – so for the time being, it’s operating on the honor system, although on the server side it’ll be using a separate account, with no confidential data, with no shell or a very locked-down shell, and no access to any other sites, and with a hard disk quota limit … so if it blows up, it should only take out this one site.

Maybe it will be vandalised, maybe it won’t. Maybe it’ll be such a complete shambles that it’ll have to be shut down, or maybe it won’t. Maybe it’ll fly, or maybe it’ll crash and burn! It’s basically an experiment, and a leap-of-faith … or possibly more accurately, a swan-dive into the abyss.

Anyway, it’s already up and running now at open.WikiBlogPlanet.com, and to kick it off, it’s using the same configuration file as the standard WikiBlogPlanet.com site, but you can now edit and update the feeds, to move it in whatever direction you want it to go. So have fun! And like they say in the movies, “where we go from here, is up to you!”