Migrating email from Outlook to Evolution: Linux’s final frontier

At various times in Linux’s history, various things about Linux have really sucked:

  • Getting hardware to work used to really suck, and you used to have to patch the kernel and recompile your own kernel… and then the kernel got a lot better, and the hardware support got a lot better, and I haven’t had to recompiled a kernel in years, and I’m happier because it largely “just works”.
  • Setting up printers used to really suck, with stuffing around with printcap files, and printer configurations, and desperately trying to get it to work … and then cups and printer detection improved, and now it’s generally all painless and point-and-click to install a printer, and I’m happier because it largely “just works”.

  • Setting up X-windows used to really suck, with editing X config files and mucking around with modelines… and then monitor detection got better, and I haven’t had to do anything with a X config file in years, and I’m happier because it largely “just works”.

  • Getting your Microsoft Office documents migrated from Windows to Linux used to really suck … and then Open Office came along, and it does a fairly good job of importing Office’s documents, and I’m happier because it largely “just works”.

This week I came to realize that there’s one last frontier remaining, where Linux still really sucks. And that frontier is migrating from a Windows graphical email client (Outlook in this case) to a Linux graphical email client (Evolution in this case). It does not “just work” … not at all. This blog entry will now explain why I say this.

Earlier this week, my main hard disk on my main desktop machine (a Windows machine) died, with a horrible repetitive clicking and grinding sound. By sheer random good luck, I had backed up all my data onto an external hard disk about one hour before this happened, plus I had a brand new machine ready to go on which I was considering trying Linux anyway. It looked like the stars were in alignment: Linux on the desktop of my main machine, here we come!

Installing and configuring Ubuntu was totally painless. (If you care, the exact steps followed are here: http://nickj.org/Ubuntu_8.04.1_desktop_setup_steps ). And there are a lot of things that I’m really liking with this new operating system, including the following:

  • Complete hardware support: My old Windows install could not detect & use all the cores of my CPU (Windows 2000 Pro does not detect / use a Quad core CPU, whereas Ubuntu 8.04.1 does). In Windows, I needed special drivers for my sound card and my mouse and my keyboard and video card, all of which had to be manually installed or downloaded. In Ubuntu 8.04.1, all my hardware “just works”, or it offers to install restricted drivers (for the video card), and then it “just works”. Nice.
  • The ease of installing and uninstalling software (synaptic + aptitude). Having a well-integrated package manager for installing and uninstalling everything is most pleasant.
  • The way when I play a video, and don’t have the right codec installed, it will offer to download and install the needed codecs for me, and once it’s finished (which typically takes all of 20 seconds), it will play the video. That’s really nice, and sure beats having to manually work out the right video codec to install. Nice one.

But the migration of data from Outlook 2000 to Evolution was shockingly, appallingly bad. There was a lot of misinformation on the web about approaches that should work, but which actually had data loss, and in the end, it took me four full frustrating days to get most of my personal data moved across. The whole migration was so painful that it has left me quite annoyed. When I hear people bandy around phrases like “the year of the Linux desktop”, I simply makes me think: Dream on! This is not the year of the Linux desktop. This is not even the fucking decade of the Linux desktop. Come back in 2011, at the earliest. And when I hear Evolution described as “an Outlook killer”, I can only laugh. To be an “XYZ killer”, you have to do everything that XYZ did, but better, AND you have to be able to import XYZ’s data. Microsoft Word was a WordPerfect killer, because it did what WordPerfect did, but it did it better (in a WYSIWYG way), and it imported WordPerfect’s data. Same for Excel versus Lotus 1-2-3. Same for Firefox versus Internet Explorer. But Evolution does NOT import data from Outlook (for any meaningful definition of the word “import”), so BY DEFINITION it simply cannot be an Outlook killer. And that’s before even getting to the fact that Evolution is not as feature-rich, nor as mature, nor as user-friendly, nor as bug-free as Outlook.

Understand that I’m no Microsoft apologist – I will happily use FOSS, if it’s as good or better.

However, if you are a masochist, if you enjoy pushing hot needles under your fingernails, then it is possible to make the transition. I evidently fall into this category, because I was too stupid or too stubborn to just give up. So here’s how you too can do the same, but I’m warning you straight up, it is not pretty, and it is not easy, and it is not quick.

And for anyone who says it is easy, allow me to enumerate some of the relevant facts:

  • I have/had 4 PST files, not one, like most people (i.e. the data is spread across 4 files, because Outlook barfs when a PST’s size approaches 2 Gb, so I needed to spread it out across multiple files to prevent this).
  • I have/had around 10 years worth of data. All of that data, every single bit of it, needs to come with me. This point is non-negotiable.
  • I use/used all the features of Outlook, apart from Journaling. That’s Email, Calendar, Contacts, Notes, and Tasks. Five categories of data, every single one of which I need. More details about each:
  1. Email. A total of between half a million and 600,000 emails, spread across a 692 nested email folders. These folders are categorized in a hierarchy to keep like mail grouped together. That’s 10 years of work/personal/hobby emails, sent and received and drafts, and this includes a collection of 40,000 spam emails received, kept to help with Bayesian training of spam versus ham.
  2. Contacts (roughly 550 contacts, some of which are just an email address, and some of which have complete details, spread across 9 nested folders).
  3. Notes (being used to store check-lists, passwords, etc., with 400 notes spread across 4 nested folders).
  4. To-do lists (being used to store information, check-lists, and list of bugs or wish-list items in various bits of software that I maintain, with 410 tasks spread across 26 nested folders).
  5. Calendar items (10 years of past events, and additional future events, recording both what happened, and predicted dates and deadlines for things that are going to happen).

So that’s the background. Here’s what does not work for migrating this data from Outlook to Evolution:

  1. Does not work: Export from Outlook, or getting Outlook to export it’s own data into some industry-standard file format. Ha! Have you ever looked at the File -> Import and Export section of Outlook? Remember, Microsoft are arrogant monopolist pricks, who have no vested interest in helping you move to anything else. So, we get just two export options, both of which are basically useless. Next option please.
  2. Does not work: Getting Evolution to read the PST files and import the data directly. It just doesn’t. Various feature requests for this has been open for the past 6 or 7 years, without any visible sign of progress. Move along, nothing to see here.
  3. Does not work: Readpst, which is an Ubuntu package, and which is apparently derived from libpst: On the very first PST I tried this with, it gave a series of warning about NULL pointers, gave a series of messages indicating that it wasn’t going to transfer everything anyway, and then proceeded to segfault. Clearly that’s not going to work.
  4. Does not work: Moving data from Outlook to Outlook Express, and then moving from Outlook Express into something else. Outlook Express only imports data from the main PST (ignoring the other files), and it loses data (converts all Contacts to mail items, converts all Tasks to mail items, losing many or most emails in the process). That’s right folks, even two Microsoft teams, who presumably work in the same building, can’t get their own email products to import data correctly from each other. Forget this.
  5. Does not work: Import into Thunderbird, which uses MBOX format, and then move the MBOX files onto the Linux box, and point Evolution at those. To do start this, you Install Thunderbird, and when you run it for the first time, choose “import from Outlook”, which will import the address book and your mail. I had high hopes for this option, it was very easy to use, and it seemed to work great … at first. However, it has a major problem: severe data-loss. Here’s an example: I have a folder that contains every bit of spam email I was ever sent. It’s useful for training spam detectors, and I found out, it’s also incredibly useful for testing migration tools for data integrity (because spammers send all kinds of weird formats, weird attachments, they ignore standards with impunity, etc.). In short, spam makes the perfect test case. This spam folder has 40,877 pieces of spam mail. How may bits of email do you think Thunderbird imported? The answer is 494 mail messages. That is a 99% data loss rate. Amazing. And there was not a single warning, not a single error – just completely silent 99% data loss. Now, I don’t care about losing my spam, but I do care very much about losing real data, and I had zero faith in Thunderbird at this point to migrate my data without data loss. So, ditch Thunderbird.
  6. Does not work: Migrate from Outlook PST to IMAP. Migrate using IMAP. Connect Outlook to an IMAP server on your LAN, copy everything there, connect Evolution to the same server, and copy or move everything from the IMAP server into Evolution. In theory, this should work great, and with a few test folders, it does. But the issue here is one of scalability: it seems to work fine on the simple stuff, but falls apart on the bigger stuff. Moving a single email would work fine. A single folder would work usually fine. But moving a hierarchy of folders with half a gigabyte of email would cause Outlook to start copying data, and then after about 10 minutes it would usually just get stuck, and then about 20 minutes later it would give a dialog box saying that the copy operation had failed. As a result, this option is unusable if you have substantial data, due (I suspect) to an Outlook IMAP bug. Other versions may work fine, but Outlook 2000 was buggy in this regard – for me, it kept hanging and could not completely transfer all of my data – and therefore it was, unfortunately, unsuitable for migrating data.

So what does this leave? At this stage, I thought I was out of options, and was tempted to just give up on Linux, and stick with Windows. Outport would move some of my data, but it would not move email, which is the largest and most complicated single component that I needed to move. Eventually I found the answer: O2M (which is a US$10 commercial product) for moving email + calendar + contact data, and Outport for tasks + notes, and 2 custom scripts I had to write to massage the O2M and Outport data into the correct format. Disclaimer: I don’t have any financial interest in O2M, I don’t know the people involved, and so forth – it simply the best tool that I could find for the job, and O2M does have problems too, but it’s problems are far less severe than the problems with the other methods.

Here is the link for the step-by-step details of how to migrate from Outlook 2000 to Evolution.