Venture Capitalists have a valuation for the Wikipedia

So in the course of a conversation last week, I heard second-hand that VCs apparently have a valuation for the Wikipedia: Your favourite encyclopedia, condensed down into a single dollar figure. I suspect it was a valuation for just the English Wikipedia, but I’m not sure. And the dollar amount? Drum-roll please: Four billion US dollars.

Now, when I heard this, the following thoughts went through my mind:

  • Amusement: Are you serious? VCs have assigned a dollar value to it? Why would they do that? What kind of warped world-view needs to reduce every single thing in existence down to a dollar value?
  • Bafflement: How did you arrive at that specific figure? Did someone indicate that they were willing to pay 3.5 billion for it, but you reckoned you push them a little bit higher? Or did you estimate the total hours that people have put into it, and then estimate the cost of paying people to reproduce it? Or some other method? Enquiring minds want to know!
  • Economic rationalist: Something is only worth what someone else will pay for it. If nobody will pay 4 billion, then as simple statement of fact, it is not worth 4 billion. So who would pay four billion for it?
  • Entrepreneurial: 4 billion? Tell you what, if there are any buyers out there desperately wanting to purchase the Wikipedia, I’ll sell you a Wikipedia clone for only 2 billion, with 10% payable up-front as a deposit. You save 2 billion: it’s a bargain, at half the price! And to set up the clone, I’d simply approach the techs who set up the original Wikipedia, set up a partnership company to divide all profits equally between all the participants, and set up a well-resourced wiki farm, on nice servers, in some nice data centres on a number of continents, load up the most recent dump of the Wikipedia, and purchase a live Wikipedia feed from the WMF to make sure it was completely up-to-date, and call it something like “encyclopedia 3.0”. I’m sure most of the techs would be happy with this (who doesn’t have a student loan to repay, or credit cards bills due, or want to take a holiday, or buy a bigger home, or a newer car, or want to buy some new gadget: whatever it is, everyone has something, and millions of dollars would buy a lot of it), and if there are purchasers, they should be happy too at the great price: so everybody wins!

Google Developer Day 2007

Went to the Google Developer Day 2007 yesterday. It was held at 9 locations worldwide. The Sydney one was the second-largest one, with 700 developers attending.

To summarize the event down to a sound bite, it was about Google APIs and mashups. (and I’m sort of hoping I won’t hear to word “mashup” again for at least a week…)

Here are my notes, and I have indented the bits that seemed to me to potentially be relevant or useful to MediaWiki or the Wikipedia:

  • All their APIS are at http://code.google.com/apis/
  • GData / Google Data APIs – provides simple read/write access via HTTP, and authentication with Google. See also S3 for a similar idea (docs here).
  • Google Web Toolkit. Also known as GWT, and pronounced “gwit”. Converts Java code to JavaScript, in a cross-browser compatible way. AJAX development can be painful because of browser compatibility problems. GWT is one solution to this problem. Licensed under Apache 2.0. Develop in any Java IDE, but recommend Eclipse. Launched a year ago (almost exactly). Using Java as source language because of its strong typing.
  • Google Gears, was presented by the creator of Greasemonkey. Gears is a browser plugin / extension, for IE + Firefox + Safari, that that allows web apps to run offline. I.e. can extend AJAX applications to run offline, with access to persistent local SQL data storage. Means users don’t always need to be online, as there is access to persistent offline storage for caching data from the web, and for caching writes back to the web. Released under BSD license. Uses an API that they want to become a standard. Idea is to increase reliability, increase performance, more convenient, and for all the times people are offline (which is most of the time for most people). It’s an early release, with rough edges. Use local storage as a buffer, and there is a seamless online-offline transition. For the demo he disconnected from the net. What talks to what: UI <–> Local Db <–> Sync <–> XmlHttpRequest. Gears has 3 modules – LocalServer (starts apps), Database (SQLlite local storage), and WorkerPool (provides non-blocking background JavaScript execution). WorkerPool was quite interesting to me – non-blocking execution, that overcomes a limitation of JS – different threads that don’t hog the CPU … really want the whole Firefox UI to use something like this, so that one CPU-hogging tab doesn’t cause the whole browser to choke.
    • Thoughts on how Gears could potentially be applied to MediaWiki: An offline browser and editor. Will sync your edits back when go online, or when the wiki recovers from a temporary failure. Could also cache some pages from the wiki (or all of them, on a small enough wiki) for future viewing. Basically, take your wiki with you, and have it shared – the best of both worlds.
  • Google Mapplets. Mapplets allows mashups inside of google maps, instead of being a google map inserted into a 3rd party web page. “Location is the great integrator for all information.” URL for preview. Can use KML or GeoRSS for exporting geographic information.
    • Thoughts on how to use this for the Wikipedia: Geotagging in more data could be good. E.g. Geotagging all free images.
  • Google Maps API overview. A lot of the maps.google.com development is done in Sydney. Talk involved showing lots of JS to centre google maps, pan maps, add routes, add markers, change marker zones, add custom controls, show / hide controls. A traffic overlay for showing road congestion is not available for Sydney yet, but will be available soon. Some applications of the Maps API: Walk Jog Run – see or plan walking or running routes – example; Set reminders for yourself ; Store and share bike routes.
    • Thoughts on applications for the Wikipedia: Perhaps a bot that tries to geolocate all the articles about locations in the world? Will take a freeform article name string, and convert to longitude + latitude, plus the certainty of the match (see page 16 of the talk for example of how to do this). Could get false matches – but could potentially be quite useful.
  • Google gadgets. Gadgets are XML content.
  • KML + Google Earth overview. KML = object model for the geographic representation of data. “80% of data has some locality, or connection to a specific point on the earth”. Googlebot searches and indexes KML. KML de facto standard, working towards making a real standard.
    • Already have a Wikipedia layer. It seems to be 3 months out of date, and based off of the data dumps though.

Misc stuff:

  • Google runs a Linux distro called Goobuntu (Google’s version of Ubuntu).
  • Summer of code – had “6338 applications from 3044 applicants for 102 open source projects and 1260 mentors selected 630 students from 456 schools in 90 countries”.
  • My friend Richard, one the organisers of Sydney BarCamp, spoke with some of the Google guys, & they were quite enthusiastic about maybe hosting the second Sydney BarCamp at a new floor they’re adding to Google’s offices in late July or early August. If that works out, it could be good … although if it could not clash with Wikimania, that would be better!
  • Frustration expressed by many people about the way the Australian govt tries to sell us our own data (that our tax dollars paid for in the first place), restricting application of that data. Example: census data. Much prefer the public domain approach taken in the US.

Wikipedia FS, plus quick steps for setting up

Wikipedia FS allows you to treat MediaWiki articles like files on a Linux system. You can view the wiki text, and edit the wiki text, all using your standard editors and command-line tools.

Steps for setting up

If you want to have a quick play with Wikipedia FS, here are the series of steps I used today on an Ubuntu 6.04 system to get it installed from scratch; Most of these you can probably cut-and-paste directly onto a test machine, and should hopefully work on any modern Ubuntu or Debian system:

# Load the FUSE module:
modprobe fuse

# Install the FUSE libs:
apt-get install libfuse-dev libfuse2

# Get, build, and install the python-fuse bindings:
cd /tmp
mkdir python-fuse
cd python-fuse/
wget http://wikipediafs.sourceforge.net/python-fuse-2.5.tar.bz2
bunzip2 python-fuse-2.5.tar.bz2
tar xf python-fuse-2.5.tar
cd python-fuse-2.5
apt-get install python-dev
python setup.py build
python setup.py install

# Get, build, and install WikipediaFS
cd /tmp
mkdir wikipediafs
cd wikipediafs/
wget http://optusnet.dl.sourceforge.net/sourceforge/wikipediafs/wikipediafs-0.2.tar.gz
tar xfz wikipediafs-0.2.tar.gz
cd wikipediafs-0.2
python setup.py install

# Install the “fusermount” program (if not installed):
# First “nano /etc/apt/sources.list”, and uncomment the line to enable installing from universe, e.g. uncomment the line like this: “deb http://au.archive.ubuntu.com/ubuntu/ dapper universe”
apt-get update
apt-get install fuse-utils

# Mount something:
mkdir ~/wfs/
mount.wikipediafs ~/wfs/

# The above line will create a default configuration file. Edit this now:
nano ~/.wikipediafs/config.xml
# You may want to create a throwaway login on the Wikipedia for testing wikipediaFS now. This
# will allow you to edit articles. Or you can use your main account instead of a test account,
# it’s up to you – but I preferred to set up a separate test account. Either way, keep note of your
# account details & password for the <site> block below.
#
# Now, uncomment the example FR Wikipedia site, and change to the English site. For example, the <site> block should look something like this:
# — START —
<site>
<dirname>wikipedia-en</dirname>
<host>en.wikipedia.org</host>
<basename>/w/index.php</basename>
<username>Your-test-account-username</username>
<password>whatever-your-password-is</password>
</site>
# —– END —–

# Then unmount and remount the filesystem to ensure the above gets applied:
fusermount -u ~/wfs/
mount.wikipediafs ~/wfs/

Using it

# Can now use the filesystem. For example:
cat ~/wfs/wikipedia-en/Japan | less
# This should show the text of the Japan article.

# See what’s in the sandbox:
cat ~/wfs/wikipedia-en/Wikipedia:Sandbox

# Now edit the sandbox:
nano ~/wfs/wikipedia-en/Wikipedia:Sandbox

# Change some stuff, save, wait about 10 seconds for the save to complete, and (fingers crossed) see if the changes worked.

Probably not production ready

I have to emphasise here that WikipediaFS does seem a little flaky. Sometimes it works, and sometimes it doesn’t. In particular, I wanted to reset the Wikipedia’s sandbox after playing with it from this wiki text:

{{test}}
<!– Hello! Feel free to try your formatting and editing skills below this line. As this page is for editing experiments, this page will automatically be cleaned every 12 hours. –>

To this wiki text:

{{Please leave this line alone (sandbox talk heading)}}
<!– Hello! Feel free to try your formatting and editing skills below this line. As this page is for editing experiments, this page will automatically be cleaned every 12 hours. –>

I.e. a one-line change. Simple, you say? … well, sorry, but you’d be wrong! In particular there seems to be something about “}}” and the character before it that caused my WikipediaFS installation to behave very strangely. It would look like an edit had saved successfully, but when you went back into editing the file, all the changes you made would be lost:

Save which VI thinks has succeeded, but which actually has failed.

No indication that the above save has failed, but actually it has.

So, I tried breaking the above change down into smaller and smaller pieces, until the save worked. Here are the pieces that I could get to work:

  1. Adding “{{Please leave this line alone (sandbox talk heading)” (note no closing “}}”, because it wouldn’t save successfully with that included)
  2. Deleting “{{“
  3. Deleting “t”
  4. Deleting “es”
  5. Deleting the newline
  6. At that stage, I simply could not successfully delete the final “t”, no matter want I did. Eventually I gave up doing this with Wikipedia FS, and deleted the “t” using my normal account and web browser.

So, although WikipediaFS is fun to play with, and is certainly an interesting idea, I do have to caution you that WikipediaFS may sometimes exhibit dataloss behaviour, and so in its current form you might not want to use it as your main editing method.

MediaWiki 1.10 out, next release thought to be start of July

MediaWiki 1.10 has been released – yay! From a brief email exchange with Brion, the release manager, the next release is currently planned for the start of July, so as to make up ground for the longer than normal 1.10 release cycle. This means that instead of the usual 3 months between releases, this time you may have around 1 month and 20 days (nearly half the usual time). So, if you wanted to integrate anything largish into the next release (MS SQL support could fall into this category, as could the rev_deleted work), starting sooner rather than later could be a good idea.

Third Sydney wiki meetup

Yesterday was the ANZAC day public holiday in Australia, and it was cold and rainy. I went into the city in the late afternoon for the third Sydney wiki Meetup. People took some group photos on the Town Hall steps, and then Chris Watkins had booked a table at the Grace Hotel, so we went there. Some quick notes:

  • General aggregate info about attendees: They were from a range of occupations (e.g. Psychiatrists, students, IT, unemployed, etc). The vast majority of people were adults, but of a wide variety of ages. Almost everyone there used his or her real name, or some minor variation on it, as his or her Wikipedia username (which I thought was interesting). Everyone was interested in or working on completely different stuff (law articles, bots / tools, theological articles, etc). Politics was discussed a little later in the day, and of the people who volunteered a political preference, most seemed to be centre-left and/or pro-green. Around 25 people turned up, around 10 more than last time, and around 6 of the people attending were at the last meetup. Of the 25 attendees, around 10 turned up a few minutes late, so waiting for 10 minutes beyond the advertised time at the meeting place for people who are running late was a very good idea.
  • Jimbo was there, in town for a rather exhausting whirlwind speaker tour of Australia’s capital cities. Following him around was the camera guy making the “Truth in Numbers” wiki-documentary. I suggested to Jimbo he should blog more (response: “but I update my blog so rarely that when I do people read so much into it”; counter-response: “so just write some really banal stuff for a while, and they’ll learn not to read too much into it”), and also suggested that he might want to add his blog to Planet Wikimedia (as opposed to the current paradoxical situation, where the unofficial WikiBlogPlanet carries his blog, yet the official Wikimedia one does not). He was an extremely nice, friendly, easygoing guy. So of course, we pressured him into trying a Tim Tam Slam (a.k.a. a “Tim Tam Straw”) … this is the “before” picture, the “after” picture was suitably messy:

  • Heard from One Salient Oversight, who sometimes creates userboxes, that just 7 days after being created, this is one of his most popular userboxes:

country-music.png

  • A geography Uni student is writing a thesis researching the Wikipedia, including why people participate (i.e. more focused from the user’s perspective). Geography is generally about spaces, and the Wikipedia is now covered under geography using a very post-modern interpretation of “spaces”, since it is “an online space”. Sounds interesting, and hopefully she will share the results.
  • Someone referred to Tim Starling as “our Tim”, on account of his Australian background, and despite currently being in the UK (I think). Of course, I should point out that Mel Gibson was always generally referred to here as “our Mel”, on account of his half-Australian background, despite living in the US for 20 years, right up until his drunken anti-Semitic tirade … at which point, he overnight became “their Mel”. So Tim, consider yourself forewarned: You + alcohol + a public place + rambling racist tirade + leaked police report = “their Tim” :-)
  • Heard about appropedia.org, which is a wiki for environmental issues and sustainable development.
  • Someone attending had previously got into a dispute over an article with another user, who (using the whois records) crossed the line, and posted the first person’s real name, home address, email address, and home phone number to the Wikipedia as a form of vengeance. Don’t do that – it’s really not nice, and it required a dev to go and permanently erase that edit from the history.
  • Enoch Lau told me about a big diagram of the MediaWiki database tables, which was a funny coincidence, because I drew it. Also, he wanted some additions to the MediaWiki API, particularly for a bot for deleting images that were duplicated on both EN and Commons, and found the current API was missing some functionality he wanted. How could or should he add the functionality he wanted? Best advice I could give him was to start small, add something simple yet that was useful to him, and attach the diff to a bug in bugzilla. If he didn’t get any feedback within a week, then join #mediawiki on IRC, and ask for some feedback or comment there.
  • The Wikimedia foundation is up to (very approximately) 350 servers, with around 8 database servers, with around 20% spare load capacity even at maximum loads (of course, that’s not much slack if you’re growing rapidly). Apparently the foundation sometimes gets offers to host the whole cluster, but they have turned this down so far, so as to prevent being completely beholden unto any single organisation (an entirely sensible position in my view) – however, offers to host say 20% of the cluster are far more favourably received. Oh and there have sometimes been discussions with Google in the past about ways they can help us, but they’re a chaotic organisation, and the foundation is a chaotic organisation, and so thus far nothing much has come of it.
  • The second Sydney BarCamp is coming up sometime in June. Nobody there had been to the first one, but a number of people (including me) had heard of it and were curious about it, so there may some Wikipedian attendees at the second one.
  • If you have MediaWiki SVN access, and you go to some gathering of wiki people, you really want a good short non-technical user-visible answer to the question: “What have you been working on lately?” I didn’t have one (something vague about testing and documenting stuff wasn’t a very satisfying answer to people), – so instead you really want to be able to point to some new feature that ordinary users will have noticed, and say “I did that”.
  • A Sydney person who could have given a satisfying answer to the above question, Werdna (who brought you the “undo” link for reverting revisions), was not there. And now I see he had his user page deleted, and people are leaving goodbye messages on his talk page. Have I missed my window of opportunity of meet another MediaWiki dev who lives in the same city as me?
  • The Melbourne Wikipedians continue to be far more organised than the unruly Sydney mob. Having 25 Sydney people in a room though, does give the potential to playfully suggest some naughtiness with voting in a block… so the idea of listing Melbourne as an article for deletion (reason: “Not notable”), and all voting “delete” was joked about, and dismissed. Something for next April Fools’ Day perhaps?
  • People involved in the Wikipedia can sometimes have a very different perspective from people who aren’t. Case in point: The waiter serving our table saw the video camera, and there was a conversation with him that went something like this:
    • Waiter: (Pointing) “Is that guy someone famous?”
    • Wikipedian: “That’s Jimbo.”
    • Waiter: (give a blank look.)
    • Wikipedian: “Jimbo Wales”
    • Waiter: (still gives a blank look.)
    • Wikipedian: “Have you heard of the Wikipedia?”
    • Waiter: (gives an even blanker look.)
    • Wikipedian: “Okay, have you heard of the Internet?”
    • Waiter: (With a faint flicker of recognition) – “Oh yeah, I’ve heard of that … Can’t say I have ever really used it though.”
  • Ta Bu Shi Da Yu (a.k.a. Chris) is still working on the various titles of the USA Patriot Act, on and off. Spoke well of David Gerard after meeting him at the last London Meetup. Next Sydney Meetup will probably be a laid-back BBQ thing at Chris’ house.

Code documentation systems

There is an automated documentation system for the MediaWiki code, that produces online documentation from the comments and tags in the code.

This system uses a open-source documentation system called doxygen ; There are other open-source systems too, such as phpDocumentor, which I think MediaWiki might have used previously. However, there is significant amount in common to what both systems understand, if you use the fairly neutral documentation style that most MediaWiki code does.

So anyway, I was looking at phpDocumentor out of curiosity, and ran it over MediaWiki, and it gave a list of errors or warnings about MediaWiki’s documentation, so I fixed some of those, whilst checking the Doxygen documentation to try and ensure that I wasn’t stuffing things up for everyone else. What happened next when people saw this can (with much poetic license) be described by the following diagram (an arrow between two blue things = something I was trying to change; an arrow between a person and something = their position, or what’s on their mind) :

Documentation fun

Oh, and just in case anyone takes this too seriously, creating this diagram was actually just a glorified excuse for me to experiment with Inkscape, something I had been meaning to do for a few weeks. If you’re looking for a vector-based open-source drawing package, it seems pretty good to me, and works fine for me on Windows (as well as being available for Mac OS X and Linux), so give it a whirl if you haven’t already.

Five new language incubator wiki planets

I’ve added 5 new planet incubators for various languages that people have shown an interest in. They are:

Polish as been omitted as there is already a Polish Wikipedia planet.

To be clear:

  • These planets are run independently, and are not official. However, there is no competition here: I view these new planets as incubators, where feeds can be added and the planets can make a start. I will HTTP redirect these 5 new planets to the Planet Wikimedia equivalents as and when they come into existence. I’m happy to host them in the short or medium term, but ultimately I want for them to find a home with Wikimedia. Therefore, please add feeds according to Wikimedia’s directions (e.g. the blog author must request or explicitly permit feed inclusion), so that there can be a seamless transition using the same list of feeds later on.
  • It could be really super if there was an easier way for non-technical folks to manage their own feeds, without involving the developers. Editing the config.ini through subversion seems like a potential barrier to me (the turnaround time is a bit slower, I’m dubious about whether it scales, it requires too much technical knowledge, and it’s unclear what happens if the maintainer goes AWOL). In particular, it’s a problem for French, Portuguese, and Russian (where nobody has stepped forward offering to maintain the config files), and the incubator exists primarily for these 3 languages, so that they can start now.
  • I don’t speak any of these new languages (other than French at the level of a 2-year-old, which doesn’t count) – so please manage the feeds amongst yourselves, and please play nice together.

Good luck, and have fun!

Citizendium read-restrictions & quick technical notes

Is it just me, or can anyone else not read the new Citizendium site without logging in, including the following pages:

Having to login to edit, that I can understand (given the vandalism problems, and that it’s a pilot), but surely the above pages at the very least should be open for public viewing? Surely this stuff is “need-to-know” information for potential editors trying to determine if the project is for them?

I find hiding that stuff rather curious, and it made me curious about what else they’re doing. A couple of tech notes after a quick bit of poking around the new site, but not logging in :

As a personal opinion, at the very least I think they should allow anon access to Special:Export for pages in the main namespace, as well as the list of recent changes, because that way information can be shared in a two-way street between the Wikipedia and Citizendium. That would certainly be an interesting bot project for someone (to keep the two sites in sync, and flag those edits that cannot be automatically synced for human review), and if it results in better quality articles for either or both sites, then I’m all for it.

Also, there is a concept of interwiki links in MediaWiki (which makes linking to a special list of external sites easier). It could be nice to have the citizendium included in this list by default, if their articles are publicly accessible. Heck, if they come to the party, and share what they’re doing in an open way, I’ll even add it myself! (Of course, it may get reverted by someone else, but that’s up to them, not me).

Simplification: Two planets rather than three

Walter asked the excellent question:

The difference between “open.wikiblogplanet.com” and “wikiblogplanet.com” is not clear to me. Yes, the open version can be edited online on the wiki and the other not. But the two seem nearly identical. What is benefit of using the non-open version? Why does the non-open version exist?

The answer is that the open one was originally an experiment, with the non-open one as a fallback in case the open one flopped. It’s not perfect, but the open one seems to be working well and expanding quickly (e.g. currently up to around 53 feeds versus 36 for the non-open one), and having two planets that looked so similar was just confusing for readers.

So, as of 5 minutes ago, http://wikiblogplanet.com was set to redirect to http://open.wikiblogplanet.com ; This means that the non-open planet is gone, and there is now only the open planet, which should hopefully eliminate any confusion, whilst keeping all the good stuff. Any old bookmarks you have will still work okay (you’ll get redirected), so you don’t have to do anything (unless you’re grabbing an ATOM feed from the planet, in which case you just need to add “open.” to the start of the URL).

The result is that we’re now back to two planets, each with different takes: Open Wiki Blog Planet (more flavourful, more feeds, you can add and edit the feeds directly, more prone to temporarily blowing up due to a bad config file or shared webhost downtime), and Planet Wikimedia (more focussed, slightly fewer feeds, you request to be added by a dev, very unlikely to blow up).

Also, you can now have add a hackergotchi, avatar, logo, or icon to your feed of blog entries on Open Wiki Blog Planet, if you want to. You just need the URL to the image on the web (example of adding avatar).

For the Planet Wikimedia folks, suggest editing the Planet’s “index.html.tmpl” file, and changing this line:

<img class=”face” src=”images/<TMPL_VAR channel_face ESCAPE=”HTML”>” width ….

to:

<img class=”face” src=”<TMPL_VAR channel_face ESCAPE=”HTML”>” width ….

That way we can both use the same images on the web, rather than each downloading images to our separate “images/” directories, and we can just copy-and-paste face / facewidth / faceheight lines between our config files.