Does this wiki business model work?

Evan Prodromou has talked and presented previously about the Commercialisation of wikis (good round-up here, slides here). He identifies four different wiki business models:

  • Service provider (Wikispaces, Wetpaint, PBWiki) [business model: either supported by ads, e.g. Google ads, or charge for hosting the wiki]
  • Content hosting (wikiHow, Wikitravel, Wikia) [business model: mostly advertising, e.g. Google ads]
  • Consulting (Socialtext) [business model: supported by income from wiki consulting for enterprises]
  • Content development (WikiBiz) [business model: create pages for a fee]

The reason I bring this up is that a couple of weeks ago I started doing some work for a wiki-based start-up, that is trying to work out a different business model to the above, but also based on wikis.

This model would financially reward people who contributed to the wiki, with a share of profits from the business. A proportion of revenue generated would be set aside, with the purpose of rewarding people who usefully participated in the wiki, and added some value to the wiki (either as users, or administrators, or as programmers). The revenue would come from selling things directly through a website associated with the wiki (as opposed to hosting other people’s ads). The wiki would drive traffic, and the direct sales would generate revenue. They’re essentially trying to work out a structure that would share the rewards based on contributions to a wiki. In addition to this, the content would probably be free content (most probably licensed under either Creative Commons or GFDL).

So for example, suppose hypothetically the Star Wars Wookieepedia had a shop associated with it that sold Star Wars merchandise and doo-dads. People would go to wiki because it had good content, and whilst there they might buy a trinket. Part of the reason they bought that trinket, was because the wiki had good content to make them come there and find the associated shop in the first place. So essentially the two things support each other – the things being sold are relevant to the wiki, and the part of the profit from the things being sold goes to support the people who made a kick-ass wiki.

The question is: How do you structure this so that it works, and so that, as well as being free content, as well as being part of a community of wiki users, so that people who contribute good content are also financially rewarded somehow if the site is successful? After all, it’s the users who generate the content, so if it’s a for-profit endeavour, why shouldn’t they also share in the reward if it does well?

Working out how to reward, and who to reward, though is quite tricky. Any system should be fair, and transparent. One possibility would be to have a requirement that users be logged in (so from the article history you know who made which edits), that they have an email address (so that you can contact them), that they have at least 100 edits (probably not much point bothering below a certain number of edits), that they not be blocked (since if there were blocked, they probably did something destructive). Then you could use a blamemap or something like IBM’s history flow tool, to work out which user was directly responsible for what percentage of the current content in any given article. And you could multiply this by some metric based on how valuable a page was (e.g. number of page views, or number of sales in the shop that originated from that page). Repeat for every page on the wiki, and then you can work out what percentage each user contributed to each part of the wiki, and any share of profits could be proportional to that number. At least, that is one possible model.

So my question is this: Do you think this model could work? If yes, why? If not, why not? Could it be improved? If so, how? Could it be simplified? Or is there a better model – preferably something other than Google ads?

Venture Capitalists have a valuation for the Wikipedia

So in the course of a conversation last week, I heard second-hand that VCs apparently have a valuation for the Wikipedia: Your favourite encyclopedia, condensed down into a single dollar figure. I suspect it was a valuation for just the English Wikipedia, but I’m not sure. And the dollar amount? Drum-roll please: Four billion US dollars.

Now, when I heard this, the following thoughts went through my mind:

  • Amusement: Are you serious? VCs have assigned a dollar value to it? Why would they do that? What kind of warped world-view needs to reduce every single thing in existence down to a dollar value?
  • Bafflement: How did you arrive at that specific figure? Did someone indicate that they were willing to pay 3.5 billion for it, but you reckoned you push them a little bit higher? Or did you estimate the total hours that people have put into it, and then estimate the cost of paying people to reproduce it? Or some other method? Enquiring minds want to know!
  • Economic rationalist: Something is only worth what someone else will pay for it. If nobody will pay 4 billion, then as simple statement of fact, it is not worth 4 billion. So who would pay four billion for it?
  • Entrepreneurial: 4 billion? Tell you what, if there are any buyers out there desperately wanting to purchase the Wikipedia, I’ll sell you a Wikipedia clone for only 2 billion, with 10% payable up-front as a deposit. You save 2 billion: it’s a bargain, at half the price! And to set up the clone, I’d simply approach the techs who set up the original Wikipedia, set up a partnership company to divide all profits equally between all the participants, and set up a well-resourced wiki farm, on nice servers, in some nice data centres on a number of continents, load up the most recent dump of the Wikipedia, and purchase a live Wikipedia feed from the WMF to make sure it was completely up-to-date, and call it something like “encyclopedia 3.0”. I’m sure most of the techs would be happy with this (who doesn’t have a student loan to repay, or credit cards bills due, or want to take a holiday, or buy a bigger home, or a newer car, or want to buy some new gadget: whatever it is, everyone has something, and millions of dollars would buy a lot of it), and if there are purchasers, they should be happy too at the great price: so everybody wins!

Comparing compression options for text input

If you’re compressing data for backups, you probably only care about 4 things:

  1. Integrity: The data that you get out must be the same as the data that you put in.
  2. Disk space used by the compressed file: The compressed file should be as small as possible.
  3. CPU or system time taken to compress the file: The compression should be as quick as possible.
  4. Memory usage whilst compressing the file.

Integrity is paramount (anything which fails this should be rejected outright). Memory usage is the least important, because any compression method that uses too much RAM will automatically be penalised for being slower (because swapping to disk is thousands of times slower than RAM).

So essentially it comes down to a trade-off of disk space versus time taken to compress. I looked at how a variety of compression tools available on a Debian Linux system compared: Bzip2, 7-zip, PPMd, RAR, LZMA, rzip, zip, and Dact. My test data was the data I was interested in storing: SQL database dump text files, being stored for archival purposes, and in this case I used a 1 Gigabyte SQL dump file, which would be typical of the input.

The graph of compression results, comparing CPU time taken versus disk space used, is below:
Compression comparison graph
Note: Dact performed very badly, taking 3.5 hours, and using as much disk space as Bzip2, so it has been omitted from the results.
Note: Zip performed badly – it was quick at 5 minutes, but at 160 Mb it used too much disk space, so it has been omitted from the results.

What the results tell me is that rather than using bzip2, either RAR’s maximum compression (for a quick compression that’s pretty space-efficient), or 7-zip’s maximum compression (for a slow compression that’s very space-efficient), are both good options for large text inputs like SQL dumps.