Talk: A content driven reputation system for Wikipedia

Wikimania 2007 talk notes: “A content driven reputation system for Wikipedia.” by Luca De Alfaro.

Trusting content. How can we determine automatically if an edit was probably good, or probably bad, based on automated methods?

Authors of long-lived content gain reputation.

Authors who get reverted lose reputation.

Longevity of edits.

Labels each bit of text by the revision in which it was added.

Short-lived text is when <= 20% of the text survives to the next revision.

Have to keep track of live / current text and the deleted text.

Did you bring the article closer to the future? Did you make the page better? If so reward. If you went against the future, then you should be punished.

Ratio that was kept versus ratio that was reverted.

+1 = everything you changed was kept

-1 = everything you did was reverted.

Run it on a single CPU, no cluster :-)

Reputation – data shown for only registered users.

89% of the edits by low reputation users were against the future direction of the page.

Versus only 5% of the edits by high reputation users were against the future direction of the page.

I.e. there is a reasonable correlation between the reputation of the user, and whether edit was with the future direction of the page.

Can catch around 80% of the things that get reverted.

See demo: http://trust.cse.ucsc.edu/

Author lends 50% of their reputation of the text they create.

Want an option or special page to show the trustedness of a page. Time to process an edit is less than 1 second. Storage required is proportional to the last edit.

Instead of saying trust the whole article, can have partial trust in some of the article. Could provide a way of addressing concern about whether Wikipedia can be trusted – this way vandalism would likely be flagged as untrusted content.

Speaker would like a live feed of Wikipedia Recent Changes to continue and improve this work. Or perhaps it could run on toolserver? Erik later emphasised that having a good and friendly UI would make it easier to help with getting weight behind this tool.

Link to talk’s page.

Talk: Using Natural Language Processing to determine the quality of Wikipedia articles

Okay, I know that Wikimania is over, but I still have a backlog of talk notes that I just simply didn’t have time to post whilst at the conference, PLUS WikiBlogPlanet stalled for 3 days whilst I was at the conference. Yes, I know, I suck. I’ll try to make up for that by posting my talk notes over the few days. So I know they’re not super-fresh, but they are only a few days old. Oh, and it’s in note form, rather than complete sentences.

Talk notes: “Using Natural Language Processing to determine the quality of Wikipedia articles” by Brian Mingus. Speaker’s background is in studying brains.

Quality should not be imposed. It should come from the community.

Brains transform information like this: Letters -> Words -> sentences -> paragraphs -> sections -> discourse -> argument structure.

Wikipedia has many more readers than editors.

Learning is critical. See [[Maximum Entropy]]. Similar to neural networks. Neurons that fire together wire together. The brain is an insanely complicated machine, that finds meta associations. The more examples of a phenomenon leads to more smartness. Things associated with relevant features get strengthened, things that are irrelevant get weakened. Each brain is unique, so everyone’s idea of quality is different. Each individual brain is invaluable based on a lifetime of experience.

Some qualitative measures of article quality:

  • Clarity

  • flow

  • precision

  • unity (outline)

  • authority (footnotes)

  • and many more things besides….

Quantitative versus qualitative.

Picking a good apple for beginners:

  1. Apple should be round

  2. Apple should have even colouring

  3. Apple should be firm

Takes about 10 years to become an expert in something. E.g. an apple farmer for 10 years would not go through a check-list – they “just know” if it’s a good apple.

Some example quantitative indicators of article quality:

  • Number of editors

  • images

  • length

  • number of internal and external links

  • features article status.

Has a function for testing each of these and many more – these functions get passed the wikitext, the htmltext, the plain text (as 3 arguments).

On the English Wikipedia, the “editorial team” hand-rates the quality of an article. They also rate the importance of each article. E.g. Around 1428 Features Articles on the English Wikipedia.

Wanted some software that would try to predict the ratings of each article.

Assigns weights to each feature.

Model learned that Wikipedia was mostly low quality material. Therefore it learned to classify everything as bad, in the first iteration.

To prevent this, had to restrict learning to a subset of articles, because the number of stubs vastly outweigh the number of complete articles in the Wikipedia.

Used 3 categories / ratings for quality: 1) Good, 2) Okay, 3) Bad.

Can test how well the model works using “a confusion matrix”, which shows how confused the model is, by showing how much stuff it gets wrong (e.g. by classifying a FA as “bad”, or a stub article as “good”)

Result: The model was learning.

PLoS One is an alternate rating system. They use 3 categories:

  • Insight

  • Style

  • ?? (sorry, I missed this item)

Need to avoid group-think – see need to rating from others only after giving your rating. Can’t allow rater to know everyone else’s rating before they give it a rating, or else you can get weird effects (like groupthink) – avoid this by independence.

Performance: Takes 24 hours to run on 120,000 articles, on a cluster with 26 nodes.

Link to talk’s page.

Wikimania Talk notes: “Where have all the editors gone?”

I’ll copy and paste my notes for some of the talks at Wikimania 2007 here, in case it’s helpful so that everyone can follow what’s going on. As such they will be in point / summary form, rather than well-formed prose:

 

Talk:”Where have all the editors gone?” by Seth Anthony. Background in chemistry education. User since 2003 – and has seen people come and go. This raised some questions for him:

 

Who adds real content to the Wikipedia? Not just correcting typos and wikification.

 

Not all edits are created equal. Some are negative (e.g. vandalism). Some are positive (e.g. tweaks, spelling, formatting, wikifying), and some are really positive (adding content). Some have possible value (e.g. admin / bureaucracy / discussion).

 

Made a study using a sample of edits to the Wikipedia. (Size of sample not clear.)

Facts & figures on findings:

  • 28% of edits: outside article namespace.
  • 10% article talk pages.
  • 62% article namespace.

(i.e. 1/3 of the edits are not about the articles)

 

Breakdown of edits:

  • 5% vandalism
  • 45% of edits are tweaking / minor changes / adding categories.
  • 12% content creation. Of that, 10% is adding content to already existing articles, and 2% is creating new articles.
  • Rest? (Probably discussion?)

 

So only 12% of edits create fresh content.

 

Of these 12%, was most interested in this, so broke this down:

  • 0% were made by admins
  • 69% were registered users.
  • 31% were created by anon users, or non-logged in users.

 

… and only 52% were by people who had a user page. I.e. only half of the people had a name-based online identity.

 

Editors are not homogeneous.

 

Content creators, versus admins (for the English Wikipedia, in 2007) :

Admins Content creators

Num Edits 12900 edits 1700 edits (admins edit more)

% main namepsace 51% 81% (admins spend less proportion of time on content)

Ave num edits/page 2.27 2.2 (same)

edits per day 16 5 (admins more active)

 

Breakdown of each group:

 

Breakdown of Content creators –

  • Anons – 24% of high content edits are anon users. Drive-by editors. Who are they? One time editors. Are they normal editors who are accidentally logged out?
  • 28% are “dabblers”. fewer than 150 edits. Editing more than 1 month. Not very likely to have a user page.
  • 48% are “wikipedians”. More than 1 edit per day. Almost all have edited within the last week. They create articles. They tend to have a focus area e.g. “Swedish royalty” – great work in a specific area. They are subject area editors. Generally have > 500 edits.

 

Admins breakdown – have 2 groups – “A” and “B”

  • Admins A: 70% of admins make 20%-60% of their edits in the article space.
  • Admins B: 30% of admins have 60-90% edits in the article space. Only 1/3 of admins are in this class.

 

Were group “B” admins once “anon wikipedians”? The short answer seems to be “yes”.

  • Admins A: Early edits were less on articles.
  • Admins B: Early edits were more on articles.

 

So 4 distinct groups of editors:

  • Anons / drive-by editors
  • Occasional dabblers
  • Subject area editors – Anons + Admins B
  • Professional bureaucrats – Admins A.

 

A possible early indicator: Type A admins create their user page sooner than Admins B :=)

Does this wiki business model work?

Evan Prodromou has talked and presented previously about the Commercialisation of wikis (good round-up here, slides here). He identifies four different wiki business models:

  • Service provider (Wikispaces, Wetpaint, PBWiki) [business model: either supported by ads, e.g. Google ads, or charge for hosting the wiki]
  • Content hosting (wikiHow, Wikitravel, Wikia) [business model: mostly advertising, e.g. Google ads]
  • Consulting (Socialtext) [business model: supported by income from wiki consulting for enterprises]
  • Content development (WikiBiz) [business model: create pages for a fee]

The reason I bring this up is that a couple of weeks ago I started doing some work for a wiki-based start-up, that is trying to work out a different business model to the above, but also based on wikis.

This model would financially reward people who contributed to the wiki, with a share of profits from the business. A proportion of revenue generated would be set aside, with the purpose of rewarding people who usefully participated in the wiki, and added some value to the wiki (either as users, or administrators, or as programmers). The revenue would come from selling things directly through a website associated with the wiki (as opposed to hosting other people’s ads). The wiki would drive traffic, and the direct sales would generate revenue. They’re essentially trying to work out a structure that would share the rewards based on contributions to a wiki. In addition to this, the content would probably be free content (most probably licensed under either Creative Commons or GFDL).

So for example, suppose hypothetically the Star Wars Wookieepedia had a shop associated with it that sold Star Wars merchandise and doo-dads. People would go to wiki because it had good content, and whilst there they might buy a trinket. Part of the reason they bought that trinket, was because the wiki had good content to make them come there and find the associated shop in the first place. So essentially the two things support each other – the things being sold are relevant to the wiki, and the part of the profit from the things being sold goes to support the people who made a kick-ass wiki.

The question is: How do you structure this so that it works, and so that, as well as being free content, as well as being part of a community of wiki users, so that people who contribute good content are also financially rewarded somehow if the site is successful? After all, it’s the users who generate the content, so if it’s a for-profit endeavour, why shouldn’t they also share in the reward if it does well?

Working out how to reward, and who to reward, though is quite tricky. Any system should be fair, and transparent. One possibility would be to have a requirement that users be logged in (so from the article history you know who made which edits), that they have an email address (so that you can contact them), that they have at least 100 edits (probably not much point bothering below a certain number of edits), that they not be blocked (since if there were blocked, they probably did something destructive). Then you could use a blamemap or something like IBM’s history flow tool, to work out which user was directly responsible for what percentage of the current content in any given article. And you could multiply this by some metric based on how valuable a page was (e.g. number of page views, or number of sales in the shop that originated from that page). Repeat for every page on the wiki, and then you can work out what percentage each user contributed to each part of the wiki, and any share of profits could be proportional to that number. At least, that is one possible model.

So my question is this: Do you think this model could work? If yes, why? If not, why not? Could it be improved? If so, how? Could it be simplified? Or is there a better model – preferably something other than Google ads?

Venture Capitalists have a valuation for the Wikipedia

So in the course of a conversation last week, I heard second-hand that VCs apparently have a valuation for the Wikipedia: Your favourite encyclopedia, condensed down into a single dollar figure. I suspect it was a valuation for just the English Wikipedia, but I’m not sure. And the dollar amount? Drum-roll please: Four billion US dollars.

Now, when I heard this, the following thoughts went through my mind:

  • Amusement: Are you serious? VCs have assigned a dollar value to it? Why would they do that? What kind of warped world-view needs to reduce every single thing in existence down to a dollar value?
  • Bafflement: How did you arrive at that specific figure? Did someone indicate that they were willing to pay 3.5 billion for it, but you reckoned you push them a little bit higher? Or did you estimate the total hours that people have put into it, and then estimate the cost of paying people to reproduce it? Or some other method? Enquiring minds want to know!
  • Economic rationalist: Something is only worth what someone else will pay for it. If nobody will pay 4 billion, then as simple statement of fact, it is not worth 4 billion. So who would pay four billion for it?
  • Entrepreneurial: 4 billion? Tell you what, if there are any buyers out there desperately wanting to purchase the Wikipedia, I’ll sell you a Wikipedia clone for only 2 billion, with 10% payable up-front as a deposit. You save 2 billion: it’s a bargain, at half the price! And to set up the clone, I’d simply approach the techs who set up the original Wikipedia, set up a partnership company to divide all profits equally between all the participants, and set up a well-resourced wiki farm, on nice servers, in some nice data centres on a number of continents, load up the most recent dump of the Wikipedia, and purchase a live Wikipedia feed from the WMF to make sure it was completely up-to-date, and call it something like “encyclopedia 3.0”. I’m sure most of the techs would be happy with this (who doesn’t have a student loan to repay, or credit cards bills due, or want to take a holiday, or buy a bigger home, or a newer car, or want to buy some new gadget: whatever it is, everyone has something, and millions of dollars would buy a lot of it), and if there are purchasers, they should be happy too at the great price: so everybody wins!

Comparing compression options for text input

If you’re compressing data for backups, you probably only care about 4 things:

  1. Integrity: The data that you get out must be the same as the data that you put in.
  2. Disk space used by the compressed file: The compressed file should be as small as possible.
  3. CPU or system time taken to compress the file: The compression should be as quick as possible.
  4. Memory usage whilst compressing the file.

Integrity is paramount (anything which fails this should be rejected outright). Memory usage is the least important, because any compression method that uses too much RAM will automatically be penalised for being slower (because swapping to disk is thousands of times slower than RAM).

So essentially it comes down to a trade-off of disk space versus time taken to compress. I looked at how a variety of compression tools available on a Debian Linux system compared: Bzip2, 7-zip, PPMd, RAR, LZMA, rzip, zip, and Dact. My test data was the data I was interested in storing: SQL database dump text files, being stored for archival purposes, and in this case I used a 1 Gigabyte SQL dump file, which would be typical of the input.

The graph of compression results, comparing CPU time taken versus disk space used, is below:
Compression comparison graph
Note: Dact performed very badly, taking 3.5 hours, and using as much disk space as Bzip2, so it has been omitted from the results.
Note: Zip performed badly – it was quick at 5 minutes, but at 160 Mb it used too much disk space, so it has been omitted from the results.

What the results tell me is that rather than using bzip2, either RAR’s maximum compression (for a quick compression that’s pretty space-efficient), or 7-zip’s maximum compression (for a slow compression that’s very space-efficient), are both good options for large text inputs like SQL dumps.

Google Developer Day 2007

Went to the Google Developer Day 2007 yesterday. It was held at 9 locations worldwide. The Sydney one was the second-largest one, with 700 developers attending.

To summarize the event down to a sound bite, it was about Google APIs and mashups. (and I’m sort of hoping I won’t hear to word “mashup” again for at least a week…)

Here are my notes, and I have indented the bits that seemed to me to potentially be relevant or useful to MediaWiki or the Wikipedia:

  • All their APIS are at http://code.google.com/apis/
  • GData / Google Data APIs – provides simple read/write access via HTTP, and authentication with Google. See also S3 for a similar idea (docs here).
  • Google Web Toolkit. Also known as GWT, and pronounced “gwit”. Converts Java code to JavaScript, in a cross-browser compatible way. AJAX development can be painful because of browser compatibility problems. GWT is one solution to this problem. Licensed under Apache 2.0. Develop in any Java IDE, but recommend Eclipse. Launched a year ago (almost exactly). Using Java as source language because of its strong typing.
  • Google Gears, was presented by the creator of Greasemonkey. Gears is a browser plugin / extension, for IE + Firefox + Safari, that that allows web apps to run offline. I.e. can extend AJAX applications to run offline, with access to persistent local SQL data storage. Means users don’t always need to be online, as there is access to persistent offline storage for caching data from the web, and for caching writes back to the web. Released under BSD license. Uses an API that they want to become a standard. Idea is to increase reliability, increase performance, more convenient, and for all the times people are offline (which is most of the time for most people). It’s an early release, with rough edges. Use local storage as a buffer, and there is a seamless online-offline transition. For the demo he disconnected from the net. What talks to what: UI <–> Local Db <–> Sync <–> XmlHttpRequest. Gears has 3 modules – LocalServer (starts apps), Database (SQLlite local storage), and WorkerPool (provides non-blocking background JavaScript execution). WorkerPool was quite interesting to me – non-blocking execution, that overcomes a limitation of JS – different threads that don’t hog the CPU … really want the whole Firefox UI to use something like this, so that one CPU-hogging tab doesn’t cause the whole browser to choke.
    • Thoughts on how Gears could potentially be applied to MediaWiki: An offline browser and editor. Will sync your edits back when go online, or when the wiki recovers from a temporary failure. Could also cache some pages from the wiki (or all of them, on a small enough wiki) for future viewing. Basically, take your wiki with you, and have it shared – the best of both worlds.
  • Google Mapplets. Mapplets allows mashups inside of google maps, instead of being a google map inserted into a 3rd party web page. “Location is the great integrator for all information.” URL for preview. Can use KML or GeoRSS for exporting geographic information.
    • Thoughts on how to use this for the Wikipedia: Geotagging in more data could be good. E.g. Geotagging all free images.
  • Google Maps API overview. A lot of the maps.google.com development is done in Sydney. Talk involved showing lots of JS to centre google maps, pan maps, add routes, add markers, change marker zones, add custom controls, show / hide controls. A traffic overlay for showing road congestion is not available for Sydney yet, but will be available soon. Some applications of the Maps API: Walk Jog Run – see or plan walking or running routes – example; Set reminders for yourself ; Store and share bike routes.
    • Thoughts on applications for the Wikipedia: Perhaps a bot that tries to geolocate all the articles about locations in the world? Will take a freeform article name string, and convert to longitude + latitude, plus the certainty of the match (see page 16 of the talk for example of how to do this). Could get false matches – but could potentially be quite useful.
  • Google gadgets. Gadgets are XML content.
  • KML + Google Earth overview. KML = object model for the geographic representation of data. “80% of data has some locality, or connection to a specific point on the earth”. Googlebot searches and indexes KML. KML de facto standard, working towards making a real standard.
    • Already have a Wikipedia layer. It seems to be 3 months out of date, and based off of the data dumps though.

Misc stuff:

  • Google runs a Linux distro called Goobuntu (Google’s version of Ubuntu).
  • Summer of code – had “6338 applications from 3044 applicants for 102 open source projects and 1260 mentors selected 630 students from 456 schools in 90 countries”.
  • My friend Richard, one the organisers of Sydney BarCamp, spoke with some of the Google guys, & they were quite enthusiastic about maybe hosting the second Sydney BarCamp at a new floor they’re adding to Google’s offices in late July or early August. If that works out, it could be good … although if it could not clash with Wikimania, that would be better!
  • Frustration expressed by many people about the way the Australian govt tries to sell us our own data (that our tax dollars paid for in the first place), restricting application of that data. Example: census data. Much prefer the public domain approach taken in the US.

Mac ads, Spider-man 3, APEC annoyances

  • Most of the Mac ads are a bit so-so, with the counsellor and the confirm-or-deny one probably being the best. But they are just begging for a comic response like this.
  • Saw Spiderman 3 on Monday. Primary plot themes: forgiveness; wrestling with your internal demons; everyone has a choice; being self-absorbed; revenge; father-child relationship. I enjoyed it, but thought Spiderman 2 was a better film, with its central plot theme of balancing personal life versus the greater good. Oh, and Monday night is my new cinema night – we were 2 of only 8 people in a 420 seat theatre – love it!
  • As part of the APEC summit in September:
    • large parts of Sydney’s CBD will be sealed off (causing traffic gridlock)
    • the mobile phone coverage in the city may be partially jammed (apparently to stop people bombing George Bush via text message)
    • three inner-city train stations will be closed
    • the police have just been given extraordinary stop-and-search powers and the power to incarcerate “suspicious” people without charge until the conference is over (yet another strike against due-process)
    • and no doubt people protesting globalisation will be tempted to run amuck like lunatics smashing windows and burning things

    … Gee, I can hardly wait! Some local politicians have been heard to question why APEC even needs to be held in Sydney at all – couldn’t it be in Canberra instead? I’m inclined to agree; the whole reason Canberra even exists is because 100 years ago, Sydney and Melbourne squabbled like little children over which of them should be the nation’s capital – and the compromise solution was that nobody should be happy, and that a new artificial capital city should be build, at significant taxpayer expense, in the middle of nowhere, half-way between the two cities, thus pissing off everyone equally (seriously – you can’t make stuff like this up). Now, if there are any benefits to my tax dollars subsidising an artificial capital in the middle of nowhere, surely hosting conferences that nobody normal wants should be one of those benefits? If not, then I just have to ask: what the hell is the point of Canberra? … Thus far, the one and only redeeming factor to APEC is that Friday the 7th September has been declared a public holiday – Yay!

Posted in pov

Wikipedia FS, plus quick steps for setting up

Wikipedia FS allows you to treat MediaWiki articles like files on a Linux system. You can view the wiki text, and edit the wiki text, all using your standard editors and command-line tools.

Steps for setting up

If you want to have a quick play with Wikipedia FS, here are the series of steps I used today on an Ubuntu 6.04 system to get it installed from scratch; Most of these you can probably cut-and-paste directly onto a test machine, and should hopefully work on any modern Ubuntu or Debian system:

# Load the FUSE module:
modprobe fuse

# Install the FUSE libs:
apt-get install libfuse-dev libfuse2

# Get, build, and install the python-fuse bindings:
cd /tmp
mkdir python-fuse
cd python-fuse/
wget http://wikipediafs.sourceforge.net/python-fuse-2.5.tar.bz2
bunzip2 python-fuse-2.5.tar.bz2
tar xf python-fuse-2.5.tar
cd python-fuse-2.5
apt-get install python-dev
python setup.py build
python setup.py install

# Get, build, and install WikipediaFS
cd /tmp
mkdir wikipediafs
cd wikipediafs/
wget http://optusnet.dl.sourceforge.net/sourceforge/wikipediafs/wikipediafs-0.2.tar.gz
tar xfz wikipediafs-0.2.tar.gz
cd wikipediafs-0.2
python setup.py install

# Install the “fusermount” program (if not installed):
# First “nano /etc/apt/sources.list”, and uncomment the line to enable installing from universe, e.g. uncomment the line like this: “deb http://au.archive.ubuntu.com/ubuntu/ dapper universe”
apt-get update
apt-get install fuse-utils

# Mount something:
mkdir ~/wfs/
mount.wikipediafs ~/wfs/

# The above line will create a default configuration file. Edit this now:
nano ~/.wikipediafs/config.xml
# You may want to create a throwaway login on the Wikipedia for testing wikipediaFS now. This
# will allow you to edit articles. Or you can use your main account instead of a test account,
# it’s up to you – but I preferred to set up a separate test account. Either way, keep note of your
# account details & password for the <site> block below.
#
# Now, uncomment the example FR Wikipedia site, and change to the English site. For example, the <site> block should look something like this:
# — START —
<site>
<dirname>wikipedia-en</dirname>
<host>en.wikipedia.org</host>
<basename>/w/index.php</basename>
<username>Your-test-account-username</username>
<password>whatever-your-password-is</password>
</site>
# —– END —–

# Then unmount and remount the filesystem to ensure the above gets applied:
fusermount -u ~/wfs/
mount.wikipediafs ~/wfs/

Using it

# Can now use the filesystem. For example:
cat ~/wfs/wikipedia-en/Japan | less
# This should show the text of the Japan article.

# See what’s in the sandbox:
cat ~/wfs/wikipedia-en/Wikipedia:Sandbox

# Now edit the sandbox:
nano ~/wfs/wikipedia-en/Wikipedia:Sandbox

# Change some stuff, save, wait about 10 seconds for the save to complete, and (fingers crossed) see if the changes worked.

Probably not production ready

I have to emphasise here that WikipediaFS does seem a little flaky. Sometimes it works, and sometimes it doesn’t. In particular, I wanted to reset the Wikipedia’s sandbox after playing with it from this wiki text:

{{test}}
<!– Hello! Feel free to try your formatting and editing skills below this line. As this page is for editing experiments, this page will automatically be cleaned every 12 hours. –>

To this wiki text:

{{Please leave this line alone (sandbox talk heading)}}
<!– Hello! Feel free to try your formatting and editing skills below this line. As this page is for editing experiments, this page will automatically be cleaned every 12 hours. –>

I.e. a one-line change. Simple, you say? … well, sorry, but you’d be wrong! In particular there seems to be something about “}}” and the character before it that caused my WikipediaFS installation to behave very strangely. It would look like an edit had saved successfully, but when you went back into editing the file, all the changes you made would be lost:

Save which VI thinks has succeeded, but which actually has failed.

No indication that the above save has failed, but actually it has.

So, I tried breaking the above change down into smaller and smaller pieces, until the save worked. Here are the pieces that I could get to work:

  1. Adding “{{Please leave this line alone (sandbox talk heading)” (note no closing “}}”, because it wouldn’t save successfully with that included)
  2. Deleting “{{“
  3. Deleting “t”
  4. Deleting “es”
  5. Deleting the newline
  6. At that stage, I simply could not successfully delete the final “t”, no matter want I did. Eventually I gave up doing this with Wikipedia FS, and deleted the “t” using my normal account and web browser.

So, although WikipediaFS is fun to play with, and is certainly an interesting idea, I do have to caution you that WikipediaFS may sometimes exhibit dataloss behaviour, and so in its current form you might not want to use it as your main editing method.