Wikimania 2007 talk notes: “Embeddable Wiki Engine; Proof of Concept†– by Ping Yeh, Google Taiwan R&D Centre.
Want to make a Wikipedia off-line client.
Problem: Currently one and only one software is guaranteed to correctly view the database dumps – i.e. MediaWiki.
Wants a reusable MediaWiki parser. This would:
-
Ensure the correctness of the Parser
-
Reduce manpower required
Showed a diagram of the typical software architecture of a wiki system.
MediaWiki is very tied to a SQL engine. But for embeddable stuff can only really assume a flat file for data storage.
Project EWE is the code name of the project to test some of these ideas. An attempt to make components and the wiki engine reusable by many programming languages. A preliminary version is ready.
Split the parser into a parser (transforms wiki mark-up into a document tree), and a formatter.
The document tree uses DocNodes – page -> section -> header / paragraph / list. Each bit is separate, so can replace each bit.
MediaWiki parser : Uses GNU flex to specify the syntax. Based on the help pages. No templates support yet. Has a manually crafted parser to parse the tokens into a document tree.
HTMLformatter: trivial conversion from DocTree to HTML tags.
Intending to make this a wiki library that can be called from C++ or PHP.
Things want to add:
-
Language bindings for Python.
-
Moin-Moin compatibility.
-
XML formatter.
-
MediaWiki formatter – need to add templates.
-
A search engine.
Problems:
-
Compatibility with extensions – e.g. the <math> extension changes output for MediaWiki.
-
Wikipedia has too many extensions. If re-implement each extension, then it will be a lot of work. It’s essentially a duplication of effort.
Note: I’ll post some more notes for more talks in a few days time.