Our offer to the Wikimedia Foundation and the Wikipedia (technical) community is this: Come up with a new and better Wikitext and use the Sweble Wikitext parser to convert old Wikipedia content to that new format. Naturally, the new Wikitext format should work well with visual editors etc. We have spent more than one year full-time working on a parser that can handle the complexities of current Wikitext and it does not make sense to us to create another one. You only need one bridge away from the place you don’t want to be any longer (the current “old” Wikitext) to get to a new and happier place.
CrystallBall is our parser demo so that you don’t have to get down to code to check out the parser. It is a simple and easy way to see how we interpret Wikitext.
The general Sweble Parser documentation is on the wiki, naturally. Here are a few examples, though, for the hurried among you. Please note that we have not invested in style sheets to make HTML output look nice or like Wikipedia.org output (not our project goal).
Parsing the generic article (page) ASDF:
- Show original Wikitext
- Show AST after expansion and postprocessing
- Render article after expansion and postprocessing
Some other articles:
The ultimate parser deathmatch Wikipedia article page (courtesy of Luca Dealfaro of WikiTrust fame):
- Saxby Chambliss – Render article after postprocessing but without expansion (skipping expansion leads to wrong nesting of tables)
- Saxby Chambliss – Render article after expansion and postprocessing
And finally some XPath queries:
- Extract the section called “Tourism” from the article “France”
- Extract the name of the capital from the “Infobox Country” template invocation of the article “France”
Have fun! And please let us know if your favorite article doesn’t do what you think it should do!
We are happy to announce the general availability of the first public release of the Sweble Wikitext parser, available from http://sweble.org.
The Sweble Wikitext parser
- can parse all complex Wikitext, incl. tables and templates
- produces a real abstract syntax tree (AST); a DOM will follow soon
- is open source made available under the Apache Software License 2.0
- is written in Java utilizing only permissively licensed libraries
You can find all relevant information and code at http://sweble.org – this also includes demos, in particular the CrystalBall demo, which lets you query a Wikipedia snapshot using XQuery. (The underlying storage mechanism is not particularly well-performing, so you may have to wait a little if load is high.)
The Sweble Wikitext parser intends to be a complete parser for Wikitext. That said, plenty of work remains to be done. Wikitext, as implemented through the MediaWiki engine, has ties to many components that aren’t strictly part of the language, most notably the parser functions, of which we have implemented only a subset.
At this stage, we are hoping for your help. You can help us by
- playing with the CrystalBall demo and pointing out to us wiki pages that look particularly bad or faulty
- simply using the parser in your projects and telling us what works and what doesn’t (bug reports!)
- getting involved in the open source project by contributing code, documentation, and good humor
If you have questions, please don’t hesitate to use the sweble.org facilities or send email to the main implementor, Hannes Dohrn.
Brought to you by the Open Source Research Group at the University of Erlangen, http://osr.cs.fau.de
Finally, our Sweble  project site has launched! And with it one of our first projects is going Open Source: The Wikitext Parser, developed at the Open Source Research Group at the University Erlangen-Nürnberg.
 The Sweble project develops and provides libaries and components for a MediaWiki compatible wiki software. An important focus is wiki content analysis.