WikiEducator roadmap/Improve print support and output formats

WikiEducator currently supports a simple export of wiki pages using HTMLDOC. HTMLDOC only supports old HTML versions, and cannot operate meaningfully upon the semantics of a document (such as chapters, instructional devices, and so on). As a consequence, the quality of exported PDF documents is very poor, and not suitable for professional use. Furthermore, PDF is not an editable output format; it would be desirable to also support formats such as DocBook and OpenDocument, which produce professional print quality, but can also be edited and processed in standard software.

Erik Möller, Board Member of the Wikimedia Foundation, is presently coordinating the efforts of the organization in the area of PDF export. Based upon the current state of discussion with multiple stakeholders, his recommendation is that the following tasks need to be resourced independently of WMF in order to meet WikiEducator's needs:


 * Improve existing wiki-to-(w)XML parser in order to map all existing wiki syntax elements, and allow customization of layout and content before export.
 * Implement (w)XML to DocBook to PDF conversion.
 * Implement (w)XML to OpenDocument conversion.

With (w)XML, we mean an intermediate XML format that represents the full semantics of wiki syntax in a well-formed manner, so that standard processing libraries can be used to convert to the desired end state.

Risk assessment

 * Risk: non-operational code
 * Given that the project operates on pre-existing technology which is partially functional, it should become visible early on in the process if development enters a dead end. The expected document transformation does not represent a particularly unique challenge; such transformations are exceedingly common due to the abundance of formats between which conversion is necessary.


 * Risk: incomplete transformation
 * It is expected that not all document scenarios can be dealt with adequately in the first implementation. Even an incomplete implementation can spur further open source development, and is an improvement of the status quo. In addition, the implementation in stages allows for one of the target formats to be dropped, should it become clear that otherwise neither will be supported to any reasonable extent.