Funding proposals/Print web service/Wikibooks proposal

From WikiEducator
Jump to: navigation, search

Research and Development Proposal Use of Wiki Technology for Collaborative Authorship of Printed Educational Materials


We propose a study of the existing and potential uses of wiki technology for the collaborative authorship of learning materials (primary, secondary and tertiary education) which can be used in the developing world. Based on the results of this research, we seek to realize open source technological (software development) contributions which would facilitate the use of wiki technology particularly for the creation of printed documents, for quality control, and for structuring complex materials. We propose to examine closely the Wikibooks project, a sister project of Wikipedia with the explicit aim to develop free content textbooks.


Growth of the largest Wikipedia editions (English, German, French, Polish, Japanese, Dutch, Italian, Swedish). Note that the English Wikipedia has grown so large that its database growth is no longer examined on a regular basis

Wiki technology, which was first developed in 1995 in the United States, has since become a global phenomenon, popularized by the enormously successful Wikipedia encyclopedia. A wiki, in its simplest form, is a collection of pages which can be modified with a high degree of openness (typically, the highest barrier to entry is a registration requirement). Wikis balance this openness with a set of strict tools for versioning and controlling changes.

Wikipedia, an encyclopedia built using wiki technology, currently (September 2006) contains about 4.5 million articles in over 100 languages. Wikipedia and its sister projects receive a daily average of about 9,000 page requests and a peak load of up to 17,000 page requests per second [1]. This massive traffic volume requires an infrastructure of over 170 servers [2], provided by a non-profit organization, the Wikimedia Foundation [3].

English Wikibooks frontpage
While Wikipedia has frequently received media attention due to the fluctuating results of the wiki process, the generally high quality of the end result is not disputed. For instance, Nature, in a comparison of science articles from Wikipedia with those from Encyclopaedia Britannica, concluded that: "Wikipedia comes close to Britannica in terms of the accuracy of its science entries [4].

One of the sister projects of Wikipedia under the umbrella of the Wikimedia Foundation is Wikibooks. Started in July 2003, the project describes itself as „a collection of free, open-content textbooks that you can edit“ (emphasis original) [5]. Over 1,000 such book projects have already been started in English, though most have a low degree of completion. About 20 books can be downloaded as PDF copies. A separate Wikijunior project is aimed at children's education.

Research questions for initial research

We would like to examine the state of these books, and to describe in some details the process of their creation. Furthermore, we would like to ask and explore answers to the following questions:

  • In a wiki-based process, how can long term quality of the content be guaranteed?
  • How can broader participation in the project be facilitated?
  • What strategies can be used to adapt wiki-books to a local educational context?
  • What technology is currently available to create print-ready documents, and what is lacking?
  • How can large documents such as wikis be structured in a wiki? How do different wiki software packages solve this problem?
  • What licensing options are there for Wikibooks content?

From the answers to these questions, we seek to clearly define development tasks to improve the MediaWiki software used by Wikipedia and Wikibooks in order to better manage the process of creating, reviewing and exporting large documents. These changes to the software would then be implemented as open source (GPL-licensed) code and, ideally, integrated with the existing codebase or made available as extensions (plug-ins).

Identified projects

The initial research is to be conducted over a scope of 2 months, while development time depends on which particular functionality is identified as being of interest to the funding partners. The proposal is structured in modules, which (with the exception of initial research) can be implemented in parallel.

Participation of education experts in the research module would be desirable, though a study with a narrow, primarily technological and descriptive focus can be conducted without such participation. Development projects that can be identified are introduced below.

Structuring wiki content for export

PediaPress allows users to compile print-ready PDF files from Wikipedia articles, and to even order print-on-demand books from these files. The software is proprietary and the user interface does not allow for the hierarchical modelling of content; however, a similar interface might be useful as an extension to the MediaWiki software.

Content in a wiki is typically structured as a loose web of connected pages. While MediaWiki offers some additional functionality (subpages, categories), it does not provide an effective and intuitive interface to visually structure a set of documents as a hierarchy (or a set of alternative hierarchies), which can then be exported together, or to generate a book-level table of contents.

Exporting content

Exporting content in various formats, particularly PDF and other formats which are suitable for printing (Open Document, Docbook, etc.). MediaWiki does not natively support any export format other than its own wiki syntax. External tools exist to convert this wiki syntax to an intermediate XML format which can then be converted to other target formats, but these tools are not currently mature technology.

Advanced document versioning

Advanced document versioning for contextualization of content. Every wiki engine supports versioning a document as a sequence of revisions, but none currently has support for the creation of multiple branches of content which can be developed independently (with the possibility to merge changes across sufficiently related branches). This functionality is well-developed in the context of computer code versioning tools like Subversion, CVS and BitKeeper; however, making the notion of a „branch“ of a document intuitively understandable poses particular user interface challenges. Nevertheless, it is essential to contextualize content for different audiences while keeping it up to date with a centrally maintained source. This is arguably the most challenging and resource-intensive development project within this proposal.


While a multitude of open source graphical editors exist (e.g. FCKEditor, TinyMCE), integration with MediaWiki is not trivial because its native wiki syntax includes programmatic features which cannot easily be mapped to intuitive visual tools (e.g. parametrized templates, categories). A visual editing environment should also interface with existing content, such as multimedia, to provide easy access (currently users have to reference filenames manually).

Quality assurance

While the open process of wikis is valuable in order to allow a large number of individuals to participate with minimal barriers in the development of content, it creates a need to systematically review changes to the content at regular intervals. Software-supported quality assurance would allow qualified individuals to make assertions about aspects of the content quality of particular page revisions (e.g. accuracy, compliance with educational standards, etc.). Only content which has been fully reviewed would then be flagged for export.

Better group collaboration

Wikibooks is a large community working on a very diverse range of topics. Collaboration within a group dedicated to a particular topic is not facilitated by the wiki. For instance, there are no group discussion forums or group notification mechanisms. Authentication also plays a role here; federating with authentication mechanisms of schools and universities using mechanisms such as OpenID or A-Select would allow for quicker participation of members of these institutions.

Clean separation of content and presentation

MediaWiki currently does not allow per-page or per-book stylesheets, to efficiently make stylistic changes that would also be reflected in the printed output. A proposal has been made to standardize MediaWiki's syntax to this effect1, but it has not been implemented. It may also be desirable to provide a simplified abstraction or front-end for Cascading Stylesheets, to ease usage.

Additional problem areas will become visible after the research into Wikibooks has been completed. Some of the functionality can be developed separately and in parallel; however, central planning and oversight is needed.

Risk assessment

  • Building upon existing software components: Low risk. The software components that will be built upon (MediaWiki, Linux, Apache, MySQL and PHP) are well-tested open source components that operate in a high traffic environment (up to 15,000 page requests per second as of September 2006 on the Wikimedia service network). They have been successfully extended by numerous individual developers and corporations, including (in the particular case of MediaWiki, the core component), Novell and intel.
  • Availability of skilled professionals: Low risk. Utilize the extensive network of professionals working with and for the Wikimedia Foundation and OpenProgress as volunteers or contracted developers.
  • Completion of development goals on time: Medium risk. The budget will include the provision of project management to manage this risk effectively.
  • Integration of developed functionality with Wikimedia services: Medium risk. It will be necessary to get a strong commitment from the Wikimedia Foundation to integrate functionality and services before development begins.