Wikipedia: The Everlasting Project

The Internet has established a network of complex relationships among its users, one that allows for a world where it’s possible to cooperate and participate in the production of knowledge. Ever since personal computers and internet access became widely available, collaborative knowledge production has presented itself in various ideas and philosophies, such as free software, open sources, and free knowledge, which have surfaced in the 1980s.

The emergence and evolution of the worldwide web have given way to new collaboration opportunities for online production. These were not limited to software and their relevant documents but surpassed them by creating public encyclopedias—edited by tens of thousands of people and culminating in today’s use of bots.

The Internet as a Way to Gather Human Knowledge

Ideas of creating a public encyclopaedia involving several contributors predates the information and communication technology of the last century. As early as the 17th century, French philosopher and writer Denis Diderot collaborated with others  to create the Encyclopédie, in an attempt to collect human knowledge at the time. It is perhaps the first encyclopaedia in history with multiple contributors. Centuries later, Wikipedia adopted and applied the philosophy of collecting human knowledge to be shared openly and freely. This transformed the Web as we know it, creating the promise of a technology that holds a social, knowledge-based philosophy at its core. Ideally, it would serve as the protagonist of equality and justice, yet capitalism’s interference turned most web applications into a centralized system.

Wikipedia, as a free software movement and open-source initiative, was influenced by preceding contributions, whether in terms of the collaborative knowledge production philosophy or the types of licensing governing the publication of Wikipedia content, as well as its use as a free encyclopedia.

For instance, with regards to software and operating systems, free and open-source software have survived since the 1980s, when the GNU Project emerged in 1984 as a Free Software Foundation project, with the aim of creating a free operating system similar to Unix. The Free Software Foundation also worked on providing legal use licenses, ensuring basic freedoms for users, advocated by the free software movement, such as the freedom to use and access the software source code, and the right to redistribute and develop. The GNU project and later merger with the Linux kernel, launched by Linus Torvalds in the early 1990s, is a success indicator for the collaborative model in project development. This merger is now known by “GNU/Linux distributions” or Linux for short.

Collaborative Knowledge Requires Free Licenses of Use

Wikipedia has become the largest most famous encyclopedia in the world. However, it was not the first attempt to create and develop a collaborative encyclopedia. Before the launch of Wikipedia, there was a similar project known as GNUpedia. It is one of the projects developed by the Free Software Foundation in the late 1990s. This encyclopedia is closer to a collective blog rather than Wikipedia’s current encyclopedia format.

Wikipedia founders, Jimmy Wales and Larry Sanger, had experimented with setting up an online encyclopedia prior to Wikipedia’s launch. It was called Nupedia and was introduced in the late 1990s—around the same time when work on GNUpedia started. Nupedia, with Larry Sanger as editor-in-chief, was an encyclopedia edited by experts and reviewed by specialists, before publishing the articles as free content. At the beginning of the new millennium, Nupedia and GNUpedia were shut down. The Free Software Foundation supported Wikipedia, making it the largest online encyclopedia with hundreds of thousands of users volunteering to edit content.

In the early 2000s, Creative Commons was established, inspired by many ideas advocated for by the free software movement, namely the “share alike” concept. Creative Commons developed licenses of use for various works to grant people additional freedom in using licensed works, without the restrictions of intellectual property and copyright laws, before the later large-scale dissemination.

Wikipedia licenses the content under two licenses: the GNU Free Documentation License and the Creative Commons License. Users are allowed to share, copy, distribute, and transfer any content on Wikipedia. They are also entitled to edit, combine, transform, and add to the content, for commercial or non-commercial purposes, provided the work is correctly attributed to the author, the licensing link is made available, the changes are shown, and the derivative works are subject to the conditions and licensing of the original work.

The objective is to enable everyone to collaborate on production and provide access to various works without restrictions.

An Open Platform for Bots too

In his essay The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary, Eric Raymond poses a central thesis suggesting that the more the software source code is available for general scrutiny and experimentation, the faster errors are detected and resolved. In contrast, Raymond believes that according to the Cathedral model (based on a closed development environment), searching for errors is more time and energy consuming, as the source code is only available to a limited number of developers. Although the “Cathedral and the Bazaar” essay focuses on the software development environment, it touches on several ideas: the collaborative Wikipedia platform for knowledge production is proof of the success of the Bazaar model, favored by Raymond in the essay. The decentralized management and development of Wikipedia content was and remains the main driver for the production of extensive free and open information.

To manage content, Wikipedia and other similar websites use what is known as Wiki, an online content management system. The term “wiki” was first coined in the 1990s to refer to sites with a specific set of characteristics. These include allowing users to collectively edit content in simple markup language through the web browser, and to create new content or update and change old information. MediaWiki, the software currently used by Wikipedia to manage content, was developed in 2002, and constitutes a free system under the GNU General Public License.

Because of the collective nature of information production on Wikipedia, the credibility and reliability of its articles have become subject to criticism—with the latter more pronounced for certain languages. Since Wikipedia relies on a crowd-editing model, content may include misleading, inaccurate, and incorrect information. In some cases, deliberate information distortion is driven by religious or political motives. However, Wikipedia’s working model is still capable of developing and increasing article credibility especially when considering the role of Artificial Intelligence  in this regard.

The development of Artificial Intelligence (AI) has established another limitless space for collaborative knowledge production. Knowledge is now being created by machines and humans alike. Machines are contributing to drafting and developing articles published on Wikipedia, blurring the lines between AI and bot contributions, and that of humans. The level of machine-made edits in Wikipedia Articles is shockingly high.

Bots carry out a lot of simple, routine tasks on Wikipedia in a faster and more organized way than humans, such as correcting grammar mistakes. They sometimes create articles by scraping data from reliable sources and rephrasing them into Wikipedia articles.

ClueBot is one of such  technologies, where it is responsible for numerous edits to Wikipedia articles. This bot is able to counter the potential distortion made to Wikipedia articles within seconds, including the deletion of profanity or inappropriate photos.

Some projects using AI applications are currently working on creating and developing Wikipedia articles with little to no human intervention. A group of MIT researchers developed a system which updates outdated articles. When editors write updated and unstructured information about a certain topic, the system would search Wikipedia pages and update the information in articles. It also highlights any contradictory parts which require rephrasing in a more “human”  style.

Google has also tossed its hat in the ring. Researchers developed a system capable of writing Wikipedia-style articles by scraping the top 10 search results on Google about a topic, then drafting a long text about it. It is then up to humans to edit the produced text to address weak syntax. Perhaps with the advances in automated learning and AI, there will no longer be a need for humans to edit such articles.

Sverker Johansson created an internet bot called Lsjbot capable of writing 10,000 Wikipedia articles in one day. It operates by scraping information from various reliable sources, compiling the material, and producing a short article on a specific topic. This bot mainly focuses on developing the Swedish Wikipedia, to produce articles on living organisms and geography. It is reported that the bot developed by Johansson has written 8.5% of articles on Wikipedia.

For Arabic content, an Arab Wikipedia user Osama Khaled developed OKBot which handles proofreading Arabic articles on Wikipedia. It still has to overcome some technical issues in terms of writing algorithms capable of understanding Arabic texts and contexts.

Wikipedia is not considered a comprehensive, completed encyclopedia. It will remain a work-in-progress as long as humanity and human knowledge continue to evolve. Wikipedia is an ongoing project with contributors from both humans and machines, coming together to collaborate and create advanced, free, and open knowledge. Wikipedia is an everlasting undertaking and serves as a model, and a promise, of what free, accessible knowledge may look like.

This page is available in a different language العربية (Arabic) هذه الصفحة متوفرة بلغة مختلفة

Mohamed Taher

Mohammad El-Taher, researcher, blogger and technologist, interested in technology and law in human rights principles and values, and cofounder of the Technology and Law Community "Masaar". Follow him on Twitter: @moeltaher; and his blog moeltaher.net.