Wiki Workshop 2019

A forum bringing together researchers exploring all aspects of Wikipedia, Wikidata, and other Wikimedia projects. Held at The Web Conference 2019 in San Francisco, Calif., May 14, 2019.

  • May 14, 2019: Workshop happened -- amazing!
  • Apr. 29, 2019: PDFs of accepted papers available.
  • Apr. 17, 2019: List of accepted papers announced.
  • Apr. 4, 2019: Workshop schedule announced.
  • Feb. 6, 2019: Denny Vrandečić confirmed as invited speaker.
  • Feb. 6, 2019: Erica Kochi confirmed as invited speaker.
  • Jan. 23, 2019: Jure Leskovec confirmed as invited speaker.
  • Jan. 23, 2019: Neil Thompson confirmed as invited speaker.
  • Jan. 23, 2019: Timnit Gebru confirmed as invited speaker.
  • Jan. 20, 2019: Workshop date announced: Tuesday, May 14, 2019.
  • Dec. 6, 2018: Wiki Workshop 2019 webpage online.
9:00 - 9:20Welcome and icebreaking
9:20 - 10:05Invited talk: Denny Vrandečić
10:05 - 10:17Paper presentation: Mohsen Sayyadiharikandeh, Jonathan Gordon, Jose-Luis Ambite and Kristina Lerman: Finding Prerequisite Relations Using the Wikipedia Clickstream
10:17 - 10:30Paper presentation: Xiaoxi Chelsy Xie, Isaac Johnson and Anne Gomez: Detecting and Gauging Impact on Wikipedia Page Views
10:30 - 11:00Coffee break
11:00 - 11:45Invited talk: Timnit Gebru
11:45 - 12:25Poster spotlight presentations
12:25 - 12:30Poster setup
12:30 - 14:00Lunch and poster session
14:00 - 14:12Paper presentation: Swati Goel, Ashton Anderson and Leila Zia: Thanks for Stopping By: A Study of “Thanks” Usage on Wikimedia
14:12 - 14:25Paper presentation: Ali Javanmardi and Lu Xiao: What’s in the Content of Wikipedia’s Article for Deletion Discussions? Towards a Visual Analytic Approach
14:25 - 15:10Invited talk: Erica Kochi
15:10 - 15:30Open discussion
15:30 - 16:00Coffee break
16:00 - 16:40Invited talk: Jure Leskovec
16:40 - 17:25Invited talk: Neil Thompson
17:25 - 17:30Closing remarks
Beyond Wikidata

Wikidata has quickly become a major wiki project. By becoming so, it has stretched what wikis can be successfully used for. We will take a look at the state of Wikidata, how it can help the Wikipedias (and other projects), and we'll discuss the question if we can take the wiki approach further to even more complex approaches, such as an Abstract Wikipedia.


Denny works at the Google Knowledge Graph. He previously has worked at the Karlsruhe Institute of Technology (2004-2012), the University of Southern California (2010), and as the project director of Wikidata at Wikimedia Deutschland (2012/13). His research interests are massive collaborative systems, knowledge bases, and the Semantic Web.

Erica Kochi (UNICEF)


Erica co-founded and co-leads UNICEF’s Innovation Unit, a group tasked with identifying, prototyping and scaling technologies and practices that improve UNICEF’s work on the ground. Erica also serves as Innovation Advisor to UNICEF’s Executive Director. Erica co-taught ‘Design for UNICEF’ at NYU’s ITP and has lectured at the Yale School of Management, Harvard University, The Art Center, Stanford University School of Engineering, and Columbia School of International and Public Affairs on technology, innovation, design, and international development.

Jure Leskovec (Stanford University)

Making Wikipedia Safer

Jure is an associate professor of Computer Science at Stanford University. His research focuses on mining and modeling large social and information networks, their evolution, and diffusion of information and influence over them. Problems he investigates are motivated by large scale data, the Web and online media.

Timnit Gebru (Google)

Understanding the Limitations of AI: When Algorithms Fail

Automated decision making tools are currently used in high stakes scenarios. From natural language processing tools used to automatically determine one’s suitability for a job, to health diagnostic systems trained to determine a patient’s outcome, machine learning models are used to make decisions that can have serious consequences on people’s lives. In spite of the consequential nature of these use cases, vendors of such models are not required to perform specific tests showing the suitability of their models for a given task. Nor are they required to provide documentation describing the characteristics of their models, or disclose the results of algorithmic audits to ensure that certain groups are not unfairly treated. I will show some examples to examine the dire consequences of basing decisions entirely on machine learning based systems, and discuss recent work on auditing and exposing the gender and skin tone bias found in commercial gender classification systems. I will end with the concept of an AI datasheet to standardize information for datasets and pre-trained models, in order to push the field as a whole towards transparency and accountability.


Timnit is a research scientist in the Ethical AI team at Google. Prior to that, she was a postdoc at Microsoft Research, New York, and a PhD student in the Stanford Artificial Intelligence Laboratory. She is currently studying the ethical considerations underlying any data mining project, and methods of auditing and mitigating bias in sociotechnical systems. The New York Times, MIT Tech Review and others have recently covered her work. As a cofounder of the group Black in AI, she works to both increase diversity in the field and reduce the negative impacts of racial bias in training data used for human- centric machine learning models.

Science is Shaped by Wikipedia: Evidence from a Randomized Control Trial

“I sometimes think that general and popular treatises are almost as important for the progress of science as original work.” — Charles Darwin
As the largest encyclopedia in the world, it is not surprising that Wikipedia reflects the state of scientific knowledge. However, Wikipedia is also one of the most accessed websites in the world, including by scientists, which suggests that it also has the potential to shape science. This paper shows that it does. Incorporating ideas into Wikipedia leads to those ideas being used more in the scientific literature. We provide correlational evidence of this across thousands of Wikipedia articles and causal evidence of it through a randomized control trial where we add new scientific content to Wikipedia. In the months after uploading it, an average new Wikipedia article on Chemistry is read tens of thousands of times and causes changes to hundreds of related scientific journal articles. Adding references to Wikipedia also has an effect, causing important scientific articles to get more citations. Our findings speak not only to the influence of Wikipedia, but more broadly to the influence of repositories of knowledge and the role that they play in Science.


Neil is a Research Scientist at MIT’s Computer Science and Artificial Intelligence Lab and a Visiting Professor at the Lab for Innovation Science at Harvard. He is also an Associate Member of the Broad Institute, and was previously an Assistant Professor of Innovation and Strategy at the MIT Sloan School of Management, where he co-directed the Experimental Innovation Lab (X-Lab). Neil did his PhD in Business and Public Policy at Berkeley. Prior to academia, he worked at organizations such as Lawrence Livermore National Laboratories, Bain and Company, the United Nations, the World Bank, and the Canadian Parliament.

Preeti Bhargava, Nemanja Spasojevic, Sarah Ellinger, Adithya Rao, Abhinand Menon, Saul Fuhrmann and Guoning Hu
Learning to Map Wikidata Entities to Predefined Topics [PDF]
Ali Javanmardi and Lu Xiao
What’s in the Content of Wikipedia’s Article for Deletion Discussions? Towards a Visual Analytic Approach [PDF]
Xiaoxi Chelsy Xie, Isaac Johnson and Anne Gomez
Detecting and Gauging Impact on Wikipedia Page Views [PDF]
Shaunak Mishra, Aasish Pappu and Narayan Bhamidipati
Inferring Advertiser Sentiment in Online Articles using Wikipedia Footnotes [PDF]
Mohsen Sayyadiharikandeh, Jonathan Gordon, Jose-Luis Ambite and Kristina Lerman
Finding Prerequisite Relations Using the Wikipedia Clickstream [PDF]
James Ashford, Liam Turner, Roger Whitaker, Alun Preece, Diane Felmlee and Don Towsley
Understanding the Signature of Controversial Wikipedia Articles through Motifs in Editor Revision Networks [PDF]
Chuankai An and Daniel Rockmore
Open Personalized Navigation on the Sandbox of Wiki Pages [PDF]
Swati Goel, Ashton Anderson and Leila Zia
Thanks for Stopping By: A Study of “Thanks” Usage on Wikimedia [PDF]
Nicolas Aspert, Volodymyr Miz, Benjamin Ricaud and Pierre Vandergheynst
A Graph-Structured Dataset for Wikipedia Research [PDF]
Gil Domingues and Carla Teixeira Lopes
Characterizing and Comparing Portuguese and English Wikipedia Medicine-Related Articles
Chander Iyer and Srinath Ravindran
Understanding Travel from Web Queries Using Domain Knowledge from Wikipedia [PDF]
Khonzodakhon Umarova and Eni Mustafaraj
How Partisanship and Perceived Political Bias Affect Wikipedia Entries of News Sources [PDF]
Charlotte Rudnik, Thibault Ehrhart, Olivier Ferret, Denis Teyssou, Raphaël Troncy and Xavier Tannier
Searching News Articles Using an Event Knowledge Graph Leveraged by Wikidata [PDF]
Iris Qu, Nithum Thain and Yiqing Hua
WikiDetox Visualization [PDF]
Olga Slivko
Online “Brain Gain”: Do Immigrants Return Knowledge Home? [PDF]
Lei Zheng, Christopher M. Albano and Jeffrey V. Nickerson
Steps toward Understanding the Design and Evaluation Spaces of Bot and Human Knowledge Production Systems [PDF]
Cristian Consonni, David Laniado and Alberto Montresor
Discovering Topical Contexts from Links in Wikipedia

Workshop date: Tuesday, May 14, 2019

If authors want paper to appear in proceedings:

  • Submission deadline: January 31, 2019
  • Author feedback: February 21, 2019
  • Camera-ready version due: March 3, 2019

If authors do not want paper to appear in proceedings:

  • Submission deadline: March 14, 2019
  • Author feedback: March 28, 2019
Note: If you need a visa to travel to U.S. and your application for the visa depends on your workshop paper being accepted, we would advise you to submit your workshop paper for the January 31 deadline.

Wikipedia is one of the most popular sites on the Web, a main source of knowledge for a large fraction of Internet users, and one of the very few projects that make not only their content but also many activity logs available to the public. Furthermore, other Wikimedia projects, such as Wikidata and Wikimedia Commons, have been created to share other types of knowledge with the world for free. For a variety of reasons (quality and quantity of content, reach in many languages, process of content production, availability of data, etc.) such projects have become important objects of study for researchers across many subfields of the computational and social sciences, such as social network analysis, artificial intelligence, linguistics, natural language processing, social psychology, education, anthropology, political science, human–computer interaction, and cognitive science.

The goal of this workshop is to bring together researchers exploring all aspects of Wikimedia projects such as Wikipedia, Wikidata, and Commons. With members of the Wikimedia Foundation's Research team on the organizing committee and with the experience of successful workshops in 2015, 2016, 2017, and 2018, we aim to continue facilitating a direct pathway for exchanging ideas between the organization that coordinates Wikimedia projects and the researchers interested in studying them.

Topics of interest include, but are not limited to

  • new technologies and initiatives to grow content, quality, diversity, and participation across Wikimedia projects
  • use of bots, algorithms, and crowdsourcing strategies to curate, source, or verify content and structured data
  • bias in content and gaps of knowledge
  • diversity of Wikimedia editors and users
  • detection of low-quality, promotional, or fake content, as well as fake accounts (e.g., sock puppets)
  • questions related to community health (e.g., sentiment analysis, harassment detection)
  • understanding editor motivations, engagement models, and incentives
  • Wikimedia consumer motivations and their needs: readers, researchers, tool/API developers
  • innovative uses of Wikipedia and other Wikimedia projects for AI and NLP applications
  • consensus-finding and conflict resolution on editorial issues
  • participation in discussions and their dynamics
  • dynamics of content reuse across projects and the impact of policies and community norms on reuse
  • privacy
  • collaborative content creation (unstructured, semi-structured, or structured)
  • innovative uses of Wikimedia projects' content and consumption patterns as sensors for real-world events, culture, etc.
  • open-source research code, datasets, and tools to support research on Wikimedia contents and communities

Papers should be 1 to 8 pages long and will be published on the workshop webpage and optionally (depending on the authors' choice) in the workshop proceedings. The review process will be single-blind (as opposed to double-blind), i.e., authors should include their names and affiliations in their submissions. Authors whose papers are accepted to the workshop will have the opportunity to participate in a poster session.

We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages).

Papers should be 1 to 8 pages long. We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages). No need to anonymize your submissions.

For submission dates, see above.

  • Michele Catasta, Stanford University
  • Lucas Dixon, Jigsaw
  • Besnik Fetahu, L3S Hannover
  • Andrea Forte, Drexel University
  • Gary Hsieh, University of Washington
  • Yiqing Hua, Cornell University
  • Isaac Johnson, Wikimedia Foundation
  • Os Keyes, University of Washington
  • Markus Kroetzsch, University of Dresden
  • Florian Lemmerich, RWTH Aachen University
  • Lauren Maggio, Uniformed Services University
  • David McDonald, University of Washington
  • Jonathan Morgan, Wikimedia Foundation
  • André Panisson, ISI Foundation
  • Daniela Paolotti, ISI Foundation
  • Tiziano Piccardi, EPFL
  • Dario Rossi, Huawei
  • Diego Saez-Trumper, Wikimedia Foundation
  • Markus Strohmaier, RWTH Aachen University
  • Nithum Thain, Jigsaw
  • Michele Tizzoni, ISI Foundation
  • Morten Warncke-Wang, Wikimedia Foundation
  • Joe Wass, Crossref
  • Ramtin Yazdanian, EPFL
  • Amy Zhang, MIT

Robert West

Bob is an assistant professor of Computer Science at EPFL, where he heads the Data Science Lab. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. He holds a PhD in computer science from Stanford University.

Miriam Redi

Miriam is a Research Scientist at the Wikimedia Foundation and Visiting Research Fellow at King's College London. Formerly, she worked as a Research Scientist at Yahoo! Labs in Barcelona and Nokia Bell Labs in Cambridge. She received her PhD from EURECOM, Sophia Antipolis. She conducts research in social multimedia computing, working on fair, interpretable, multimodal machine learning solutions to improve knowledge equity.

Dario Taraborelli

Dario is a social computing researcher and the Wikimedia Foundation's Head of Research. His current interests focus on online collaboration, open science, and the measurement and discoverability of scientific knowledge. He holds a PhD in cognitive science from the École des Hautes Études en Sciences Sociales.

Please direct your questions to wikiworkshopgooglegroupscom.