Wiki Workshop 2020

A forum bringing together researchers exploring all aspects of Wikipedia, Wikidata, and other Wikimedia projects. Held at The Web Conference 2020 in Taipei, Taiwan, 21 April 2020.

  • Feb. 11, 2020: Benjamin Mako Hill (University of Washington) confirmed as invited speaker.
  • Jan. 30, 2020: Misha Teplitskiy (University of Michigan) confirmed as invited speaker.
  • Jan. 30, 2020: Kristina Lerman (USC ISI) confirmed as invited speaker.
  • Jan. 30, 2020: Mark Graham (Internet Archive) confirmed as invited speaker.
  • Dec. 4, 2019: Workshop date announced: Tuesday, 21 April 2020.
  • Nov. 15, 2019: Wiki Workshop 2020 webpage online.

We will have a series of invited talks by academia and industry experts, as well as a combination of lightning talks and a poster session for the accepted papers.

Details to be determined.

All times are UTC! That is, the start time is 5:00am PST, 15:00 CEST, etc.

13:20-14:05Keynote (TBD)
14:05-14:15Creative time!
14:15-15:00A conversation with Mark Graham
15:00-15:05More creative time!
15:05-15:50A conversation with Misha Teplitskiy and Kristina Lerman
16:00-17:15Featured talks and lightning talks
17:15-18:00Virtual posters: thematic meetings in breakout rooms
18:00-18:05Conclusing remarks
18:05-19:30Appendix: one-on-one meetings

More speakers to be announced soon—stay tuned!

Misha Teplitskiy (University of Michigan)

Wisdom of Polarized Crowds
As political polarization in the United States continues to rise, the question of whether polarized individuals can fruitfully cooperate becomes pressing. Although diverse perspectives typically lead to superior team performance on complex tasks, strong political perspectives have been associated with conflict, misinformation and a reluctance to engage with people and ideas beyond one’s echo chamber. Here, we explore the effect of ideological composition on team performance by analysing millions of edits to Wikipedia’s political, social issues and science articles. We measure editors’ online ideological preferences by how much they contribute to conservative versus liberal articles. Editor surveys suggest that online contributions associate with offline political party affiliation and ideological self-identity. Our analysis reveals that polarized teams consisting of a balanced set of ideologically diverse editors produce articles of a higher quality than homogeneous teams. The effect is most clearly seen in Wikipedia’s political articles, but also in social issues and even science articles. Analysis of article ‘talk pages’ reveals that ideologically polarized teams engage in longer, more constructive, competitive and substantively focused but linguistically diverse debates than teams of ideological moderates. More intense use of Wikipedia policies by ideologically diverse teams suggests institutional design principles to help unleash the power of polarization.

Misha Teplitskiy is an Assistant Professor at the School of Information, University of Michigan. His research is at the intersection of Science of Science + Sociology of Organizations + Computational Social Science. He studies how social and organizational factors affect scientific discovery. He is especially interested in evaluation practices in science, and whether they promote or stifle innovation. My approach relies primarily on field experiments -- interventions in scientific competitions and other settings -- and applying computational tools to large-scale observational data. Previously, he was a postdoc at the Laboratory for Innovation Science at Harvard (LISH). He received his PhD in Sociology from the University of Chicago, where he was a member of KnowledgeLab.

Mark Graham (Internet Archive)

How the Internet Archive is helping to make the Web more useful and reliable with the Wayback Machine and projects like Weaving Books into the Web, starting with Wikipedia
The Internet Archive is a San Francisco based non-profit dedicated to the mission of Universal Access to All Knowledge. It is best known for the Wayback Machine, which has been archiving much of the public Web for the past 24 years. To date more than 900 billion URLs have been archived and nearly a billion are added every day. But the Internet Archive is engaged in many more projects to digitize analog material, and preserve and make available digital material. In this session we will update several of those efforts, with a focus on projects aimed at helping to strengthen the global information ecosystem by adding persistent Web URLs to services, by adding links to digital books and academic papers as cited in Wikipedia articles, and to supporting research projects and services working to address the issues of mis- and disinformation.

Mark Graham has created and managed innovative online products and services since 1984. As Director of the Wayback Machine he is responsible for capturing, preserving and helping people discover and use, more than 1 billion new web captures each week. Previously, Mark was Senior Vice President with NBC News, Senior Vice President of Technology with iVillage, and a co-founder of Rojo Networks. In the early days of the net he managed technology and business development at The WELL and also helped bring the pre-web Internet to millions of people by running AOL's Gopher project as part of their Internet Center. He managed technology for the pioneering US-Soviet Sovam Teleport email service and co-founded and managed PeaceNet,, and Mark's early training and experience with computer-mediated communications was acquired while he served in the US Air Force, spending more than 3 years working at the Air Force Data Services Center at the Pentagon.

Kristina Lerman (USC ISI)

Title and abstract TBA

Kristina Lerman is a Project Leader at the Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Viterbi School of Engineering's Computer Science Department. Her research focuses on applying network- and machine learning-based methods to problems in social computing.

Benjamin Mako Hill (University of Washington)

The Growth and Decline of Digital Knowledge Commons
After increasing rapidly over seven years, the number of active contributors to English Wikipedia peaked in 2007 and has been in decline since. Of course, Wikipedia is only one example of "peer production"—a model of collaborative production that also lies behind millions of wikis, free/open source software projects, websites like OpenStreetMap, and more. Unfortunately, there is evidence that English Wikipedia's pattern of growth and decline occurs in these other efforts as well. A body of emerging scholarship suggests that decline in projects' contributor bases tends to coincide with a shift from the lightweight governance and porous boundaries closely associated with peer production to less open forms of organization.
Why would successful peer production communities become less open in ways that cause a decline in their contributor bases?Drawing from research into collective action and public goods as well as Ostrom's work on common-pool resources, I will present a theoretical model that suggests an answer. I will argue that peer production projects' success at building valuable knowledge commons drives both a virtuous cycle of good contributions as well as well an influx of bad-faith actors in a dynamic set of relationships that leads communities to become increasingly closed in order to protect the knowledge bases they have built. I will end by presenting a range of empirical evidence in support of the model and by discussing some of its implications.

Benjamin Mako Hill is an Assistant Professor in the University of Washington Department of Communication, an Adjunct Assistant Professor in the departments of Human-Centered Design and Engineering and Computer Science and Engineering, and Affiliate Faculty in the Center for Statistics and the Social Sciences, the eScience Institute, and the "Design Use Build" (DUB) group that supports research on on human computer interaction. He is also a Faculty Associate at the Berkman Klein Center for Internet and Society at Harvard University and an affiliate of the Institute of Quantitative Social Science at Harvard. Mako studies collective action in online communities and seeks to understand why some attempts at collaborative production — like Wikipedia and Linux — build large volunteer communities while the vast majority never attract even a second contributor. His research is deeply interdisciplinary, consists primarily of "big data" quantitative analyses, and lies at the intersection of communication, human-computer interaction, and sociology. Mako received his PhD from MIT.

Workshop date: Tuesday, 21 April 2020

If authors want paper to appear in proceedings:

  • Submission deadline: 17 January 2020
  • Author feedback: 3 February 2020
  • Camera-ready version due: 17 February 2020

If authors do not want paper to appear in proceedings:

  • Submission deadline: 21 February 2020
  • Author feedback: 6 March 2020
Note: If you need a visa to travel to Taiwan and your application for the visa depends on your workshop paper being accepted, we would advise you to submit your workshop paper for the 17 January deadline. (You could still opt for not having your paper included in the proceedings.)

Wikipedia is one of the most popular sites on the Web, a main source of knowledge for a large fraction of Internet users, and one of the very few projects that make not only their content but also many activity logs available to the public. Furthermore, other Wikimedia projects, such as Wikidata and Wikimedia Commons, have been created to share other types of knowledge with the world for free. For a variety of reasons (quality and quantity of content, reach in many languages, process of content production, availability of data, etc.) such projects have become important objects of study for researchers across many subfields of the computational and social sciences, such as social network analysis, artificial intelligence, linguistics, natural language processing, social psychology, education, anthropology, political science, human–computer interaction, and cognitive science.

The goal of this workshop is to bring together researchers exploring all aspects of Wikimedia projects such as Wikipedia, Wikidata, and Commons. With members of the Wikimedia Foundation's Research team on the organizing committee and with the experience of successful workshops in 2015, 2016, 2017, 2018, and 2019, we aim to continue facilitating a direct pathway for exchanging ideas between the organization that coordinates Wikimedia projects and the researchers interested in studying them.

Topics of interest include, but are not limited to

  • new technologies and initiatives to grow content, quality, diversity, and participation across Wikimedia projects
  • use of bots, algorithms, and crowdsourcing strategies to curate, source, or verify content and structured data
  • bias in content and gaps of knowledge
  • diversity of Wikimedia editors and users
  • detection of low-quality, promotional, or fake content, as well as fake accounts (e.g., sock puppets)
  • questions related to community health (e.g., sentiment analysis, harassment detection)
  • understanding editor motivations, engagement models, and incentives
  • Wikimedia consumer motivations and their needs: readers, researchers, tool/API developers
  • innovative uses of Wikipedia and other Wikimedia projects for AI and NLP applications
  • consensus-finding and conflict resolution on editorial issues
  • participation in discussions and their dynamics
  • dynamics of content reuse across projects and the impact of policies and community norms on reuse
  • privacy
  • collaborative content creation (unstructured, semi-structured, or structured)
  • innovative uses of Wikimedia projects' content and consumption patterns as sensors for real-world events, culture, etc.
  • open-source research code, datasets, and tools to support research on Wikimedia contents and communities

Papers should be 1 to 8 pages long and will be published on the workshop webpage and optionally (depending on the authors' choice) in the workshop proceedings. The review process will be single-blind (as opposed to double-blind), i.e., authors should include their names and affiliations in their submissions. Authors whose papers are accepted to the workshop will have the opportunity to participate in a poster session.

We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages).

Papers should be 1 to 8 pages long. We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages). No need to anonymize your submissions.

For submission dates, see above.

  • Pushkal Agarwal, King's College London
  • Giovanni Colavizza, University of Amsterdam
  • Martin Gerlach, Wikimedia Foundation
  • Kristina Gligorić, EPFL
  • Isaac Johnson, Wikimedia Foundation
  • Markus Krötzsch, University of Dresden
  • Florian Lemmerich, RWTH Aachen University
  • Jonathan Morgan, Wikimedia Foundation
  • Maxime Peyrard, EPFL
  • Tiziano Piccardi, EPFL
  • Diego Saez-Trumper, Wikimedia Foundation
  • Morten Warncke-Wang, Wikimedia Foundation
  • Ramtin Yazdanian, EPFL
  • Amy Zhang, MIT

Miriam Redi

Miriam is a Research Scientist at the Wikimedia Foundation and Visiting Research Fellow at King's College London. Formerly, she worked as a Research Scientist at Yahoo! Labs in Barcelona and Nokia Bell Labs in Cambridge. She received her PhD from EURECOM, Sophia Antipolis. She conducts research in social multimedia computing, working on fair, interpretable, multimodal machine learning solutions to improve knowledge equity.

Leila Zia

Leila is a senior research scientist at the Wikimedia Foundation. Her current research interests are on understanding Wikipedia's readers, quantifying and addressing the gaps of knowledge in Wikipedia and Wikidata, and understanding and improving diversity in Wikipedia. She holds a PhD in management science and engineering from Stanford University.

Robert West

Bob is an assistant professor of Computer Science at EPFL, where he heads the Data Science Lab. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. He holds a PhD in computer science from Stanford University.

Please direct your questions to wikiworkshopgooglegroupscom.