A forum bringing together researchers exploring all aspects of Wikipedia, Wikidata, and other Wikimedia projects. Held at The Web Conference 2019 in San Francisco, Calif., May 14, 2019.
9:00 - 9:20 | Welcome and icebreaking |
9:20 - 10:05 | Invited talk: Denny Vrandečić |
10:05 - 10:17 | Paper presentation: Mohsen Sayyadiharikandeh, Jonathan Gordon, Jose-Luis Ambite and Kristina Lerman: Finding Prerequisite Relations Using the Wikipedia Clickstream |
10:17 - 10:30 | Paper presentation: Xiaoxi Chelsy Xie, Isaac Johnson and Anne Gomez: Detecting and Gauging Impact on Wikipedia Page Views |
10:30 - 11:00 | Coffee break |
11:00 - 11:45 | Invited talk: Timnit Gebru |
11:45 - 12:25 | Poster spotlight presentations |
12:25 - 12:30 | Poster setup |
12:30 - 14:00 | Lunch and poster session |
14:00 - 14:12 | Paper presentation: Swati Goel, Ashton Anderson and Leila Zia: Thanks for Stopping By: A Study of “Thanks” Usage on Wikimedia |
14:12 - 14:25 | Paper presentation: Ali Javanmardi and Lu Xiao: What’s in the Content of Wikipedia’s Article for Deletion Discussions? Towards a Visual Analytic Approach |
14:25 - 15:10 | Invited talk: Erica Kochi |
15:10 - 15:30 | Open discussion |
15:30 - 16:00 | Coffee break |
16:00 - 16:40 | Invited talk: Jure Leskovec |
16:40 - 17:25 | Invited talk: Neil Thompson |
17:25 - 17:30 | Closing remarks |
Wikidata has quickly become a major wiki project. By becoming so, it has stretched what wikis can be successfully used for. We will take a look at the state of Wikidata, how it can help the Wikipedias (and other projects), and we'll discuss the question if we can take the wiki approach further to even more complex approaches, such as an Abstract Wikipedia.
Denny works at the Google Knowledge Graph. He previously has worked at the Karlsruhe Institute of Technology (2004-2012), the University of Southern California (2010), and as the project director of Wikidata at Wikimedia Deutschland (2012/13). His research interests are massive collaborative systems, knowledge bases, and the Semantic Web.
Erica co-founded and co-leads UNICEF’s Innovation Unit, a group tasked with identifying, prototyping and scaling technologies and practices that improve UNICEF’s work on the ground. Erica also serves as Innovation Advisor to UNICEF’s Executive Director. Erica co-taught ‘Design for UNICEF’ at NYU’s ITP and has lectured at the Yale School of Management, Harvard University, The Art Center, Stanford University School of Engineering, and Columbia School of International and Public Affairs on technology, innovation, design, and international development.
Jure is an associate professor of Computer Science at Stanford University. His research focuses on mining and modeling large social and information networks, their evolution, and diffusion of information and influence over them. Problems he investigates are motivated by large scale data, the Web and online media.
Automated decision making tools are currently used in high stakes scenarios. From natural language processing tools used to automatically determine one’s suitability for a job, to health diagnostic systems trained to determine a patient’s outcome, machine learning models are used to make decisions that can have serious consequences on people’s lives. In spite of the consequential nature of these use cases, vendors of such models are not required to perform specific tests showing the suitability of their models for a given task. Nor are they required to provide documentation describing the characteristics of their models, or disclose the results of algorithmic audits to ensure that certain groups are not unfairly treated. I will show some examples to examine the dire consequences of basing decisions entirely on machine learning based systems, and discuss recent work on auditing and exposing the gender and skin tone bias found in commercial gender classification systems. I will end with the concept of an AI datasheet to standardize information for datasets and pre-trained models, in order to push the field as a whole towards transparency and accountability.
Timnit is a research scientist in the Ethical AI team at Google. Prior to that, she was a postdoc at Microsoft Research, New York, and a PhD student in the Stanford Artificial Intelligence Laboratory. She is currently studying the ethical considerations underlying any data mining project, and methods of auditing and mitigating bias in sociotechnical systems. The New York Times, MIT Tech Review and others have recently covered her work. As a cofounder of the group Black in AI, she works to both increase diversity in the field and reduce the negative impacts of racial bias in training data used for human- centric machine learning models.
“I sometimes think that general and popular treatises are almost as important for the progress of science as original work.” — Charles Darwin
As the largest encyclopedia in the world, it is not surprising that Wikipedia reflects the state of scientific knowledge. However, Wikipedia is also one of the most accessed websites in the world, including by scientists, which suggests that it also has the potential to shape science. This paper shows that it does.
Incorporating ideas into Wikipedia leads to those ideas being used more in the scientific literature. We provide correlational evidence of this across thousands of Wikipedia articles and causal evidence of it through a randomized control trial where we add new scientific content to Wikipedia. In the months after uploading it, an average new Wikipedia article on Chemistry is read tens of thousands of times and causes changes to hundreds of related scientific journal articles. Adding references to Wikipedia also has an effect, causing important scientific articles to get more citations. Our findings speak not only to the influence of Wikipedia, but more broadly to the influence of repositories of knowledge and the role that they play in Science.
Neil is a Research Scientist at MIT’s Computer Science and Artificial Intelligence Lab and a Visiting Professor at the Lab for Innovation Science at Harvard. He is also an Associate Member of the Broad Institute, and was previously an Assistant Professor of Innovation and Strategy at the MIT Sloan School of Management, where he co-directed the Experimental Innovation Lab (X-Lab). Neil did his PhD in Business and Public Policy at Berkeley. Prior to academia, he worked at organizations such as Lawrence Livermore National Laboratories, Bain and Company, the United Nations, the World Bank, and the Canadian Parliament.
Workshop date: Tuesday, May 14, 2019
If authors want paper to appear in proceedings:
If authors do not want paper to appear in proceedings:
Wikipedia is one of the most popular sites on the Web, a main source of knowledge for a large fraction of Internet users, and one of the very few projects that make not only their content but also many activity logs available to the public. Furthermore, other Wikimedia projects, such as Wikidata and Wikimedia Commons, have been created to share other types of knowledge with the world for free. For a variety of reasons (quality and quantity of content, reach in many languages, process of content production, availability of data, etc.) such projects have become important objects of study for researchers across many subfields of the computational and social sciences, such as social network analysis, artificial intelligence, linguistics, natural language processing, social psychology, education, anthropology, political science, human–computer interaction, and cognitive science.
The goal of this workshop is to bring together researchers exploring all aspects of Wikimedia projects such as Wikipedia, Wikidata, and Commons. With members of the Wikimedia Foundation's Research team on the organizing committee and with the experience of successful workshops in 2015, 2016, 2017, and 2018, we aim to continue facilitating a direct pathway for exchanging ideas between the organization that coordinates Wikimedia projects and the researchers interested in studying them.
Topics of interest include, but are not limited to
Papers should be 1 to 8 pages long and will be published on the workshop webpage and optionally (depending on the authors' choice) in the workshop proceedings. The review process will be single-blind (as opposed to double-blind), i.e., authors should include their names and affiliations in their submissions. Authors whose papers are accepted to the workshop will have the opportunity to participate in a poster session.
We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages).
Papers should be 1 to 8 pages long. We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages). No need to anonymize your submissions.
For submission dates, see above.
Bob is an assistant professor of Computer Science at EPFL, where he heads the Data Science Lab. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. He holds a PhD in computer science from Stanford University.
Miriam is a Research Scientist at the Wikimedia Foundation and Visiting Research Fellow at King's College London. Formerly, she worked as a Research Scientist at Yahoo! Labs in Barcelona and Nokia Bell Labs in Cambridge. She received her PhD from EURECOM, Sophia Antipolis. She conducts research in social multimedia computing, working on fair, interpretable, multimodal machine learning solutions to improve knowledge equity.
Dario is a social computing researcher and the Wikimedia Foundation's Head of Research. His current interests focus on online collaboration, open science, and the measurement and discoverability of scientific knowledge. He holds a PhD in cognitive science from the École des Hautes Études en Sciences Sociales.
Please direct your questions to wikiworkshopgooglegroupscom.