Featured Product
This Week in Quality Digest Live
Innovation Features
David L. Chandler
Greater sensitivity and speed for industrial inspection, airport security, and communications
Anton Ovchinnikov
The push-pull effects of customization
Robert Zaruda
Connecting education and industry to create careers
Christopher Dancy
Not binding, not enforceable, but a step in the right direction
Scott Ginsberg
A balancing act between getting new technology and bringing it to your people

More Features

Innovation News
High-performance polymers for use in high-temperature range developed for 2PP 3D printing
Technique may enable energy-efficient 3D printing of blades for gas turbines or jet engines
Easily synthesized chemical filter could stop greenhouse gas from reaching atmosphere
New medical product from Canon’s Video Sensing Division
Features include flexible installation and fast measurement
Guidance for deploying digital twins safely and securely
Electric power is replacing diesel and hydraulics in construction industry

More News

Vanessa Bates Ramirez


Meta Is Building an AI to Fact-Check Wikipedia

That’s right—all 6.5 million articles

Published: Thursday, September 29, 2022 - 11:02

Most people older than 30 probably remember doing research with good old-fashioned encyclopedias. You’d pull a heavy volume from the shelf, check the index for your topic of interest, then flip to the appropriate page and start reading. It wasn’t as easy as typing a few words into the Google search bar, but on the plus side, you knew that the information you found in the pages of the Britannica or the World Book was accurate and true.

Not so with internet research today. The overwhelming multitude of sources is confusing enough, but with the proliferation of misinformation it’s a wonder any of us believe a word we read online.

Wikipedia is a case in point. As of early 2020, the site’s English version was averaging about 255 million page views per day, making it the eighth-most-visited website on the internet. As of last month, it had moved up to spot No. 7, and the English version currently has more than 6.5 million articles.

But as high-traffic as this go-to information source may be, its accuracy leaves something to be desired. The site’s page on its own reliability states, “The online encyclopedia does not consider itself to be reliable as a source and discourages readers from using it in academic or research settings.”

Meta—formerly Facebook—wants to change this. In a blog post published last month, the company’s employees describe how AI could help make Wikipedia more accurate.

Though tens of thousands of people participate in editing the site, the facts they add aren’t necessarily correct; even when citations are present, they’re not always accurate nor even relevant.

Meta is developing a machine learning model that scans these citations and cross-references their content to Wikipedia articles to verify that not only the topics line up, but specific figures cited are also accurate.

This isn’t just a matter of picking out numbers and making sure they match. Meta’s AI will need to “understand” the content of cited sources. However, “understand” is a misnomer, as complexity theory researcher Melanie Mitchell would tell you, because AI is still in the “narrow” phase, meaning it’s a tool for highly sophisticated pattern recognition, while “understanding” is a word used for human cognition, which is still a very different thing.

Meta’s model will “understand” content not by comparing text strings and making sure they contain the same words, but by comparing mathematical representations of blocks of text, which it arrives at using natural language understanding (NLU) techniques.

“What we have done is to build an index of all these web pages by chunking them into passages and providing an accurate representation for each passage,” Fabio Petroni, Meta’s Fundamental AI Research tech lead manager, tells Digital Trends. “That is not representing word-by-word the passage, but the meaning of the passage. That means that two chunks of text with similar meanings will be represented in a very close position in the resulting n-dimensional space where all these passages are stored.”

The AI is being trained on a set of four million Wikipedia citations, and besides picking out faulty citations on the site, its creators would like it to eventually be able to suggest accurate sources to take their place, pulling from a massive index of data that’s continuously updating.

One big issue left to work out is a grading system for sources’ reliability. A paper from a scientific journal, for example, would receive a higher grade than a blog post. The amount of content online is so vast and varied that you can find “sources” to support just about any claim. But parsing the misinformation from the disinformation (the former means incorrect, while the latter means deliberately deceiving), the peer-reviewed from the nonpeer-reviewed, and the fact-checked from the hastily slapped-together is no small task. But it’s a very important one when it comes to trust.

Meta has open-sourced its model, and those who are curious can see a demo of the verification tool. Meta’s blog post noted that the company isn’t partnering with Wikimedia on this project, and that it’s still in the research phase and not currently being used to update content on Wikipedia.

If you imagine a not-too-distant future where everything you read on Wikipedia is accurate and reliable, wouldn’t that make doing any sort of research a bit too easy? There’s something valuable about checking and comparing various sources ourselves, isn't there? It was a big a leap to go from paging through heavy books to typing a few words into a search engine and hitting the “Enter” key. Do we really want Wikipedia to move from a research jumping-off point to a gets-the-last-word source?

In any case, Meta’s AI research team will continue working toward a tool to improve the online encyclopedia. “I think we were driven by curiosity at the end of the day,” Petroni says. “We wanted to see what was the limit of this technology. We were absolutely not sure if [this AI] could do anything meaningful in this context. No one had ever tried to do something similar.”

First published August 26, 2022, on Singularity Hub.


About The Author

Vanessa Bates Ramirez’s picture

Vanessa Bates Ramirez

Vanessa Bates Ramirez is senior editor of Singularity Hub. She’s interested in biotechnology and genetic engineering, the nitty-gritty of the renewable energy transition, the roles technology and science play in geopolitics and international development, and countless other topics.