PROMISE: Our kitties will never sit on top of content. Please turn off your ad blocker for our site.
puuuuuuurrrrrrrrrrrr
Adam Conner-Simons
Published: Wednesday, February 10, 2021 - 13:03 This story was originally published by MIT Computer Science & Artificial Intelligence Lab (CSAIL). Scatterplots. You may not know them by name, but if you spend more than 10 minutes online, you’ll find them everywhere. They’re popular in news articles, in the data science community, and perhaps most crucially, for internet memes about the digestive quality of pancakes. By depicting data as a mass of points across two axes, scatterplots are effective in visualizing trends, correlations, and anomalies. But using them for large datasets often leads to overlapping dots that make them more or less unreadable. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) say they’ve solved this with a new open-source system that makes it possible to create interactive scatterplots based on large-scale datasets that have upward of billions of distinct data points. Called “Kyrix-S,” the system has an interface that allows users to pan, zoom, and jump around a scatterplot as if they were looking at directions on Google Maps. Whereas other systems developed for large datasets often focus on very specific applications, Kyrix-S is generalizable enough to work for a wide range of visualization styles, including heatmaps, pie charts, and radar-style graphics. (The team showed that the system allows users to create visualizations with 800 times less code compared to a similar state-of-the-art authoring system.) Users can produce a scatterplot by just writing a few dozen lines of JSON, a human-readable text format. For example, the system turns this: ... into this: Lead developer Wenbo Tao, a Ph.D. student at MIT CSAIL, gives the following example of a static scatterplot from The New York Times (below) that he says would improve by being made interactive via a system like Kyrix-S. “In these scatterplots, you are able to see overall trends and outliers, but the overplotting and the static nature of the plot limit the user’s ability to interact with the chart,” says Tao. In contrast, Kyrix-S can produce a version (below) that puts data in several zoom levels, enabling interaction with each county. To avoid overplotting, Kyrix-S’ scatterplot also shows only the most important examples, like the most populous counties. “As a visualization researcher, I am constantly at the edge of data sizes that are possible to visualize, which forces me to summarize or partition my data to get any insights,” says Kristi Potter, a data visualization scientist at the National Renewable Energy Laboratory who was not involved in the research. “With Kyrix-S, it’s possible to use all of the data, providing much more confidence in visualizations of large-scale data.” Kyrix-S is currently being used by Data Civilizer 2.0, a data integration platform developed at MIT. An earlier version was also employed to help Massachusetts General Hospital analyze a massive brain activity dataset (EEG) that clocks in at 30 terabytes—the equivalent of more than 50,000 hours of digital music. (The goal of that study was to train a model that predicts seizures, given a series of two-second EEG segments.) Moving forward, the researchers will be adapting Kyrix-S to work as part of a graphical user interface. They also plan to add functionality so that the system can handle data that are being continuously updated. Reprinted with permission of MIT CSAIL News. Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Adam Conner-Simons is a communications professional, consultant, and content creator who has 15+ years of experience in journalism and public relations. He oversees communications and public relations for MIT’s largest research lab, the Computer Science and Artificial Intelligence Lab (CSAIL) leading all efforts related to media outreach, digital strategy, social media, web content, as well as speechwriting, and translating difficult concepts for general audiences. He regularly speaks about communications and media relations at conferences. As a freelance writer he contributes regularly to outlets such as The New York Times, Slate Magazine, and The Boston Globe.Less Scatterbrained Scatterplots
An open-source system makes it possible to create interactive scatterplots of large datasets
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Adam Conner-Simons
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Comments
Great article (did I find a typo?)
Hi!
Very interesting and useful article.
In the following line, I think the "800" could be a typo because 100% is the largest reduction possible from an original quantity.
"The team showed that the system allows users to create visualizations with 800-percent less code compared to a similar state-of-the-art authoring system."
Thank you.
Nice catch