Featured Product
This Week in Quality Digest Live
Innovation Features
Jill Roberts
Another way to know what’s too old to eat
Gregory Way
Drug designers are rethinking their response to medications that affect multiple targets
Del Williams
8-in. cable and disc systems are comparable to belt or bucket systems
Edmund Andrews
For creative collaboration, sometimes you can’t beat a face-to-face meeting
Steven Brown
21st-century standard candles at NIST

More Features

Innovation News
Virtual reality training curriculum prepares organizations for rapid transformation
Meet the latest generation of LC xx6 encoders
Maximum work envelope in a small footprint
On-demand pipe flow measurement, no process interruptions
Products range from software to scanners
Four new models available in TVM Series

More News

Knowledge at Wharton

Innovation

The Democratization of Machine Learning

No longer exclusive to big companies, machine learning is accelerating and taking business with it

Published: Thursday, May 11, 2017 - 12:01

The world of high-tech innovation can change the destiny of industries seemingly overnight. Now we are on the cusp of a new grand leap, thanks to the democratization of machine learning, a form of artificial intelligence that enables computers to learn without being explicitly programmed. This process is already underway, according to Kartik Hosanagar, a Wharton School of Business professor of operations, information and decisions, and a cofounder of Yodle Inc.; and Apoorv Saxena, a product manager at Google and co-chair of the recent AI Frontiers conference.

Last month, at the CloudNext conference in San Francisco, Google announced its acquisition of Kaggle, an online community for data scientists and machine-learning competitions. Although the move may seem far removed from Google’s core businesses, it speaks to the skyrocketing industry interest in machine learning (ML). Kaggle not only gives Google access to a talented community of data scientists, but also to one of the largest repositories of datasets that will help train the next generation of machine-learning algorithms.

As ML algorithms solve bigger and more complex problems, such as language translation and image understanding, training them can require massive amounts of pre-labeled data. To increase access to such data, Google had previously released a labeled dataset created from more than 7 million YouTube videos as part of its YouTube-8M challenge on Kaggle. The acquisition of Kaggle is an interesting next step.

Market-based access to data and algorithms will lower entry barriers and lead to an explosion in new applications of AI. As recently as 2015, only large companies like Google, Amazon, and Apple had access to the massive data and computing resources needed to train and launch sophisticated AI algorithms. Small startups and individuals simply didn’t have access and were effectively blocked out of the market. That changes now. The democratization of ML gives individuals and startups a chance to get their ideas off the ground and prove their concepts before raising the funds needed to scale.


The final step to democratization of machine learning will be the development of simple drag-and-drop frameworks

But access to data is only one way in which ML is being democratized. There is an effort underway to standardize and improve access across all layers of the machine learning stack, including specialized chipsets, scalable computing platforms, software frameworks, tools, and ML algorithms.

Specialized chipsets
Complex machine-learning algorithms require an incredible amount of computing power, both to train models and implement them in real time. Rather than using general-purpose processors that can handle all kinds of tasks, the focus has shifted toward custom-building specialized hardware for ML tasks. With Google’s Tensor Processing Unit (TPU) and NVIDIA’s DGX-1, we now have powerful hardware built specifically for machine learning.

Highly scalable computing platforms
Even if specialized processors were available, not every company has the capital and skills needed to manage a large-scale computing platform needed to run advanced machine learning on a routine basis. This is where public cloud services such as Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure, and others come in. These services offer developers a scalable infrastructure optimized for ML to rent at a fraction of the cost of setting up on their own.

Open-source, deep-learning software frameworks
A major issue in the wide-scale adoption of machine learning is that there are many different software frameworks out there. Big companies are open-sourcing their core ML frameworks and trying to push for some standardization. Just as the cost of developing mobile apps fell dramatically as iOS and Android emerged as the two dominant ecosystems, so too will machine learning become more accessible as tools and platforms standardize around a few frameworks. Some of the notable open source frameworks include Google’s TensorFlow, Amazon’s MXNet, and Facebook’s Torch.

Developer-friendly tools
The final step to democratization of machine learning will be the development of simple drag-and-drop frameworks accessible to those without doctorate degrees or deep data science training. Microsoft Azure ML Studio offers access to many sophisticated ML models through a simple graphical user interface. Amazon and Google have rolled out similar software on their cloud platforms as well.

Marketplaces for ML algorithms and datasets
Not only do we have an on-demand infrastructure needed to build and run ML algorithms, we even have marketplaces for the algorithms themselves. Need an algorithm for face recognition in images or to add color to black-and-white photographs? Marketplaces like Algorithmia let you download the algorithm of choice. Further, websites like Kaggle provide the massive datasets one needs to further train these algorithms.

All of these changes mean that the world of machine learning is no longer restricted to university labs and corporate research centers that have access to massive training data and computing infrastructure.

What are the implications?

Back during the mid- and late-1990s, web development was done by specialists and was accessible only to firms with ample resources. Now, with simple tools like WordPress, Medium, and Shopify, any lay person can have a presence on the web. The democratization of machine learning will have a similar effect of lowering entry barriers for individuals and startups.

Further, the emerging ecosystem, consisting of marketplaces for data, algorithms, and computing infrastructure, will also make it easier for developers to pick up ML skills. The net result will be lower costs to train and hire talent. The above two factors will be particularly powerful in vertical (i.e., industry-specific) use cases such as weather forecasting, healthcare and disease diagnostics, drug discovery, and financial risk assessment, which have been traditionally cost prohibitive.

Just like cloud computing ushered in the current explosion in startups, the ongoing build-out of machine learning platforms will likely power the next generation of consumer and business tools. The PC platform gave us access to productivity applications like Word and Excel, and eventually to web applications like search and social networking. The mobile platform gave us messaging applications and location-based services. The ongoing democratization of ML will likely give us an amazing array of intelligent software and devices powering our world.

First published April 13, 2017, on the Knowledge@Wharton website.

Discuss

About The Author

Knowledge at Wharton’s picture

Knowledge at Wharton

Knowledge@Wharton is the web-based research and business analysis journal of the Wharton School of the University of Pennsylvania. Launched in May 1999, its goal is to disseminate business knowledge and insights to readers around the world. The Knowledge@Wharton Network offers free access to analysis of current business trends; interviews with industry leaders and Wharton faculty; articles based on the most recent business research; conference overviews, book reviews, and links to relevant content; and a searchable database of more than 1,500 articles and research abstracts.