Democratization of AI
A double-edged sword
When company leaders talk about democratizing artificial intelligence (AI), it’s not difficult to imagine what they have in mind. The more people with access to the raw materials of knowledge, tools, and data required to build an AI system, the more innovations that are bound to emerge. Efficiency improves and engagement increases. Faced with a shortage of technical talent? Microsoft, Amazon, and Google have all released premade, drag-and-drop or no-code AI tools that allow people to integrate AI into applications without needing to know how to build machine learning models.
But as companies move toward democratization, a cautionary tale is emerging. Even the most sophisticated AI systems, designed by highly qualified engineers, can fall victim to bias, explainability issues, and other flaws. An AI system built by someone without proper training, or operating without appropriate controls, could create something outright dangerous — introducing discrimination or serious errors. Worse, the problems may not become evident until after a system has been implemented, leaving companies scrambling to reassure stakeholders, undo the damage, and fix the tech.
This is not to say that the democratization of AI is not without value. Making these new technologies more accessible and affordable expands the possibilities of what businesses and governments can accomplish and fuels competition. For example, datasets, models, papers, and research related to the fight against COVID19 have been open-sourced enabling a large global community to collaborate. The key for company leaders is to avoid getting carried away by the hype and instead focus on identifying what exactly they are going to democratize (will it be something simple, such as data visualization, or something complex, like model development?), who the users will be (novices or experts), and how their organization can maximize the benefit while managing or mitigating the risks with proper training and governance.
THE TECHNOLOGY SPECTRUM — WHAT TO DEMOCRATIZE?
The technology vendors releasing AI and machine learning (ML) products need to start by determining which part or parts of the value chain their tool or platform will be democratizing. Here it is helpful to think of a spectrum (see Figure 1 below), across which the tools and models become increasingly sophisticated and result in greater value generation.
At one end of the spectrum is data, and the ingestion of data into data warehouses and data lakes. AI systems, and in particular ML, run on large volumes of structured and unstructured data — it is the material from which you can then generate insights, decisions, and then outcomes. In its raw form, it is easy to democratize, enabling people to perform basic analyses. Already, a number of technology providers have created data explorers to help users search and visualize the openly available datasets.
Next along the spectrum come the algorithms that data is fed into. Here the value and complexity go up, as the data is put to work. At this point, democratization is still relatively easy to achieve, and in fact algorithms are widely accessible. Open source code repositories like GitHub have been growing significantly over the past decade. In November 2018, there were more than 100 million code repositories, with a significant number of them being related to AI. Understanding algorithms requires a basic understanding of computer science and a level of mathematics or statistical background to make sense of what the algorithms do.
As we continue to move along the spectrum to storage and computing platforms, the complexity increases. During the past five years, the technology platform for AI move to the cloud with three major AI/ML providers in Amazon Web Services (AWS), Microsoft Azure, and Google Compute Platform. This has made central processing units and graphics processing units (essential for training reasonably large deep learning models) accessible to end-users on a pay-as-you-go basis, substantially reducing the barrier to entry. However, whereas algorithms are hardware agnostic (for example, they typically can run on any hardware or cloud platform) the cloud storage and computing platforms require specific training and certification by the technology vendors (Amazon, Microsoft, or Google).
Now we come to model development. Models solve specific problems: Some become recommendation engines, some become facial recognition, and so on. Here we are seeing democratization with AutoML platforms and tools. For example, automating the ability to ingest a variety of data formats — structured, semi-structured, and unstructured — and to run a number of algorithms on the same dataset and select the best ensemble of algorithms is making the model development process more accessible (and also faster). However, if the users are not appropriately trained, the potential for building bias into the model, being unable to explain the results of the model, and even making wrong decisions is high.
Finally, at the far end of the spectrum, we are in the early stages of creating a marketplace for data, algorithms, and models. There is also an emerging marketplace for problems and talent that can solve these problems. Kaggle, created in 2010 and acquired by Google in 2017, is one of the best-known examples of a data science or AI marketplace. Data science challenges with significant prize money, like the Kaggle Netflix movie recommendation challenge, allow anyone anywhere in the world to compete and demonstrate their skills. As we curate data, algorithms, and models in these marketplaces the risks of misinterpreting them and applying them in the wrong context increase significantly. The danger of systemic misuse of models will increase in such cases.
KNOW YOUR USERS — FOR WHOM TO DEMOCRATIZE?
Designing an AI system requires extensive technical know-how and a firm grasp of data science. Just as you’d want to be sure a surgeon is qualified, trained, and has experience in the operating room, AI systems should be designed, tested, and maintained by people with solid tech chops, an understanding of the key components of an AI system, and a commitment to responsible AI.
Vendors often make sweeping statements, saying they have democratized data ingest, data cleansing, and data mining by creating drag-and-drop tools. Or that they have democratized complex statistical and computational model development by automating the entire machine learning or data science process. But who is accessing these tools and models? Have they been appropriately trained — not just in the tool, but also in the underlying concepts?
In one example, a business user at an organization that had made drag-and-drop tools widely available built a machine learning model without setting aside a sample of the data for validation and testing. Because the model was overfitted to the training data, the reported accuracy of the model was flawed. If the model had been deployed, it could have led to significant financial losses.
The beneficiaries of AI democratization exist in three broad categories — casual users, power users, and specialist developers. Casual users and specialist developers are at opposite extremes, with power users somewhere in between; the latter group is more knowledgeable and well-trained than casual users, but not working at an expert level. Business users are typically casual users; they have not received extensive training in the statistical and mathematical concepts underlying the models nor in the specific processes required to build models. Specialist developers or data scientists generally have strong qualifications or the appropriate certifications.
Companies need to determine which of these three categories they are targeting with various initiatives. For example, when we democratize data visualization, we are enabling all three types of users to quickly create a variety of visualizations with little or no programming — which is a low risk proposition. However, when we say we are democratizing model development, are we doing it for just specialist data scientists, to enable them to run different algorithms, evaluate them, and choose the right ensemble of models? Or are we attempting to democratize it for casual and power users as well? If the latter, extreme caution is required.
BENEFITS OF DEMOCRATIZATION — WHY DEMOCRATIZE?
Democratization of AI offers three main benefits. First and foremost, it reduces entry barriers for individuals as well as organizations to start experimenting with AI. They can leverage publicly available data and algorithms to start experimenting building AI models on cloud infrastructure. Individuals, anywhere in the world with little or no financial investment (with may be just an access to the internet) can enter the exciting world of AI. Not only can they learn about AI but can also solve important problems in marketplaces such as Kaggle to obtain significant rewards.
Second, the democratization reduces the overall cost of building AI solutions as communities of programmers and users start using and extending the data, algorithms, and tools to build more powerful solutions. The openness of democratization, where data or algorithms, are made freely available to others also helps in building the necessary talent. The curation of ImageNet and making it publicly available with defined metrics on performance helped a number of researchers to build faster and more accurate models. The availability of open source deep learning frameworks, like Caffe, TensorFlow, PyTorch etc has significantly contributed to a growing number of talented deep learning experts. Hence, reduced time to talent development is also a significant benefit of democratization.
All of these aspects are increasing the speed of adoption of AI in the academic and business world. Aspects of natural language processing, such as, analyzing and extracting structured information from text documents, analyzing customer sentiments from social media or call centers, use of conversational interfaces or chatbots are becoming common business applications. Similarly, use of machine learning and deep learning to draw insights, identify or classify data, automate tasks or augment human decision making are becoming common place.
VALUE LEVERS FOR DEMOCRATIZATION — HOW TO DEMOCRATIZE?
Most of the current efforts around democratization has focused on access to data, algorithms, storage, compute, model development, and marketplace. However, we need to move beyond just democratizing access for AI to a number of value drivers of democratization that can ensure that AI is not mis-used and also enhance overall value (see Figure 2).
Access to the different components of AI needs to be affordable — in some cases like open data initiatives or sharing of algorithms through GitHub — even free. Beyond affordable access each of the above layers need to be easy to use. If everyone needed to write SQL queries to access the data or understand linear algebra and differential equations to be able to use some of the algorithms one can hardly say that we have democratized AI.
The next step in the democratization process is the ability to control the different elements of the stack. For example, if a technology vendor offering services in one of the layers, say the compute platform, were to insist that they will run the compute when they have spare cycles and restrict what you can take outside of their platform, the democratization becomes less useful. For AI to be truly democratized users need to have control over what they run, when they run, and how they use the results of the run.
Going beyond control the notion of ownership needs to be addressed. Is the data owned by the organization/people who generate the data, the organization that processes the data or the organization/people who draw insights from it. Can ownership be really split amongst multiple parties? Can we have a blockchain mediate or distribute the notion of ownership? These are all open questions yet to be resolved in the quest for democratization.
Access, use, control, and ownership are increasing value levers that vendors who democratize AI, as well as users, enterprises who seek to benefit from AI need to consider. The flip side of these value levers are the risks associated with democratization. Providing free access initially to acquire market share or to acquire data and then charging for it or using the data to strengthen the models and monetize data is a well-developed business model. Making this explicit and providing safeguards on the use of data can go a long way in building trust with consumers.
Next, if one provides free access to the use of data, algorithms, models etc., but do not train the casual users or power users on the context in which the data was obtained or the assumptions under which the model was developed or the necessary mathematics to interpret the results of the model, the results could be disastrous. Over the past few years there have been a number of documented instances of misuses of models even by specialists building the models e.g., data bias in machine learning models, adversarial attacks etc. Without the training on the use and interpretation of the algorithms and models this phenomenon will only increase.
BRINGING IT ALL TOGETHER — DEMOCRATIZATION FRAMEWORK
Designing an AI system requires extensive technical know-how and a firm grasp of data science. Just as you’d want to be sure a surgeon is qualified, trained and has lots of experience in the operating room, AI systems should be designed, tested, and maintained by people with solid tech chops, an understanding of the key components of an AI system, and a commitment to responsible AI.
There are four distinct actions (see Figure 3) that need to be taken to ensure that one gets the full benefit of democratization across the different value levers — access, use, control, and ownership. while avoiding misuse, abuse, bias, and other problems. There are five actions that leaders need to take:
1. Training. Lack of adequate training in AI development and implementation could be calamitous — especially when it comes to systems that deal with people’s health or financial well-being. For example, if casual or untrained users don’t understand the importance of splitting data into buckets for training, validation, and testing, they could easily end up with AI that produces inaccurate or unintended results. If we want to move from just providing access to stimulating the use of these tools, training the casual or power users with the appropriate foundations of data science is critical for the safe use of AI.
2. Governance. Company leaders need to establish clarity on the ownership and control of data that is fed into AI-powered platforms, and how rights relate to the insights generated. When data is collected for a specific AI/ML program, then used for different applications — which can be the case with open-source data lakes — it’s easy to lose visibility into the data’s origins, the purpose it was collected for, how it has been modified, and how (or if) it’s being appropriately safeguarded. “Shadow AI,” that is, AI created using data not governed by teams within an organization whose job is to ensure data integrity, is also a concern. To minimize risk, AI/ML models should be built using data that is monitored, secured, and understood.
Organizations often have data governance as part of corporate compliance activity, but few are monitoring the other elements involved in an AI system as closely as they should be. Controls need to be in place to ensure that the models are being developed with the appropriate success or validation metrics (a balance of accuracy, fairness, explainability), to avoid the development and deployment of AI models whose results are biased or can’t be easily explained or understood.
3. Intellectual property (IP) rights. The perceived benefits of democratization may not be achieved without decisions about who owns the IP rights. A number of companies have refused to use cloud platforms for image processing or audio processing out of fear that confidential information will be processed outside their four walls, and that cloud solution provider is using and benefiting from the significant insights generated from their data. As more and more companies realize that the full power of democratization come from their data (and those of their competitors), and not from the tools and platforms themselves, they will likely demand some level of IP rights.
4. Open sourcing. Closing the loop on the enablers of democratization requires all parties to open source what they have done, to the extent that it does not infringe on privacy, confidentiality, and competitive dynamics. Failure to close the loop from ownership to access will essentially create a one-way flow wherein some players — typically larger companies or governments with ample funding — will benefit from democratization in the short-term, but those with fewer resources get left behind.
To be sure, there are benefits to making AI more accessible and affordable. Doing so expands the possibilities of what businesses and governments can accomplish by making experimentation easier. And in general, the open-source movement has fueled competition, making quality and customer experience the ultimate winners.
The idea of democratizing AI seems like a good idea in a world where open-sourcing is increasingly common and technologists and early adopters alike are eager to lower the barriers to entry in pursuit of progress. Indeed, if we can’t learn from one another’s AI projects, we risk creating a one-way flow in which some players — typically larger companies or governments with ample funding — benefit, while those with fewer resources get left behind. As the World Economic Forum’s head of AI and machine learning Kay Firth-Butterfield recently wrote, AI’s long-term success depends on “the agility of the collaboration, the diversity and integrity of the data, and the accuracy of the risk assessments.”
But technologies that enable drag-and-drop or novice-creation of AI models, especially when taken to an extreme, do an injustice to the seriousness of the issues. When trained data scientists can stumble in creating ethical and responsible AI, to place the burden of robustness on the untrained is unfair and potentially dangerous. By acknowledging the possible downsides of true democratization, industry can explore the standards and guidelines needed to ensure a door to any number of possible risks isn’t opened. By making AI transparent and establishing governance, it can be removed from its “black box” and engender trust. Because if people don’t trust AI, it can’t truly progress.
By acknowledging the possible downsides of the democratization of AI, industry can explore the standards and guidelines needed to ensure that innovation goes hand in hand with safe implementation. And by making AI transparent and establishing governance, it can be removed from its “black box” and can engender trust.
Discover Past Posts