Bringing AI to the device: Edge AI chips come into their own
If you love your smartphone’s AI-enhanced camera, wait until you find out what edge AI chips could do for enterprise.
Many people may be familiar with the frustration of calling up their smartphone’s speech-to-text function to dictate an email, only to find that it won’t work because the phone isn’t connected to the internet. Now, a new generation of edge artificial intelligence (AI) chips is set to reduce those frustrations by bringing the AI to the device.1
We predict that in 2020, more than 750 million edge AI chips—chips or parts of chips that perform or accelerate machine learning tasks on-device, rather than in a remote data center—will be sold. This number, representing a cool US$2.6 billion in revenue, is more than twice the 300 million edge AI chips Deloitte predicted would sell in 20172—a three-year compound annual growth rate (CAGR) of 36 percent. Further, we predict that the edge AI chip market will continue to grow much more quickly than the overall chip market. By 2024, we expect sales of edge AI chips to exceed 1.5 billion, possibly by a great deal.3 This represents annual unit sales growth of at least 20 percent, more than double the longer-term forecast of 9 percent CAGR for the overall semiconductor industry.4
These edge AI chips will likely find their way into an increasing number of consumer devices, such as high-end smartphones, tablets, smart speakers, and wearables. They will also be used in multiple enterprise markets: robots, cameras, sensors, and other IoT (internet of things) devices in general. Both markets are important. The consumer edge AI chip market is much larger than the enterprise market, but it is likely to grow more slowly, with a CAGR of 18 percent expected between 2020 and 2024. The enterprise edge AI chip market, while much newer—the first commercially available enterprise edge AI chip only launched in 20175—is growing much faster, with a predicted CAGR of 50 percent over the same time frame.
Here, there, and everywhere: The many locations of AI computing
Until recently, AI computations have almost all been performed remotely in data centers, on enterprise core appliances, or on telecom edge processors—not locally on devices. This is because AI computations are extremely processor-intensive, requiring hundreds of (traditional) chips of varying types to execute. The hardware’s size, cost, and power drain made it essentially impossible to house AI computing arrays in anything smaller than a footlocker.
Now, edge AI chips are changing all that. They are physically smaller, relatively inexpensive, use much less power, and generate much less heat, making it possible to integrate them into handheld devices such as smartphones as well as nonconsumer devices such as robots. By enabling these devices to perform processor-intensive AI computations locally, edge AI chips reduce or eliminate the need to send large amounts of data to a remote location—thereby delivering benefits in usability, speed, and data security and privacy.
Of course, not all AI computations have to take place locally. For some applications, sending data to be processed by a remote AI array may be adequate or even preferred—for instance, when there is simply too much data for a device’s edge AI chip to handle. In fact, most of the time, AI will be done in a hybrid fashion: some portion on-device, and some in the cloud. The preferred mix in any given situation will vary depending on exactly what kind of AI processing needs to be done.
Figure 1 shows the various locations where AI computing can occur, all of which are likely to coexist for the foreseeable future.
The term “telecom edge” deserves some explanation here. Telecom edge compute (also known as telco edge compute)—the “far edge network” depicted in figure 26—refers to computing performed by what are basically mini data centers located as close to the customer as possible, but owned and operated by a telco, and on telco-owned property. They currently use data center–style AI chips (big, expensive, and power-hungry), but they may, over time, start incorporating some of the same kinds of edge AI chips (consumer or enterprise) that we discuss in this chapter. Unlike edge device computing, however, the chips used in telecom edge compute are located at the edge of the telco’s network, not on the actual end device. Further, not all telecom edge computing is AI computing. According to industry analysts, revenues for the telecom edge compute market (all kinds of computing, not just AI) will reach US$21 billion in 2020. This is up more than 100 percent from 2019, and the market is poised to grow more than 50 percent in 2021 as well.7 A precise breakdown of this market by category is not publicly available, but analysts believe that the AI portion will likely be still relatively nascent in 2020, with revenues of no more than US$1 billion, or 5 percent of total telecom edge compute spending.8
Edge AI for consumers: It doesn’t have to be expensive
In 2020, the consumer device market will likely represent more than 90 percent of the edge AI chip market, both in terms of the numbers sold and their dollar value. The vast majority of these edge AI chips will go into high-end smartphones, which account for more than 70 percent of all consumer edge AI chips currently in use.9 This means that, in 2020 as well as for the next few years, AI chip growth will be driven principally by smartphones: both how many smartphones are sold and what percentage of them contain edge AI chips. In terms of numbers, the news appears to be good. After a weak 2019, which saw smartphone sales decrease by 2.5 percent year over year, smartphones are expected to sell 1.56 billion units in 2020, roughly the same number as in 2018—a 2.8 percent increase.10 We believe that more than a third of this market may have edge AI chips in 2020.
Smartphones aren’t the only devices that use edge AI chips; other device categories—tablets, wearables, smart speakers—contain them as well (figure 3). In the short term, these nonsmartphone devices will likely have much less of an impact on edge AI chip sales than smartphones, either because the market is not growing (as for tablets11) or because it is too small to make a material difference (for instance, smart speakers and wearables combined are expected to sell a mere 125 million units in 202012). However, many wearables and smart speakers depend on edge AI chips, so penetration is already high.
The economics of edge AI chips for smartphones
Currently, only the most expensive smartphones—those in the top third of the price distribution—are likely to use edge AI chips. That said, some phones under the US$1,000 price point do contain AI as well. Several AI-equipped phones from Chinese manufacturers, such as Xiaomi’s Mi 9,13 sell for under US$500 in Western countries. Further, as we’ll see below, putting an AI chip in a smartphone doesn’t have to be price-prohibitive for the consumer.
Calculating the cost of a smartphone’s edge AI chip is a roundabout process, but it’s possible to arrive at a fairly sound estimate. The reason one must estimate instead of simply looking up the cost outright is that a smartphone’s “AI chip” is not literally a separate chip unto itself. Inside a modern smartphone, only 7 to 8 millimeters thick, there is no room for multiple discrete chips. Instead, many of the various necessary functions (processing, graphics, memory, connectivity, and now AI) are all contained on the same silicon die, called a system on a chip (SoC) applications processor (AP). The term “AI chip,” if a phone has one, refers to the portion of the overall silicon die that is dedicated to performing or accelerating machine learning calculations. It is made from exactly the same materials as the rest of the chip, using the same processes and tools. It consists of hundreds of millions of standard transistors—but they are arranged in a different way (that is, they have a different architecture) than in the chip’s general processing or graphics portions. The AI portion is commonly, though not always, known as an NPU, or neural processing unit.
To date, three companies—Samsung, Apple, and Huawei—have had images taken of their phone processors that show the naked silicon die with all its features visible, which allows analysts to identify which portions of the chips are used for which functions. A die shot of the chip for Samsung’s Exynos 9820 shows that about 5 percent of the total chip area is dedicated to AI processors.14 Samsung’s cost for the entire SoC AP is estimated at US$70.50, which is the phone’s second-most expensive component (after the display), representing about 17 percent of the device’s total bill of materials.15 Assuming that the AI portion costs the same as the rest of the components on a die-area basis, the Exynos’s edge AI NPU represents roughly 5 percent of the chip’s total cost. This translates into about US$3.50 each.
Similarly, Apple’s A12 Bionic chip dedicates about 7 percent of the die area to machine learning.16 At an estimated US$72 for the whole processor,17 this suggests a cost of US$5.10 for the edge AI portion. The Huawei Kirin 970 chip, estimated to cost the manufacturer US$52.50,18 dedicates 2.1 percent of the die to the NPU,19 suggesting a cost of US$1.10. (Die area is not the only way to measure what percent of a chip’s total cost goes toward AI, however. According to Huawei, the Kirin 970’s NPU has 150 million transistors, representing 2.7 percent of the chip’s total of 5.5 billion transistors. This would suggest a slightly higher NPU cost of US$1.42.)20
Although this cost range is wide, it may be reasonable to assume that NPUs cost an average of US$3.50 per chip. Multiplied by half a billion smartphones (not to mention tablets, speakers, and wearables), that makes for a large market, despite the low price per chip. More importantly, at an average cost of US$3.50 to the manufacturer, and a probable minimum of US$1, adding a dedicated edge AI NPU to smartphone processing chips starts looking like a no-brainer. Assuming normal markup, adding US$1 to the manufacturing cost translates into only US$2 more for the end customer. This means that NPUs and their attendant benefits—a better camera, offline voice assistance, and so on—can be put into even a US$250 smartphone for less than a 1 percent price increase.
Companies that manufacture smartphones (and other device types) can take different approaches to obtaining edge AI chips, with the decision driven by factors including phone model and (sometimes) geography. Some buy AP/modem chips from third-party companies that specialize in making and selling them to phone makers, but do not make their own phones. Qualcomm and MediaTek are two prominent examples; combined, these two companies captured roughly 60 percent of the smartphone SoC chip market in 2018.21 Both Qualcomm and MediaTek offer a range of SoCs at various prices; while not all of them include an edge AI chip, the higher-end offerings (including Qualcomm’s Snapdragon 845 and 855 and MediaTek’s Helio P60) usually do. At the other end of the scale, Apple does not use external AP chips at all: It designs and uses its own SoC processors such as the A11, A12, and A13 Bionic chips, all of which have edge AI.22 Still, other device makers, such as Samsung and Huawei, use a hybrid strategy, buying some SoCs from merchant market silicon suppliers and using their own chips (such as Samsung’s Exynos 9820 and Huawei’s Kirin 970/980) for the rest.
What do edge AI chips do?
Perhaps the better question is, what don’t they do? Machine learning today underlies all sorts of capabilities, including but not limited to, biometrics, facial detection and recognition, anything to do with augmented and virtual reality, fun image filters, voice recognition, language translation, voice assistance … and photos, photos, photos. From hiding our wrinkles to applying 3D effects to enabling incredibly low-light photography, edge AI hardware and software—not the lens or the sensor’s number of megapixels—are now what differentiates the best smartphone cameras from the rest.
Although all these tasks can be done on processors without an edge AI chip, or even in the cloud, they work much better, run much faster, and use less power (thereby improving battery life) when performed by an edge AI chip. Keeping the processing on the device is also better in terms of privacy and security; personal information that never leaves a phone cannot be intercepted or misused. And when the edge AI chip is on the phone, it can do all these things even when not connected to a network.
Edge AI for enterprise: A fertile field for opportunity
If the edge AI processors used in smartphones and other devices are so great, why not use them for enterprise applications too? This has, in fact, already happened for some use cases, such as for some autonomous drones. Equipped with a smartphone SoC AP, a drone is capable of performing navigation and obstacle avoidance in real time and completely on-device, with no network connection at all.23
However, a chip that is optimized for a smartphone or tablet is not the right choice for many enterprise or industrial applications. The situation is analogous to what chip manufacturers faced in the 1980s with central processing units (CPUs). In the 1980s, personal computers (PCs) had excellent CPUs; their high computational power and flexibility made them ideal for such a general-purpose tool. But it made no sense to use those same CPUs to put just a bit of intelligence into (say) a thermostat. Back then, CPUs were too big to fit inside a thermostat housing; they used far too much power, and at roughly US$200 per CPU, they cost too much for a device whose total cost needed to be less than US$20. To address these shortcomings, an entire industry developed to manufacture chips that had some of the functions of a computer CPU, but were smaller, cheaper, and less power-hungry.
But wait. As discussed earlier, the edge AI portion of a smartphone SoC is only about 5 percent of the total area, about US$3.50 of the total cost, and would use about 95 percent less power than the whole SoC does. What if someone built a chip that had only the edge AI portion (along with a few other required functions such as memory) that cost less, used less electricity, and was smaller?
Some already have—and more are coming. Intel and Google, for instance, are currently selling internally developed standalone edge AI chips to developers. Nvidia, the leading manufacturer of graphics processing units (GPUs) commonly used in accelerating data center AI—which are very large, use hundreds of watts of electricity, and can cost thousands of dollars—now sells a customized AI-specific chip (that is not a GPU) suitable for edge devices that is smaller, cheaper, and less power-hungry.24 Qualcomm, the leading maker of merchant market SoCs with embedded edge AI processing cores for smartphones and other consumer devices, has released two standalone edge AI chips that are less powerful than its SoCs, but that are cheaper, smaller, and use less electricity.25 Huawei is doing the same.26
In all, as many as 50 different companies are said to be working on AI accelerators of various kinds.27 In addition to those working on application-specific integrated circuit (ASIC) chips, field-programmable gate array (FPGA) manufacturers now offer edge AI chip versions for use outside data centers.28
The standalone edge AI chips available in 2019 were targeted at developers, who would buy them one at a time for around US$80 each. In volumes of thousands or millions, these chips will likely cost device manufacturers much less to buy: some as little as US$1 (or possibly even less), some in the tens of dollars. We are, for now, assuming an average cost of around US$3.50, using the smartphone edge AI chip as a proxy.
Besides being relatively inexpensive, standalone edge AI processors have the advantage of being small. Some are small enough to fit on a USB stick; the largest is on a board about the size of a credit card. They are also relatively low power, drawing between 1 to 10 watts. For comparison, a data center cluster (albeit a very powerful one) of 16 GPUs and two CPUs costs US$400,000, weighs 350 pounds, and consumes 10,000 watts of power.29
With chips such as these in the works, edge AI can open many new possibilities for enterprises, particularly with regard to IoT applications. Using edge AI chips, companies can greatly increase their ability to analyze—not just collect—data from connected devices and convert this analysis into action, while avoiding the cost, complexity, and security challenges of sending huge amounts of data into the cloud. Issues that AI chips can help address include:
Data security and privacy. Collecting, storing, and moving data to the cloud inevitably exposes an organization to cybersecurity and privacy threats, even when companies are vigilant about data protection. This immensely important risk is becoming even more critical to address as time goes on. Regulations about personally identifiable information are emerging across jurisdictions, and consumers are becoming more cognizant of the data enterprises collect, with 80 percent of them saying that they don’t feel that companies are doing all they can to protect consumer privacy.30 Some devices, such as smart speakers, are starting to be used in settings such as hospitals,31 where patient privacy is regulated even more stringently.
By allowing large amounts of data to be processed locally, edge AI chips can reduce the risk of personal or enterprise data being intercepted or misused. Security cameras with machine learning processing, for instance, can reduce privacy risks by analyzing the video to determine which segments of the video are relevant, and sending only those to the cloud. Machine learning chips can also recognize a broader range of voice commands, so that less audio needs to be analyzed in the cloud. More accurate speech recognition can deliver the additional bonus of helping smart speakers detect the “wake word” more accurately, preventing it from listening to unrelated conversation.
Low connectivity. A device must be connected for data to be processed in the cloud. In some cases, however, connecting the device is impractical. Take drones as an example. Maintaining connectivity with a drone can be difficult depending on where they operate, and both the connection itself and uploading data to the cloud can reduce battery life. In New South Wales, Australia, drones with embedded machine learning patrol beaches to keep swimmers safe. They can identify swimmers who have been taken by riptides, or warn swimmers of sharks and crocodiles before an attack, all without an internet connection.32
(Too) big data. IoT devices can generate huge amounts of data. For example, an Airbus A-350 jet has over 6,000 sensors and generates 2.5 terabytes of data each day it flies.33 Globally, security cameras create about 2,500 petabytes of data per day.34 Sending all this data to the cloud for storage and analysis is costly and complex. Putting machine learning processors on the endpoints, whether sensors or cameras, can solve this problem. Cameras, for example, could be equipped with vision processing units (VPUs), low-power SoC processors specialized for analyzing or preprocessing digital images. With edge AI chips embedded, a device can analyze data in real time, transmit only what is relevant for further analysis in the cloud, and “forget” the rest, reducing the cost of storage and bandwidth.
Power constraints. Low-power machine learning chips can allow even devices with small batteries to perform AI computations without undue power drain. For instance, ARM chips are being embedded in respiratory inhalers to analyze data, such as inhalation lung capacity and the flow of medicine into the lungs. The AI analysis is performed on the inhaler, and the results are then sent to a smartphone app, helping health care professionals to develop personalized care for asthma patients.35 In addition to the low-power edge AI NPUs currently available, tech companies are working to develop “tiny machine learning”: Deep learning on devices as small as microcontroller units (which are similar to the SoCs mentioned earlier, but smaller, less sophisticated, and much lower power, drawing only milliwatts or even microwatts). Google, for instance, is developing a version of TensorFlow Lite that can enable microcontrollers to analyze data, condensing what needs to be sent off-chip into a few bytes.36
Low latency requirements. Whether over a wired or wireless network, performing AI computations at a remote data center means a round-trip latency of at least 1–2 milliseconds in the best case, and tens or even hundreds of milliseconds in the worst case. Performing AI on-device using an edge AI chip would reduce that to nanoseconds—critical for uses where the device must collect, process, and act upon data virtually instantaneously. Autonomous vehicles, for instance, must collect and process huge amounts of data from computer vision systems to identify objects, as well as from the sensors that control the vehicle’s functions. They must then convert this data into decisions immediately—when to turn, brake, or accelerate—in order to operate safely. To do this, autonomous vehicles must process much of the data they collect in the vehicle itself. (Today’s autonomous vehicles use a variety of chips for this purpose, including standard GPUs as well as edge AI chips.) Low latency is also important for robots, and it will become more so as robots emerge from factory settings to work alongside people.37
The difference between training and inference, and what it could mean for data center–based AI
The AI enabled by an edge AI chip is more properly known as deep machine learning, which has two components. The first component is training. Training involves repeatedly analyzing a large amount of historical data, detecting patterns in that data, and generating an algorithm for that kind of pattern detection. The second component is inference. In inference, the algorithm generated by training—often updated or modified over time through further training—is used to analyze new data and produce useful results.
Until recently, machine learning software used the same standard chips—a mix of CPUs, GPUs, FPGAs, and ASICs—for all their training and inference. These chips are all large, expensive, power-hungry, and produce a lot of heat; consequently, AI hardware built on these chips is always housed in a data center. In contrast, the edge AI chips discussed in this chapter perform mainly (or only) inferencing, using algorithms that were developed by training back in a data center. Although some edge AI chips do training as well, most training still occurs in data centers.
Interestingly, although data center chips have historically been used for both training and inference, we are now seeing the development of different flavors of data center chips, with some optimized for training and some for inference.38 The implications of this relatively new development are not yet clear. But it is possible that, due to the emergence of edge AI chips, data centers will see their current mix of training and inference processing shift toward more training and less inferencing over time. If this happens, these more specialized data center chips could be especially useful for flexibility, allowing a data center that sees its ratio of training to inferencing shifting to change its hardware mix accordingly.
The bottom line
Who will benefit from the edge AI chip market’s growth? Obviously, it’s good for the companies that make edge AI chips. From essentially zero a few years ago, they will earn more than US$2.5 billion in “new” revenue in 2020, with a 20 percent growth rate for the next few years, and likely with industry-comparable margins. But that number should be placed in context. With 2020 global semiconductor industry revenue projected at US$425 billion,39 edge AI chips make up too small a fraction of that to move the needle for the industry as a whole, or even for its larger individual companies.
In truth, the bigger beneficiaries are likely those who need AI on the device. Edge AI chips can not only enormously improve the capabilities of existing devices, but also allow for entirely new kinds of devices with new abilities and markets. Over the longer term, edge AI chips’ more transformative impact will most probably come from the latter.
Will companies that make AI chips for data centers be harmed as some of the processing (mainly inferencing at first) moves from the core to the edge? The answer is uncertain. All of the companies that make data center AI chips are also making edge versions of these chips, so the shift in processing from the core to the edge may have little or no net effect. Also, demand for AI processing is growing so quickly that its rising tide may lift all boats: The AI chip industry (edge and data center combined) is expected to grow from about US$6 billion in 2018 to more than $90 billion in 2025, a 45 percent CAGR.40 A more likely potential negative is that the emergence of cheaper, smaller, lower-power edge AI chips may exert downward pressure on data center AI chip pricing, if not units. This has happened before: In the semiconductor industry’s history, the spread of edge processing chips frequently caused prices for mainframe/core processing hardware to fall faster than would have been expected based only on improvements according to Moore’s Law.
Some might also think that moving AI processing from the core to the edge will hurt cloud AI companies. This is unlikely: Recent forecasts for the cloud AI or AI-as-a-Service market predict that its revenues will grow from US$2 billion in 2018 to nearly US$12 billion by 2024, a 34 percent CAGR.41 Perhaps that growth would be even larger if edge AI chips did not exist, but it still means that cloud AI is growing twice as quickly as the overall cloud market, with a predicted CAGR of 18 percent to 2023.42
Equally, some might fear that if edge devices can perform AI inference locally, then the need to connect them will go away. Again, this likely will not happen. Those edge devices will still need to communicate with the network core—to send data for AI training, to receive updated AI algorithms for inference, and for many other reasons. For these reasons, we expect that all or almost all edge AI devices will be connected.
The nature of that connection, however, may be different than what was expected only two to three years ago. At that time, AI inference was restricted to large data centers, meaning that smart IoT devices had to be connected to access those AI inference capabilities—and not just to any old network, but one with ultra-high speeds, guaranteed quality of service, high connection densities, and the lowest possible latency. These attributes were (and still are) only to be found on 5G wireless networks. The natural assumption, therefore, was that all IoT devices that used AI would also need to use 5G, and only 5G.
That assumption no longer holds. If a device can handle a significant amount of AI processing locally, it doesn’t eliminate the need for a connection of some sort, but the connection may not always need to be through 5G. 5G will still be necessary some of the time, of course. And the 5G market is poised to grow enormously, at a 55 percent CAGR—more than US$6 billion annually—through 2025.43 But thanks to edge AI chips, the market opportunity in 5G IoT may be slightly smaller than was expected a few years ago.
The spread of edge AI chips will likely drive significant changes for consumers and enterprises alike. For consumers, edge AI chips can make possible a plethora of features—from unlocking their phone, to having a conversation with its voice assistant, to taking mind-blowing photos under extremely difficult conditions—that previously only worked with an internet connection, if at all. But in the long term, edge AI chips’ greater impact may come from their use in enterprise, where they can enable companies to take their IoT applications to a whole new level. Smart machines powered by AI chips could help expand existing markets, threaten incumbents, and shift how profits are divided in industries such as manufacturing, construction, logistics, agriculture, and energy.44 The ability to collect, interpret, and immediately act on vast amounts of data is critical for many of the data-heavy applications that futurists see as becoming widespread: video monitoring, virtual reality, autonomous drones and vehicles, and more. That future, in large part, depends on what edge AI chips make possible: Bringing the intelligence to the device.
Discover Past Posts