Unlocking the Next Generation of AI Hardware

By Marc Bouchet Published on Dec. 3, 2020

Why AI?

Like most things, it will serve us well to understand the underlying motivator behind this conversation. In order to comprehend the need for new hardware to power computers, here's a quick primer on the new kind of software that we are asking our computers to run these days. Generally, this software can be bucketed under the headline of artificial intelligence (AI), which encompasses machine learning, deep learning, general intelligence, neural networks, and a bunch of other techniques.

In a basic sense, we can think of artificial intelligence as a framework for designing computer programs that behave more like humans. On a granular level, a lot of the ideas behind how these programs are built come from our efforts to replicate how neurons interact in the brain, but the state of the art today is largely abstracted from this. One of the better ways to differentiate what makes artificial intelligence special is encapsulated by the diagram below, from the great piece by Towards Data Science.

Typically, when humans learn something, we know what we have, we know what we want to do, and we figure out a repeatable way to achieve our goals that is robust to future changes. The learning that we have then built is flexible and can be combined with other learnings to build knowledge. In this same way, we can tell artificial intelligence frameworks what we have (data), what we want to do (desired outputs), and the software figures out a repeatable program that it can use to evaluate new data in the future (learning).

Fundamentally, AI programs are different from traditional programs in one crucial sense. Traditionally, a computer program’s reactions to input are pre-determined and immutable unless a developer goes in and tinkers with the strict rules that dictate a predetermined relationship between input and output. This leads to very reliable but complex and inflexible hard-coded programs. The beauty of AI is that instead of creating enormous programs to cover every possible input and output case, we can tell an AI framework about what output data we have and how it is important, and a flexible program will be built to tell us when those important data show up in unknown future inputs. This turns out to be great for a lot of things, like categorizing objects in images, predicting machine failures, and even creating new images based on old ones.

Now that we’re primed for it, we can see the specter of AI looming large across the tech world. From our Google searches to our Instagram feeds, very few aspects of software in the last few years have escaped the power of recognition, prediction, and optimization offered by AI. Keep in mind, though, that this is all old news.

Why AI Hardware?

Some more old news: computers have been our steadfast companions since the 1940’s. Over the course of the nearly 80 years since ENIAC, we’ve witnessed computers mature from simple adding machines into systems that provide us the tools that we depend on to live and work more effectively than ever before. But even though computers are constantly by our side, it’s probably safe to say that most of us don’t give a lot of thought to the chips and wires deep inside our MacBooks.

One may wonder: the science behind machine learning and artificial intelligence has been around for decades, and so have computers, so why the hype around AI? There are a few reasons, but we’ll also see that the hype train doesn’t run unencumbered.

At a high level, AI has finally hit its stride thanks to ungodly amounts of data, Moore’s Law, and cheap storage in recent decades. It helps that the troves of information we are collecting (images, machine data, sound) contain insights that we crave, and that cheap computers plus online communications have fostered a vibrant research community. This has led to AI-based solutions that can create faux-Van Goghs, predict vehicle failures, and understand your every command. We also need to tip our hats to easily accessible frameworks (Tensorflow, PyTorch, Keras, CAFFE, and others) and programming languages (Python!) that have abstracted the building of AI models to be within reach of everyday developers.

In our fervor to build AI models that answer ever more complex questions, though, we have started to see bumps appear in the road. To develop some of the most cutting edge models today, the best computers still need to work for hours (if not days) on a process called “training” — where the AI model builds a program based on the data we give it to crunch. In addition, running a trained AI program to do its job (recognize pictures of dogs, for example) in a process called “inferencing” can be tricky to handle efficiently and in real-time. Also, when we compare the pace of Moore’s law — a 2-year doubling period for computing power — with the pace of AI computing power needs — a 3.4-month doubling period — we see a gap that will inevitably widen. Top that with growing skepticism about Gordon Moore’s ability to help with this problem and it looks like we’re back in the ‘40s waiting for vacuum tubes to warm up.

Moreover, not all AI models are created equal when we look at what type of computing resources are needed for them to run quickly. This can become confounding when we try to deploy these models in constrained conditions (in low-power applications, for example). You’ll be hard pressed to find an off-the shelf computer chip that can inference on a remote sensor just for predictive maintenance just as well as it can inference voice commands on your laptop.

If AI is to run seamlessly on everything from servers to cellphones, the old way of designing computer chips doesn’t look like it can keep up on many fronts. This is why we need specialized AI hardware, and I want to explore some of the new ways that computer chips are evolving to make our AI-fueled dreams a reality.

How do we do it?

At its most basic level, computing on modern hardware consists of manipulating ones and zeroes (binary code) with transistors (fancy switches) to execute computations (addition, subtraction, and their colleagues). Through successive layers of abstraction, these computations are translated to outputs that we are able to interact with as user interfaces, communications, calculation, and document creation. Traditionally, computer processing hardware has been developed to rapidly execute very diverse tasks, and a whole ecosystem has been developed to enable the design, production, and programming of the chip that does it all — the Central Processing Unit (CPU). As our interactions with computers became characterized by visual interfaces, a similar ecosystem has grown around the chip that can display it all — the Graphics Processing Unit (GPU).

In order to understand where the rift begins between the demands of our AI programs and the silicon running the show, it’s worth diving a bit deeper. Although computing power has become fast and cheap, it has been traditionally optimized for two use cases: general computing and graphics processing. General computing happens on the CPU and allows for everyday use of productivity, creative, and connectivity software on a computer, while graphics processing is dedicated to processing data for visual output. Graphic data is handled by the GPU, and is traditionally represented in matrices and tensors (fancy higher-dimensional matrices), which are manipulated through matrix arithmetic (fancy addition, multiplication, subtraction, division). Think of the difference between the CPU and GPU as being a Swiss army knife compared to a scalpel, one tool is optimized for many different tasks and processes, while the other is optimized for a very specific process involving lots of the same kind of computation and data. As they are wont to do, the Mythbusters did a great job at illustrating this difference — CPU above and GPU below:

Clearly, when it comes to pure throughput, the GPU has an edge as a specialized tool. This can be quite useful for the heavy lifting needed during the training process of AI programs, where the program is fed streams of information to analyze. Luckily, the computations behind those decisions involve much of the same math as image processing, so GPUs were at the core of the initial democratization of AI. But despite seeing a lot of early success in using GPUs for this purpose, continual performance gains are hard to come by when the chips are still fundamentally designed to process images rather than neural networks.

The picture is even less clear when it comes to inferencing with AI models, where researchers have had to concoct ways of leveraging the best parts of CPUs (flexibility, speed) with those of GPUs (matrix math, throughput). The fallout from the mismatched pace of GPU and CPU development with advances in AI is a need for specialized computer chips that are designed to handle AI jobs from a fundamental transistor level.

What Next?

In the past two years or so, all of this has come to a head. Ironically, Silicon Valley has had to return to its roots to figure out how to make computer chips for a new generation of AI demands. Deep-pocketed newcomers like Google and Amazon have designed and built their own chips, while incumbents like Nvidia and Intel have steered designs in a new direction to tackle these needs. Even more exciting, dozens of bold startups have thrown their hat in the ring to compete for a spot in the AI-optimized silicon stack of the future. Ultimately, the flexibility of machine learning means that we can expect a need for high performance AI computing across many niches — from servers to cell phones to tiny edge computers that need to run on a AA battery. In a final effort to grasp where things are headed for this space, it seems right to build a bit of a taxonomy for AI hardware beyond CPUs and GPUs:

Useful Terms:

  • Von Neumann Architecture: this references a certain way of building computer chips and determining how different elements of the chip (memory, control, arithmetic) are connected. Typically, chips that use the Von Neumann architecture are limited by only being able to do one thing at a time: either read instructions or write data. Although most personal computers use this architecture, computing speed and clever designs have minimized this limitation to a point. The so-called “Von Neumann Bottleneck” has become an issue again, though with AI workloads, due to the sheer volume of instruction reading and data writing that needs to happen. This has driven some of the innovations in noted in this taxonomy.
  • ASIC: Not the shoes. We’re talking about Application-Specific Integrated Circuits. This is a fancy way to say that the elements of the chip are laid out (again: memory, control, arithmetic, etc.) in a specialized way for a purpose not suited for general processors (CPUs). ASICs are typically developed when there is a specialized computing need not met by off-the shelf chips, and a tailored chip can do the job more efficiently. Many novel chip designs for AI workloads may be considered ASICs.
  • FPGA: Field Programmable Gate Arrays are essentially reprogrammable computer chips. Using special programming languages, designers are able to dictate how the important bits on a processor are connected and update those connections on the fly. This can be enormously helpful when workloads change (i.e. a new AI model needs to be used), at the expense of energy efficiency.
  • In-memory Computing: This term covers the emergent area of designing memory units with rudimentary amounts of computational power. As part of a thesis for overcoming the Von Neumann bottleneck, chip designers are looking to move processing as close to the computer’s memory as possible. This comes in handy when you need to process lots of stored data in relatively simple ways, as one does with AI workloads.
  • Edge Computing: As opposed to processing code and data in the cloud, edge computing refers to the processing of data as close to the source as possible. As data streams proliferate along with our ability to make sense of data with AI, a natural convergence exists to try to extract meaningful information from data at its source (usually sensors). These cases are often power-constrained and create an interesting challenge for AI chips, pushing designs to the limit of efficiency.

AI Chip Types:

  • Accelerators: Chips that are designed to improve processing performance for a certain task are considered accelerators. In a broad sense we can consider GPUs to be graphics “accelerators” for CPUs and general computing, although this is debatable as new programming frameworks allow developers to build more generalized software that can run on GPUs. Many startups have worked to develop accelerators for certain AI workloads like inferencing, and even for certain specific machine learning models. On the other hand, incumbents such as Intel are working to bake accelerators into new chips, which allows them to run AI programs without a wholesale change of the chip architecture. Often times accelerators are developed as ASICs and might even leverage non-Von Neumann architectures like in-memory computing. A few examples of startups working in this realm are below:

Neuro AI: Neuro AI is working on scalable AI accelerators that work on edge devices as well as servers based on FPGAs.

Graphcore:By focusing on the cloud — one realm where most AI models are being handled — Graphcore has built accelerators tailored to commercial workloads.

Gyrfalcon:Gyrfalcon is designing chips that take over the heavy lifting for AI workloads at the edge, specialized around use cases like natural language processing and classification.

  • New Chips: Although the line between accelerators and wholesale new chip designs can become blurred, it is worth focusing on companies that are designing hardware that processes AI in ways that fall outside of traditional memory/arithmetic/control frameworks. The folks working on these solutions often start new companies to do so, and much innovation has been seen in the startup world. Some have noted that a second wave of hardware funding has emerged to support these startups and the outsized performance returns they hope to achieve.

Rain Neuromorphics:Rain Neuromorphics is designing analog chips that more closely mimic the structure of the human brain, neurons, synapses, and all.

Cerebras: Cerbras is taking a different perspective on the scale of chip needed to compute next generation AI workloads and betting that huge wafer-scale chips will be up to the challenge.

Memcomputing:As the name suggests, memcomputing is working to move the number crunching closer to information storage on chips dedicated to AI workloads.

Recogni: Recogni is building chips to better understand visual data in real time in an energy footprint that can easily be integrated at the edge and into vehicles for autonomous driving.

  • Frontier Designs: At the bleeding edge of this discussion, we can peer in to a future where even the core element of our classical computer, the transistor, is disrupted. In an effort to shift from a paradigm of diminishing returns from Moore’s law in AI computing, many large companies and startups are also exploring quantum computing (using sub-atomic physics to store an manipulate information) and light-based computing (using photons instead of electrons to transmit and manipulate information). By and large, startups working on the core hardware needed for this space are ahead of the VC funding curve, but there are interesting opportunities for startups building tools that empower developers to write code for these future systems.

QCWare:QC Ware is building hardware-agnostic enterprise software solutions for quantum computers. The company’s quantum computing software solves problems in combinatorial optimization and machine learning with efficiency unmatched by traditional HPC solutions.

IonQ: IonQ is building quantum computers using a system that combines atoms and lasers to better store and compute information.

Lightmatter: Lightmatter builds hardware that combines classical computing with certain elements accelerated using integrated photonics for AI workloads.

Luminous Computing:Luminous uses photonics to transmit and compute information faster and using less energy than classical computers for AI workloads.

Conclusion

Across the board we are seeing the manifestation of some core truths about making computer chips, namely that it is an expensive, expertise-driven, and unforgiving business. The Graphcores and SambaNovas of the world got the start they needed from VCs with prior experience in the chipmaking space, and their multi-hundred-million dollar follow on rounds indicate the kind of resources needed to make the vision of AI-specific hardware a reality. This reality gives pause when considering the legions of startups developing AI chips that, despite tackling more niche applications, may not have the same level of resources.

Ultimately, if we lean on the framework of our interaction with classical computers today, there is surely huge value to be unlocked in the development of software tools that will enable developers to build valuable products on top of this new generation of AI hardware. It is certainly exciting to consider the possibilities of a world where every device is enabled to process AI workloads — devices that are not only connected but able to make contextual decisions based on the data they collect.


At Plug and Play’s New Materials & Packaging Accelerator, we match large corporations with top-tier startups that are changing the world as we know it.Join our platform today.