Artificial Intelligence. How much has been said about him, and yet we did not even really start talking. Almost everything you hear about the progress of artificial intelligence is based on a breakthrough that is thirty years old. The continued pace of progress will require circumvention of the serious limitations of serious restrictions. Further from the first person – James Somers.
I’m standing where the center of the world will be, or just in the big room on the seventh floor of the shiny tower in downtown Toronto – which side to look at. I am accompanied by Jordan Jacobs, co-founder of this place: Vector Institute, which opens its doors this fall and promises to become a global epicenter of artificial intelligence.
We’re in Toronto, because Jeffrey Hinton is in Toronto. And Jeffrey Hinton is the father of “deep learning,” the technique that underlies the frenzy on the subject of AI. “In 30 years, we will look back and say that Jeff is Einstein for AI, deep training, everything that we call artificial intelligence,” says Jacobs. Of all the researchers, AI Hinton is quoted more often than three who follow him, taken together. His students and graduate students go to work in the laboratory of AI in Apple, Facebook and OpenAI; Hinton himself is the leading scientist in the Google Brain AI team. Virtually any achievement in the field of AI over the past ten years – in translation, speech recognition, image recognition and games – somehow concerns the work of Hinton.
The Vector Institute, this monument to the ascent of Hinton’s ideas, is a research center in which companies from all over the US and Canada – like Google, Uber and NVIDIA – are sponsoring efforts to commercialize AI technologies. Money pours in more quickly than Jacobs manages to ask for it; two of its co-founders interviewed companies in the Toronto area, and the demand for experts in the field of AI was 10 times higher than that supplied by Canada every year. The Vektor Institute is in a sense a virgin soil for trying to mobilize the world around deep learning: to invest in this technique, to teach it, to sharpen and apply it. Data centers are being built, skyscrapers are filled with start-ups, whole generations of students are pouring into the region.
When you stand on the floor of the “Vector”, there is a feeling that you are at the beginning of something. But deep training, in its essence, is very old. Hinton’s breakthrough article, written with David Rumelhart and Ronald Williams, was published in 1986. The work described in detail the method of back propagation of the error (backpropagation), “backprop”, if briefly. Backpop, according to John Cohen, is “everything on which deep training is based – in general everything”.
If you look at the root, today AI is deep training, and deep training is a backprop. And it’s amazing, considering that the back-prop for more than 30 years. Understand how it happened, it’s just necessary: how could the technology have waited so long and then become the cause of the explosion? Because once you learn the history of the backprop, you will understand what is happening now with AI, as well as the fact that we can not stand at the beginning of the revolution. Perhaps we are at the end of that.
Walking from the Vector Institute to Hinton’s office at Google, where he spends most of his time (he is now an honorary professor at the University of Toronto) is a kind of live advertisement for the city, at least in the summer. It becomes clear why Hinton, who came from the UK, moved here in the 1980s after working at the Carnegie Mellon University in Pittsburgh.
Maybe we are not at the very beginning of the revolution
Toronto is the fourth largest city in North America (after Mexico City, New York and Los Angeles) and certainly more diverse: more than half the population was born outside of Canada. And this is seen when you walk around the city. The crowd is multinational. There are free health care and good schools, people are friendly, politicians are relatively left and stable; all this attracts people like Hinton who says that he left the US because of “Irangate” (Iran-contras – a major political scandal in the US in the second half of the 1980s, when it became known that some members of the US administration organized secret the supply of arms to Iran, thereby violating the arms embargo against that country). From this begins our conversation before dinner.
“Many believed that the US could well invade Nicaragua,” he says. “They somehow believed that Nicaragua belongs to the United States.” He says that he recently made a big breakthrough in the project: “A very good junior engineer,” a woman named Sarah Sabur, started working with me. Sabur Iran, and she was denied a visa for work in the US. Google’s office in Toronto pulled it out.
Hinton is 69 years old. He has a sharp, thin English face with a thin mouth, big ears and a proud nose. He was born in Wimbledon and in conversation resembles a storyteller of a children’s book about science: curious, enticing, trying to explain everything. He’s funny and plays a little for the audience. He is hurt to sit because of problems with his back, so he can not fly, and at a dentist’s reception lies on a device resembling a surfboard.
In the 1980s, Hinton was, as now, an expert on neural networks, a greatly simplified model of the network of neurons and synapses of our brain. However, at that time, it was firmly decided that neural networks are a dead end in AI research. Although the very first Perceptron neural network was developed in the 1960s and was considered the first step towards human-level computer intelligence, in 1969 Marvin Minsky and Seymour Papert mathematically proved that such networks can perform only the simplest functions. These networks had only two layers of neurons: the input layer and the output layer. Networks with a large number of layers between input and output neurons could theoretically solve a wide variety of problems, but no one knew how to train them, so in practice they were useless. Because of the Perceptrons, almost all of them abandoned the idea of neural networks with a few exceptions,
Hinton’s breakthrough in 1986 was to show that the back propagation method of error can teach a deep neural network with a number of layers greater than two or three. But it took another 26 years before the increased computing power. In an article in 2012, Hinton and his two students from Toronto showed that deep neural networks trained with backprop used the best image recognition systems. “Deep training” began to gain momentum. The world decided overnight that the AI would seize power in the morning. For Hinton, it was a long-awaited victory.
The field of distortion of reality
The neural network is usually depicted as a sandwich, the layers of which are superimposed on each other. These layers contain artificial neurons, which are inherently represented by small computational units that are excited – how a real neuron is excited – and transmit this excitement to other neurons with which they are connected. The excitation of a neuron is represented by a number, say, 0.13 or 32.39, which determines the degree of excitation of the neuron. And there is another important number, on each of the connections between the two neurons, which determines how much excitation should be transferred from one to the other. This number simulates the strength of the synapses between brain neurons. The higher the number, the stronger the connection, and hence the more excitation flows from one to the next.
One of the most successful applications of deep neural networks has been the recognition of images. Today there are programs that can recognize if there is a hot dog in the picture. Some ten years ago they were impossible. To get them to work, you need to take a picture first. For simplicity, let’s say that this is a black and white image of 100 by 100 pixels. You feed it to the neural network, setting the excitation of each simulated neuron in the introductory layer so that it will be equal to the brightness of each pixel. This is the bottom layer of the sandwich: 10,000 neurons (100 × 100) representing the brightness of each pixel in the image.
Then you connect this large layer of neurons to another large layer of neurons, already above, say, several thousand, and them, in turn, to another layer of several thousand neurons, but already smaller and so on. Finally, the top layer of the sandwich – the output layer – will consist of two neurons – one representing the hot dog, and the other – not a hot dog. The idea is to train the neural network to excite only the first of these neurons, if the picture has a hot dog, and the second, if not. Backpack, the method of back propagation error, on which Hinton built his career, does just that.
Backpop is extremely simple, although it works best with a huge amount of data. That’s why the big data is so important to AI – why they are so eagerly engaged in Facebook and Google and why the Vector Institute decided to establish communication with the four largest hospitals in Canada and exchange data.
In this case, the data takes the form of millions of images, some with hot dogs, some without; The trick is to mark these images as having hot dogs. When you create a neural network for the first time, the connections between neurons have random weights-random numbers that tell how much excitation is transmitted through each connection. As if the synapses of the brain are not yet tuned. The purpose of the backprop is to change these weights so that the network will work: so when you transfer the hot dog image to the lowest layer, the “hot dog” neuron in the uppermost layer is excited.
Suppose you take the first training picture with a piano image. You convert the pixel intensity of an image of 100 x 100 into 10 000 numbers, one for each neuron of the lower layer of the network. As the excitation spreads through the network in accordance with the neuronal junction force in the adjacent layers, everything gradually reaches the last layer, one of the two neurons that determine if there is a hot dog in the picture. Since this is a piano picture, the “hot dog” neuron should show zero, and the neuron “not hot dog” should show a higher number. Let’s say it does not work. Suppose the network was wrong about the image. Bekprop is a procedure for strengthening the strength of each connection in the network, which allows you to correct the error in the example of training.
How it works? You start with the last two neurons and find out how wrong they are: what is the difference between their excitation numbers and what should it really be? Then you look at each connection leading to these neurons – descending down the layers – and determine their contribution to the error. You continue to do this until you reach the first set of connections at the very bottom of the network. By this point, you know what the contribution of a single connection to a common error is. Finally, you change all the weights to generally reduce the chances of making a mistake. This so-called “method of back propagation of the error” is that you, as it were, run the errors back through the network, starting from the opposite end, from the exit.
The incredible begins to happen when you do this with millions or billions of images: the network starts to determine well whether the picture is a hot dog or not. And what is even more remarkable is that individual layers of these image recognition networks begin to “see” images in the same way that our own visual system does. That is, the first layer reveals contours – neurons are excited when contours are there, and are not excited when there are none; The next layer defines the sets of contours, for example, angles; the next layer begins to distinguish forms; the next layer finds all sorts of things like “open bun” or “closed bun”, because the appropriate neurons are activated. The network is organized into hierarchical layers, even without being programmed in this way.
Real intelligence is not embarrassed when the problem changes a little.
This is what struck all. It’s not so much that neural networks are good at classifying images with hot dogs: they build representations of ideas. With the text, it becomes even more obvious. It is possible to feed the Wikipedia text, many billions of words, a simple neural network, by teaching it to assign each word with numbers corresponding to the excitations of each neuron in the layer. If you represent all these numbers in coordinates in a complex space, you find a point known in this context as a vector, for each word in that space. Then you train the network so that the words appearing on the pages of Wikipedia will be given similar coordinates – and voila, something strange happens: words with similar meanings will be displayed side by side in this space. “Mad” and “upset” will be next door; “Three” and “seven” too. Furthermore, vector arithmetic allows you to subtract the vector of “France” from “Paris”, add it to “Italy” and find “Rome” nearby. Nobody told the neural network that Rome for Italy is the same as Paris for France.
“It’s amazing,” says Hinton. “It’s shocking.” Neural networks can be seen as an attempt to take things – images, words, recordings of conversations, medical data – and put them in, as mathematicians say, a multidimensional vector space in which closeness or remoteness of things will reflect the most important aspects of the real world. Hinton believes that this is what the brain does. “If you want to know what a thought is,” he says, “I can give it to you in a series of words. I can say: “John thought:” Oops. ” But if you ask: what is a thought? What does it mean for John to have this thought? After all, there are no opening quotes in his head, “oops”, closing quotes, in general this is not even close. In his head, there is a kind of neural activity. ” Large pictures of neural activity, if you are a mathematician, you can catch in vector space, where the activity of each neuron will correspond to a number, and each number will be the coordinate of a very large vector. According to Hinton, thought is a dance of vectors.
Now it is clear why the Vector Institute was named so?
Hinton creates a certain field of distortion of reality, you are given a sense of confidence and enthusiasm, inspiring faith that for vectors there is nothing impossible. In the end, they already created self-managed cars, cancer-detecting computers, instant interpreters of spoken language.
And only when you leave the room, you remember that these systems of “deep learning” are still pretty stupid, despite their demonstrative power of thought. A computer that sees a bunch of donuts on the table and automatically signs it as “a bunch of donuts lying on a table” seems to understand the world; but when the same program sees a girl who is brushing her teeth and says that this is a “boy with a baseball bat,” you realize how elusive this understanding, if at all, is.
Neural networks are simply thoughtless and vague image recognizers, and how useful such image recognizers can be – they are striving to integrate into any software – at best they represent a limited breed of intelligence that is easily deceived. A deep neural network that recognizes images can be completely embarrassed if you change one pixel or add a visual noise that is invisible to the person. Almost as often as we find new ways of using deep learning, so too often we face its limitations. Self-governing cars can not go in conditions that have not been seen before. Machines can not disassemble proposals that require common sense and understanding of how the world works.
Deep training in a sense imitates what is happening in the human brain, but superficially – which perhaps explains why his intellect is so superficial sometimes. Backpop was not found in the process of immersion in the brain, attempts to decipher the very thought; he grew out of models of learning animals by trial and error in old-fashioned experiments. And most of the important steps that have been taken since its inception have not included anything new on neurobiology; these were technical improvements, years of work by mathematicians and engineers. What we know about intelligence is nothing compared to what we do not know about it yet.
David Duvenod, assistant professor at the same department as Hinton at the University of Toronto, says that deep training is like engineering before the introduction of physics. “Someone writes a work and says:” I made this bridge, and it’s worth it! “. Another writes: “I made this bridge, and it collapsed, but I added supports and it stands.” And they all go crazy on the props. Someone adds an arch – and all such: the arch is cool! With physics, you can actually understand what will work and why. We have only recently begun to move on to some understanding of artificial intelligence. ”
And Hinton himself says: “Most conferences talk about introducing small changes instead of thinking and asking questions:” Why do not we do what we are doing now? What is the reason for this? Let’s focus on this. ”
The view from the outside is difficult to compose, when all that you see is the progress behind the progress. But the latest progress in the field of AI was to a lesser extent scientific and, more importantly, engineering. Although we have become better able to understand what changes deep learning systems will improve, we are still vaguely aware of how these systems work and will they ever be able to assemble into something as powerful as the human mind.
It is important to understand whether we were able to extract everything that we can from the backprop. If so, then we will have a plateau in the development of artificial intelligence.
If you want to see the next breakthrough, something like the foundation for machines with a much more flexible intelligence, you should, in theory, turn to studies similar to the backprop research in the 80s: when smart people gave up because their ideas did not work yet .
A few months ago I visited the Center for Minds, the Brains and Machines, a multi-purpose facility stationed at MIT to see how my friend Eyal Dechter defended his thesis in cognitive science. Before the beginning of the performance, his wife Amy, his dog Ruby and his daughter Suzanne supported him and wished him luck.
Eyal began his presentation with a fascinating question: how did it happen that Suzanne, who only two years old, learned to speak, play, watch the stories? What in the human brain is such that it allows him to study so well? Will the computer learn someday to learn so quickly and smoothly?
We understand new phenomena from the point of view of things that we already understand. We break the domain into pieces and study it in parts. Eyal is a mathematician and a programmer, he thinks about tasks – for example, making a souffle – like complex computer programs. But you do not learn to make a souffle, learning hundreds of tiny instructions of the program like “turn the elbow 30 degrees, then look at the table top, then pull out your finger, then …”. If it was necessary to do this in each new case, training would become unbearable, and you would stop in development. Instead, we see in the program the higher-level steps like “whip the squirrels”, which in themselves consist of subprograms like “break the eggs” and “separate the proteins from the yolks”.
Computers do not do this and therefore seem stupid. In order for the deep learning system to recognize the hot dog, you will need to feed it 40 million images of hot dogs. That Susanna recognized the hot dog, just show her the hot dog. And long before that, she will have an understanding of the language, which goes far deeper than recognizing the occurrence of individual words together. Unlike a computer, in her head there is an idea of how the world works. “It surprises me that people are afraid that computers will take their jobs from them,” says Eyal. “Computers can replace lawyers, not because lawyers do something complicated. But because lawyers listen and talk with people. In this sense, we are very far from all this. ”
Real intelligence is not embarrassed if you slightly change the requirements for solving the problem. And the key thesis of Eyal was the demonstration of exactly this, in principle, how to make the computer work this way: it’s quick to apply everything that he already knows to solving new problems, quickly grasping on the fly, becoming an expert in a completely new field.
In fact, this is a procedure, which he calls the “research-compression” algorithm. It gives the computer the function of a programmer collecting a library of reusable modular components, allowing you to create more complex programs. Knowing nothing about the new domain, the computer tries to structure knowledge about it, simply by studying it, consolidating it and then studying it like a child.
His adviser, Joshua Tenenbaum, is one of the most cited AI researchers. The name Tenenbaum surfaced in half the conversations I had with other scientists. Some of the key people in DeepMind – the development team of AlphaGo, who legendarily defeated the world champion in the game of go in 2016 – worked under his command. He is involved in a startup that tries to give self-guided cars an intuitive understanding of the basics of physics and the intentions of other drivers so that they better anticipate what is happening in situations that have not been encountered before.
Eyal’s thesis has not yet been applied in practice, even in the program was not introduced. “The problems Eyal is working on are very, very complex,” says Tenenbaum. “It takes many generations to pass.”
When we sat down to have a cup of coffee, Tenenbaum said that he was researching the history of the backprop for the sake of inspiration. For decades, back-up was a manifestation of steep mathematics, for the most part incapable of anything. As computers grew faster and technology became more complex, everything changed. He hopes that something like this will happen to his own work and the works of his students, but “it can take a couple of decades.”
As for Hinton, he is convinced that overcoming the limitations of AI is due to the creation of a “bridge between informatics and biology”. Bekprop, from this point of view, was a triumph of biologically inspired calculations; the idea was not originally from engineering, but from psychology. So now Hinton tries to repeat this trick.
Today neural networks consist of large flat layers, but in the human neocortex, the real neurons line up not only horizontally, but vertically, into columns. Hinton guesses what these columns are for – in sight, for example, they allow you to recognize objects even when the viewpoint changes. Therefore, he creates an artificial version – and calls them “capsules” – to test this theory. So far, nothing works: the capsules did not improve the performance of its networks. But in fact 30 years ago with the back -prop was the same.
“It should work,” he says of the capsule theory, laughing at his own bravado. “And the fact that it does not work yet is only a temporary irritation.”