Google’s DeepMind Artificial Intelligence AlphaGo Zero recently attained an important milestone—the Artificial Intelligence (AI) taught itself how to play the strategy game Go without any human interaction and was able to beat the world’s best Go players. The ability to reach this level of performance without human input is a significant step forward in the maturation of AI.
Over the past several years, AI has made significant progress in a wide variety of areas such as image and speech recognition, drug discovery, and algorithmic trading. In most of these cases, the AI relies on vast existing data sets and some degree of human engagement. A long-standing ambition of AI researchers has been to create algorithms that do not rely on already existing data sets nor the need for human input.
AlphaGo Zero learned in a manner similar to how we (humans) learn—it learned via experience. The differentiating factor in this instance is that the AI learned how to play on its own. There was no human interaction. The earlier Go-playing algorithms were based on more than 100,000 human games that had been played by experts. The AlphaGo Zero version was devoid of any human interaction. The AI started from scratch and learned from playing itself. It (only) took 40 days for AlphaGo Zero to play 30 million games and it learned from all of these games. AlphaGo Zero was able to defeat the world’s best GO player—which, by the way, happens to be the AI technology that originally beat the best human players. Let’s think about this for a minute…the AI that originally beat the best human players has now been soundly beaten by the next generation of AI technology.
A type of Machine Learning known as Reinforcement Learning was used by the DeepMind team. Reinforcement Learning is different from the more commonly used Machine Learning approaches of Structured and Unstructured Learning (future blog posts will go into more detail on the different models and their applicability). Reinforcement Learning is a technique in which an agent interacts with its environment. The solution to the task requires a sequence of actions and the algorithm receives feedback in the form of a reward. The reward is defined as the aim of the task and is necessary to teach the learning system. The algorithm decides on an action and then compares the outcome against the reward. The agent seeks to maximize the rewards as it learns the best sequence of actions required to solve the task. The AI performs these actions by itself with no direct instructions and/or interaction from humans. This is more powerful than the earlier versions of the AI because it is not constrained by the limits of human knowledge.
Why is this important?
The ability for AI to self-train without any human interaction is an important step forward in the application of AI to ever-increasingly complex problems in areas such as -omic (genome, proteome, microbiome, metabolome) research; drug discovery and development; energy consumption; climate change; autonomous driving; developing new materials; etc. We are in the very early phase of AI being used to augment human capabilities. The next decade is going to be very cool.