The history of machine learning

Today, machine learning powers tools such as self-driving cars, voice-activated assistants and social media feeds.

However the ideas behind machine learning have a long history, and rely on maths from hundreds of years ago and the enormous developments in computing in the last 70 years.



Stellario Cama

Bayes Theorem Stylised Equation (Credit: Stellario Cama)

Bayes’ theorem defines the probability of an event based on prior knowledge of conditions that might be related to it

Many of the mathematical underpinnings of modern machine learning predate computers and come from statistics.

Major breakthroughs include the work of Thomas Bayes in the 18th century, which led Pierre-Simon Laplace to define Bayes’ Theorem (1812). Adrien-Marie Legendre also developed the Least Squares method for data fitting (1805), and Andrey Markov described analysis techniques later called Markov Chains (1913). These techniques are all fundamental to modern machine learning.

BBC: The most beautiful equation is... Bayes' TheoremIntroduction to Markov Chain Monte Carlo Methods


Stored program computer

Computer Laboratory, University of Cambridge

EDSAC I, R.Hill operating

EDSAC I operator at work in the lab

In the late 1940s work proceeded to develop stored-program computers that hold their instructions (programs) in the same memory used for data.

The first computers of this type began the modern computing revolution. They were the Manchester Small-Scale Experimental Machine (nicknamed 'Baby') in 1948, Cambridge’s EDSAC and the Manchester Mark 1 in 1949, and the University of Pennsylvania’s EDVAC in 1951.

BBC: Programming in the early days of the computer ageThe 'Manchester Baby'


Computing machinery and intelligence

Slate statue of Mathematician Alan Turing at Bletchley Park - shutterstock_436299130

Alan Turing statue at Bletchley Park

In 1950 Alan Turing published Computing Machinery and Intelligence, in which he asked: “Can machines think?” – a question that we still wrestle with.

Based on the growing understanding of the power of computers, the paper was one of the first attempts to describe how ‘artificial’ intelligence could be developed. It famously proposed the 'imitation game', a test to determine whether a computer was intelligent by asking a person to distinguish between a human and a computer when communicating with them both through typed messages. The BBC invited Turing to give a talk on its new radio show Third Programme in 1951 and again in January 1952.

BBC: Alan Turing, creator of modern computingBBC: Alan Turing and the experiment that shaped AI

He has a lively mind but I am very doubtful about him."

BBC radio producer on Turing


First neural network


Marvin Minsky in a lab at M.I.T. in 1968

Marvin Minsky at MIT in 1968

Marvin Minsky and Dean Edmonds built the first artificial neural network – a computer-based simulation of the way organic brains work.

The Stochastic Neural Analog Reinforcement Computer (SNARC) learned from experience and was used to search a maze, like a rat in a psychology experiment. It was built along connectionist principles, representing a mind as a network of simple units within which intelligence may emerge. Minsky went on to work at the MIT Artificial Intelligence Laboratory and made many other significant interventions in the AI debate. He was an advisor on the film 2001: A Space Odyssey.

BBC: AI pioneer Marvin Minsky dies aged 88


The first AI 'winter'

Sir James LIghthill

Sir James Lighthill, author of 'Artificial Intelligence: A General Survey'

During the 1950s and 1960s there was enormous enthusiasm for AI research, but people became disillusioned when breakthroughs didn't happen.

The failure of machine translation and overselling AI's capabilities led to reduced funding. The 1973 Lighthill Report to Parliament looked at the state of research and noted the failure to deliver ‘grandiose objectives’. Things changed in the late 1980s with new approaches, the emergence of expert systems and rediscovering old ideas that could be applied in new settings. Different names came about to describe AI, like informatics, machine learning and computational intelligence.

Lighthill ReportBBC 1973 TV debate on the Lighthill controversy


Deep Blue beats Garry Kasparov

World chess champion Garry Kasparov explains the significance of his defeat by a computer

Garry Kasparov lost his second match to a computer

Public awareness of AI increased greatly when an IBM computer named Deep Blue beat world chess champion Garry Kasparov in the first game of a match.

Kasparov won the 1996 match, but in 1997 an upgraded Deep Blue then won a second match 3½ games to 2½. Although Deep Blue played an impressive game of chess it largely relied on brute computing power to achieve this, including 480 special purpose ‘chess chips’. It worked by searching from 6-20 moves ahead at each position, having learned by evaluating thousands of old chess games to determine the path to checkmate.

BBC: How a computer beat the world's best chess playerBBC: Garry Kasparov on why the world should embrace AI



neural network image recognition - example of cat

Neural network image recognition via backpropagation

One of the core techniques used in maching learning systems is backpropagation, used to train deep neural networks.

First described in the 1960s as part of control theory and adopted for neural networks, backpropagation fell out of favour until work by Geoff Hinton and others using fast modern processors demonstrated its effectiveness. Deep learning nets are now a mainstay of current machine learning. In 2017 Hinton, who now works for Google, expressed concerns that backpropagation has reached its limits in building machine learning systems and that new insights from biology are needed.

BBC: Toronto's Vector Institute for AI and Geoff Hinton




Google DeepMind artificial intelligence moving an animated figure

DeepMind artificial intelligence moving an animated figure

DeepMind Technologies is a British company founded in 2010 and acquired by Google in 2014.

DeepMind gained prominence when it developed a neural network that could learn to play video games simply by analysing the behaviour of pixels on a screen. It also built a neural network that can access external memory – a Neural Turing Machine. In 2016 DeepMind was involved in a controversy over the use of NHS patient data from the Royal Free Hospital to train a medical system, and the UK Information Commissioners Office ruled that it had breached the Data Protection Act.

BBC: DeepMind explores inner workings of AIBBC: Google DeepMind AI becomes more alien


AlphaGo beats Lee Sedol

Human hand playing Go game with virtual hand overlay

Go was considered a more difficult game than chess for AI to master

If Deep Blue’s chess expertise was the big AI story of the last millennium, then AlphaGo’s success at Go has replaced it in popular culture.

Developed by DeepMind researchers, AlphaGo won its first match against a professional in 2015, beat the world’s number two player Lee Sedol in March 2016 and the number one player Ke Jie in 2017. AlphaGo’s neural network is trained by playing both humans and computers, and uses a Monte Carlo tree search algorithm to find moves. Its success is significant as AI researchers consider the game of Go to be a hard problem and had not anticipated that humans would start losing to computers so soon.

BBC: Google AI defeats human Go championBBC: What is the game Go?


The singularity

What does the future hold for humanoid robots?

BBC Click's Spencer Kelly and a humanoid robot

Some computer scientists believe that once we develop a generalised AI at or above human level then it will develop more advanced versions of itself.

This process is called the singularity, a term first used in this way by SF author Vernor Vinge. If this happens then the resulting exponential growth in AI capability will rapidly transcend human intelligence and we may find ourselves subservient to the machines. The likelihood of the singularity is studied by organisations such as the Cambridge Centre for the Study of Existential Risk, as even if it is considered very unlikely it would pose a serious threat to human survival.

BBC Radio 4: The Singularity