DeepMind takes baby steps, but this is significant

When, a year ago, Google bought, for a reported $400M, a small and previously little known British artificial intelligence startup called DeepMind, the acquisition was widely reported, the general press essentially saying; “Gosh golly, these tech type geniuses are, like .. wow! And they get rich too!”(1,2).

Technical news sites concentrated more on the reasons Google would be interested in such a company, connecting it with Google’s buying spree of robotics firms and pointing out that artificial learning could be useful in its core business of selling (more or less) accurately targeted advertising.

But last week’s publication in Nature of a paper by the DeepMind team claiming that their software has been able to learn to successfully play 49 early video games points toward something much more profound than this – because the software they produced has learned to play the games without any previous information about what the games were, or what success consisted of. The only inputs were the screen itself (exactly what a human would see) and the score, and the only outputs from the software were valid actions that could be produced by the game controller. From nothing more than repeated ‘plays’ the software ‘discovered’ strategies for success, and in a quite number of cases (29!), learned to play the game better than humans can.

‘But these are trivial games’, you might say; computers have been able to beat the world champion Chess players for years, and are now making progress even at the game of Go. And you’d be right, but the difference is in the way the software learned to play.

XKCD comics – gotta read them all!

The amazing results in highly complex traditional games (it is estimated that the number of possible variants in the game of Go played on a 19×19 board is so large that no duplicate game has ever unintentionally been played) have all been achieved by first setting up the rules – including the winning conditions – in software, adding in methods of ranking possible future positions, then using the speed of computing operations to examine huge numbers of possible moves and the possible moves that could follow them to discover an optimal move. Onto these rule-based analytical techniques is layered encyclopaedic information on well-documented aspects of the game (typically opening and end-game situations). The computer has not learned to play the game – it has been taught many things about the game, and then set up to be super-humanly strong by using brute-force techniques based on game-specific algorithms developed by human programmers.

It is often pointed out that nay-sayers about Artificial Intelligence continually move the goal-posts whenever computer achieves a previously impossible seeming target – when Deep Blue beat Garry Kasparov at chess, it was claimed that the computer wasn’t really playing chess in the way that a human did, that the contest therefore wasn’t fair, and that anyway computers playing Go were not even able to beat low-level amateur players.

In an article in MIT Technology Review in 2007 [link – note pay barrier after first page], philosopher Daniel Dennet suggests that this is a poor argument – that a human brain is not really operating in a fundamentally different way, that it is;

still, so far as we know, a massively parallel search engine that has an outstanding array of heuristic pruning techniques that keep it from wasting time on unlikely branches.

Which brought me up short; in a somewhat weak-minded way (and as a rank amateur player of Go) I had been comforting myself that humans were still, somehow, doing something special when playing these games, and that the brute-force techniques used by computers, while not perhaps ‘unfair’ (such a concept seems hard to apply to a contest between two such different sets of apparatus), did nevertheless belong to a different category – one which must ultimately be limited to particular kinds of operation.

Reading about recent progress with Go software reinforced this human-centric view (even while computers seem about to clear yet another hurdle), as researchers have adopted an even less recognisable approach than that used by AI chess programmers, resorting to selecting possible moves for analysis at random, attempting no analysis of position, but taking ‘trees’ of resulting possibilities all the way through to the conclusion of the game, using margin of eventual victory as a means of scoring each move – a so called Monte-Carlo Tree Search approach. Brute force indeed!

However, Dennet’s ridicule of the idea that human protein-based machines have some sort of privileged status when compared to electronic digital computing machines is hard to resist, particularly when the techniques used in the protein-based machine are so poorly understood (and, it seems, take place at a layer removed from consciousness).

So, do we move the goal-posts again? Point out that everything the computers know about the games that they play have been taught them by humans (in point of fact, many chess programmes embody learning abilities of various kinds)?

Well, even if we did, the DeepMind result is something else again – the software taught itself to play these (admittedly simple) games, and taught itself to play well. NO GAME-SPECIFIC PROGRAMMING OR INFORMATION WAS USED. This is what is called general intelligence – a system with no special pre-installed intention except to ‘do well’, persistently interacts with its environment, repeatedly adjusting itself to ‘fit’ the environment better in such a way that the ‘do well’ score is improved [sounds a little like a rather abstract description of evolution, no? Well you’d have a point; one of the mathematical techniques used, Gradient Descent, looks a lot like the description of the evolutionary process as a traverse across a landscape of troughs and ridges of relative ‘fitness’].

Of course, as the DeepMind people say themselves, this result is in a strictly limited field;

The ultimate goal is to build smart, general-purpose [learning] machines. We’re many decades off from doing that,” said artificial intelligence researcher Demis Hassabis, coauthor of the study published online Wednesday in the journal Nature. “But I do think this is the first significant rung of the ladder that we’re on.

Thinking about all this, I was reminded powerfully of the sequence in Greg Egan’s hard-SF novel Diaspora, which contains a bravura first chapter in which he imagines the emergence into consciousness of an AI with human-like intelligence, from its inception as a ‘mind seed, a string of instruction codes like a digital genome’, through a process modelled on the way an embryo develops and gradually differentiates under the influence of attenuating fields of influence over thousands of iterations, until it becomes a conscious being called Yatima; ‘And the model of Yatima’s beliefs about Yatima’s mind became the whole model of Yatima’s mind‘.

The point for me being that human-like intelligence is not a simple thing, and not simply developed – that it consists of many systems, layered one over another in ways which we are only beginning to comprehend [just one example; neural nets take a single understanding of human brain mechanics – that the likelihood of the firing of a synapse is the feature that underlies networks, but ignore other mechanisms, like the simultaneous existence of chemical fields which operate over different timescales, attenuate over distance and increase the likelihood of firing of nearby but not directly connected synapses. Link – note pay barrier].

By contrast, AI researchers have enough to do to make simplistic, single-approach models produce useful results.

In my view, it is going to take the integration of refined versions of many of these single-approach systems before we can begin to approach the true dreams (and fears) of strong AI.

EDIT: An edition of ‘The Forum’ on the BBC World Service, broadcast yesterday, has some fascinating discussion about current work in AI, wider and deeper than the DeepMind coverage. Look here for the episode called ‘Deep Learning’ – there is also a short but useful essay, which suggests that the sort of learning DeepMind can do may operate by at best a partial model of the human mode – which involves creation on the basis of stimuli, rather than creation of a model that fits the stimuli.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.