Alan Turing is most famous to the public for his role in cracking the German Enigma code at Bletchley Park in World War 2. He’s also quite famous for the Turing Test, to compare human with computer intelligence. But computer nerds should also know about his seminal contributions to computers, programming, and computer science, especially Turing Machines, described in his 1936 paper “On Computable Numbers, with an Application to the Entscheidungsproblem” (PDF). It was intended to prove that a general solution to a certain esoteric problem in metamathematics is impossible, and had the side effect of founding the subject of computer science. Turing needed some way to formalise what mathematicians do when they calculate, so he imagined converting human computers into mechanical computers. In 1936, a computer was a person who computed, not the electronic digital devices we use today.
What is notable about this paper? Here is a list of some of the things Turing either invented in this paper, or gave an important new twist to:
- a definition of computability;
- finite state machines (FSMs);
- Turing Machines (TM, the simplest computing model);
- the Universal Turing Machine (an interpreter that can simulate any TM);
- stored-program computers;
- computer programs (after Babbage/Lovelace);
- skeleton tables (= macros/functions/subroutines);
- computer science, esp. the halting problem;
Turing’s paper On Computable Numbers established some of the basic principles of modern digital computers, and showed that there are some things that computers can never compute. Unfortunately for Turing, a proof about the esoteric problem (the Entscheidungsproblem) was published by Alonzo Church just before he finished his paper. So he was scooped. However, his supervisor felt that Turing’s approach to the problem was so original that he should publish it anyway. This seems very reasonable, given that Turing invented computers just to solve his math problem 🙂
This paper is obviously very much concerned with numbers, but you don’t need a Phd in mathematics to appreciate it. Probably the minimum requirement would be to at least understand that there are different kinds of number, such as natural numbers (0, 1, 2, 3, …), integers (natural numbers plus negative numbers), rationals (numbers expressed as ratios of integers), and reals (numbers represented with decimal points). There are many more kinds, and Turing claims that there are also computable and non-computable numbers. He defines computable numbers as the real numbers which can be calculated by finite methods. A number is computable if it can be written down by a machine, and he’s going to demonstrate some automatic machines (we now call them Turing Machines or TMs) that can do this. Each machine is built to solve a specific problem (or algorithm as we now would say), and one of Turing’s amazing insights is that you can design a single machine to do any algorithm, by writing it on a tape to be processed by the machine. We now call this the Universal Turing Machine – or computer…
The “computable” numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means… a number is computable if its decimal can be written down by a machine.
1. Computing machines.
He introduces his automatic machine, which can be in one of a finite number of configurations (or states), and which can scan a tape divided into squares in each of which which there may be a symbol. The machine’s next operation is determined uniquely by the combination of current state, and symbol. It might, for example, write a new symbol on the square, and move along to the next square (left or right), and change to a new state. He claims that this very simple machine can compute any number which can be computed by any other finite means.
Turing defined computable numbers as the real numbers which can be computed to any arbitrary precision by a finite terminating algorithm; i.e. as those whose decimals can be computed in a finite time. This will be done by an imaginary machine that comprises a finite state machine (FSM, not the flying spaghetti monster) connected to an infinite tape divided into squares, via a read-write head that can scan left or right one square at a time, reading and writing as it goes. This was motivated by thinking of a mathematician (or maybe an accountant) doing their math by writing neatly on a sheet of paper. The 2-dimensional nature of the paper can be substituted by a 1-dimensional tape, considering that the computer (who at that time, was a human calculator) would write say, left to right, top to bottom, and might backtrack sometimes, and alter some previous symbols.
We may compare a man in the process of computing a real number to a machine which is only capable of a finite number of conditions q1, q2, …, qR which will be called “m-configurations”. The machine is supplied with a “tape”, (the analogue of paper) running through it, and divided into sections (called “squares”) each capable of bearing a “symbol”. At any moment there is just one square, say the r-th, bearing the symbol S(r) which is “in the machine”. We may call this square the “scanned square”. The symbol on the scanned square may be called the “scanned symbol”. The “scanned symbol” is the only one of which the machine is, so to speak, “directly aware”. However, by altering its m-configuration the machine can effectively remember some of the symbols which it has “seen” (scanned) previously. The possible behaviour of the machine at any moment is determined by the m-configuration qn and the scanned symbol S(r). This pair qn, S(r) will be called the “configuration”: thus the configuration determines the possible behaviour of the machine. In some of the configurations in which the scanned square is blank (i.e. bears no symbol) the machine writes down a new symbol on the scanned square: in other configurations it erases the scanned symbol. The machine may also change the square which is being scanned, but only by shifting it one place to right or left. In addition to any of these operations the m-configuration may be changed. Some of the symbols written down will form the sequence of figures which is the decimal of the real number which is being computed. The others are just rough notes to “assist the memory”. It will only be these rough notes which will be liable to erasure.
The FSM can be specified by a table, each row being indexed by the ‘state’, and each column being indexed by the current symbol. The cell thus indexed contains instructions on which symbol to write back to the current tape position, which way to move the head, and which state to go to next. You may notice that this table sounds very much like a computer program!
The Turing Machine consists of:
- an infinite tape divided into cells along its length. Each cell contains a symbol, such as a letter or a number, taken from a specified set such as ‘0’ and ‘1’.
- a head which can read or write symbols on the tape, and move 1 cell one way or the other.
- a register that stores the current state of the machine. This machine can be in one of a finite number of states, each of which determine what happens next when a given symbol is read.
- a table of instructions that specify what the machine does, such as replace the current symbol with a new one, move along the tape, and what will be the next state. This table is a kind of program indexed by the state and current symbol.
Here’s a simple program to add 2 unary numbers (numbers in base 1, e.g. 5 = 11111).
|0||1, +1, 1||1, +1, 0|
|1||0, -1, 2||1, +1, 1|
|2||0, +1, 2||0, +1, 9|
The machine starts in state 0. If it reads a 1 from the tape, it gets the ‘instruction’ 1, +1, 0. The first element is the symbol to write back to the tape (replacing the current symbol). The 2nd is the direction to move the head: -1 = left, +1 = right. The 3rd is the new state. This is effectively a primitive programming language. You wouldn’t want to use this instead of Python! However, it is, in principle, as powerful as Python, in that it can compute anything that Python can! Here is a modern simulation in Python of this Turing Machine:
If at each stage the motion of a machine (in the sense of § 1) is completely determined by the configuration, we shall call the machine an “automatic machine” (or a-machine).
He distinguishes between automatic machines and choice machines (which we now call interactive).
He explains that his computing machine prints 2 kinds of symbols: figures, which are 0 and 1 (i.e. binary digits), and ‘symbols of the second kind’. The result of the machine’s computation will be sequence of figures (i.e. 0 & 1) printed by the machine, preceded by a decimal point (wrong, Turing – a binary point!).
If an a-machine prints two kinds of symbols, of which the first kind (called figures) consists entirely of 0 and 1 (the others being called symbols of the second kind), then the machine will be called a computing machine. If the machine is supplied with a blank tape and set in motion, starting from the correct initial m-configuration, the subsequence of the symbols printed by it which are of the first kind will be called the sequence computed by the machine. The real number whose expression as a binary decimal is obtained by prefacing this sequence by a decimal point is called the number computed by the machine.
Circular and circle-free machines.
He explains that the machine could get stuck, and not print any more figures (i.e. bits). This is a circular machine. We want circle-free machines.
If a computing machine never writes down more than a finite number of symbols of the first kind, it will be called circular. Otherwise it is said to be circle-free.
Computable sequences and numbers.
A sequence is said to be computable if it can be computed by a circle-free machine. A number is computable if it differs by an integer from the number computed by a circle-free machine.
3. Examples of computing machines.
He shows examples of machines to compute these sequences: 010101…, 001011011101111011111…
4. Abbreviated tables.
There are some processes that turn out to be often useful in constructing these machines, and so he introduces “skeleton tables.”
There are certain types of process used by nearly all machines, and these, in some machines, are used in many connections. These processes include copying down sequences of symbols, comparing sequences, erasing all symbols of a given form, etc. Where such processes are concerned we can abbreviate the tables for the m-configurations considerably by the use of “skeleton tables”.
Some people point out (e.g. Petzold) that this is like a subroutine. I prefer to think of it as a macro, whose text is to be substituted in-place. I don’t see any jump/return mechanism? However, Turing does call them m-functions, but states that they are to be regarded as abbreviations; they aren’t essential. More examples follow. These examples will be useful for the later construction of the UTM.
5. Enumeration of computable sequences.
He introduces a ‘standard description’ (SD) – which I think we can recognise now as a programming language, defined by an agreed standard. This SD can be converted to a ‘description number (DN), and this is reminiscent of Goedel’s trick of encoding logical statements as numbers! So every unique TM corresponds to a unique DN. He now states that:
To each computable sequence there corresponds at least one description number, while to no description number does there correspond more than one computable sequence. The computable sequences and numbers are therefore enumerable.
Turing isn’t interested at this time, in using the machine to do everyday calculations. He wants it as a representation or model of mathematical procedure, and he wants to be able to treat this model as a mathematical item in its own right. This he does by constructing a ‘description number’ from the table of instructions, as follows: take the first row, and append the second row, then the next, and so on, so that the table has become one long row. This row is a unique identifier for the algorithm represented by the original table.
6. The universal computing machine.
This was a huge surprise in 1936 – one machine to rule them all! We are now so familiar with computers that we might find it hard to appreciate just how much of a big deal it was then! Here, Turing outlines how he’s going to construct this machine.
Note that each Turing machine so far can only do the one algorithm it was designed for, but next, Turing constructs the Universal machine. It processes the description number when it’s written on the tape to be read by the Universal machine! This UTM is by no means a simple thing, and in modern terms we might call it a compiler; it compiles description numbers into the ‘assembly language‘ of Turing Machines, namely instructions concerning what to write on the tape, which way to move the head, etc.
It is possible to invent a single machine which can be used to compute any computable sequence. If this machine M is supplied with a tape on the beginning of which is written the S.D of some computing machine M, then U will compute the same sequence as M. In this section I explain in outline the behaviour of the machine. The next section is devoted to giving the complete table for U.
We believe that the UTM can compute anything that can be computed by man or machine. This is the Church-Turing thesis; not formally proven, but strongly believed to be true. No supercomputer can, in principle, compute anything that cannot be computed by the UTM. But keep in mind that this isn’t intended to be a practical device that you would build for serious operation! It’s very much like Einstein’s ‘thought experiments’, a mental model that allows one to derive consequences, much like a set of axioms and rules of inference can be used to derive mathematical theorems.
7. Detailed description of the universal machine.
Here he gives the complete table for the universal machine.
8. Application of the diagonal process.
This section is the heart of the paper. The diagonal process was made famous by Cantor, as a way to show that the real numbers aren’t enumerable. Turing tries to apply this process to the computable numbers and sequences, and fails – thanks to the famous halting problem. So the computable numbers are enumerable. This is critical – you could list them, unlike the reals. There are infinitely more reals than computable numbers, therefore some of them cannot be computed.Turing needs a couple of techniques to get to his main result. One is due to Kurt Gödel, an Austrian-American logician. Gödel published his two incompleteness theorems in 1931. The first incompleteness theorem states that for any self-consistent mathematical system powerful enough to describe the arithmetic of the natural numbers, there are true propositions about the naturals that cannot be proved from the axioms. To prove this theorem, Gödel developed a technique now known as Gödel numbering, which codes formal expressions as natural numbers. Similarly, Turing is going to number his machines (but not the way Gödel did).
He claims that these machines can be enumerated (or listed), and he tries to apply the next technique, due to Georg Cantor to test the claim. Cantor originally used it to to show that the real numbers (i.e. numbers with possibly non-terminating decimals) cannot be enumerated. This is the famous diagonalisation procedure, which works by taking a claimed listing of numbers, and changing the nth decimal of the nth number to something else. This number cannot be in the list, since it differs from every number on it. Add the number to the list, and repeat…
This is a powerful and general technique that has since been used in a wide range of proofs, also known as diagonal arguments by analogy with the argument used in this proof. The most famous examples are perhaps Russell’s paradox, the first of Gödel’s incompleteness theorems, and Turing’s answer to the Entscheidungsproblem.
The heart of Turing’s paper is 8. Application of the diagonal process. He has shown that his machines embody the notion of mathematical computation, and that they can be represented by a table of instructions, which can be ‘Gödelised’ into a ‘description number’ DN. This process can be reversed to recreate the original Turing machine. Now imagine that you have a program that given a DN will tell you if this machine will always halt on all possible data inputs. If this program is used to analyse itself, at some point it must consider itself analysing itself… and so on ad infinitum, and it will never halt.
Alternatively, consider the program H that contains a halt-detector D as a subroutine. After calling D on the supplied input, it goes into an infinite loop if D says the supplied input program always halts, and halts if it doesn’t. If you supply H as the input to H, then if D says that H halts, H goes into the infinite loop – and so cannot halt! But if D says that H doesn’t halt, it does…
The motivation for this is to show that TMs are enumerable, i.e. can be listed, like the integers. In which case, they are infinitely outnumbered by the real numbers and so there are real numbers that cannot be computed. The TMs wouldn’t be enumerable if you could apply Cantor’s diagonalisation process to any purported list of TMs, to generate one not on the list. This cannot be done because the diagonalisation process can’t be guaranteed to terminate.
Here is the actual paragraph from Turing’s paper, explaining the problem. He wants to compute the diagonal digits ala Cantor from a list of TMs. He is considering the machine H which must test the integers i = 1,2,3, …. N to see if they are the D.Ns of circle-free (i.e. halting machines, which are ‘satisfactory’). We may imagine that H is built up from three sub-machines, P, D and U. P parses the supplied integer i to see if it’s a valid TM (most aren’t). D checks for circle-free machines. Valid TM integers that are circle free will be passed to U (a UTM) to be run up to the point where i = N, so that we can extract the nth digit of the nth TM, to add to the sequence for the Cantor diagonal. But the machine H must itself be somewhere in this list; suppose its number is K. R is a tally of the currently known valid circle-free TMs, we need it to keep track of which digit (‘figure’) we’re computing in the diagonal.
Now let K be the D.N of H. What does H do in the K-th. section of its motion ? It must test whether K is satisfactory, giving a verdict “s” or “u”. Since K is the D.N of H – and since H is circle-free, the verdict cannot be “u”. On the other hand the verdict cannot be “s”. For if it were, then in the K-th. section of its motion H would be bound to compute the first R(K—1) + 1 = R(K) figures of the sequence computed by the machine with K as its D.N and to write down the R(K)-th as a figure of the sequence computed by H. The computation of the first R(K) — 1 figures would be carried out all right, but the instructions for calculating the R(K)-th. would amount to “calculate the first R(K) figures computed by H and write down the R(K)-th”. This R(K)-th figure would never be found. I.e., H is circular, contrary both to what we have found in the last paragraph and to the verdict “s” . Thus both verdicts are impossible and we conclude that there can be no machine H.
In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running or continue to run forever. Turing proved that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist. Turing never actually called it the halting problem; Jack Copeland (2004) attributes the term to Martin Davis. A consequence of the halting problem’s undecidability is that there cannot be a general algorithm that decides whether a given statement about natural numbers is true or not. The proposition that a certain program will halt given a certain input can be converted into an equivalent statement about natural numbers. If there was an algorithm that could determine the truth of every statement about natural numbers, it could certainly find the truth of this one; but that would determine whether the original program halts, which is impossible, since the halting problem is undecidable.
9. The extent of the computable numbers.
He uses 3 different methods to try to convince us that his computable numbers include all numbers which would naturally be regarded as computable:
- a direct appeal to intuition;
- a proof of the equivalence of two definitions;
- examples of large classes of numbers which are computable.
10. Examples of large classes of numbers which are computable.
11. Application to the Entscheidungsproblem
The Entscheidungsproblem was concerned with whether it is possible, in principle, to find a definite method, or mechanical procedure – now known as an algorithm – to determine if a given mathematical proposition is provable from a given set of axioms and rules. Entscheidungsproblem is German for “decision problem”, that is, the problem of deciding if the proposition can be proven. It originated from David Hilbert‘s desire to show that mathematics is built on solid foundations; for example, there cannot ever be contradictory theorems (one says that P is true, another says that P is false); and all true theorems can be proven (in principle). This desire was shattered in part by Goedel’s Incompleteness Theorems in 1931, and Turing’s 1936 paper (and others).
In 1928 David Hilbert posed the following problem (and several others) to the mathematics community: create an algorithm that scans a statement in first-order logic and answers yes or no to the question of whether it is a true statement. Then, there would not be any such thing as an unprovable theorem.
While the failure of Hilbert’s program was a shattering blow at the time, the context is very arcane and most mathematicians carried on doing math as usual, shaking their heads and muttering “Whodathunkit?”. Math still works, but now we know that there are some theorems that might never be proven. But what turned out to be the real gold nugget of Turing’s paper, was that he effectively founded the field now known as computer science. We might also generously give him the honour of having accidentally invented computers, but as with most claims of priority and credit, the context is a tangled web of other actors and events, such as John von Neumann, Konrad Zuse, World War 2, etc. But most people will agree that Turing’s 1936 paper was seminal to computer science, and contained many of the basic concepts of modern computing, such as stored data, programs and subroutines.
Alonzo Church showed in 1935-6, by means of his lambda calculus, that a general solution to the Entscheidungsproblem was impossible. Slightly later Turing also proved this, by constructing what he called ‘automatic machines’, and we now call Turing Machines. Normally, there wouldn’t be much motivation for publishing the same result twice, but Turing’s approach was so original, intuitive and compelling – Goedel himself favoured it over Church’s method – that it was deemed worthy of publication in the London Mathematical Society‘s proceedings. Turing also demonstrated the equivalence of his and Church’s methods, now known as the Church-Turing thesis.
Who Invented the Computer?
Questions such as who invented the Internet, cars, calculus, television, flying machines often, perhaps even usually, really don’t have a simple answer in terms of one lone superhero. The Internet wasn’t invented by Tim Berners-Lee (he merely placed the world-wide web upon it), cars were originally horse-drawn carriages, Newton & Leibniz both independently invented calculus, and so on. Each invention and discovery usually takes place in a complex web in space and time of several actors and events, and if the commonly acknowledged inventor/discoverer had not put the final piece in place – such as Einstein did for Special Relativity – someone else would have done so later (e.g. Poincare or Lorentz).
In the case of computers, some of the candidates are
- Charles Babbage & Ada Lovelace (1842)
- Alan Turing (1936, 1946)
- Konrad Zuse (1941)
- John von Neumann (1945)
- John Mauchly and J. Presper Eckert (1946)
We could debate endlessly about who “invented computers”, but I think it’s more productive to first ask, what is it about modern computers that makes them so powerful and ubiquitous? Their speed, small size, and low cost are definitely very important, but it’s their flexibility that sets computers apart from all other inventions. They can perform an incredibly wide variety of tasks. They are extremely universal. And it was the Universal Turing Machine that taught us this! The fundamental properties of the digital computer are these:
- an algorithm can be expressed as a program of instructions;
- the instructions can be obeyed in non-sequential order (loops, conditionals);
- programs are stored in the computer as data that can be read by other programs.
The UTM was in effect the first compiler: it scans other TMs and translates them into Turing Machine code which can then be executed. This is the fundamental principle of modern programming, without which your PCs, laptops, smartphones, supercomputers, and embedded computers would be totally dumb. For this reason, we now call programming languages that can truly compute anything, Turing complete.
[welcomewikilite wikiurl=”https://en.wikipedia.org/wiki/Universal_Turing_machine” sections=”Short description” settings=””]
- The Alan Turing Internet Scrapbook: Computable Numbers and the Turing Machine, 1936
- The Alan Turing Internet Scrapbook: Who Invented the Computer?
- A geek with a hat Week 1: Turing’s On computable numbers
- The Annotated Turing
- Computable number, From Wikipedia, the free encyclopedia
- Turing’s Landmark Paper of 1936
- The original paper, PDF
- Turing’s Proof (Wikipedia)
- Proof That Computers Can’t Do Everything (The Halting Problem) (YouTube)
- Alan Turing: Is he really the father of computing?
- What is a Turing Machine? By Jack Copeland