Dehumanising the Turing test

Kybernetes

ISSN: 0368-492X

Article publication date: 4 May 2010

766

Keywords

Citation

Pleming, R. (2010), "Dehumanising the Turing test", Kybernetes, Vol. 39 No. 3. https://doi.org/10.1108/k.2010.06739caa.003

Publisher

:

Emerald Group Publishing Limited

Copyright © 2010, Emerald Group Publishing Limited


Dehumanising the Turing test

Article Type: Letter to the Editor From: Kybernetes, Volume 39, Issue 3

Keywords Cybernetics, Turing test, Imitation game

Dear Mark

I have been asked to expand on the ideas mentioned in a letter written to the New Scientist in September 2004, on an improvement of the Turing test to elevate human involvement to that of an objective observer.

In the test devised by Alan Turing bearing his name in his seminal 1950 paper “Computing machinery and intelligence”, to determine whether a machine is able to demonstrate intelligence – in Turing’s words “Can machines act like they are thinking?” – a human examiner is asked to determine whether one or other of two subjects with whom he is communicating via a teletype is human. The examiner does this by establishing a question-and-answer session made up of a series of questions, possible identical, to the two subjects and the responses from both. He then assesses these responses against each other and his expectations of an intelligent human response.

It is the contention of this letter that the direct involvement of a human questioner and a human “standard of intelligence” subject in determining the outcome of the Turing test creates a flawed and questionable set of results, because of the level of subjectivity involved.

Further, this letter asserts that the promotion of the human examiner to the position of observer, rather than a participant, will give rise to:

  • a far more objective set of conclusions, with the possibility of categorising the performance of systems that are built with a goal of showing artificial intelligence;

  • a new test configuration involving systems under assessment communicating directly between themselves; and

  • a worthwhile extension of the range and categorisation of intelligent behaviour recorded.

Problems arising from human participation in the Turing test

The Turing test as originally devised has a human “interrogator” eliciting responses to identical questions from:

  • a target intelligent machine; and

  • another human as a “standard candle of intelligence”.

The fact that human intelligence varies so widely between individuals in its breadth, depth, content and communications capabilities has to generate significant concerns about the objectivity of such a test, both as to the questioner’s objectivity on choice of questions and on the assessment of answers, and in the employment of a single “standard human intelligence” as the yardstick.

It is a proposal of this letter that the human involvement in the Turing test needs to be elevated to as near to that of an objective observer as possible.

A proposal for the configuration of new test of intelligence

To eliminate the participation of a human in influencing the content of the information flows in an intelligence test, it is proposed that two identical examples of the (machine) intelligence under scrutiny are configured so as to communicate directly with each other by means of well-established protocols, such as the teletype protocol employed in Turing’s test.

This is a development of the “fly-on-the-wall” variation of the Turing test, in which the interrogator becomes an observer, with the two target systems – one human, one machine – communicating directly with each other. The “fly-on-the-wall” variation still has the human involvement, generating similar concerns about subjectivity as in the classic Turing test.

In reality, the (machine) intelligence under assessment will probably be a computer program constructed with the intention of demonstrating human-like intelligence, or even with the aim of being intelligent.

The nature of this construction of a test for intelligence therefore becomes an observation of the behaviour of a pair of communicating systems, and the reaction of one system to the responses of a second identical system.

Of course, the information that is the subject of flows between the systems being observed need not be in natural language, it might just as well be in the form of say images, pictograms, music, the sounds created by whales or dolphins, or any other sophisticated, protocol.

The evolution and categorisation of responses in the symmetric case

The balanced, symmetric nature of the proposed configuration for an improved intelligence test could allow an assessment of the attributes of the system under test against additional intelligence factors.

In particular, the important attribute of spontaneity, in terms of the unsolicited commencement of new areas of content by the system under investigation, is one key factor that the proposed configuration of coupled identical systems enables to be observed; the Turing test actually acts to inhibit the observation of the spontaneity attribute.

It is also important to assess the evolution of responses as time progresses. This is a commonly overlooked characteristic of intelligence tests that can lead to important conclusions about the nature of the system under test.

Taking a time-evolving viewpoint of this proposed coupled, symmetric, configuration of purportedly intelligent systems, it appears possible to categorise the coupled systems’ behaviour. This categorisation is based on the observed nature of the interaction between the two-coupled systems over time, which for the basis of this letter is defined as a conversation (in contrast to the question-and-answer style of the classic Turing test). Analysing the characteristics of possible conversations leads to a proposed categorisation which splits the nature of conversations as regards its intelligence into four basic levels:

Level 0.; A conversation between the two-coupled systems will commence and then stop, or may not even ever start. Systems displaying Level 0 characteristics lack the spontaneity attribute. Whilst they may mimic higher-level intelligence by generating appropriate responses under Turing test conditions, they do not have the capability to initiate or restart conversations.;

Level 1.; The conversation between the two identical coupled systems will evolve into a loop, with the same topics and conversational elements reoccurring in a predictable never-ending sequence, as recorded by the observer. This is the signal of a tightly deterministic system, where for example, the cessation of a conversation is always followed by a predictably characterised attempt by one or other system to restart the conversation.;

Level 2.; The observer will note that the conversation between the two systems will meander around a random and ever-changing set of topics with no emergence of novel ideas and concepts. Achievement of a Level 2 rating by a system is arguably the equivalent of passing the Turing test, in that the conversations appear to be equivalent to normal human-human interaction, demonstrating depth and breadth of knowledge, the ability to change topic or level of detail, as well as spontaneity and a degree of unpredictability.;

Level 3.; The observer will conclude that the conversation that is recorded includes the creation of demonstrably new and original ideas and concepts. Systems showing Level 3 characteristics will genuinely surprise their human observer with the novelty of the content of their communications, thus disproving Lady Lovelace’s Objection. The attribute of creativity is a fundamental component of what is commonly understood as being intelligence.

It is a proposal of this letter that a human-created system exhibiting behaviour that is independently classified at Level 3 would be recorded to demonstrate the elements of the creation of an artificial intelligence.

In the case of sound being the protocol for communication between systems, a system demonstrating the creation of new and original themes and variations could gain a Level 3 classification.

Further development

This letter has proposed the investigation of the symmetric configuration of two identical communicating systems that claim intelligence as a fruitful area for investigation. These “homosystemic” configurations could lead to a greater understanding of what is needed to built truly artificially intelligent systems.

It is clear that there may well be valuable conclusions to be obtained from objectively observing the behaviour of two dissimilar systems (“heterosystemic”) that are in communication. Whether or not the different the influences at work between the dissimilar systems have a bearing on the objective observation of their intelligence would be an interesting matter for investigation. Indeed, the analysis of independently observed communications between two members of mankind has in the past led to some truly important conclusions about intelligence.

Conclusions

It is hoped that the ideas presented in this letter, developing those devised by Turing, will encourage the adoption of a more objective set of tests for (machine) intelligence.

Indeed, if mankind ever sees any signs of communications of whatever nature occurring between non-human entities, for example, as the result of the search for extra-terrestrial intelligence, then these conversations may be capable of being categorised using the scale referred to earlier. But in these circumstances, the observer of these communications must be aware that a more advanced intelligence than mankind’s may reach a different subjective level of classification against the scale defined above.

Yours sincerely

Robert Pleming43 Ashburton Road. Alresford, Hampshire, SO24 9HJ.10 May 2009

Related articles