Wired for Speech: How Voice Activates the Human‐Computer Relationship

Philip Barker (University of Teesside, Middlesbrough, United Kingdom)

The Electronic Library

ISSN: 0264-0473

Article publication date: 1 March 2006

389

Keywords

Citation

Barker, P. (2006), "Wired for Speech: How Voice Activates the Human‐Computer Relationship", The Electronic Library, Vol. 24 No. 2. https://doi.org/10.1108/02640470610660459

Publisher

:

Emerald Group Publishing Limited

Copyright © 2006, Emerald Group Publishing Limited


Effective voice interaction with computers depends critically upon two broad underlying technological requirements. First, it is necessary to have available “the technology” to analyse authentic human utterances (as input) and, when necessary, acoustic interfaces must be able to create synthetic utterances (for output). Second, it must be possible to embed (within a machine) the knowledge needed to understand (and act upon) the information contained in sonic utterances. In this book, the two authors describe a body of research into the use of voice interfaces that has been undertaken over the last ten years at the CHIMe laboratory at Stanford University in the USA. (CHIMe is an acronym for Communication between Humans and Interactive Media.) Naturally, the implications of voice input and output for use within library systems and information retrieval tools are many and varied.

Essentially, this book “describes and synthesises” the authors' researches into voice interfaces in terms of how the human brain is activated by voice and how computers can best relate to human beings. There are fifteen chapters in the book – and an extensive collection of endnotes and references. The first (relatively short) opening chapter provides some of the essential background and underlying rationale for the book. The immediately following chapters then deal with gender (male and female voices, stereotyping and social identity) and its importance in designing voice interfaces to toys, cars and computers. Some of the important issues relating to the “personality of voices” are presented in Chapters 4 and 5. The topics covered in this section of the book include extroversion/introversion, similarity attraction, consistency attraction, personality assignment as a result of voice analysis and the use of multiple personalities in voice interfaces.

Obviously, accents, race and ethnicity are likely to have a significant influence within voice interfaces – especially when they are augmented by pictures. These effects are considered in Chapter 6. Here the authors describe an online e‐commerce experiment that was designed to explore some of the behavioural issues associated with these variables. Naturally, human behaviour is strongly related to emotion. The nature of emotion (from a human perspective) is discussed in Chapter 7 – the important issue being how a user's emotional state influences interaction with voice interfaces. Chapter 8 goes on to consider emotion from a voice‐interface perspective – that is, how the manifestation of emotion in voice interfaces affects a user's behaviour. Increasingly, in many situations there is a need to use multiple voices within an aural interface. The rationale for this approach, some of its implications and the results of laboratory experiments to study its effects are discussed in Chapter 9.

Many of the experiments described in this book use various forms of synthetic speech. Chapters 10 through 12 therefore explore various issues relating to the use of synthetic and recorded voices within interfaces. Some of the topics discussed in these chapters include: when to use “I” and “we” in voice interfaces, anthropomorphism (based on the use of facial images to augment speech), the analysis of facial emotion and the use of multiple faces. The problems of mixing synthetic and recorded voices are also explored (via experiments) and discussed in this section. Chapter 13 deals with the topic of “Communication Contexts” – that is, the evolutionary, physical, cultural and social environment that determines what can and cannot be said. One interesting context considered in this chapter is “the feeling of being recorded” (during a voice transaction) and its influence on creativity and disclosure. Unlike textual and graphical user interfaces, voice interfaces do not always understand what a user wants to do due to misrecognition problems. Some of the issues associated with misrecognition in voice interfaces are described and discussed in Chapter 14. In the final (very short) chapter, the authors briefly summarise their findings and, in so doing, emphasise the social nature of speech communication. Indeed, their vision of the future foresees speech interaction between people and computers leading to the development of highly cooperative problem solving environments.

The final part of the book (some 75 pages) is devoted to a very extensive collection of end‐notes and references. These are intended to provide readers with extra, more in‐depth material – should they require it. Of course, it is possible to read the book without consulting any of this material. However, I think anyone who is interested in speech communication with computers will find the content of this book and its rich source of literature references a very valuable asset.

Related articles