Voice Input
Input of commands or text has been an integral part of control concepts since long before the introduction of Siri, Alexa and their sisters. The early days of voice input were characterized not only by limited capabilities of voice recognizers but also by the idea that everything in a car should be voice-controlled. However, a number of fundamental studies from the early days of voice control show that it just does not make sense to control everything through voice commands. With the ever-increasing quality of voice recognition and different user behavior of the next generation, it is of course possible that voice control applications will be expanded. Nevertheless, several findings from early studies still apply. Voice control is particularly suitable for one-dimensional, unambiguous information that does not require contextual knowledge of a system status. Examples include phone numbers or names of contacts to be called or an address to be entered into the navigation system. Voice control is not suitable at all for actions which can be performed with just the touch of a finger or which are time-sensitive, such as using the turn signal or braking.
Quick actions that require simple input are less suitable, if at all, in particular when they are based on contextual knowledge or the system’s learning ability. One example of a command that requires contextual knowledge is “Open the window.” How far should the window open? If the driver needs to reach for a ticket at the entry gate of a parking garage, “open” would mean that the window should open all the way. If asked to “Open the window,” a passenger in a car travelling at high speed in the rain would probably open the window just a crack because their contextual knowledge tells them that opening the window all the way would be inappropriate in that situation.
Voice control will likely become ever more important to keep drivers’ eyes on the road and hands on the steering wheel. The extent to which this will happen depends on the quality of voice recognizers and on future user needs.
Voice Output
In its early days, voice use in vehicles was not focused on voice control but on voice output. There were two reasons for this, namely to declutter the driver’s visual field and to prevent their visual attention from being directed away from traffic. In the 1980s, voice recognizers were not yet advanced enough to be used in cars, which still had higher interior noise levels at the time.
Audi first trialed voice output in its Audi quattro model. As it was a completely new system, gestalt principles were not sufficiently considered in terms of human factors. One example of voice output was “Attention! Oil pressure!” to indicate risk of low oil pressure. A (likely) sensor error during a test drive with a journalist caused the system to repeat “Attention! Oil pressure! Attention! Oil pressure!...” over and over across a long distance. Systematic experiments to study voice output provided fundamental insights into ways of using and designing such systems. In one example, the field test manager pulled the hand brake slightly without the test participant noticing and the voice output “Please release the hand brake” was activated. The level of acceptance for (objectively correct and important) information was very low. The reason is clear in retrospect. Drivers felt embarrassed in front of their passengers. The voice output was tweaked to reflect this and the problem of low acceptance was solved. The output was changed to: “Please check the hand brake.” Polls regularly show that at least 80% of drivers are convinced that their driving skills are above average. Pointing out mistakes in the presence of others has no place in this system.
The most common applications for voice output are navigation systems. Experience has shown, however, that acoustic output alone is not enough and that additional visual display support is helpful in complex situations (such as traffic circles, complex intersections).
The Meaning of “Please” and Repetitions of Voice Output
A navigation system beginning every output with the word “please” has little to do with politeness. The word “attention” is inappropriate for that level of urgency and should therefore be avoided. It must be reserved for particularly urgent alerts, for example to warn of wrong-way drivers. Beginning with “please” (e.g. “please turn right”) gets the driver’s attention and focuses their hearing on the voice output, which makes it easier to understand for the user.
Voice output must not be repeated over and over again to avoid annoying the driver (as in the example above). On the other hand, if a driver did not hear a voice output in full or at all, they must be able to hear it again (e.g. for navigation). A first prototype of a system from the 1980s, which was designed at our institute, made it possible to replay certain categories of voice output at the push of a button.
Literature