Let's make connections

FT Labs is preparing to launch a new game on Google Home. Here is a glimpse of what is to come, and some learnings acquired in the process of developing for voice devices.

Updates

VUI, CUI,... OUI?

Voice interactions have been on FT Labs' radar for some time; forecasting devices straight out of science fiction films would soon spread In Real Life. Spreading, they are indeed. Although defining that new audio space we are meant to be developing for isn't easy. What to call it? Conversational User Interface, Oral User Interface? The current preferred and widely used term is VUI; Voice (some would rather say "Vocal") User Interface.

Conversational user journey mapped with Twine

Whatever the terminology, it implies one core factor: users speaking. Developers need to design new journeys and plan conversation flows, within the limitations imposed by the current APIs and tools. We have been thinking about how and when our readers might want to use voice interactions: at home, at work, in the morning, on the go... Voice devices create new modes of accessibility, for example, visually impaired users could switch from screen readers to a voice device for a more fluid interaction (e.g not saying 'Click this button', but 'Read me the article'). But they also bring new constraints: a page cannot be refreshed at the click of a button, or a file downloaded with a right click anymore. When designing we have to be mindful of breaking people's screen habits.

"Alexa, tell me about <CompanyName>"

We have explored the possibilities to develop for two major players: Amazon and Google. At the time of trying out the Echo Dot, development was still limited and a very manual task of filling in data (a big problem when we want to cover a million companies). We were also limited by our own source of data for companies, not having it in a form suitable to be read out. We decided to devote some time to Google Home, which was slightly more flexible with dynamic data.

"OK Google"

FT Labs has a longer term goal where we want to allow rich conversations to take place between our audience and voice devices, this game is a step along the path to that capability.

To develop Make Connections, we used API.ai and its Action on Google integration. The game uses FT Labs' correlations service, based on our previous exploration of the 6 degrees of Angela Merkel.

In the same spirit, players must choose from 3 propositions which person has been mentioned in the same FT article as the one in the question. The game only looks at recent articles (published within the previous week), so the questions and answers will change over time.

"Theresa May was mentioned in an article with which one of the following people?"

  1. Martin Gruenberg
  2. Richard Cousins
  3. Mark Carney

Correct answer on 02/10/2017: Mark Carney

An answer can be made by saying the option number or the person's name. Upon answering, the player is offered more context on the connection: the title of the article where the correlation occured. If the answer was correct, another question comes up and the score increases until a wrong answer is given, or no more connections are found. Every person chosen, either as the focus for a question or as the correct answer to a question, is removed from the pool of candidates for constructing subsequent questions. The first questions are based on people that appear in articles the most, therefore the game gets progressively harder.

The game has also been developed to work on Google Assistant on mobile devices. On this surface, users will be able to click on a name to give their answer, and access a link to the relevant FT article.

API.ai is a powerful interface where one can easily create intents, upload entities (possible user inputs) and have a basic Action ready within minutes. From a developer's point of view, though, it is a strange mix of GUI and externally hosted service. It would be preferrable to be able to generate everything in the code in that case, and have an endpoint to tap into the Machine Learning side of the Assistant.

Limitations

From a technological standpoint, it is still early days for those devices. We have already seen improvements in the capability since the beginning of our explorations. Time will tell if users feel comfortable using voice commands, or if they become scarily accurate; and how these will impact human interactions. At the moment though, it feels like our ambitions exceed the capabilities, or at least meet restrictions. We would have liked the possibility to interrupt the device when it's talking, as you would interject in a regular human conversation. Being able to interject, resume a conversation or digress is an integral part of the illusion.

Google Home sometimes struggles to understand simple words, as well as more complex names, with no feedback on what it might have heard, creating frustrations (which, we hope, will disappear as the machine learns), hence we added numbers alongside the options, so the user can say the name or the number when specifying their answer. Some names simply crashed the app with no apparent reason (looking at you, Michel Barnier), so had to be removed from the game entirely.

For developers, the current setup can cause a few frustrations. For example, it is impossible to test with a local url, as it often hits the timeout limit. It also feels unnatural in the development workflow to have to mix inputing data through the GUI and in the code base. A few things have evolved throughout the development: the interface for the simulator changed, the option to choose an invocation has been removed, the way the images are displayed in the rich responses on screen also changed without warning.

The future

We are keeping an eye out for new development capabilities. Currently, users are under no illusion that they might be conversing with a human, and learn to wait for the machine to finish talking. As the speech and interactions possibilities grow, however, they will become less forgiving when errors occur. We also keep in mind that we will have to deal with the uncanny valley, as machines learn to predict interactions.

Beyond games, there are other aspects we can explore, such as authentication through voice recognition; where voice devices might become a biometric portal to unlock services. New devices to be used concurrently with voice interfaces are also starting to appear, extending our field of exploration.

In the meantime, get ready for Make Connections, coming soon to your Google Home device!