Rohit Prasad has 17 smart speakers in his home powered by Amazon’s smart assistant.
“I test my own technology – with all of them being called Alexa, I see which one is waking up and whether it is the right device,” says the chief scientist of the AI division responsible for the tech.
That’s a lot of Alexa. But, it seems, still not enough.
In a one-on-one interview with the BBC, Mr Prasad discussed plans for Alexa to both become smarter and to follow users wherever they go. This is known in the trade as ubiquitous ambient computing, and Amazon hopes to corner the market.
In the US, it already sells an Echo system that plays Alexa through a car’s speakers. And Mr Prasad says he also wants the virtual assistant to accompany users as they walk about too.
To achieve this, he explains, the tech needs to get better at contextual reasoning.
“If you are in a store and you say, ‘Where are the tomatoes?’ it will need to have the context,” he says.
“You are actually looking for the aisle so it may need a store map.
“Or if you are in a hotel room and you ask for the swimming pool hours, it should give you the hours for the hotel and not the community pool.”
To pursue this goal, the firm recently launched its own Alexa-enabled earbuds and is testing other wearables including glasses and even a ring with select customers.
The more users Alexa attracts and the more time they spend chatting to it, the more data Mr Prasad’s team and the training algorithms involved can draw on to make improvements.
There are, he says “hundreds of millions” of devices worldwide already taking billions of requests each week from customers. He adds that Alexa now offers 100,000-plus skills – its version of apps – and can communicate with other smart products from more than 9,500 brands.
And it seems Amazon is getting the upper hand.
According to market research firm Canalys, Amazon’s smart speakers outsold Google’s by nearly three-to-one worldwide in the last quarter. Moreover, it said Amazon’s sales were still accelerating while Google’s had slumped.
Google, of course, benefits from the fact that its assistant is baked into Android, meaning handset owners are already using it across their daily lives.
But in the contest to offer the smartest smart assistant, Mr Prasad says Alexa is evolving at a rapid pace.
“All of these components have become smarter by four times in terms of what our error rate was in 2014 to now,” he says.
The figure, he says, is based on accurate handling of four common tasks:
- waking up to the chosen trigger word – Alexa, Amazon, Echo or Computer
- speech recognition, which converts spoken commands into text before it is processed
- language understanding, which involves making sense of context and the various ways words spoken by users can be arranged in sentences
- text-to-speech synthesis, which is used to provide Alexa’s responses in its distinctive female accent
Amazon is about to let users buy alternative celebrity voices for Alexa, starting with that of the actor Samuel L Jackson.
But will it ever give users the choice of having a default voice that isn’t female? In other words, letting owners chat to Alex rather than Alexa?
Gender-neutral AI is, Mr Prasad says, a “very hot topic”.
“This is a debate we have every few months. It is not just about gender, but about Alexa’s voice and choice of words,” he adds.
“We wanted a personality that was very relatable to our customers.”
“If we felt that there has to be another gender for Alexa, well we’d also have to think what would the wake word be, because its personality and gender all sort of go together and you have to think about overall personality, not just gender.”
Despite all the work that’s been done on Alexa, tales of “fails” persist.
Amazon had to act to curb a spontaneous creepy laugh reported by users last year, while reports persist of it struggling with some accents.
According to Martin Garner, an analyst at research firm CCS Insight, Alexa’s capabilities may have “moved on a lot over the last two years… but smart assistant voice services have the effect of raising people’s expectations very rapidly too”.
“All providers are racing as fast as possible to extend the range of questions they can answer,” he added.
Users also need to feel they can trust Amazon if they are to surround themselves with its microphones. And that confidence was recently challenged after revelations that the firm was using third-party contractors working from home to listen back and label recordings.
Amazon has since made it easier to opt out of the process, but Mr Prasad says it has no plans to drop human-based checks.
“Supervised learning is a key aspect, where humans label a very small fraction – actually less than 1% – of the data that goes though Alexa,” he said.
“We are making it much easier to delete but also by bringing the convenience of voice. You can say, ‘Alexa tell me what you heard,’ and if you’re uncomfortable with what is was, you can delete it.”
But some have questioned this “easy option”, pointing out that in order to exclude themselves users have to locate a setting buried several menus deep on the Alexa app and Amazon’s website.
Even if many consumers make an active choice to have Alexa in their lives, others may be unhappy at having their voices picked up.
Last month, Google’s devices chief Rick Osterloh told the BBC that he informs guests to his home that smart speakers are in use before they enter.
But Mr Prasad says he does not believe there’s a need for such etiquette to become widespread.
“It’s very important to realise that the devices don’t listen for anything but Alexa, the wake word,” he says.
“We have to be clear about that in terms of what detection means.
“You can mute the button on the device. And Alexa is quite transparent when it is streaming to the cloud because the blue lights come on.”
Even so, as Alexa is added to more devices and becomes more omnipresent there’s a question about how obvious this will always be.