Smart speakers: Why your voice is a major battle in music
Share on Linkedin
Voice-activated speakers and music

Smart speakers, those voice-activated devices that allow you to check the weather, play podcasts or order food delivery, are becoming more and more common in our homes. In the space of less than five years, brands like Amazon Echo, Google Home, and Sonos One have become household names, while voice assistants like Alexa and Siri are already part of family life.

Since their launch, voice-activated devices have offered the potential to interact with music in ways that satisfy even the most ardent fan, no matter what their mood or taste. When it comes to discovering and listening to music, smart speakers are improving by the month. It’s becoming easier to find the songs we want to hear on smart speakers, and less risky to ask for a playlist suggestion. More and more, requests like “Play me happy music” or “Play the latest from Ed Sheeran” are giving us what we’re looking for.

Using voice to sift through and access music may be a relatively new idea, but it’s brought with it an immense technological challenge that streaming companies, record labels and machine-learning start-ups are all reckoning with. Proper use of this new interface and underlying metadata can mean the difference between sinking and staying afloat on the smart speaker medium. The time has come to lay new groundwork for the future of music consumption.

Playing music is the most popular request for smart speakers (Credit: Nielsen Research)

Playing music is the most popular request for smart speakers (Credit: Nielsen Research)

This year, the entire smart speaker market is predicted to reach a value of $7bn (£5.37bn) according to Deloitte Global, making it the fastest growing connected device category in the world. In 2017, only 7% of Americans owned smart speakers, a figure that Nielsen says has risen to 24%. By 2022, Juniper Research expects that 55% of US households will own a smart speaker, which equates to around 175 million devices in 70 million homes.

In spite of more transformative skills and features that can control your thermostat, order a ride, or deliver a pizza, the Deloitte research showed that playing music is still the most-used application in most markets, followed by weather forecast searches. Rumours are growing that Amazon may capitalise on this behaviour by launching a free, ad-supported streaming music service in the United States, which it would market through its voice-activated Echo speakers.

As these smart devices have jumped straight to the mainstream, becoming a hit in a few short years, it has highlighted an important challenge for the music industry. And it’s one that wades into the murky, waist-deep waters of algorithms. For record labels and artists, success on smart speakers comes down to how well they can optimise their songs’ metadata.

Happy talk

Before the days of Alexa, Siri and the other voice-activated software, record labels and streaming services were working with a much simpler algorithmic toolbox. Artist names, track titles, genre tags, even beats-per-minute and release dates are among these most basic pieces of music metadata. “These tags have been applied over time in order to make searching easier,” says Lydia Gregory, cofounder of independent machine-learning company FeedForward. “Typically, they’re put in a taxonomy, a hierarchy or a structure.”

Now, in a streaming and voice-activated world, the descriptors need to factor the way in which music is requested and the platforms they are distributed on. The more accurate and specific the tagging, the more likely the song will be circulated in the appropriate playlists and served up to the right listeners. “There are 30,000 songs uploaded to the internet daily,” says Hazel Savage, cofounder of AI-based start-up Musiio. “It is humanly impossible to listen to everything.” Metadata saves us from slogging through hours of music before finding something we enjoy.

Voice-activated search terms often include mood, for instance "Play me happy music" (Credit: Getty Images)

Voice-activated search terms often include mood, for instance "Play me happy music" (Credit: Getty Images)

For record labels, cracking the code is essential, as properly optimised metadata can win or lose a spot for their artists’ material in relevant playlists and search queues.

Two of the most important wells to tap into are preconstructed playlists and genres on streaming services like Spotify. “Happy” is a recognized Spotify genre, with many relevant playlists under its umbrella. When a user says “Play happy music on Spotify”, the device will most likely search for the most popular playlist within the “happy’ genre: Happy Hits!. If record labels manage to optimise their tags to closely align with songs already on the playlist – or better yet, make it onto the playlist themselves – there is a higher chance of their own song being picked up by the algorithm in future “happy music” searches.

Even then, tags may not be enough to surface songs. Listeners using smart speakers have also begun searching for songs by singing a snippet of lyrics, according to Paul Firth, director of Amazon Music UK. As Amazon builds out the capability for Alexa to pick up on lyric-based requests, labels must consider lyrics to be another piece of metadata. If they don’t, they might miss out on tip-of-your-tongue requests using lyrics from songs that aren't featured in the title or chorus, like “Play the song that goes ‘Tell me what you want, what you really, really want’”.

Slave to the algorithm

Without virtually any guidance from the big tech companies on how to do so, labels say they are sailing with a broken compass. “We’ve not been given any of that information,” says Kara Mukerjee, former head of digital at RCA Records. “It’s been completely an arms race.”

Finding ways to get ahead of the pack is the name of the game for those with the resources to invest in new ideas.

Without virtually any guidance from the big tech companies, labels say they are sailing with a broken compass

“They’re all hiring people to kind of understand smart speakers and voice, and issues around it. Sony Music hired a team to reverse engineer algorithms around Alexa,” says Stuart Dredge, author of a 2017 smart speaker report for Music Ally. “Everyone’s leaning in and everyone’s trying to figure it out.”

Some record labels and streaming services have even chosen to outsource the process to independent machine-learning companies. Using highly sophisticated algorithms, companies like Musiio and FeedForward can take some of the guess work out of the hunt for rich metadata. Their systems automatically predict which tags will be important, giving labels the chance to apply the most relevant and useful data to their material.

But combine these data networks with voice, where requests can be vague and more conversational than a written search, and the true challenge starts to emerge. Phrases as simple as “Alexa, play music” – a startlingly common request according to Amazon Music UK’s Firth – leave algorithms with wild interpretive freedom. Difficult-to-pronounce, or similar sounding song or artist names risk not being found, and so labels need to think about how distinct, pronounceable and memorable their artist and track names are.

Metadata has to include tip-of-your-tongue requests like “Play me the song that goes ‘Tell me what you want, what you really, really want’ (Credit: Getty Images)

Metadata has to include tip-of-your-tongue requests like “Play me the song that goes ‘Tell me what you want, what you really, really want’ (Credit: Getty Images)

Out of the big three smart speaker producers, Google seems to be the leader on everything voice and speech, says FeedForward’s Gregory. It pays to be the smartest kid on the block. Harnessing the full power of metadata means more accurately curated content for smart speaker companies and, ultimately, a better experience for the customer.

For labels and A&R executives, it can also mean discovering new talent ahead of competitors. While not yet a reality for most, the thinking is this: labels that understand how to navigate popular smart speaker requests and metadata tags can begin to measure the success of specific combinations. If “relaxing” music featuring “acoustic guitars” and lyrics about “the beach” seems to be performing well, they can in theory search for up-and-coming artists whose material matches the tags.

Two tribes

Advanced algorithms, while certainly a source of power, nonetheless fail to solve the tension simmering between smart speaker companies and record labels. In a world in which major tech companies hold the keys to music discovery, when does the music industry get a seat at the table? Should the artist or Alexa be the one to decide which creative categories a song should be placed into?

“Both sides would argue it’s something they can do really well,” Dredge says. “In an ideal world they’d come together to come up with rich, brilliant metadata around songs.”

Should the artist or Alexa be the one to decide which creative categories a song should be placed into?

Critics of smart speakers have also focused on the listeners themselves. While excited by the prospect of a new streaming audience, they say it might further widen the gap between large and small musicians. A list of Alexa’s most-requested albums last year showed that people tend to ask for what they already know, rather than new artists. The nature of the Alexa user, Firth says, tends to lean a bit more mainstream in their music taste. Rather than dig deep to find new material, they have shown to be more interested in charting music and what’s already very popular, he says.

If, by and large, what people search for are Top 40 artists and made-for-radio singles, indie artists might be left out of the equation. This is cause for concern among the smaller labels, Mukerjee says, as most do not have the resources or bandwidth to invest in metadata research. All they have time to focus on are their artists’ releases.

Ed Sheeran's ÷ was the most-requested album on Amazon's voice-activated speakers last year (Credit: Getty Images)

Ed Sheeran's ÷ was the most-requested album on Amazon's voice-activated speakers last year (Credit: Getty Images)

Without much communication with labels, it’s Amazon’s and other tech companies’philosophies that will finally dictate how the relationship develops. Firth believes the role of Alexa in the music industry, however, shouldn’t be one of too much control or influence; wanting to maintain the integrity of music is where Amazon and labels find common ground.

“We should allow great music to find its fanbase, and we should allow fans of music to find the music they love,” he says. “And that’s our role. We should be making that as easy as possible.”

Nonetheless, labels and artists concerned with the future must keep up with new interfaces like smart speakers if they want to stay ahead. It isn’t too far of a leap to believe the way to do so is to think about tracks with an eye toward metadata first.

If data becomes a part of the songwriting process — say, an artist creating “happy” songs with “acoustic guitars” about “the beach” in response to a hot metadata trend — their airtime might see a jump. But this is where Firth draws the line.

“I don’t think powerful technology should ever get in the way of the creative process,” he says. “It doesn’t matter how smart our voice technology is, or how beautiful our apps are, or how good our marketing is. We need music.”

If you liked this story, sign up for the weekly bbc.com features newsletter, called “If You Only Read 6 Things This Week”. A handpicked selection of stories from BBC Music, Culture, Capital, Future and Travel, delivered to your inbox every Friday.