Hackers steal 86 million songs from Spotify

Experts warn that the stolen music will likely be used to train AI systems

An online activist group has stolen tens of millions of songs from Spotify. ©Image Credit: Reet Talreja
An online activist group has stolen tens of millions of songs from Spotify. ©Image Credit: Reet Talreja

An online activist group says it has quietly pulled tens of millions of songs from Spotify.

The group, called Anna’s Archive, claims it scraped 86 million audio files along with 256 million rows of metadata, including artist names and album information. Spotify hosts more than 100 million tracks in total, and the company says the scrape did not include everything on the platform.

Spotify confirmed the activity and said it has already shut down the accounts involved.

“An investigation into unauthorised access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM [digital rights management] to access some of the platform’s audio files,” Spotify said. The company added that it does not believe the music has been released yet.

Anna’s Archive is best known for linking to pirated books and academic papers. In a blog post, the group said the Spotify files were collected to create what it called a “preservation archive” for music. It claimed the files represent 99.6% of all music listened to by Spotify users, and said it plans to distribute them via torrents.

“Of course Spotify doesn’t have all the music in the world, but it’s a great start,” the group wrote.

Spotify said it has more than 700 million users worldwide and stressed that the scrape involved unlawful behavior. The company said it has added new safeguards and is monitoring for similar activity.

The claim immediately caught the attention of people working in artificial intelligence.

Ed Newton-Rex, a composer and copyright campaigner, said the files would almost certainly be used to train AI systems if they are released.

“Training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models,” he said. “This is why governments must insist AI companies reveal the training data they use.”

Anna’s Archive has previously pointed to LibGen, a massive pirated book library, as inspiration. LibGen was mentioned in US court filings tied to Meta’s AI training practices, including internal warnings that the dataset was known to be pirated. Meta ultimately defeated a copyright lawsuit, though parts of the case are still being challenged.

Some in tech circles are already talking about what a dataset like this could enable. Yoav Zimmerman, a co-founder of the AI startup Third Chair, wrote on LinkedIn that it could allow people to “create their own personal free version of Spotify,” or let companies “train on modern music at scale.”

“The only thing stopping them is copyright law and the deterrent of enforcement,” he wrote.

Spotify said it has not seen evidence that the files are circulating publicly. For now, the music remains in limbo — scraped, claimed, and disputed — while the argument over who controls cultural data keeps expanding beyond books, images, and text, and deeper into sound.

Source: The Guardian