Experience Our Biggest Web App Update with 5,000+ New Genres! 🎉 Discover Now

Analyzing Music Using Neural Network: 4 Essential Steps

Analyzing Music Using Neural Network: 4 Essential Steps

As written in the earlier blog article, we at Cyanite focus on the analysis of music by using artificial intelligence (AI) in the form of neural networks. Neural networks in music can be utilized for many tasks like automatically detecting the genre or the mood of a song, but sometimes it can also be tricky to understand how they work exactly.

With this article, we want to shed light on how neural networks can be deployed for analyzing music. Therefore, we’ll be guiding you through the four essential steps you need to know when it comes to neural networks and AI audio analysis. To see a music neural network in action, check out one of our data stories, for example, an Analysis of German Club Sounds with Cyanite. 

The 4 steps for analyzing music with neural networks include:

1. Collecting data

2. Preprocessing audio data

3. Training the neural network

4. Testing and evaluating the network

Step 1: Collecting data

Let’s say that we want to automatically detect the genre of a song. That is, the computer should correctly predict whether a certain song is, for example, a Pop, Rock, or Metal song. This seems like a simple task for a human being, but it can be a tough one for a computer. This is where deep learning in the form of neural networks come in handy.

In general, a neural network is an attempt to mimic how the human brain functions. But before the neural network is able to predict the genre of a song, it first needs to learn what a genre is.

Simply put: what makes a Pop song a Pop song? What is the difference between a Pop song and a Metal song? And so on. To accomplish this, the network needs to “see” loads of examples of Pop, Rock or Metal, etc. songs, which is why we need a lot of correctly labeled data.

Labeled data means that the actual audio file is annotated with additional information like genre, tempo, mood, etc. In our case, we would be interested in the genre label only.

Although there are many open sources for this additional information like Spotify and LastFM, collecting the right data can sometimes be challenging, especially when it comes to labels like the mood of a song. In these cases, it can be a good but also perhaps costly approach to conduct surveys where people are asked “how they feel” when they are listening to a specific song.

Overall, it is crucial to obtain meaningful data since the prediction of our neural network can only be as good as the initial data it learned from (and this is also why data is so valuable these days). To see all the different types metadata used in the music industry, see the article an Overview of Data in the Music Industry.

Moreover, it is also important that the collected data is equally distributed, which means that we want approximately the same amount of, for example, Pop, Rock, and Metal songs in our music dataset.

After collecting a very well labeled and equally distributed dataset, we can proceed with step 2: pre-processing the audio data.

A screenshot from a data collection music database

Step 2: Pre-processing audio data

There are many ways how we can deal with audio data in the scope of music neural networks, but one of the most commonly used approaches is to turn the audio data into “images”, so-called spectrograms. This might sound strange and counterintuitive at first, but it will make sense in a bit.

First of all, a spectrogram is the visual representation of the audio data, more precisely: it shows how the spectrum of frequencies that the audio data contains varies with time. Obtaining the spectrogram of a song is usually the most computationally intensive step, but it will be worth the effort. Spectrograms are essentially data visualizations – you can read about different types of music data visualizations here.

Since great successes were achieved in the fields of computer vision over the last decade using AI and machine learning (face recognition is just one of the many notable examples), it seems natural to take advantage of the accomplishments in computer vision and apply them to our case of AI audio analysis.

That’s why we want to turn our audio data into images. By utilizing computer vision methods, our neural network can “look” at the spectrograms and try to identify patterns there.

Spectrograms from left to right: Christina Aguilera, Fleetwood Mac, Pantera

Step 3: Training the neural network

Now that we have converted the songs in our database into spectrograms, it is time for our neural network to actually learn how to tell different genres apart.

Speaking of learning: the process of learning is also called training. In our example, the neural network in music will be trained to perform the specific task of predicting the genre of a song.

To do so, we need to split our dataset into two subsets: a training dataset and a test dataset. This means that the network will be only trained on the training dataset. This separation is crucial for the evaluation of the network’s performance later on, but more on that in step 4.

So far, we haven’t talked about how our music neural network will actually look like. There are many different forms of neural network architectures available, but for a computer vision task like trying to identify patterns in spectrograms, so-called convolutional neural networks (CNN) are most commonly applied.

Now, we will feed a song in form of a labeled spectrogram into the network, and the network will return a prediction for the genre of this particular song.

At first, our network will be rather bad at predicting the correct genre of a song. For instance, when we feed a Pop song into the network, the network’s prediction might be a metal song. But since we know the correct genre due to the label, we can tell the network how it needs to improve.

We will repeat this process over and over again (this is why we needed so much data in the first place) until the network will perform well on the given task. This process is called supervised learning because there’s a clear goal for the network that it needs to learn.

During the training process, the network will learn which parts of the spectrograms are characteristic of each genre we want to predict.

Example architecture of how a CNN can look like

Step 4: Testing and evaluating the network

In the last step, we need to evaluate how good the network will perform on real-world data. This is why we split our dataset into a training dataset and a test dataset before training the network.

To get a reasonable evaluation, the network needs to perform the genre classification task on data it never has seen before, which in this case will be our test dataset. This is truly an exciting moment because now we get an idea of how good (or maybe bad) our network actually performs.

Regarding our example of genre classification, recent research has shown that the accuracy of a CNN architecture (82%) can surpass human accuracy (70%), which is quite impressive. Depending on the specific task, accuracy can be even higher.

But you need to keep in mind: the more subjective the audio analysis scope is (like genre or mood detection), the lower the accuracy will be.

On the plus side: everything we can differentiate with our human ears in music, a machine might distinguish as well. It’s just a matter of the quality of the initial data.

Conclusion

Artificial intelligence, deep learning, and especially neural network architectures can be a great tool to analyze music in any form. Since there are tens of thousands of new songs released every month and music libraries are growing bigger and bigger, music neural networks can be used for automatically labeling songs in your personal music library and finding similar sounding songs. You can see how the library integration is done in detail in the case study on the BPM Supreme music library and this engaging interview video with MySphera. 

Cyanite is designed for these tasks, and you can try it for free by clicking the link below.

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via mail@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

Case Study: How Mediengruppe RTL / i2i Music decreases searching time for music with Cyanite’s AI

Case Study: How Mediengruppe RTL / i2i Music decreases searching time for music with Cyanite’s AI

About RTL / i2i Music

 

The Mediengruppe RTL GmbH is one of the largest German media companies. Part of which are the TV-Channels RTLRTL IIVOX, and n-tv as well as the music publisher i2i Music. RTL owned i2i Music is an interface and service provider between producers, editors, and marketing experts on the one hand and composers on the other. They publish commissioned compositions for film, television, and radio and have music produced for the advertising sector. The production music offering of i2i Music is called FAR MUSIC and is aimed at filmmakers, editors, and producers of trailers, advertising, and online content. The platform offers a wide variety of musical styles and provides tracks of all genres for download. The FAR MUSIC catalog includes international labels from Germany, Great Britain, and the USA.

 

Catalogue size of FAR MUSIC: 8,200+ songs

Alarm for Cobra 11 is just one of many series supported by music from i2i Music.

Challenge

In the content production process, RTL’s editors and journalists have access to the company’s own music catalog FAR MUSIC, where the rights are pre-cleared for all uses, internal and external. Due to usability issues with the music catalog interface and ineffective search tools, RTL employees can find it easier to use external music sources. This costs the company unnecessary licensing fees and subjects them to copyright infringement issues.

FAR MUSIC being RTL’s own music library

Solution

Cyanite’s automatic tagging and Similarity Search drastically increase the usability of RTL’s music library FAR MUSIC. Cyanite delivers the expected music through intuitive search options using a vast range of tags as well as input tracks from Youtube, Spotify, and their proprietary music databases. The solution is delivered via Cyanite’s own API.

Cyanite’s API docs

Benefits

+ Projected 86% decrease in searching time.

+ Projected 40% increase of usage of pre-licensed copyrights and 26% decrease in licensing fees.

Lutz Fassbender

Lutz Fassbender

Managing Director of i2i Music

Lutz Fassbender is the managing director of i2i Music and responsible for all copyright affairs. He is part of Mediengruppe RTL for more than 15 years.

“We have so much unused potential in our catalogue that we can now exploit much better with the searching algorithms by Cyanite.”

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via mail@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

Cyanite Update 2020 🥁 The new Library and updated Similarity Search

Cyanite Update 2020 🥁 The new Library and updated Similarity Search

Introducing Cyanite’s new features

We have implemented feedback from Cyanite users around the world into our latest version and are more than excited to finally launch it. This version includes AI tagging for your own songs, sonic similarity searches for your own databases and a brand new and refined detail view for better communicating and comparing your music.

Library: Manage & tag your music

Drag and drop your music into our new library view and have it tagged in minutes. Automatically analyze your music on various features like mood, genre, bpm, key, voice, energy, and mood dynamics.

Similarity Search: Find similar songs

Find similar songs in your own library in seconds. Our improved Similarity Search lets you search your own database with any reference track from Spotify, and lets you filter the results by mood, genre, voice, and timbre.

Detail view: Deep dive into a song

Understand your music at a glance. Use the data-driven interface to find the best song parts in seconds and communicate your music better in any pitch from Spotify to synch.

I want to try out Cyanite’s AI platform – how can I get started?

If you want to get a first grip on how Cyanite works, you can also register for our free web app to analyze music and try out similarity searches without any coding needed.

Contact us with any questions about our frontend and API services via mail@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

Case Study: How Filmmusic.io optimizes its search with Cyanite’s AI

Case Study: How Filmmusic.io optimizes its search with Cyanite’s AI

About Filmmusic.io

Filmmusic.io is a marketplace from Hannover exclusively for Creative Commons music. It is primarily aimed at amateur musicians and serves media professionals, photographers, producers of independent films, game developers, educational institutions, aid organizations and other institutions with low or hardly any budgets. Also amateur filmmakers and YouTubers will find a wide selection of free music, without having to forego the monetization of their videos. Filmmusic.io pays 60%-70% to the artists.

Catalogue size: 3,500+ songs

 

Usage: 100,000 plays / day

 

Registered users: 100,000+

YouTube

By loading the video, you agree to YouTube's privacy policy.
Learn more

Load video

Kevin MacLeod not only has 232,000 YouTube followers, but is also the biggest music contributor on Filmmusic.io.

Challenge

Clean tagging and constant improvement of search filters are key to delivering music and finding it easy on the platform. The steady growth of the catalog makes tagging more and more difficult, while new features like bpm or key are requested by the active Filmmusic.io community.

A screenshot of Filmmusic.io – a Creative Commons music heaven for content creators.

Solution: Automatic metadata via API integration

Filmmusic.io implemented Cyanite’s music intelligence to automate song tagging especially in the fields of bpm, moods, and key. Next, a Similarity Search will be implemented in a new major update, allowing Filmmusic.io users to search the platform by reference tracks. The technology is seamlessly integrated via the Cyanite API, which means that every new song on Filmmusic.io is automatically tagged and added to the Similarity Search.

 

The new bpm search filter on Filmmusic.io is based on Cyanite’s algorithm.

Results

+ 15% increase in session time

 

+ 35% increase in filter options

 

+ 70% time-saving in the tagging process

 

Sascha Ende

Sascha Ende

Founder and Developer of Filmmusic.io

Sascha Ende is the creative and technical brain behind Filmmusic.io. He has a long history of producing music before launching his own platform.
 

“The team and technology from Cyanite help me handling the constant growth of Filmmusic.io and improving the user experiences with modern algorithms.”

 

I want to apply AI to my app as well – how can I get started?

Contact us with any questions about our frontend and API services via mail@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get a first grip on how Cyanite works, you can also register for our free web app to analyze music and try out similarity searches without any coding needed.