Experience Our Biggest Web App Update with 5,000+ New Genres! 🎉 Discover Now

How Do AI Music Recommendation Systems Work

How Do AI Music Recommendation Systems Work

Music recommendation systems can significantly improve the listening and search experience of a music library or music application. Algorithmic recommender systems have become inevitable due to increased access to digital content. In the music industry, there is just too much music for the user to navigate tens of millions of songs effectively. Since the need for satisfactory music recommendations is so high, the MRS (music recommendation systems) field is developing at a lightning speed. On top of that, the popularity of streaming services such as Spotify and Pandora shows that people like to be guided in their music choice and discover new tracks with the help of algorithms. 

Hitting the musical spot for their users is the goal of every music service. However, there are many ways and philosophies to music recommendation with very different implications.

In this article, we unravel all the specifics of music recommendation systems. We look into the different approaches to music recommendation and explain how they work. We also discuss approaches to Music Information Retrieval which is a field concerned with automatically extracting data from music.

If you want to upgrade a music library or build a music application, keep reading to find out which recommendation system works best for your needs.

Approaches to Music Recommendation

We focus on three approaches to music recommender systems: Collaborative Filtering, Content-based Filtering, and Contextual Approach.

1. Collaborative Filtering

The collaborative filtering approach predicts what users might like based on their similarity to other users. To determine similar users, the algorithm collects user historical activity such as user rating of a music track, likes, or how long the user was listening to the track. 

This approach reproduces the friends’ recommendations approach in the days when music was passed around in the tight circle of friends with similar interests. Because only user information is relevant, collaborative filtering doesn’t take into account any of the information about the music or sound itself. Instead, it analyzes user preferences and behavior and by matching one user to another predicts the likelihood of a user liking a song. For example, if User A and User B liked the same song in the past, it is likely that their preferences match. In the future, User A might get song recommendations that User B is listening to based on the similarity that was established earlier. 

The most prominent problem of the collaborative filtering approach is the cold start. When the system doesn’t have enough information at the beginning, it won’t provide accurate recommendations. This applies to new users, whose listening behavior is not tracked yet, or new songs and artists, where the system needs to wait before users interact with them. 

In collaborative filtering, several approaches are used such as user-based and item-based filtering, and explicit and implicit ratings. 

 

Alina Grubnyak @ Unsplash

Collaborative Filtering Approaches

It is common to divide collaborative filtering into two types – user-based and item-based filtering: 

  • User-based filtering establishes the similarity between users. User A is similar to User B so they might like the same music. 
  • Item-based filtering establishes the similarity between items based on how users interacted with the items. Item A can be considered similar to Item B because they were both rated 5 out of 10 by users. 

Another differentiation that is used in collaborative filtering is explicit vs implicit ratings: 

  • Explicit rating is when users provide obvious feedback for items such as likes or shares. However, not all items get a rating, and sometimes users will interact with the item without rating it. In that case, the implicit rating can be used. 
  • Implicit ratings are predicted based on user activity. When the user didn’t rate the item but listened to it 20 times, it is assumed that the user likes the song.

2. Content-based Filtering

Content-based filtering uses metadata attached to the items such as descriptions or keywords (tags) as the basis of the recommendation. Metadata characterizes and describes the item. Now, when the user likes an item the system determines that this user is likely to like other items with similar metadata to the one they already liked. 

Three common ways to assign metadata to content items are through a qualitative, quantitative, and automated approach.

Firstly, the qualitative approach is through library editors that professionally characterize the content.

Secondly, in the quantitative or crowdsourced approach, a community of people assigns metadata to content manually. The more people participate, the more accurate and less subjectively biased the metadata gets. 

And thirdly, the automated way where algorithmic systems automatically characterize the content. 

Metadata

Musical metadata is adjacent information to the audio file. It can be objectively factual or descriptive (based on subjective perception). In the music industry, the latter is also often referred to as creative metadata. 

For example, artist, album, year of publication are factual metadata. Descriptive data describes the actual content of a musical piece e.g. the mood, energy, and genre. Understanding the types of metadata and organizing the taxonomy of the library in a consistent way is very important as the content-based recommender uses this metadata to pick the music. If the metadata is wrong the recommender might pull out a wrong track. You can read more about how to properly structure a music catalog in our free taxonomy paper. For professional musicians sending music, this guide on editing music metadata can be helpful. 

 

David Pupaza @ Unsplash

Content-based recommender systems can use both factual and descriptive metadata or focus on one type of data only. Much attention is put into content-based recommendation systems as they allow for objective evaluation of music and can increase access to “long-tail” music. They can enhance the search experience and inspire many new ways of discovering and interacting with music. 

The field concerned with extracting descriptive metadata from music is called Music Information Retrieval (MIR). More on that later in the article.

3. Context-aware Recommendation Approach

Context has become popular in recommender systems recently and it is a relatively new and still developing field. The context includes the user’s situation, activity, and circumstances that content-based recommendation and collaborative filtering systems don’t take into account, but might influence music choice. Recent research by the Technical University of Berlin shows that 86% of music choices are influenced by the listener’s context. 

This could be environment-related context and user-related context. 

  • Environment-related Context

In the past, recommender systems were developed that established a link between the user’s geographical location and music. For example, when visiting Venice you could listen to a Vivaldi concert. When walking the streets of New York, you could blast Billy Joel’s New York State Of Mind. Emotion indicating tags and knowledge about musicians were used to recommend music fitting to a geographical place. 

  • User-related Context

You could be walking or running depending on the time of the day and your own plans, you could also be sad or happy depending on what happened in your personal life – these circumstances represent the user-related context. Being alone vs being in a social company may also significantly influence music choice. For example, when working out you might want to listen to more energetic music than your usual listening habits and musical preferences would suggest. Good music recommendation systems would take this into account.

Music Information Retrieval

The research concerned with the automated extraction of creative metadata from the audio is called Music Information Retrieval (MIR). MIR is an interdisciplinary research field combining digital signal processing, machine learning, and artificial intelligence with musicology. In the area of music analysis, its focus is widely spread, evolving from BPM or key detection from audio, analysis of higher-level information like automatic genre or mood classification to state of the art approaches like automatic full-text song captioning. It also covers research on the similarity of musical audio pieces and, in line with that, search algorithms for music and automatic music generation.

At Cyanite, we are using a combination of Music Information Retrieval methods. For example, various artificial neural network architectures are used to predict the genre, mood, and other features of the song based on the existing dataset and subsequent network training. More on that in this article on how to analyze music with neural networks. Our Similarity Search takes a reference track and gives you a list of songs that match by pulling audio, metadata, and other relevant information from audio files. The overall character of the library can be determined and managed using Similarity Search. 

Our Cyanite AI found the best applications in music libraries targeted at music professionals, DJs, artists, and brands. They represent the business segment of the industry. 

Custom interval

Cyanite Similarity Search

Conclusion

The choice of a music recommendation approach is highly dependent on your personal needs and the data you have available. An overarching trend is a hybrid approach that combines features of collaborative filtering, content-based filtering, and context-aware recommendations. However, all fields are in a state of constant development and innovations make each approach unique. What works for one music library might not be applicable to another.

The common challenges of the field are access to large enough data sets and understanding how different musical factors influence people’s perception of music. More on that soon on the Cyanite blog! 

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via sales@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

If you are interested in going deeper into Music Recommender Systems we highly recommend the following reads: 

Current challenges and visions in music recommender systems research

Music Recommender Systems

Deep Learning in Music Recommendation Systems

How to Create Custom Audiences for Pre-Release Music Campaigns in Facebook, Instagram, and Google

How to Create Custom Audiences for Pre-Release Music Campaigns in Facebook, Instagram, and Google

As a music label, you know how hard it is to promote a new artist and cut through the noise. Most music advertising agencies and labels choose to do Facebook and Instagram marketing as an easy way to start. There are some important things you should know about setting up such campaigns but there are already great guides and tips on that here and here. What we want to cover in this article are the steps you can take to identify the audience for a completely new track.

An interest-based audience is a tool that allows you to select customers based on their interests. You can target people who are fans of other artists or people who browse websites similar to your artist’s webpage. Interest-based audiences feature significantly narrows your audience to the most relevant group, thus increasing your chances of reaching the right people. 

If you are a big music label you can also use custom audiences on Facebook. You might have thought that custom audiences are only applicable to music that already gained its following, but there is a workaround. All you need to do is find similar artists who fit the roster of your label and then launch a campaign based on the same audience but for the new artist. Thus you make the most out of your advertising efforts. 

With Spotify, it’s easy to find similar artists and their respective fan communities, but when the new song is not yet released or you are breaking a new artist, Spotify algorithms won’t work. So how do you identify similar artists for a track that is not released yet? You can use Cyanite’s Similarity Search to solve that problem. The Cyanite Similarity Search compares the sound of the song you want to promote with hundreds of thousands of other tracks and finds the ones that sound similar.

At this point, kudos to Maximilian Pooschke of Virgin Music Label & Artist Service  for bringing this use case of Cyanite’s similarity search to our team’s attention.

Cyanite’s Similarity Search is an intuitive tool when I’m creating custom audiences for new artists on social media. Especially for music that doesn’t easily fit into a box, the similarity search is a great entry point for campaign planning.”

Maximilian Pooschke

Virgin Music Label & Artist Service

Now, here is a step-by-step guide on how to create custom audiences using similar artists identified by Cyanite.

Step 1. Upload music to the library view and let Cyanite analyze it

Drag and drop your music to the library view. Before you do that, you need to register for free here https://app.cyanite.ai/register

Library view
Picture 1. Cyanite library view

The library view will show some data about the song such as mood, genre, energy level, emotional profile, and more. You can explore this data or move to the next step.

Step 2. Find similar songs using Similarity Search

Click on Similarity next to the analyzed song in the library to start finding similar songs from our showcase database of around 600k popular songs. Our similarity algorithms work to offer you the most relevant and precise results and focus purely on the actual sound and feel of a song. Additionally, you will see all the same analysis data available for all the songs including Moods, Energy Level, and Emotional Profile. 

Step 3. Play around with the different filters for more granular insights

Often the magic occurs when you apply different filters. Use the custom interval, play around with tempo, genre, and key, and dive deeper into different results. Then pick the artists and tracks you find most relevant from the Cyanite suggestions.

Step 4. Enrich your findings with additional data from sources like Chartmetric

To get more details on discovered similar songs and artists, you should use other data sources to further narrow down your selection and be as precise as possible. You can check out festivals, radio stations, and/or magazines to enrich your search and select more source audiences for your audience.

Custom interval
Picture 2. Cyanite Similarity Search based on custom interval

Step 5. Go and select your audiences

Off to Facebook or Instagram to create your audiences with the popular artists you have found and selected with the Similarity Search. Use interest-based targeting and enter a similar artist’s name as a keyword. Play around with keywords for maximum results. You can use artists’ names, song names, genres, or other keywords. A good comprehensive resource on how to use and manage Facebook, Instagram, and Google ads is AdEspresso. 

Facebook Ad Settings
Picture 3. Facebook Ads Detailed Targeting

This is just one of many ways to use Cyanite for your purposes. You can check out this article to find out more on how to use Cyanite for playlist pitching or this one to find out how to use Cyanite to find music for your videos.

Analyzing Music Using Neural Network: 4 Essential Steps

Analyzing Music Using Neural Network: 4 Essential Steps

As written in the earlier blog article, we at Cyanite focus on the analysis of music by using artificial intelligence (AI) in the form of neural networks. Neural networks in music can be utilized for many tasks like automatically detecting the genre or the mood of a song, but sometimes it can also be tricky to understand how they work exactly.

With this article, we want to shed light on how neural networks can be deployed for analyzing music. Therefore, we’ll be guiding you through the four essential steps you need to know when it comes to neural networks and AI audio analysis. To see a music neural network in action, check out one of our data stories, for example, an Analysis of German Club Sounds with Cyanite. 

The 4 steps for analyzing music with neural networks include:

1. Collecting data

2. Preprocessing audio data

3. Training the neural network

4. Testing and evaluating the network

Step 1: Collecting data

Let’s say that we want to automatically detect the genre of a song. That is, the computer should correctly predict whether a certain song is, for example, a Pop, Rock, or Metal song. This seems like a simple task for a human being, but it can be a tough one for a computer. This is where deep learning in the form of neural networks come in handy.

In general, a neural network is an attempt to mimic how the human brain functions. But before the neural network is able to predict the genre of a song, it first needs to learn what a genre is.

Simply put: what makes a Pop song a Pop song? What is the difference between a Pop song and a Metal song? And so on. To accomplish this, the network needs to “see” loads of examples of Pop, Rock or Metal, etc. songs, which is why we need a lot of correctly labeled data.

Labeled data means that the actual audio file is annotated with additional information like genre, tempo, mood, etc. In our case, we would be interested in the genre label only.

Although there are many open sources for this additional information like Spotify and LastFM, collecting the right data can sometimes be challenging, especially when it comes to labels like the mood of a song. In these cases, it can be a good but also perhaps costly approach to conduct surveys where people are asked “how they feel” when they are listening to a specific song.

Overall, it is crucial to obtain meaningful data since the prediction of our neural network can only be as good as the initial data it learned from (and this is also why data is so valuable these days). To see all the different types metadata used in the music industry, see the article an Overview of Data in the Music Industry.

Moreover, it is also important that the collected data is equally distributed, which means that we want approximately the same amount of, for example, Pop, Rock, and Metal songs in our music dataset.

After collecting a very well labeled and equally distributed dataset, we can proceed with step 2: pre-processing the audio data.

A screenshot from a data collection music database

Step 2: Pre-processing audio data

There are many ways how we can deal with audio data in the scope of music neural networks, but one of the most commonly used approaches is to turn the audio data into “images”, so-called spectrograms. This might sound strange and counterintuitive at first, but it will make sense in a bit.

First of all, a spectrogram is the visual representation of the audio data, more precisely: it shows how the spectrum of frequencies that the audio data contains varies with time. Obtaining the spectrogram of a song is usually the most computationally intensive step, but it will be worth the effort. Spectrograms are essentially data visualizations – you can read about different types of music data visualizations here.

Since great successes were achieved in the fields of computer vision over the last decade using AI and machine learning (face recognition is just one of the many notable examples), it seems natural to take advantage of the accomplishments in computer vision and apply them to our case of AI audio analysis.

That’s why we want to turn our audio data into images. By utilizing computer vision methods, our neural network can “look” at the spectrograms and try to identify patterns there.

Spectrograms from left to right: Christina Aguilera, Fleetwood Mac, Pantera

Step 3: Training the neural network

Now that we have converted the songs in our database into spectrograms, it is time for our neural network to actually learn how to tell different genres apart.

Speaking of learning: the process of learning is also called training. In our example, the neural network in music will be trained to perform the specific task of predicting the genre of a song.

To do so, we need to split our dataset into two subsets: a training dataset and a test dataset. This means that the network will be only trained on the training dataset. This separation is crucial for the evaluation of the network’s performance later on, but more on that in step 4.

So far, we haven’t talked about how our music neural network will actually look like. There are many different forms of neural network architectures available, but for a computer vision task like trying to identify patterns in spectrograms, so-called convolutional neural networks (CNN) are most commonly applied.

Now, we will feed a song in form of a labeled spectrogram into the network, and the network will return a prediction for the genre of this particular song.

At first, our network will be rather bad at predicting the correct genre of a song. For instance, when we feed a Pop song into the network, the network’s prediction might be a metal song. But since we know the correct genre due to the label, we can tell the network how it needs to improve.

We will repeat this process over and over again (this is why we needed so much data in the first place) until the network will perform well on the given task. This process is called supervised learning because there’s a clear goal for the network that it needs to learn.

During the training process, the network will learn which parts of the spectrograms are characteristic of each genre we want to predict.

Example architecture of how a CNN can look like

Step 4: Testing and evaluating the network

In the last step, we need to evaluate how good the network will perform on real-world data. This is why we split our dataset into a training dataset and a test dataset before training the network.

To get a reasonable evaluation, the network needs to perform the genre classification task on data it never has seen before, which in this case will be our test dataset. This is truly an exciting moment because now we get an idea of how good (or maybe bad) our network actually performs.

Regarding our example of genre classification, recent research has shown that the accuracy of a CNN architecture (82%) can surpass human accuracy (70%), which is quite impressive. Depending on the specific task, accuracy can be even higher.

But you need to keep in mind: the more subjective the audio analysis scope is (like genre or mood detection), the lower the accuracy will be.

On the plus side: everything we can differentiate with our human ears in music, a machine might distinguish as well. It’s just a matter of the quality of the initial data.

Conclusion

Artificial intelligence, deep learning, and especially neural network architectures can be a great tool to analyze music in any form. Since there are tens of thousands of new songs released every month and music libraries are growing bigger and bigger, music neural networks can be used for automatically labeling songs in your personal music library and finding similar sounding songs. You can see how the library integration is done in detail in the case study on the BPM Supreme music library and this engaging interview video with MySphera. 

Cyanite is designed for these tasks, and you can try it for free by clicking the link below.

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via mail@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

Free Mood Taxonomy: Translate Emotions Into Words And Vice Versa

Free Mood Taxonomy: Translate Emotions Into Words And Vice Versa

Describing a certain feeling or emotion is one of the hardest things in the world, let alone trying to describe a feeling that you get from listening to a song. You have probably faced this challenge before when working in the music industry, whether you’re trying to decipher a synch briefing or when writing a new promo text.

That’s why we think it’s very important to work with a clear, concise set of moods. In our free mood taxonomy overview, we give you a clear overview of moods and its synonyms to help you to put emotions into words.

Click here to request our Mood Taxonomy Overview

4 Ways How We Use Music To Regulate Our Emotions In Everyday Life

4 Ways How We Use Music To Regulate Our Emotions In Everyday Life

Music listening is an integral and oftentimes purposeful activity in our daily lives. We listen to particular tracks in order to change our current emotional state or in order to maintain it. How we react to a particular song not only depends on the musical attributes of that song but on various situational and personal factors.

Originally posted on our Groovecat blog, written by Sami Behbehani , 15. March 2019

Possibilities for musical self-regulation are limitless in today’s modern society. Technical advancements such as smartphones, high-quality earphones, and music streaming have enabled listeners to access massive song-libraries from anywhere, at any time.

Consequently, individuals can immediately react to new circumstances by adapting their listening strategy accordingly.

However, the process of self-regulation through music is highly subjective and dependent on various factors.

Perceiving an emotion doesn’t mean that you feel it the same as other people

 

A song is a construct, whose single elements merge and ultimately communicate a particular feeling or atmosphere. Most likely, a listener will perceive this feeling accurately. Yet, the feeling having any effect on the emotional state of the listener is not given.

“Whether a certain song evokes an emotion or not depends firstly on the listener’s musical preference, secondly the previous listening experience and thirdly, empathy with the recording artist” – Sami Behbehani

As in movies, a certain degree of identification with the protagonist is preconditioned for the story to touch the audience. In a musical context, empathy is the precondition for a song’s story to strike interest and cause emotional contagion. Studies have shown that with an increasing degree of empathy towards a song/artist, a higher correspondence between perceived and felt emotion during music listening can be experienced.

Your listening environment influences your music selection more than personal attributes

Some recent scientific studies have shown situational circumstances to have a stronger influence on the process of music selection than personal attributes of the listener. However, capturing the essence of a situation is a complex and scientifically still relatively unexplored issue. Situations do not only include physical elements such as location, persons, weather, time of day etc. But there is also the aspect of how a person reacts towards these respective elements. This aspect even includes potential highly complex interactions between person and situation.

In our daily lives, we experience various situations that affect us in different ways and to which we react accordingly. While some of these situations occur spontaneously, others allow us to plug in our earphones or switch on our speakers. For instance: On our way to work we might get bored and hence need something to lift us up; while getting ready in the morning we might want to start the day off on a positive or energetic note; when we socialize with others we like to create a comforting atmosphere; and in order to prepare for a stressful situation we want to reach a higher state of excitement, in order to handle the situation better

Common strategies of emotional regulation

  1. Aesthetic enjoyment

Studies have shown that personal well-being is a key motive for music listening. When listening to preferred songs it makes the listener draw enjoyment from the overall listening experience. Liked music was shown to trigger the release of neurological messengers such as dopamine and serotonin, signaling pleasure and reward to the system, resulting in increased comfort. This can be interpreted as a mood-improvement process through aesthetic stimulation, which however does not modify the listener’s emotion in a specific fashion.

 

  1. Sustaining cheerfulness

Further in line with the principle of emotional regulation is a deliberate choice of songs that communicate emotions parallel with those felt by the listener. Persons experiencing cheerfulness tend to listen to happy music more frequently because they like to maintain the emotional state they are in. This is a common strategy in situations where social interaction between persons is desirable, as at parties or relaxed evenings with friends.

 

  1. Emotional Self-therapy

Another strategy that directly influences a music listener’s emotional state is utilized when experiencing negative emotion. Sad music, for instance, is highly popular amongst listeners of different genres on the one hand; and on the other hand, it can exert a strong effect on the listener. As compared to happy music which rather maintains or enforces an existing emotional state, sad or depressing songs are more commonly used for musical self-therapy. If previously mentioned mechanisms such as empathy with the song/artist, preference for the style etc. are given, sad music can mirror the listener’s feelings and therefore help to process experienced sadness, ultimately resulting in uplift.

 

  1. Stimulation

Aggressive music is a special case in itself because it can be positively stimulating on the one hand yet also expresses a negative emotional connotation on the other hand. Listening to aggressive music while experiencing feelings of aggression can have a channeling effect. Beyond that, intense music, aggressive music, in particular, enables the listener to achieve a higher degree of stimulation. This effect is consciously or subconsciously utilized by music listeners in order to: get pumped up for physical activities such as sports or dancing; motivate themselves to pull through monotonous tasks such as housework and cooking; or prepare themselves mentally for events known to include conflict and negative stress.

Click on the button to load the content from open.spotify.com.

Load content

Click on the button to load the content from open.spotify.com.

Load content

Click on the button to load the content from open.spotify.com.

Load content

Implications for the future

It can be suggested that any form of maintaining or improving one’s emotional state through music falls under the category of musical-self therapy.

There is however no auditive all-around solution for daily needs since individuals vary in their personal attributes and situations exert different effects on different people. Since music recommendation algorithms rarely or not at all focus on mentioned aspects, it is unlikely for them to serve as an adequate daily regulation-tool for listeners.

Research is still at a point where new discoveries can potentially shake up the field and though there are several studies with valid findings, most likely no study will ever be able to include all parameters that fully explain human music listening behavior.

From the consumer’s perspective, the last few years of technological development have facilitated a free and goal-driven use of music. This positive development could continue in the future with tech-companies and start-ups working on new ways for music to fulfill the listeners’ potential needs.