Experience Our Biggest Web App Update with 5,000+ New Genres! ???? Discover Now

An Overview of Data in The Music Industry

An Overview of Data in The Music Industry

This article is a continuation of the series “How to Turn Music Data into Actionable Insights”. We’re diving deeper into the first layer of the Data Pyramid and explore different kinds of metadata in the industry. 

Metadata represents a form of knowledge in the music industry. Any information you have about a song is considered metadata – information about performance, sound, ownership, culture, etc. Metadata is already used in every step of the value chain; for music recommendation (algorithms), release planning, advance payments, marketing budgeting, royalty payouts, artist collaborations, and etc. 

Nonetheless, the music industry has an ambivalent relationship with musical metadata. On the one hand, the data is necessary as millions of songs are circulating in the industry every day. On the other hand, music is quite varied and individual (pop music is very different to ambient sounds, for example) so the metadata that describes music can take different forms and meanings making it a quite complex field. 

This article intends to explore all kinds of metadata used in the industry from basic descriptions of acoustic properties to company proprietary data. 

The article was created with helpful input from Music Tomorrow:

Music data is a multi-faceted thing. The real challenge for any music business looking to turn data into powerful insights is connecting the dots across all the various types of music data and aggregating it at the right level. This process starts with figuring out how the ideal dataset would look like — and a well-rounded understanding of all the various data sources available on the market is key

Dmitry Pastukhov

Analyst at Music Tomorrow

Types of Metadata

There are various classifications of the music metadata. The basic classification is Public vs Private metadata: 

  • Public metadata is easily available and visible to the public.  
  • Private metadata is kept behind closed doors due to legal and security or because of economic reasons. Maintaining competitiveness is one of these reasons why the metadata is kept private. Typically, performance-related metadata like sales numbers is private. 

Another very basic classification of metadata is Manual vs Automatic annotations: 

  • Manual metadata is entered into the system by humans. These could be annotations from the editors or users. 
  • Automatic metadata is obtained through automatic systems, for example, AI.

1. Factual (Objective)

Factual metadata is objective. This is such metadata as artist, album, year of publication, and duration of the song, etc. Factual metadata is usually assigned by the editor or administrator and describes the information that is historically true and can not be contested. 

Usually, factual metadata doesn’t describe the acoustic features of the song. 

Besides the big streaming services, platforms like Discogs are great sources to find and double-check objective metadata. 

Discogs provides a full account of factual metadata

2. Descriptive (Subjective)

Descriptive metadata (often also referred to as creative metadata or subjective metadata) provides information about the acoustic qualities and artistic nature of a song. These are such data as mood, energy, and genre, voice, and etc.  

Descriptive metadata is usually subjectively defined by the human or the machine based on the previous experience or dataset. However, BPM, Key, and Time Signature are the exception to this rule. BPM, Key, and Time Signature are objective metadata that describe the nature of the song. We still count them as the descriptive metadata.  

Major platforms like Spotify and Apple Music have strict requirements for submitted files. Having incomplete metadata can result in a file being rejected for further distribution. For music libraries, the main concern is user search experience as categorization and organization of songs in the library rely almost entirely on metadata.

Companies such as Cyanite, Musiio, Musimap, or FeedForward are able to extract descriptive metadata from the audio file.

3. Ownership/Performing Rights Metadata

Ownership Metadata defines the people or entities who own the rights to the track. These could be artists, songwriters, labels, producers, and others. These are all sides interested in a royalty split, so ownership metadata ensures everyone involved is getting paid. Allocation of royalties can be quite complicated with multiple songwriters involved, songs using samples of other songs, and etc. So ownership metadata is important.

Companies such as Blòkur, Jaxta, Exectuals, Trqk, and Verifi Media provide access to ownership metadata with the goal to manage and track changes to ownership rights of the song over time – and ensure correct payouts for the rights holders.

4. Performance – Cultural Metadata

The performance or cultural metadata is produced by the environment or culture. Usually, this implies users having an interaction with the song. This interaction is then registered in the system and analyzed for patterns, categories, and associations. Such metadata includes likes, ratings, social media plays, streaming performance, chart positions, and etc. 

The performance category can be divided into two parts: 

  • Consumption or Sales data deals with consumption and use of the item/track and usually needs to be acquired from partners. For example, Spotify shares data with distributors, distributors pass it down to labels, and so forth. 
  • Social or Audience data. Social data indicates how well music/artist does within a particular platform plus who the audience is. It can be accessed either through first-party tools or third-party tools. 

First-party tools are powerful but disconnected. They require to harmonize data from different platforms to get a full picture. They are also limited in scope, meaning that they cover only proprietary data. Third-party tools are more useful. They provide access to data across the market, incl. performance for artists you have to connect to. In this case, the data is already harmonized but the level of detail is lower. 

Another way to acquire social data is tracking solutions (movie syncs, radio, etc) that produce somewhat original data — these could be either integrated with third-party solutions (radio-tracking on Soundcharts/Chartmetric, for example) or operate as standalone tools (radio monitor, WARM, BDS tracker). It is still the consumption data but it’s accessed bypassing the entire data chain.

5. Proprietary Metadata

Proprietary data is the data that remains in the hands of the company and rarely gets disclosed. Some data used by recommender systems is proprietary data. For example, a song’s similarity score is proprietary data that can be based on performance data. This type of data also includes insights from ad campaigns, sales, merch, ticketing, and etc. 

Some of the proprietary data belongs to more than one company. Sales, merch, ticket sales — a number of parties are usually involved here.

Outlook

Today, processes in the music industry are rather one-dimensional when it comes to data. For instance, marketing budgets are often planned merely based on past performance of an artist’s recordings – so are multiples on the acquisitions of their rights. 

Let’s look at the financial sector: In order to estimate the value of a company, one has to look at company-internal factors such as inventory, assets, or human resources as well as outside factors such as past performances, political situation, or market trends. Here we look at proprietary data (company assets), semi-proprietary data (performance), and public data (market trends). The art of connecting those to make accurate predictions will be the topic of future research on the Cyanite blog.

Cyanite library provides a range of descriptive metadata

Conclusion

Musical metadata is needed to manage large music libraries. We tried to review all metadata types in the industry, but these types can intersect and produce new kinds of metadata. This metadata can then be used to derive information, build knowledge, and deliver business insights, which constitute the layers of the Data Pyramid – a framework we presented earlier that helps make data-based decisions. 

In 2021, every company should see itself as a data company. Future success is inherently dependent on how well you can connect your various data sources.

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via sales@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

How to Turn Music Data into Actionable Insights

How to Turn Music Data into Actionable Insights

Decisions in the music industry are increasingly made based on data. Services like Chartmetric, Spotify for Artists, or Facebook Business Manager among others are powerful tools to enable music industry teams to back up their work with data. This process of collective data generation is called datafication, pointing at the fact that almost everything nowadays can be collected, measured, and captured either manually or digitally. Data can be used by recommendation algorithms to recommend music. This data is also an opportunity for the stakeholders to make business decisions, reach new audiences in the market, and find innovative ways to do business. 

However, decision-making from data also requires a new level of analytical skills. Simple statistical analysis is not enough for data-based decision-making. The amount of data, its complexity, and challenges associated with storing and synchronizing large data require smart technologies to help manage the data and make use of it. 

This article explains the general framework for data-based decision-making and explores how Cyanite capabilities can lead to actionable business insights. It is specifically tailored to music publishers, music labels, and all types of music businesses that own musical assets. This framework can be applied in a variety of ways to areas such as catalog acquisitions, playlist pitching, and release plans.

Data Pyramid

So how are data-based decisions made? The framework for decision-making is based on the Berklee University Data Pyramid model by Liv Buli. In the next parts we outline the 4 steps of the model and add a certain Cyanite layer to it:

  1. Data Layer
  2. Information Layer
  3. Knowledge Layer 
  4. Intelligence Layer

1. Data Layer

At the very first layer, the raw data is generated and collected. This data includes user activity, information about the artist such as social media accounts and usernames, album release dates or concert dates, and creative metadata. Among others, a technology like Cyanite generates creative metadata such as genre, mood, instruments, bpm, key, vocals, and more.  

At this stage the data is collected in a raw form, so limited insights can be made out of it. This step is, however, crucial for insight generation at the next layers of the pyramid. 

It’s also important to remember a general rule of thumb: the better the input, the better the output. That is, the higher the quality of this initial data generation, the better insights you can derive from it at the top of the pyramid. Any mistakes made in the crucial data layer will be carried in the other layers biasing insights and limiting actionability.

Our philosophy at Cyanite is to have a  reduced set of data output in the first layer to help accuracy and reliability. The data is presented in readable form in the Cyanite interface. You can see what kind of data Cyanite generates in this analysis of Spotify’s New Music Friday Playlist

Cyanite Interface

2. Information Layer

At the information level, the data is structured and visualized, often in a graphical form. A first glance at data in a visual form can already bring value and enable the company to answer questions such as: “What happened with the artist, playlist, or a chart?” 

At each stage of the pyramid, an appropriate analytical method is used. The visualization and reporting methods among others at the information stage are called descriptive analytics. The main goal at this stage is to identify useful patterns and correlations that can be later used for business decisions. 

Analytical methods used at each stage of the data pyramid

(based on Sivarajah et al. classification of data analytical models, 2016)

At this level, there are two main techniques: data aggregation and data mining. Data aggregation deals with collecting and organizing large data sets. Data mining discovers patterns and trends and presents the data in a visual or another understandable format. 

This is the simplest level of data analysis where only cursory conclusions can be made. Inferences and predictions are not made on this level. 

Instruments in Detail View

One of many ways to visualize music via Cyanite’s detail view

3. Knowledge Layer

The knowledge layer is the stage where the information is converted into knowledge. Benchmarking and setting milestones are used at this stage to derive insights from data. For example, you can set expectations for the artist’s performance based on the information about how they performed in the past. Various activities such as events and show appearances can influence the artist’s performance and this information too can be converted into knowledge, indicating how different factors affect the artist’s success

This stage employs the so-called predictive analytics that answer the question: “What is likely to happen in the future?” This kind of analytics captures patterns and relationships in data. They can also deal with historical patterns, connect them to future outcomes, and capture interdependencies between variables. 

At the knowledge level, such techniques as data mining, statistical modeling, and machine learning algorithms are used. For example, machine learning methods can try to fill in the missing data with the best possible guesses based on the existing data. More information on the techniques and advantages and disadvantages of each analytical method can be found here

Here are some contexts in which the data generated by Cyanite could create knowledge:

  • Analyzing a “to-be-acquired” catalog of music rights and benchmarking it into the existing one;
  • Analyzing popular playlists to predict matches;
  • Analyzing trending music in advertising to find the most syncable tracks in the own catalog.

4. Intelligence Layer

The intelligence layer is the stage where questions such as “So What?” and “Now What?” are asked and possibly answered. This level enables the stakeholders to predict the outcomes and recommend actions with a high level of confidence. However, decision-making at this level is risky as the wrong prediction can be expensive. While this level is still very much human-operated, machines, especially AI, are moving up the pyramid levels to take over insights generation and decision making

Prescriptive analytics deal with cause-effect relationships among knowledge points. They allow businesses to determine actions and assess their impact based on feedback produced by predictive analytics at the knowledge level. This is the level where the truly actionable insights that can affect business development are born. 

Intelligence anticipates what and when something might happen. It can even attempt to understand why something might happen. At the intelligence level, each possible decision option is evaluated so that stakeholders can take advantage of future opportunities or avoid risks. Essentially, this level deals with multiple futures and evaluates the advantages of each option in terms of future opportunities and risks. For further reading, we recommend this article from Google revealing the mindset you need to develop data into business insights.  

Cyanite and Data-Based Decision

At the very base of the Data Pyramid lies raw data. The quality and accuracy of raw data is detrimental in decision making. 

Cyanite generates data about each music track such as bpm, dominant key, predominant voice gender, voice presence profile, genre, mood, energy level, emotional profile, energy dynamics, emotional dynamics, instruments, musical era, and other characteristics of the song. This data is then presented in a visual format such as graphs with dynamic and the ability to choose the custom segment.

Based on different audio parameters, the system determines the similarity between the items, and lists similar songs based on a reference track. From Cyanite analytics, it can be derived, for example, what the overall mood of the library is and different songs can be added to make the library more comprehensive. Branding decisions can also be made using Cyanite, to ensure all music employed by the brand adheres to one mood or theme

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via sales@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

How Do AI Music Recommendation Systems Work

How Do AI Music Recommendation Systems Work

Music recommendation systems can significantly improve the listening and search experience of a music library or music application. Algorithmic recommender systems have become inevitable due to increased access to digital content. In the music industry, there is just too much music for the user to navigate tens of millions of songs effectively. Since the need for satisfactory music recommendations is so high, the MRS (music recommendation systems) field is developing at a lightning speed. On top of that, the popularity of streaming services such as Spotify and Pandora shows that people like to be guided in their music choice and discover new tracks with the help of algorithms. 

Hitting the musical spot for their users is the goal of every music service. However, there are many ways and philosophies to music recommendation with very different implications.

In this article, we unravel all the specifics of music recommendation systems. We look into the different approaches to music recommendation and explain how they work. We also discuss approaches to Music Information Retrieval which is a field concerned with automatically extracting data from music.

If you want to upgrade a music library or build a music application, keep reading to find out which recommendation system works best for your needs.

Approaches to Music Recommendation

We focus on three approaches to music recommender systems: Collaborative Filtering, Content-based Filtering, and Contextual Approach.

1. Collaborative Filtering

The collaborative filtering approach predicts what users might like based on their similarity to other users. To determine similar users, the algorithm collects user historical activity such as user rating of a music track, likes, or how long the user was listening to the track. 

This approach reproduces the friends’ recommendations approach in the days when music was passed around in the tight circle of friends with similar interests. Because only user information is relevant, collaborative filtering doesn’t take into account any of the information about the music or sound itself. Instead, it analyzes user preferences and behavior and by matching one user to another predicts the likelihood of a user liking a song. For example, if User A and User B liked the same song in the past, it is likely that their preferences match. In the future, User A might get song recommendations that User B is listening to based on the similarity that was established earlier. 

The most prominent problem of the collaborative filtering approach is the cold start. When the system doesn’t have enough information at the beginning, it won’t provide accurate recommendations. This applies to new users, whose listening behavior is not tracked yet, or new songs and artists, where the system needs to wait before users interact with them. 

In collaborative filtering, several approaches are used such as user-based and item-based filtering, and explicit and implicit ratings. 

 

Alina Grubnyak @ Unsplash

Collaborative Filtering Approaches

It is common to divide collaborative filtering into two types – user-based and item-based filtering: 

  • User-based filtering establishes the similarity between users. User A is similar to User B so they might like the same music. 
  • Item-based filtering establishes the similarity between items based on how users interacted with the items. Item A can be considered similar to Item B because they were both rated 5 out of 10 by users. 

Another differentiation that is used in collaborative filtering is explicit vs implicit ratings: 

  • Explicit rating is when users provide obvious feedback for items such as likes or shares. However, not all items get a rating, and sometimes users will interact with the item without rating it. In that case, the implicit rating can be used. 
  • Implicit ratings are predicted based on user activity. When the user didn’t rate the item but listened to it 20 times, it is assumed that the user likes the song.

2. Content-based Filtering

Content-based filtering uses metadata attached to the items such as descriptions or keywords (tags) as the basis of the recommendation. Metadata characterizes and describes the item. Now, when the user likes an item the system determines that this user is likely to like other items with similar metadata to the one they already liked. 

Three common ways to assign metadata to content items are through a qualitative, quantitative, and automated approach.

Firstly, the qualitative approach is through library editors that professionally characterize the content.

Secondly, in the quantitative or crowdsourced approach, a community of people assigns metadata to content manually. The more people participate, the more accurate and less subjectively biased the metadata gets. 

And thirdly, the automated way where algorithmic systems automatically characterize the content. 

Metadata

Musical metadata is adjacent information to the audio file. It can be objectively factual or descriptive (based on subjective perception). In the music industry, the latter is also often referred to as creative metadata. 

For example, artist, album, year of publication are factual metadata. Descriptive data describes the actual content of a musical piece e.g. the mood, energy, and genre. Understanding the types of metadata and organizing the taxonomy of the library in a consistent way is very important as the content-based recommender uses this metadata to pick the music. If the metadata is wrong the recommender might pull out a wrong track. You can read more about how to properly structure a music catalog in our free taxonomy paper. For professional musicians sending music, this guide on editing music metadata can be helpful. 

 

David Pupaza @ Unsplash

Content-based recommender systems can use both factual and descriptive metadata or focus on one type of data only. Much attention is put into content-based recommendation systems as they allow for objective evaluation of music and can increase access to “long-tail” music. They can enhance the search experience and inspire many new ways of discovering and interacting with music. 

The field concerned with extracting descriptive metadata from music is called Music Information Retrieval (MIR). More on that later in the article.

3. Context-aware Recommendation Approach

Context has become popular in recommender systems recently and it is a relatively new and still developing field. The context includes the user’s situation, activity, and circumstances that content-based recommendation and collaborative filtering systems don’t take into account, but might influence music choice. Recent research by the Technical University of Berlin shows that 86% of music choices are influenced by the listener’s context

This could be environment-related context and user-related context. 

  • Environment-related Context

In the past, recommender systems were developed that established a link between the user’s geographical location and music. For example, when visiting Venice you could listen to a Vivaldi concert. When walking the streets of New York, you could blast Billy Joel’s New York State Of Mind. Emotion indicating tags and knowledge about musicians were used to recommend music fitting to a geographical place. 

  • User-related Context

You could be walking or running depending on the time of the day and your own plans, you could also be sad or happy depending on what happened in your personal life – these circumstances represent the user-related context. Being alone vs being in a social company may also significantly influence music choice. For example, when working out you might want to listen to more energetic music than your usual listening habits and musical preferences would suggest. Good music recommendation systems would take this into account.

Music Information Retrieval

The research concerned with the automated extraction of creative metadata from the audio is called Music Information Retrieval (MIR). MIR is an interdisciplinary research field combining digital signal processing, machine learning, and artificial intelligence with musicology. In the area of music analysis, its focus is widely spread, evolving from BPM or key detection from audio, analysis of higher-level information like automatic genre or mood classification to state of the art approaches like automatic full-text song captioning. It also covers research on the similarity of musical audio pieces and, in line with that, search algorithms for music and automatic music generation.

At Cyanite, we are using a combination of Music Information Retrieval methods. For example, various artificial neural network architectures are used to predict the genre, mood, and other features of the song based on the existing dataset and subsequent network training. More on that in this article on how to analyze music with neural networks. Our Similarity Search takes a reference track and gives you a list of songs that match by pulling audio, metadata, and other relevant information from audio files. The overall character of the library can be determined and managed using Similarity Search. 

Our Cyanite AI found the best applications in music libraries targeted at music professionals, DJs, artists, and brands. They represent the business segment of the industry. 

Custom interval

Cyanite Similarity Search

Conclusion

The choice of a music recommendation approach is highly dependent on your personal needs and the data you have available. An overarching trend is a hybrid approach that combines features of collaborative filtering, content-based filtering, and context-aware recommendations. However, all fields are in a state of constant development and innovations make each approach unique. What works for one music library might not be applicable to another.

The common challenges of the field are access to large enough data sets and understanding how different musical factors influence people’s perception of music. More on that soon on the Cyanite blog! 

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via sales@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

If you are interested in going deeper into Music Recommender Systems we highly recommend the following reads: 

Current challenges and visions in music recommender systems research

Music Recommender Systems

Deep Learning in Music Recommendation Systems

How to Create Custom Audiences for Pre-Release Music Campaigns in Facebook, Instagram, and Google

How to Create Custom Audiences for Pre-Release Music Campaigns in Facebook, Instagram, and Google

As a music label, you know how hard it is to promote a new artist and cut through the noise. Most music advertising agencies and labels choose to do Facebook and Instagram marketing as an easy way to start. There are some important things you should know about setting up such campaigns but there are already great guides and tips on that here and here. What we want to cover in this article are the steps you can take to identify the audience for a completely new track.

An interest-based audience is a tool that allows you to select customers based on their interests. You can target people who are fans of other artists or people who browse websites similar to your artist’s webpage. Interest-based audiences feature significantly narrows your audience to the most relevant group, thus increasing your chances of reaching the right people. 

If you are a big music label you can also use custom audiences on Facebook. You might have thought that custom audiences are only applicable to music that already gained its following, but there is a workaround. All you need to do is find similar artists who fit the roster of your label and then launch a campaign based on the same audience but for the new artist. Thus you make the most out of your advertising efforts. 

With Spotify, it’s easy to find similar artists and their respective fan communities, but when the new song is not yet released or you are breaking a new artist, Spotify algorithms won’t work. So how do you identify similar artists for a track that is not released yet? You can use Cyanite’s Similarity Search to solve that problem. The Cyanite Similarity Search compares the sound of the song you want to promote with hundreds of thousands of other tracks and finds the ones that sound similar.

At this point, kudos to Maximilian Pooschke of Virgin Music Label & Artist Service  for bringing this use case of Cyanite’s similarity search to our team’s attention.

Cyanite’s Similarity Search is an intuitive tool when I’m creating custom audiences for new artists on social media. Especially for music that doesn’t easily fit into a box, the similarity search is a great entry point for campaign planning.”

Maximilian Pooschke

Virgin Music Label & Artist Service

Now, here is a step-by-step guide on how to create custom audiences using similar artists identified by Cyanite.

Step 1. Upload music to the library view and let Cyanite analyze it

Drag and drop your music to the library view. Before you do that, you need to register for free here https://app.cyanite.ai/register

Library view
Picture 1. Cyanite library view

The library view will show some data about the song such as mood, genre, energy level, emotional profile, and more. You can explore this data or move to the next step.

Step 2. Find similar songs using Similarity Search

Click on Similarity next to the analyzed song in the library to start finding similar songs from our showcase database of around 600k popular songs. Our similarity algorithms work to offer you the most relevant and precise results and focus purely on the actual sound and feel of a song. Additionally, you will see all the same analysis data available for all the songs including Moods, Energy Level, and Emotional Profile. 

Step 3. Play around with the different filters for more granular insights

Often the magic occurs when you apply different filters. Use the custom interval, play around with tempo, genre, and key, and dive deeper into different results. Then pick the artists and tracks you find most relevant from the Cyanite suggestions.

Step 4. Enrich your findings with additional data from sources like Chartmetric

To get more details on discovered similar songs and artists, you should use other data sources to further narrow down your selection and be as precise as possible. You can check out festivals, radio stations, and/or magazines to enrich your search and select more source audiences for your audience.

Custom interval
Picture 2. Cyanite Similarity Search based on custom interval

Step 5. Go and select your audiences

Off to Facebook or Instagram to create your audiences with the popular artists you have found and selected with the Similarity Search. Use interest-based targeting and enter a similar artist’s name as a keyword. Play around with keywords for maximum results. You can use artists’ names, song names, genres, or other keywords. A good comprehensive resource on how to use and manage Facebook, Instagram, and Google ads is AdEspresso

Facebook Ad Settings
Picture 3. Facebook Ads Detailed Targeting

This is just one of many ways to use Cyanite for your purposes. You can check out this article to find out more on how to use Cyanite for playlist pitching or this one to find out how to use Cyanite to find music for your videos.

Analyzing Music Using Neural Network: 4 Essential Steps

Analyzing Music Using Neural Network: 4 Essential Steps

As written in the earlier blog article, we at Cyanite focus on the analysis of music by using artificial intelligence (AI) in the form of neural networks. Neural networks in music can be utilized for many tasks like automatically detecting the genre or the mood of a song, but sometimes it can also be tricky to understand how they work exactly.

With this article, we want to shed light on how neural networks can be deployed for analyzing music. Therefore, we’ll be guiding you through the four essential steps you need to know when it comes to neural networks and AI audio analysis. To see a music neural network in action, check out one of our data stories, for example, an Analysis of German Club Sounds with Cyanite. 

The 4 steps for analyzing music with neural networks include:

1. Collecting data

2. Preprocessing audio data

3. Training the neural network

4. Testing and evaluating the network

Step 1: Collecting data

Let’s say that we want to automatically detect the genre of a song. That is, the computer should correctly predict whether a certain song is, for example, a Pop, Rock, or Metal song. This seems like a simple task for a human being, but it can be a tough one for a computer. This is where deep learning in the form of neural networks come in handy.

In general, a neural network is an attempt to mimic how the human brain functions. But before the neural network is able to predict the genre of a song, it first needs to learn what a genre is.

Simply put: what makes a Pop song a Pop song? What is the difference between a Pop song and a Metal song? And so on. To accomplish this, the network needs to “see” loads of examples of Pop, Rock or Metal, etc. songs, which is why we need a lot of correctly labeled data.

Labeled data means that the actual audio file is annotated with additional information like genre, tempo, mood, etc. In our case, we would be interested in the genre label only.

Although there are many open sources for this additional information like Spotify and LastFM, collecting the right data can sometimes be challenging, especially when it comes to labels like the mood of a song. In these cases, it can be a good but also perhaps costly approach to conduct surveys where people are asked “how they feel” when they are listening to a specific song.

Overall, it is crucial to obtain meaningful data since the prediction of our neural network can only be as good as the initial data it learned from (and this is also why data is so valuable these days). To see all the different types metadata used in the music industry, see the article an Overview of Data in the Music Industry.

Moreover, it is also important that the collected data is equally distributed, which means that we want approximately the same amount of, for example, Pop, Rock, and Metal songs in our music dataset.

After collecting a very well labeled and equally distributed dataset, we can proceed with step 2: pre-processing the audio data.

A screenshot from a data collection music database

Step 2: Pre-processing audio data

There are many ways how we can deal with audio data in the scope of music neural networks, but one of the most commonly used approaches is to turn the audio data into “images”, so-called spectrograms. This might sound strange and counterintuitive at first, but it will make sense in a bit.

First of all, a spectrogram is the visual representation of the audio data, more precisely: it shows how the spectrum of frequencies that the audio data contains varies with time. Obtaining the spectrogram of a song is usually the most computationally intensive step, but it will be worth the effort. Spectrograms are essentially data visualizations – you can read about different types of music data visualizations here.

Since great successes were achieved in the fields of computer vision over the last decade using AI and machine learning (face recognition is just one of the many notable examples), it seems natural to take advantage of the accomplishments in computer vision and apply them to our case of AI audio analysis.

That’s why we want to turn our audio data into images. By utilizing computer vision methods, our neural network can “look” at the spectrograms and try to identify patterns there.

Spectrograms from left to right: Christina Aguilera, Fleetwood Mac, Pantera

Step 3: Training the neural network

Now that we have converted the songs in our database into spectrograms, it is time for our neural network to actually learn how to tell different genres apart.

Speaking of learning: the process of learning is also called training. In our example, the neural network in music will be trained to perform the specific task of predicting the genre of a song.

To do so, we need to split our dataset into two subsets: a training dataset and a test dataset. This means that the network will be only trained on the training dataset. This separation is crucial for the evaluation of the network’s performance later on, but more on that in step 4.

So far, we haven’t talked about how our music neural network will actually look like. There are many different forms of neural network architectures available, but for a computer vision task like trying to identify patterns in spectrograms, so-called convolutional neural networks (CNN) are most commonly applied.

Now, we will feed a song in form of a labeled spectrogram into the network, and the network will return a prediction for the genre of this particular song.

At first, our network will be rather bad at predicting the correct genre of a song. For instance, when we feed a Pop song into the network, the network’s prediction might be a metal song. But since we know the correct genre due to the label, we can tell the network how it needs to improve.

We will repeat this process over and over again (this is why we needed so much data in the first place) until the network will perform well on the given task. This process is called supervised learning because there’s a clear goal for the network that it needs to learn.

During the training process, the network will learn which parts of the spectrograms are characteristic of each genre we want to predict.

Example architecture of how a CNN can look like

Step 4: Testing and evaluating the network

In the last step, we need to evaluate how good the network will perform on real-world data. This is why we split our dataset into a training dataset and a test dataset before training the network.

To get a reasonable evaluation, the network needs to perform the genre classification task on data it never has seen before, which in this case will be our test dataset. This is truly an exciting moment because now we get an idea of how good (or maybe bad) our network actually performs.

Regarding our example of genre classification, recent research has shown that the accuracy of a CNN architecture (82%) can surpass human accuracy (70%), which is quite impressive. Depending on the specific task, accuracy can be even higher.

But you need to keep in mind: the more subjective the audio analysis scope is (like genre or mood detection), the lower the accuracy will be.

On the plus side: everything we can differentiate with our human ears in music, a machine might distinguish as well. It’s just a matter of the quality of the initial data.

Conclusion

Artificial intelligence, deep learning, and especially neural network architectures can be a great tool to analyze music in any form. Since there are tens of thousands of new songs released every month and music libraries are growing bigger and bigger, music neural networks can be used for automatically labeling songs in your personal music library and finding similar sounding songs. You can see how the library integration is done in detail in the case study on the BPM Supreme music library and this engaging interview video with MySphera. 

Cyanite is designed for these tasks, and you can try it for free by clicking the link below.

I want to integrate AI in my service as well – how can I get started?

Please contact us with any questions about our Cyanite AI via mail@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.

Free Mood Taxonomy: Translate Emotions Into Words And Vice Versa

Free Mood Taxonomy: Translate Emotions Into Words And Vice Versa

Describing a certain feeling or emotion is one of the hardest things in the world, let alone trying to describe a feeling that you get from listening to a song. You have probably faced this challenge before when working in the music industry, whether you’re trying to decipher a synch briefing or when writing a new promo text.

That’s why we think it’s very important to work with a clear, concise set of moods. In our free mood taxonomy overview, we give you a clear overview of moods and its synonyms to help you to put emotions into words.

Click here to request our Mood Taxonomy Overview