In this article, we present the common challenge of inconsistencies of keyword tagging in music databases. We discuss what causes these problems and how Cyanite developed a Keyword Cleaning system to automatically solve and overcome these. We will present four use cases for our Keyword Cleaning system and the potential impact it may have on music businesses.

Introduction of the problem

The way we perceive music is highly individual. So is the way we describe music. What is food for many dinner conversations is important to be aware of when handling bigger amounts of musical pieces professionally.

To leverage diverse monetization opportunities with musical assets, many music companies sort music catalogs by assigning keyword tags to all the audio files in their music database. These tags may describe the mood and genre of a song or categorize its instruments or tempo. This way music companies ensure accessibility and searchability of any musical asset even in very large music catalogs.

These tags follow the companies’ individual understanding of music – their catalog language. The specific nature of a catalog language may be understood under two aspects:

1. Objective catalog language (tagging): the entity of keywords and tags often described as taxonomy or tag anthology (quantity, classes and wording). „Which tags do I use.

2. Subjective catalog language (understanding of tagging): the understanding of tags and their connection to certain sound qualities. „When do I assign a certain tag?“

Objective catalog language is inherent to the music catalog or the company that owns it. Subjective catalog language, however, is inherent to every individual person that tags the music.

Having a consistent catalog language leads to a brilliant search experience and is the perfect condition for thorough exploitation of your assets. A lot of work can go into building and maintaining an own catalog language. However, 3 main events can quickly erode it and thus erode tagging quality and meaningfulness:

Event 1: Catalog acquisitions or integrations.

Event 2: Differences in the form of the day of tagging staff.

Event 3: The hiring of new tagging staff.

Not being aware of this can cause the annihilation of the work of decades. Songs can’t be found and revenue streams can’t be realized as before, seriously harming a company’s ability to execute their business model.

More importantly – music searching staff don’t trust the music search anymore which leads them to building up highly individual systems of workarounds to finding suitable music or a very limited „go-to-catalog“ of songs that they use more often rather than grasping on the entire music catalog.

Aaron Chavez © Unsplash

Our solution

Addressing these issues, Cyanite developed a way to bring together (translate) two catalog languages – objective or subjective – with minimum information loss and maximum speed, using AI.

We base our approach on a measure we denote as keyword similarity, describing the degree of semantic similarity of a pair of tags. To give an example, the keywords “enthusiastic” and “euphoric” should have a rather similar meaning when used for the description of a musical mood. We would therefore expect a high degree of keyword similarity. On the contrary, “enthusiastic” and “gloomy” represent a quite contrary pair of descriptive attributes which should point towards a low degree of keyword similarity.

Most music catalogs contain a multi-label tagging scheme, meaning the possibility for a single piece of music to be assigned multiple tags. We take use of this fact and focus on the track-wise co-occurrence of tags, hypothesizing that a frequent joint attribution of a tag pair will indicate a high degree of interrelation and, thus, keyword similarity.

We developed a natural language processing (NLP) AI system capable of learning the semantic interrelation of keywords in any library. With this, we are able to derive a quantitative measure for any combination of keywords contained in one or several music catalogs. This analysis is the basis for a variety of groundbreaking use cases to overcome challenges many music companies are struggling with.


Use Case 1: Catalog language translation

This challenge arises when two (or more) differently tagged music catalogs shall be integrated into each other (potentially after a catalog acquisition or when choosing a different distribution outlet). Manually translating tags is tedious and may lead to significant information loss as sometimes the same tags are not used equally (see “subjective catalog language” above).

Our system is able to understand and map every tag in relation to each other. It does it with both taxonomies understanding the respective catalog language. In a second step it maps both catalogue languages on top of each other drawing direct relations between tags and their understanding. The third step marks the translation of the single song tagging from one catalog language into the one the catalog shall be integrated in. The system automatically re-tags every song in a new catalog language.

Use Case 2: Keyword Cleaning of inconsistent keyword tagging

Companies with high fluctuation in tagging staff face this challenge – or it may be a company with a particularly large catalog (>100,000 songs) that picked up some legacy over the years: Inconsistencies in keyword tagging. This is one of the biggest problem catalogs can face as it seriously diminishes the searchability and search experience of the catalog leading to mistrust of the system, individual workarounds and eventually losing the customer for good. Or it leads the customer to directly contact the library’s sales team and search staff which harms the capability of your business to scale.

After understanding the respective catalog language of your catalog our Cyanite Keyword Cleaning system can detect tags with low keyword similarity that may contradict the other tags and flag the respective songs. To assess if a tag was wrongfully assigned (or may be missing), we offer an audio-based tagging solution for these anomalies to detect whether or not a tag is suitable or not. In case of the latter the tag is then deleted.   

Use Case 3: Taxonomy Cleaning. Detection of redundancies and blind spots.

Languages change over time – and with it change catalog languages. Some catalogs have 15,000+ different keywords in their taxonomy. It should come as no surprise that songs with older keyword tags are less being found. The choice to a slimmer taxonomy can elevate searchability and overall search experience of catalogs.

This raises the question of whether all tags are necessary and meaningful or not. To test this, our Cyanite system can detect tags that are equal in meaning by scanning through your keyword tagging. Then it consolidates redundancies condensing a taxonomy to only meaningful disjunct keyword classes.

Use Case 4: Open search

If you rely on customers handing in sync briefings and then search your catalog yourself, your business will lack scalability. So you might want to open up your catalog search to every potential client. For this you want to make sure, that you deliver the right music to every music search and every individual understanding of music – you need to speak the language of every of your customers.

To achieve this, our Cyanite Keyword system can translate a vast amount of keywords into semantically related tags. This means that if you only tag the keyword „euphoric” for very upbeat, outgoing and happy songs, but the client wants to search for „enthusiastic”, our Cyanite Keyword system understands and will present the suitable songs out of your catalog. This is important for keyword that were tagged significantly less in your catalog to be able to show a good variety of music.

Use Case 5: Automatic tagging in your own catalog language.

Let’s say your clients and customers got used to your specific keyword tagging – your catalog language. It means that your catalog language is an integral part of the stickiness of your platform and will lead customers to retain to your service. If you introduce automatic tagging through deep learning systems such as the Cyanite Tagging system, you want to keep the automatic tags in your catalog language so that your customers keep on finding the right music.

To achieve this, our Cyanite Keyword system and the Cyanite Tagging system work together on translating our auto-tags into your catalog language. Your customers won’t even notice that you switched to AI-tagging.

How to get started!

If the approach of Cyanite’s Keyword Cleaning resonates with you, the first step is to have a look into your metadata. For that, please reach out to Together, we will dive into your tagging scheme and assess the possibility of a Keyword Cleaning project.