We lately introduced Cyanite’s new Keyword Cleaning system that detects inconsistencies in tagging in any music catalog. The purpose of the system is to have cleaner, more consistent, and most importantly, better searchable music libraries.
Today we are pleased to show our Keyword Cleaning in action with Universal Production Music Germany. In this article, you’ll learn how to clean up a music library based on the UPM example. You’ll get an understanding of UPM’s specific challenges with music tagging, how we identified over 16,000 tagging mistakes in their catalog, and provided UPM with a deeper and visualized knowledge of their catalog’s tagging language.
Universal Production Music – Current State
The UPM example shows how keyword cleaning can significantly improve the music catalog. In the preparation phase, we analyzed the UPM’s catalog language in its current state, which included two aspects: the taxonomy of the catalog (the keywords used) and the use of keywords (the UPM’s understanding of those keywords).
Firstly, the music in the catalog is tagged following a multi-label scheme, meaning that several tags per class are assigned to one song. Secondly, UPM integrates a variety of different music libraries, therefore it is under a constant threat of incompatibility of different tagging schemes. And finally, the UPM catalog is mainly searched by clients rather than by the Universal staff which means that the keyword tagging must be at maximum consistency and reliability.
Moreover, Universal sorts their keywords into main categories such as ‘happy’ and ‘energetic’, from where they branch out into sub-categories such as ‘happy – playful’ or ‘happy – optimistic’.
First Phase – Taxonomy cleaning
To start, we analyzed the entire keyword tagging system. We calculated the degree of co-occurrence of tags within single songs. Based on this analysis, the Cyanite AI system calculated keyword similarity (a quantitative measure of semantic relatedness of keywords) within the UPM taxonomy.
Our system clustered the existing keywords within the UPM taxonomy based on their semantic relations. The result can be presented in a two-dimensional visualization, displaying the semantic structure of the UPM catalog’s language. You can see an excerpt of the cluster visualization in figure 1.
Figure 1. An excerpt of the 2D visualization of the catalog language
In the visualization, the main categories are divided by color. For example, ‘happy’ is marked black, ‘beauty’ is green and ‘motion’ is blue.
As expected, the results show that tags from the same main category tend to cluster together meaning that they are semantically connected.
Some, however, spread out over the entire plot. For example, the main category ’happy’ shows a wide distribution (figure 2). Its sub-categories are sometimes more closely affiliated with other main categories, such as ‘beauty’, than with its own main cluster which is located in the lower-left section of the visualization in figure 1. Cyanite uncovered unclear assignments of sub-categories to main categories. Furthermore, the system detected several keywords that are very closely related in meaning. This is a sign of redundancy between tags.
Based on this first analysis step, Cyanite proposes to re-assign keywords to more suitable main categories or delete/integrate keywords with the same meaning by looking at the location of the keywords and their relative proximity to one another.
For example, the two keywords ‘happy positive – warm’ and ‘happy positive – heartwarming’ can be integrated into one keyword and then re-assigned to the main category ‘beauty’. Those two tags are nearly overlapping and clustered more with tags in the ‘beauty’ family than with their original ‘happy positive’ family (figure 2).
Figure 2. A close-up of the ‘beauty’ cluster
Furthermore, this visualization can help when onboarding new tagging staff who have to learn the catalog language. It can also guide the music searchers to their desired songs more quickly, or help the staff when they are unclear about how to tag a song.
This groundwork is needed to integrate third-party catalogs with a totally different catalog language. The “new” music can be automatically re-tagged with UPM keywords with the lowest rate of information loss. And vice versa, the UPM catalog can be easily integrated into other music libraries with their own respective tagging systems.
Second Phase – Keyword Cleaning
To find inconsistencies and conflicting keywords, we took the keyword similarity approach and analyzed the keyword tagging system of the UPM catalog. If a song is assigned several tags with high keyword similarity such as ‘happy’, ‘heartwarming’, ‘cheerful’, but it also features a tag with low keyword similarity to the rest of the tags, for example, ‘dark’, our system flags the song for a review. The system finds those odd keywords at scale. In Universal’s catalog, 16,000 tracks, main-versions and sub-versions, were identified as being oddly tagged.
In figure 3, you can see the entire keyword tagging of the mood class of one song. 6 moods were assigned, 5 of which show high keyword similarity. The tag ‘motion – driving’ showed very low keyword similarity and was thus flagged by the system. The first listening sample confirmed that ‘motion – driving’ was indeed a false tag.
To thoroughly test at scale if the tagging of an odd keyword combination is due to specific dynamics in the song (e.g. a melancholic beginning and an uplifting finish) or if it involves falsely assigned tags, in the next step we would apply our deep learning systems that can automatically detect moods, genres, and other features in music. This will be the subject of future research projects for Universal Production Music by Cyanite.
The discovery of 16,000 oddly tagged songs strongly suggests applying our deep learning algorithms. If the AI system finds the previously tagged moods in any segment of the song, this specific file gets ‘de-flagged’. In all other cases, the odd keyword tag gets deleted.
Figure 3. An example of wrongly tagged songs
Summary and Benefits
To sum it all up, the Cyanite Keyword Cleaning system allows UPM to realize the following benefits:
- Find and eliminate redundancies in the existing keyword taxonomy
- Find tagging mistakes i.e. falsely assigned tags, allowing for a cleaner and more searchable library
- Translate tagging from third-party catalogs or AI tagging into UPM’s catalog (or vice versa) without losing the native catalog’s language
- Sustainably automate the tagging process
- Set up a “Google-like” open text search in the future.
As you can see, there are many ways the Cyanite Keyword Cleaning system can be integrated into a music catalog to deeply understand semantic relations, improve tagging accuracy, and enhance music search experiences.
I want to clean up my catalog as well – how can I get started?
If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.