Can Meta’s audio aesthetic model actually rate the quality of music?

Can Meta’s audio aesthetic model actually rate the quality of music?

Last year, Meta released Audiobox Aesthetics (AES), a research model that proposes scoring audio based on how people would rate it. The model outputs four scores: Production Quality (PQ), Production Complexity (PC), Content Enjoyment (CE), and Content Usefulness (CU). 

The study suggests that audio aesthetics can be broken into these axes, and that a reference-free model can predict these scores directly from audio. If that holds, the scores could start informing decisions and become signals people lean on when judging music at scale.

I took a closer look to understand how the model frames aesthetic judgment and what this means in practice. I ran Audiobox Aesthetics myself and examined how its scores behave with real music.

What Meta’s Audiobox Aesthetics paper claims

Before jumping into my evaluation, let’s take a closer look at what Meta’s Audiobox Aesthetics paper set out to do.

The paper introduces a research model intended to automate how audio is evaluated when no reference version exists. The authors present this as a way to automate listening judgments. They describe human evaluations as costly and inconsistent, leading them to seek an automated alternative.

To address this need, the authors propose breaking audio evaluation into four separate axes and predicting a separate score for each:

  • Production Quality (PQ) looks at technical execution, focusing on clarity and fidelity, dynamics, frequency balance, and spatialization.
  • Production Complexity (PC) reflects how many sound elements are present in the audio.
  • Content Enjoyment (CE) reflects how much listeners enjoy the audio, including their perception of artistic skill and overall listening experience.
  • Content Usefulness (CU) considers whether the audio feels usable for creating content.

The model is trained using ratings from human listeners who follow the same guidelines across speech, music, and sound effects. It analyzes audio in short segments of around 10 seconds. For longer tracks, the model scores each segment independently and provides an average. 

Beyond the audio itself, the model has no additional context. It does not know how a track is meant to be used or how it relates to other music. According to the paper, the scores tend to align with human ratings and could help sort audio when it’s not possible to listen to it all. In that way, the model is presented as a proxy for listener judgment.

Why I decided to evaluate the model

I wasn’t the only one who was curious to look into this model. Jeffrey Anthony’s “Can AI Measure Beauty? A Deep Dive into Meta’s Audio Aesthetics Model,” for instance, offers a deep, philosophical examination of what it means to quantify aesthetic judgment, including questions of ontology and judgment. I decided to tackle the topic even more with a hands-on approach, testing the model on some real-world examples to understand whether we could find some interesting patterns in the model’s predictions. 

What caught my attention most was how these scores are meant to be used. Once aesthetic judgments are turned into numbers, they start to feel reliable. They look like something you can sort by, filter on, or use to decide what gets heard and what gets ignored.

This matters in music workflows. Scores like these could influence how catalogs are cleaned up, how tracks are ranked for sync, and how large libraries of music are evaluated without listening. With a skeptical but open mindset, I set out to discover how these scores behave with real-world data.

 

What I found when testing the model

A) Individual-track sanity checks

I began with a qualitative sanity check using individual songs whose perceptual differences are unambiguous to human listeners. The tracks I selected represent distinct production conditions, stylistic intentions, and levels of artistic ambition.

I included four songs:

The motivation for this test was straightforward. A model claiming to predict Production Quality should assign a lower PQ to “Funky Town” (low-quality MP3) than to “Giorgio by Moroder.” A model claiming to estimate production or musical complexity should recognize “Blue Calx” by Aphex Twin as more complex than formulaic late-90s pop-trance such as DJ Visage’s “Schumacher Song.” Likewise, enjoyment and usefulness scores should not collapse across experimental electronic music, audiophile-grade disco-funk, old-school pop-trance, and degraded consumer audio.

You can see that the resulting scores, shown in the individual-track comparison plot above, contradict these expectations. “Funky Town” receives a PQ score only slightly lower than “Giorgio by Moroder,” indicating near insensitivity to codec degradation and mastering fidelity. Even more strikingly, “Blue Calx” is assigned the lowest Production Complexity among the four tracks, while “The Schumacher Song” and “Funky Town” receive higher PC scores. This directly inverts what most listeners would consider to be structural or compositional complexity.

Content Enjoyment is highest for “Funky Town” and lowest for “Blue Calx,” suggesting that the CE dimension aligns more closely with catchiness or familiarity than with artistic merit or aesthetic depth.

Taken together, these results indicate that AES is largely insensitive to audio fidelity. It fails to reflect musical or structural complexity, and instead appears to reward constant spectral activity and conventional pop characteristics. Even at the individual track level, the semantics of Production Quality and Production Complexity don’t match their labels.

B) Artist-level distribution analysis

Next, I tested whether AES produces distinct aesthetic profiles for artists with musical identities, production aesthetics, and historical contexts that are clearly different. I analyzed distributions of Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness for Johann Sebastian Bach, Skrillex, Dream Theater, The Clash, and Hans Zimmer.

If AES captures musically meaningful aesthetics, we would expect to see systematic separation between these artists. For example, Hans Zimmer and Dream Theater might have a higher complexity score than The Clash. Skrillex’s modern electronic productions might have a higher quality score than early punk recordings. Bach’s works might show high complexity but variable enjoyment or usefulness depending on the recording and interpretation.

Instead, the plotted distributions show strong overlap across artists for CE, CU, and PQ, with only minor shifts in means. Most scores cluster tightly within a narrow band between approximately 7 and 8, regardless of artist. PC exhibits slightly more variation, but still fails to form clear stylistic groupings. Bach, Skrillex, Dream Theater, and Hans Zimmer largely occupy overlapping regions, while The Clash is not consistently separate.

This suggests that AES doesn’t meaningfully encode artist-level aesthetic or production differences. Despite extreme stylistic diversity, the model assigns broadly similar aesthetic profiles, reinforcing the interpretation that AES functions as a coarse estimator of acceptability or pleasantness rather than a representation of musical aesthetics.

C) Bias analysis using a balanced gender-controlled dataset

Scoring models are designed to rank, filter, and curate songs in large music catalogs. If these models encode demographic-correlated priors, they can silently amplify existing biases at scale. To test this risk, I analyzed whether AES exhibits systematic differences between tracks with female lead vocals and tracks without female lead vocals.

In our 2025 ISMIR paper, we showed that common music embedding models pick up non-musical singer traits, such as gender and language, and exhibit significant bias as a result. Because AES is intended to judge quality, aesthetics, and usefulness, it would be particularly problematic if it had similar biases. They could directly influence which music is considered “better” or more desirable.

I constructed a balanced dataset using the same methodology used in our 2025 paper, equalizing genre distribution and singer language across groups.

For each group, I computed score distributions for Content Enjoyment, Content Usefulness, Production Complexity, and Production Quality, visualized them, and performed statistical testing using Welch’s t-test alongside Cohen’s d effect sizes. For context, Welch’s t-test is a statistical test that compares whether the average scores between two groups are significantly different. Cohen’s d is a measure of effect size that quantifies how large that difference is in standardized units.

The results show consistent upward shifts for female-led tracks in CE, CU, and PQ. All three differences are statistically significant with small-to-moderate effect sizes. In contrast, there is virtually no difference in Production Complexity score between groups.

This pattern indicates that the model systematically assigns higher enjoyment, usefulness, and quality scores to material with female vocals, even under controlled conditions. Because complexity remains unaffected, the effect doesn’t appear to stem from structural musical differences. Instead, it likely reflects correlations in training data and human annotations, or the model treating certain vocal timbres and production styles associated with female vocals as implicit quality indicators.

These findings suggest that AES encodes demographic-correlated aesthetic priors, which is problematic for a model intended to judge musical quality, aesthetics, and usefulness.

When a measure becomes a target, it ceases to be a good measure.

Charles Goodhart

Economist

Why this matters for the industry

Economist Charles Goodhart famously observed that “when a measure becomes a target, it ceases to be a good measure.” He was describing what happens when a metric starts to drive decisions rather than just being an indicator. Once a number is relied on, it begins to shape how people think and choose.

That idea applies directly to aesthetic scoring. A score, once it exists, carries weight. It gets used as a shortcut in decisions, even when its meaning is incomplete. This matters in music workflows because aesthetic judgment depends on context and purpose. 

When a simplified score is treated as reliable, systems can start favoring what scores well rather than what actually sounds better or serves a creative goal. Over time, that can quietly steer decisions away from how audio is perceived and used in practice.

How we approach audio intelligence at Cyanite

At Cyanite, music isn’t judged in a vacuum, and neither are the decisions built on top of it. That’s why we don’t rely on single aesthetic scores. Instead, we focus on making audio describable and searchable in ways that stay transparent and grounded in context.

Aesthetic scoring can give the illusion of precision, but it often lumps together different technical qualities, genres, and styles. In music search and discovery, a single score doesn’t explain why a track is surfaced or excluded. That reasoning matters to us. Not to decide what’s “good,” but to give teams tools they can understand and trust.

We see audio intelligence as a way to expose structure, not replace judgment. Our systems surface identifiable musical attributes and relationships, knowing that the same track can be the right or wrong fit depending on how it’s used. The goal is to support human decision-making, not substitute it with scores.

Experimentation has a place, but in music, automation works best when it’s explainable and limit-aware.

What responsible progress in music AI should look like

Progress in music and AI is underpinned by transparency. Teams should be able to understand how a model was trained and how its outputs relate to the audio. When results are interpretable, people can see why a track surfaces and judge for themselves whether the signal makes sense in their own context.

That transparency depends on data choices. Music spans styles, cultures, eras, and uses, and models reflect whatever they are fed. Developers need to work with broad, representative data and be clear about where coverage is thin. Being open about what a model sees, and what it does not, makes its behavior more predictable and its limits easier to manage.

Clear communication matters just as much once tools are in use. For scores and labels to be applied responsibly, teams need a shared understanding of what those signals reflect and where their limits are. Otherwise, even well-intentioned metrics can be stretched beyond what they are able to support.

This kind of openness helps the industry build tools people can understand and trust in real workflows. 

We explored how these expectations show up in practice in “The state of AI transparency in music 2025,” a report developed with MediaTracks and Marmoset on how music licensing professionals make decisions around AI, creator background, and context. You can read the full report here.

So… does Meta’s model provide meaningful ratings for music?

Based on these tests, the answer is no. The model produces stable scores, but they don’t map cleanly to how musical quality or complexity are assessed in real catalog work. Instead, the model appears to align more with easily detectable production traits than with the distinctions people consistently make when judging music in context.

That doesn’t make Audiobox Aesthetics insignificant. It can support research by defining a clear scoring framework, showing how reference-free predictors can be trained across speech, music, and sound, and making its models and data available for inspection and comparison. It also illustrates where AES scores can be useful, particularly when large volumes of audio need to be filtered or monitored but full listening is impractical.

Problems emerge when scores like these begin shaping decisions. When a score is presented as a measure of quality, people need to know what it’s actually measuring so they can judge whether it applies to their use case. Without that clarity, it becomes easy to trust the number even when it’s not a good fit.

At Cyanite, we see this as a reminder of the importance of responsibility in music and AI. Progress is driven by systems that stay grounded in real listening behavior and make their assumptions visible.

How Cyanite protects your sensitive audio: privacy-first workflows for every catalog

How Cyanite protects your sensitive audio: privacy-first workflows for every catalog

Looking for secure AI music analysis? Discover Cyanite’s integration options. 

For many music teams, a significant hesitation about AI analysis is not about its capability or quality. It’s about trust. When teams explore AI-driven tagging or search, the conversation almost always leads to the same question: What happens to our audio once it leaves our system?

At Cyanite, we’ve built our technology around that concern from the very beginning. Rather than offering a single security promise, we provide multiple privacy-first workflows designed to meet different levels of sensitivity and compliance. This gives teams the flexibility to choose how their audio is handled, without compromising on tagging quality or metadata depth.

This article outlines the three privacy models Cyanite offers, explains how each one works in practice, and helps you decide which setup best fits your catalog and internal requirements.

Why audio privacy matters in modern music workflows

For those who manage it, audio represents creative identity, contractual responsibility, and, often, years of human effort. It’s not just another data type. Sending that material outside an organization can feel risky, even when the technical safeguards are strong and the operational benefits are clear.

Teams that evaluate our services often raise concerns about protecting unreleased material, complying with licensing agreements, and maintaining long-term control over how their catalogs are used. They look for assurances around:

  • Safeguarding confidential or unreleased content
  • Complying with NDAs and contractual obligations
  • Meeting internal legal or security standards
  • Maintaining full ownership and control

These are not edge cases. They reflect everyday realities for publishers, film studios, broadcasters, and music-tech platforms alike. That’s why Cyanite treats privacy as a core design principle.

Security option 1: GDPR-compliant processing on secure EU servers

For many organizations, strong data protection combined with minimal operational complexity is the right balance. In Cyanite’s standard setup, all audio is processed on secure servers located in the EU and handled in full compliance with GDPR.

In practical terms, this means:

  • Audio files are never shared with third parties.
  • Songs can be deleted anytime.
  • Ownership and control of the music always remains with the customer.

This model works well for publishers, production libraries, sync platforms, and music-tech companies that want to scale tagging and search workflows without maintaining their own infrastructure. For most catalogs, this level of protection is both robust and sufficient.

That said, not every organization is able to send audio outside its own environment, even under GDPR. For those cases, Cyanite offers additional options.

Learn more: See how AI music tagging works in Cyanite and how it supports large catalogs.

Security option 2: zero-audio pipeline—tagging without transferring audio

Some teams manage catalogs that cannot be transferred externally at all. These include confidential film productions, enterprise music departments, and archives operating under strict internal compliance rules. For these situations, Cyanite provides a spectrogram-based workflow that enables full tagging without the audio files ever being sent.

Three spectograms

Spectrograms from left to right: Christina Aguilera, Fleetwood Mac, Pantera

Instead of uploading MP3s, audio is converted locally on the client side into spectrograms using a small Docker container provided by Cyanite. A spectrogram is a visual representation of frequency patterns over time. It contains no playable audio, cannot be converted back into a waveform without significant quality loss, and does not expose the original performance in any usable form.

From a metadata perspective, the results are identical to audio-based processing. From a privacy perspective, the original audio never leaves the customer’s environment. This makes the zero-audio pipeline a strong middle ground for teams that want AI-powered tagging while maintaining strict control over their content.

From a product perspective, all Cyanite features can be fully leveraged.

For us at Synchtank, the spectrogram-based upload was key. Many of our clients are cautious about where their audio goes, and this approach lets us use high-quality AI tagging and search without transferring any copyrighted audio. That balance, confidence for our customers without compromising on quality, is what made the difference for us.Amy Hegarty, CEO at Synchtank 

Learn more: What are spectrograms, and how can they be applied to music?

Security option 3: fully on-premise deployment via the Cyanite Audio Analyzer on the AWS Marketplace

For organizations with the highest security and compliance requirements, Cyanite also offers a pseudo-on-premises deployment option via the AWS Marketplace. In this setup, Cyanite’s tagging engine runs entirely inside the customer’s own AWS cloud infrastructure via the Cyanite Audio Analyzer.

This approach provides:

  • Complete pseudo-on-premise processing
  • Zero data transfer outside your AWS cloud environment
  • Full control over storage, access, and compliance
  • Tagging accuracy identical to cloud-based workflows

This option is typically chosen by film studios, broadcasters, public institutions, and organizations working with unreleased or highly sensitive material that must pass strict internal or external audits.

Because the pseudo-on-premise container operates in complete isolation (no internet connection), search-based features—including Similarity Search, Free Text Search, and Advanced Search—are not available in this setup. In pseudo-on-premise environments, Cyanite therefore focuses exclusively on audio tagging and metadata generation.

Important note: The rates on the AWS Marketplace are intentionally high to deter fraudulent activity. Please contact us for our enterprise rates and find the best plan for your needs.

Choosing the right privacy model for your catalog

Selecting the right setup depends less on catalog size and more on how tightly you need to control where your audio lives. A useful way to frame the decision is to consider how much data movement your internal policies allow.

In practice, teams tend to choose based on the following considerations:

  • GDPR cloud processing works well when secure external processing is acceptable.
  • Zero-audio pipelines suit teams that cannot transfer audio but can share abstract representations.
  • Pseudo-on-premise deployment is best for environments requiring complete isolation.

All three options deliver the same tagging depth, consistency, and accuracy. The difference lies entirely in how data moves, or doesn’t move, between systems.

Final thoughts

Using AI with music requires trust—trust that audio is handled responsibly, that ownership is respected, and that workflows adapt to real-world constraints rather than forcing compromises. Cyanite’s privacy-first architecture is designed to uphold that trust, whether you prefer cloud-based processing, a zero-audio pipeline, or a fully isolated pseudo-on-premise deployment.

If you’d like to explore which setup best fits your catalog, workflow, and compliance needs, you can review the available integration options.

FAQs

Q: Where is my audio processed when using Cyanite’s cloud setup?

A: In the standard setup, audio is processed on secure servers located in the EU and handled in full compliance with GDPR. Audio is not shared with third parties and remains your property at all times.

Q: Can I use Cyanite without sending audio files at all?

A: Yes. With the zero-audio pipeline, you convert audio locally into spectrograms and send only those abstract frequency representations to Cyanite. The original audio never leaves your environment, while full tagging results are still generated.

Q: What is the difference between the zero-audio pipeline and pseudi-on-premise deployment?

A: The zero-audio pipeline sends spectrograms to Cyanite’s cloud for analysis. The pseudo-on-premise deployment runs the Cyanite Audio Analyzer entirely inside your own AWS cloud infrastructure, which is cut off from the internet and only connected to your system. Pseudo-on-premises offers maximum isolation but only supports tagging, without search features.

Q: Are Similarity Search and Free Text Search available in all privacy setups?

A: Similarity Search, Free Text Search, and Advanced Search are available in cloud-based and zero-audio pipeline workflows. In fully pseudo-on-premise deployments, Cyanite focuses exclusively on tagging and metadata generation due to the isolated environment.

Q: Which privacy option is right for my catalog?

A: That depends on your internal security, legal, and compliance requirements. Teams with standard protection needs often use GDPR cloud processing. Those with higher sensitivity choose the zero-audio pipeline. Organizations requiring full isolation opt for on-premise deployment. Cyanite supports all three.

Why AI labels and metadata now matter in licensing

Why AI labels and metadata now matter in licensing

A new industry report from Cyanite, MediaTracks, and Marmoset reveals how professionals are navigating the rise of AI-generated music. Read here.

AI’s move to the mainstream has changed what people expect from music catalogs. Licensing teams now look for clearer data about the music they review. They want to know whether it’s human-made or AI-generated, and they also look for details that help place the music in the right creative or cultural setting. Many check these cues first, then move on to mood or tone.

At Cyanite, we partnered with MediaTracks and Marmoset to understand the level of transparency and cultural context music licensing professionals expect when reviewing AI-generated music. MediaTracks and Marmoset surveyed 144 people across their professional communities—including music supervisors, filmmakers, advertisers, and producers—and we worked with them to interpret the findings and publish this report.

The responses revealed that most people want clear labeling when AI is involved. Yet, despite this shared desire for transparency, only about half of the respondents said they would only work with human-made music.

The full study goes deeper into these findings and shows how they play out in real licensing work.

Why we ran this study

We wanted a clear view of how people make decisions when AI enters the picture. The conversation around AI in music moves fast, and many teams now ask for context that helps them explain their selections to clients. This study aimed to find out which parts of the metadata give them that confidence.

It also looked at how origin details and creator context guide searches and reviews. We wanted to see where metadata supports the day-to-day licensing process and where there are gaps.

Transparency is now a baseline expectation

97% of respondents said they want AI-generated music to be clearly labeled, and 37% used the word “transparency” in their written responses. They want a straightforward read on what they’re listening to. Some tied this to copyright worries. One person put it simply: 

“I’m concerned that if it were AI-generated, where did the AI take the themes or phrases from? Possible copyright infringement issues.”

Transparency doesn’t just apply to the AI label. We found that respondents also see context as part of that clarity—knowing who made the music and where it comes from. This information helps them assess whether the music is a good fit for the project. They use it during searches to filter for cultural background or anything else that’s relevant to the brief.

What these findings mean for the industry

These findings show how much clarity now shapes day-to-day work in music catalogs. People expect AI music to be labeled accordingly, and they lean on context to move through searches and briefs without second-guessing their choices. Human-made music is still highly valued. The real change has been in how teams use origin details to feel sure about their selection.

This sets a new bar for how catalogs present their music. Teams want dependable information, including context that helps them avoid missteps in projects that depend on cultural accuracy or narrative alignment.

This finding ties into how Cyanite supports catalogs today. Our audio-first analysis gives people a clear read of the music itself, which sits alongside the cultural or creative context they already rely on. It helps teams search with more clarity and meet the expectations that are now shaping the industry.

How Cyanite’s advanced search fits in

The study showed how important cultural background and creator context are when people review music. Teams often keep their own notes and metadata for this reason. Cyanite’s Advanced Search supports that need by letting catalogs add and use their own custom information in the search.

Custom Metadata Upload – one of many features of our new Advanced Search, lets you upload your own tags – such as cultural or contextual details that don’t come from the audio analysis – and use them as filters. You can set your own metadata criteria first, and the system will search only within the tracks that match those inputs.

When you then run a Similarity- or Free Text Search, the model evaluates musical similarity inside that filtered subset. As a result, search and discovery reflects both the sound of a track and the context around it.

You can search your catalog for “upbeat indie rock” but you can also search for “upbeat indie rock, human-produced, female-led, one-stop cleared, independent.

Read the full report

The survey drew responses from people who license music often as part of their work and feel the impact of unclear metadata. Their answers show how they think about AI involvement, creator background, and the context they need when they search.

The full report brings these findings together with information about the study—who took part, how often they search, the questions they answered, and how responses differed by role. It also includes partner insights from MediaTracks and Marmoset, along with charts and quotes that show how transparency and context shape real choices in licensing.

You can read the full study here.

Guest post for Hypebot: how AI can generate new revenue for existing music catalogs

Guest post for Hypebot: how AI can generate new revenue for existing music catalogs

Our CEO Markus Schwarzer has published a guest post on UK-based music industry medium Hypebot.

In this guest post, our CEO Markus elaborates on how AI can be used to resurface, reuse, and monetize long-forgotten music, addressing concerns about its impact on the music industry. By leveraging AI-driven curation and tagging capabilities, music catalog owners can extract greater value from their collections, enabling faster search, diverse curation, and the discovery of hidden music, while still protecting artists and intellectual property rights.

You can read the full guest post below or head over to Hypebot via this link.


by Markus Schwarzer, CEO of Cyanite

AI-induced anxiety is ever-growing.

Whether it’s the fear that machines will evolve capabilities beyond their coders’ control, or the more surreal case of a chatbot urging a journalist to leave his wife, paranoia that artificial intelligence is getting too big for its boots is building. One oft-cited concern, voiced in an open letter from a group of AI-experts and researchers calling themselves the Future of Life Institute calling for a pause in AI development, is whether, alongside mundane donkeywork, we risk automating more creative human endeavors.

It’s a question being raised in recording studios and music label boardrooms. Will AI begin replacing flesh and blood artists, generating music at the touch of a button?

While some may discount these anxieties as irrational and accuse AI skeptics of being dinosaurs who are failing to embrace the modern world, the current developments must be taken seriously.

AI poses a potential threat to the livelihood of artists and in the absence of new copyright laws that specifically deal with the new technology, the music industry will need to find ways to protect its artists.

We all remember when AI versions of songs by The Weeknd and Drake hit streaming services and went viral. Their presence on streaming services was short-lived but it’s a very real example of how AI can potentially destabilise the livelihood of artists. Universal Music Group quickly put out a statement asking the music industry “which side of history all stakeholders in the music ecosystem want to be on: the side of artists, fans and human creative expression, or on the side of deep fakes, fraud and denying artists their due compensation.

“there are vast archives of music of all genres lying dormant and thousands of forgotten tracks”

However, there are ways that AI can deliver real value to the industry – and specifically to the owners of large music catalogues. Catalogue owners often struggle with how to extract the maximum value out of the human-created music they’ve already got.

But we can learn from genAI approaches. Recently introduced by AI systems like Midjourney, ChatGPT or Riffusion, prompt-based search experiences are prone to creep into anyone’s user behavior. But instead of having to fall back to bleak replicas of human-created images, texts, or music, AI engines can give music catalogue owners the power to build comparable search experience with the advantage of surfacing well-crafted and sounding songs with a real human and a real story behind it.

There are vast archives of music of all genres lying dormant, and thousands of forgotten tracks within existing collections, that could be generating revenue via licensing deals for film, TV, advertising, trailers, social media clips and video games; from licences for sampling; or even as a USP for investors looking to purchase unique collections. It’s not a coincidence that litigation over plagiarism is skyrocketing. With hundreds of millions of songs around, there is a growing likelihood that the perfect song for any use case already exists and just needs to be found rather than mass-generated by AI.

With this in mind, the real value of AI to music custodians lies in its search and curation capabilities, which enable them to find new and diverse ways for the music in their catalogues to work harder for them.

How AI music curation and AI tagging work

To realize the power of artificial intelligence to extract value from music catalogues, you need to understand how AI-driven curation works.

Simply put, AI can do most things a human archivist can do,but much, much faster; processing vast volumes of content, and tagging, retagging, searching, cross-referencing and generating recommendations in near real-time. It can surface the perfect track – the one you’d forgotten, didn’t know you had, or would never have considered for the task in hand – in seconds.

This is because AI is really good at auto-tagging, a job few humans relish. It can categorise entire music libraries by likely search terms, tagging each recording by artist and title, and also by genre, mood, tempo and language. As well as taking on a time-consuming task, AI removes the subjectivity of a human tagger, while still being able to identify the sentiment in the music and make complex links between similar tracks. AI tagging is not only consistent and objective (it has no preference for indie over industrial house), it also offers the flexibility to retag as often as needed.

The result is that, no matter how dusty and impenetrable a back catalogue, all its content becomes accessible for search and discovery. AI has massively improved both identification and recommendation for music catalogues. It can surface a single song using semantic search, which identifies the meaning of the lyrics. Or it can pick out particular elements in the complexities of music in your library which make it sound similar to another composition (one that you don’t own the rights to, for example). This allows AI to use reference songs to search through catalogues for comparable tracks.

The power of AI music catalog search

The value of AI to slice and dice back catalogs in these ways is considerable for companies that produce and licence audio for TV, film, radio and multimedia projects. The ability to intelligently search their archives at high speed means they can deliver exactly the right recording to any given movie scene or gaming sequence.

Highly customisable playlists culled from a much larger catalogue are another benefit of AI-assisted search. While its primary function is to allow streaming services such as Spotify to deliver ‘you’ll like this’ playlists to users, for catalogue owners it means extracting infinitely refinable sub-sets of music which can demonstrate the archive’s range and offer a sonic smorgasbord to potential clients.

“the extraction of ‘hidden’ music”

Another major value-add is the extraction of ‘hidden’ music. The ability of AI to make connections based on sentiment and even lyrical hooks and musical licks, as well as tempo, instruments and era, allows it to match the right music to any project with speed and precision only the most dedicated catalogue curator could fulfil. With its capacity to search vast volumes of content, AI opens the entirety of a given library to every search, and surfaces obscure recordings. Rather than just making money from their most popular tracks, therefore, the owners of music archives can make all of their collection work for them.

The tools to do all of this already exist. Our own solution is a powerful AI engine that tags and searches an entire catalogue in minutes with depth and accuracy. Meanwhile, AudioRanger is an audio recognition AI which identifies the ownership metadata of commercially released songs in music libraries. And PlusMusic is an AI that makes musical pieces adaptive for in-game experiences. As the gaming situation changes, the same song will then adapt to it.

Generative AI – time for careful reflection

The debate on the role of generative AI in the music industry won’t be solved anytime soon and it shouldn’t. We should reflect carefully on the incorporation of any technology that might potentially reshape our industry. We should ask questions such as: how do we protect artists? How do we use the promise of generative AI to enhance human art? What are the legal and ethical challenges that this technology poses? All of these issues must be addressed in order for the industry to reap the benefits of generative AI.

Adam Taylor, President and CEO of the American production music company APM Music, shared with me that he believes it is vital to safeguard intellectual property rights, including copyright, as generative AI technologies grow across the world. As he puts it: “While we are great believers in the power of technology and use it throughout our enterprise, we believe that all technology should be used in responsible ways that are human-centric. Just as it has been throughout human history, we believe that our collective futures are intrinsically tied to and dependent on retaining the centrality of human-centered art and creativity.

The debate around the role of generative AI models will continue to play out as we look for ways to embrace new technologies and protect artists, and naturally there are those like Adam who will wish to adopt a cautious approach. But while there are many who are reluctant to wholeheartedly embrace generative AI models, andthere are many more who are willing to embrace analysis and search AI to protect their catalogues and make them more efficient and searchable.

Ultimately, it’s down to the industry to take control of this issue, find a workable level of comfort with AI capabilities, and build AI-enhanced music environments that will vastly improve the searchability – and therefore usefulness – of existing, human-generated music.

If you want to get more updates from Markus’ view on the music industry, you can connect with him on LinkedIn here.

 

More Cyanite content on AI and music

Debating the upsides of Universal Music Group’s recent AI attack (guest post on Music Ally)

Debating the upsides of Universal Music Group’s recent AI attack (guest post on Music Ally)

Our CEO Markus Schwarzer has published a guest post on UK-based music industry medium Music Ally. In the post, Markus addresses the concern that major labels and other large music companies have shown recently about the use of Artificial Intelligence in music and business – and the importance of stepping back and thinking carefully about as-yet unknown repercussions, before moving into a future where AI benefits us all.

You can read the full guest post below or head over to Music Ally via this link.

In recent months, Universal Music Group has become the ringleader of a front that has formed against generative music AI companies – and latterly all AI companies.

After news made the rounds of UMG’s recent actions, people everywhere (including myself) spoke out about the positives of AI. AI has the potential to improve art, create a better environment for DIY artists, and foster new musical ecosystems. However, whilst the industry was debating the prosperous future of music fuelled by AI, with leveled playing fields, democratised accesses, and transparency, we forgot one thing. All of these positive outcomes might be true in the future, but the current reality of generative AI is different.

Currently, it is an uncontrolled wild west where new models have shown that they’re not just some game for the tech-interested individuals among us, but an actual threat to the livelihoods of artists.

Reading through and experimenting with recent generative music AI advancements, I can’t help but feel reminded of Pause Giant AI Experiments: An Open Letter, which was directed at developers of large language models (LLMs) like Open•AI’s GPT-4 or Meta’s LLaMA. It urged them to halt their developments and think about the implications of their projects for at least six months.

The open letter made some requests which are equally applicable to the music industry. Just like LLMs, some generative music startups see themselves “locked in an out-of-control race to develop and deploy ever more powerful digital minds”. Just like LLMs we may run into the risk that “no one – not even their creators – can understand, predict, or reliably control” them. Just like LLMs, we need to ask ourselves “Should we automate away all the jobs, including the fulfilling ones?”

The latter is a question that we at Cyanite and other AI companies also have to ask ourselves frequently. Do we automate meaningful jobs, or just tedious unloved chores to free up time for creative work?

But unlike LLMs, the music industry has copyright law to enforce the temporary halt of new training models (at least in those areas where it is enforceable). So what if the UMG-attempted halt of new generative AI training allows us to take a step back and try to get an objective perspective on recent developments? This is something that is not possible with LLMs, because training data is so much more accessible and less controllable. Which is the reason people have to write open letters in the first place – a strategy which has somewhat questionable expectations of success.

Many in the industry have criticised UMG’s approach as a general barrage of fire launched at any company working with AI, in the hope of hitting some of their targets; one that will ultimately also harm companies working on products beneficial for the industry, while also eventually forcing advancements in the generative space into the uncontrollable underground.

Despite this being undoubtedly true, we can’t deny that it has sparked a very important debate on whether we need to slow down the acceleration of AI. I would argue that if UMG’s actions will let us pause AI for a second, take a deep breath, imagine the future of music AI and then start developing towards exactly that goal, their actions would have a hugely positive effect.

If you want to get more updates from Markus’ view on the music industry, you can connect with him on LinkedIn here.

AI panel: using AI music search in a co-creative approach between human and machine

AI panel: using AI music search in a co-creative approach between human and machine

In September 2022, Cyanite co-founder, Markus took part in a panel discussion at Production Music Conference 2022 in Los Angeles.

The panel topic was to discuss the role of AI in a co-creative approach between humans and machines. The panel participants included Bruce Anderson (APM Music), Markus Schwarzer (Cyanite), Nick Venti (PlusMusic), Philippe Guillaud (MatchTune), and Einar M. Helde (AIMS API)

The panel raised pressing discussion points on the future of AI so we decided to publish our takeaways here. To watch the full video of the panel, scroll down to the middle of the article. Enjoy the read! 

Human-Machine Co-creativity

AI performs many tasks that are usually difficult for people, such as analyzing song data, extracting information, searching music, and creating completely new tracks. As AI usage increases, questions of AI’s potential and its ability to create with humans or create on their own have been raised. The possibility of AI replacing humans is, perhaps, one of the most contradicting topics. 

The PMC 2022 panel focused on the topic of co-creativity. Some AI can create on their own, but co-creativity represents creativity between the human and the machine.

So it is not the sum of individual creativity, rather it is the emergence of various new forms of interactions between humans and machines. To find out all the different ways AI music search can be co-creative, let’s dive into the main takeaways from the panel:

Music industry challenges

The main music industry challenge that all participants agreed on was the overwhelming amount of music produced these days. Another challenge is reaching a shared understanding of music.

The way someone searches for music depends on their understanding of music which can widely differ and their role in the music industry. Music supervisors, for example, use a different language to search for music than film producers.

We talked about it in detail at Synchtank blog back in May 2022. AI can solve these issues, especially with the new developments in the field.

Audience Question from Adam Taylor, APM Music: Where do we see AI going in the next 5 years?

So what’s in store for music AI in the next 5 years? We’re entering a post-tagging era marked by the combination of developments in music search. Keyword search will no longer be the main way to search for or index music. Instead, the following developments will take place: 

 

  • Similarity Search has shown that we can use complex inputs to find music. Similarity search pulls a list of songs that match a reference track. It is projected to be the primary way of searching for music in the future. 

 

  • Free Searches – Search in full-text that allows searching for music in your own words based on natural language processing technologies. With a Free Search, you enter what comes to mind into a search bar and the AI suggests a song. This is a technology similar to DALL-E or Midjourney that returns an image based on text input.

 

  • Music service that already knows what to do – in a further perspective, music services will emerge that recommend music depending on where you are in your role or personal development. These services will cater to all levels of search: from an amateur level that simply gives you a requested song to expert searches following an elaborate sync brief including images and videos that accompany the brief or even a stream of consciousness.

Audience Question from Alan Lazar, Luminary Scores: Can I decode which songs have the potential to be a hit?

While some AI companies attempted to decode the hit potential of music, it is still unclear if there is any way to determine whether the song becomes a hit.

The nature of pop culture and the many factors that compile a hit from songwriting to production and elusive factors such as what is the song connected make it impossible to predict whether or not a song becomes a hit. 

The vision for AI from Cyanite – where would we like to see it in the future?

AI curation in music is developing at a lightning speed. We’re hoping that it will make music space more exciting and diverse, which includes in particular: 

 

  • Democratization and diversity of the field – more opportunities will become available for musicians and creators, including democratized access to sync opportunities and other ways to make a livelihood from music. 

 

  • Creativity and surprising experiences – right now AI is designed to do the same tasks at a rapid speed. We’re hoping AI will be able to perform tasks co-creatively and produce surprising experiences based on music but also other factors. As music has the ability to touch directly into people’s emotions, it has the potential to be a part of a greater narrative.

You are currently viewing a placeholder content from Default. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.

More Information

Video from the PMC 2022 panel: Using AI Music Search In A Co-Creative Approach Between Human and Machine

Bonus takeaway: Co-creativity between users and tech – supplying music data to technology

It seems that we should be able to pull all sorts of music data from the environments such as video games and user-generated content. However, the diversity of music projects is quite astonishing.

So when it comes to co-creativity from the side of enhancement of machine tagging with human tagging, personalization can be harmful to B2B. In B2B, AI mainly works with audio features without the involvement of user-generated data.

Conclusion

To sum up, AI can co-create with humans and solve the challenges facing the music industry today. There is a lot in store for AI’s future development and there is a lot of potential.

Still, AI is far away from replacing humans and should not replace them completely. Instead, it will improve in ways that will make music searches more intuitive and co-creative responding to human input in the form of a text search, image, or video. 

As usual with AI, some people overestimate what it can do. Some tasks such as identifying music’s hit potential remain unthinkable for AI.

On the other hand, it’s not hard to envision the future where AI can help democratize access to opportunities for musicians and produce surprising projects where music will be a part of a shared emotional experience.

We hope you enjoyed this read and learned more about AI co-creativity and the future of AI music search. If you’re interested to learn more, you can also check out the article “The 4 Applications of AI in the Music Industry”. If you have any feedback, questions, or contributions, please reach out to markus@cyanite.ai.

I want to integrate AI search into my library – how can I get started?

Please contact us with any questions about our Cyanite AI via mail@cyanite.ai. You can also directly book a web session with Cyanite co-founder Markus here.

If you want to get the first grip on Cyanite’s technology, you can also register for our free web app to analyze music and try similarity searches without any coding needed.