Why context matters in a crowded music catalog

Why context matters in a crowded music catalog

Add structured context to your discovery workflows with Cyanite’s Advanced Search.

Every week, thousands of new tracks enter music libraries. There’s no real limit to how many can be uploaded. As catalogs expand, it becomes harder to tell why one piece deserves attention over another.

At the same time, generative AI tools make it possible to produce a lot of music quickly and cheaply. This means catalogs can get flooded with “AI slop,” a term used to describe mass-produced generative content created for volume rather than quality.

Context is key to making the distinction between music and AI slop. Knowing who created a track, what shaped it, and why it exists roots it in human experience and creative intent.  Without that layer of insight, music becomes interchangeable audio, reduced to tags and search terms.

What makes music human?

The intention behind music and the social connection it creates are what make it human.

That humanity is visible in the decisions that shape a track. No matter how minimal or elaborate a composition is, every musical choice reflects human knowledge and experience. The key, rhythm, production, and instruments used are all guided by cultural exposure, emotional memory, and learned musical language.

And then there’s the risk. When someone releases music, they also release control over how it will be heard and judged. That exposure is vulnerable, and recognizing the risk and context behind a piece makes the connection to it stronger.

The AI limitation

When intention and personal stakes are missing, the difference is noticeable.

AI-generated music can sound like human-made tracks. It can replicate style, structure, and production detail with striking accuracy. In many contexts, it even meets professional standards.

However, it doesn’t come from lived experience and instead reconstructs patterns it has learned from existing music. There’s no vulnerability behind the track. There’s no social stake. And there’s no personal history shaping the decision to produce it. The output is coherent because the sequence fits statistically, not because something needed to be expressed.

Why context matters more than ever

With the sheer volume of modern catalogs, several tracks can sound interchangeable. You can work on a brief and find 10 pieces that would technically meet the requirements. 

What actually helps you choose is knowing where the music comes from and who made it. That extra layer of information changes how you hear it. In a space this crowded, context is what keeps everything from blending into the same background noise.

The potential of contextual metadata

If context gives music meaning, it needs to be structured as metadata so it can be searched and filtered at scale.

Custom tagging makes that possible. Catalogs can include fields for artist origin, geography, creative background, cultural context, and editorial positioning. When that information can be filtered, it starts shaping decisions. Context moves from description to action.

The same principle applies to one of the clearest distinctions in modern catalogs: whether a track is human-created or AI-generated. When that difference is structured as metadata, it becomes searchable inside existing discovery systems.

Melodie Music puts this into practice to spotlight original Australian artists. They combine Cyanite’s sound-based AI search with their own editorial and contextual metadata.

  • Cyanite analyzes the sound of a reference track and generates a shortlist based on emotional profile and sonic character.
  • Melodie layers contextual filters, such as artist origin, on top of those results.
  • Users refine further using editorial tags aligned with cultural or strategic priorities.
  • The final selection satisfies both the creative brief and the mandate to support specific artist communities.

What this means for music discovery

Algorithmic recommendations alone aren’t enough. Teams want clarity about origin, authorship, and AI involvement before committing to a track. 

In our joint study with MediaTracks and Marmoset, we found that contextual metadata plays a central role in how professionals work through briefs. Respondents described relying on origin details and creator background to avoid misalignment and explain their choices to clients. 

Clearly labeling AI involvement was part of that same expectation. Professionals are open to working with AI-generated music, but they want to know explicitly whether a track is AI-generated or human-made. Context, including transparency around authorship, informs decisions.

Read more: Why AI labels and metadata now matter in licensing

Cyanite’s Advanced Search, available via API integration, allows teams to upload their own custom metadata fields and use them for filtering. Fields can include artist origin, cultural background, clearance information, and editorial categories.

Search queries then run within that defined subset, so sound analysis operates inside contextual boundaries set by the catalog owner, as implemented by Melodie Music.

For platforms embedding Cyanite’s search algorithms into their own systems, this enables structured transparency at scale. Context becomes part of the discovery logic itself.

Choosing meaning over noise

We have always connected to music because it carries intention, experience, and emotion – not just sound. A song means something because it was created in a specific moment, for a reason, by someone responding to their world. Today, we are surrounded by more music than ever, inevitably making it harder to feel that connection. Delivering context to a song gives a glimpse into what went into it, and with it a chance to understand the people and feelings behind the music.

Even though AI-generated music can sound pleasant, it is fundamentally an imitation – a reconstruction of patterns it has seen before. It lacks intention, situation, risk, and personal stake.

That’s why context matters more than ever. Knowing why a piece of music exists, where it comes from, and what went into it is what turns sound into something meaningful.

Simon Timm

Music Producer, Cyanite Marketing Expert, Cyanite

Context can double as infrastructure in catalogs. As AI-generated music becomes easier to produce and distribute, what will separate human-made tracks from AI slop is whether a track’s origin is visible and understood. Catalogs that structure and surface contextual metadata can ensure music is selected based on where it comes from and why it exists, not just how it sounds.

Ready to add context to your discovery workflows?

FAQs

Q: How can contextual metadata help distinguish human-created music from AI-generated tracks?

A: Contextual metadata adds information beyond sound analysis, such as artist background, origin, editorial positioning, and authorship labeling. It can allow teams to filter catalogs based on transparency and creative intent, helping distinguish human-created music from generative content produced at scale.

 

Q: Does Cyanite detect whether music is AI-generated or human-made?

A: Cyanite is developing AI music detection capabilities designed to support transparent catalog workflows. Early implementations allow teams to label and filter tracks based on AI involvement, helping licensing professionals and curators make informed decisions during discovery.

Q: Can Cyanite’s Advanced Search filter music using custom metadata fields?

A: Yes. Advanced Search allows catalog owners to include their own metadata fields as filters within search queries. These filters narrow the searchable catalog before sound similarity or text-based matching is applied, helping teams surface results that fit their creative and business requirements.

Q: How can music platforms integrate contextual discovery into existing workflows?

A: Music catalogs can integrate Cyanite’s Advanced Search through the API, making it possible to combine sound analysis with custom metadata filters inside their existing workflows.

How to prompt: the guide to using Cyanite’s Free Text Search

How to prompt: the guide to using Cyanite’s Free Text Search

Ready to search your catalog in natural language? Try Free Text Search.

Do you have trouble translating your vision for music into precise keywords? If so, this guide on how to prompt using Cyanite’s Free Text Search is for you.

It’s a more natural way to search your music catalog and discover tracks. You can use complete sentences to describe soundscapes, film scenes, daily situations, activities, or environments. Prompts can be written in different languages and can include cultural references, so you’re not forced to reduce your idea to a fixed set of tags.

Before you explore what Free Text Search can do, keep in mind that prompt-based search works best when your input is specific. The clearer you are, the easier it is to find what you’re looking for. 

Read more: What is music prompt search?

Why music catalogs struggle with discovery

Most large catalogs contain inconsistent metadata. Many were built before modern tagging standards, then expanded over time through different workflows. New music arrives faster than metadata teams can standardize it, especially with the volume from UGC and AI-generated releases, while older tracks remain described in ways that don’t always support how music is searched for today.

Traditional search relies on tags and keyword logic. This approach can be effective for many searches, but it has limits when ideas are already highly specific, like with a detailed creative brief or a particular scene description. Translating concrete, nuanced needs into tags often loses critical details and context.

That’s where natural language search makes a difference. Instead of defining a specific vision in terms of available tags, you can describe what you need directly or even paste a brief into the search bar. The system interprets intent, mood, and context in ways that complement tag-based discovery.

This helps sync and licensing teams work faster with detailed requests, and gives catalog teams another tool to surface relevant music, especially from underused parts of the catalog.

Read more: How to use AI music search for your music catalog

How Free Text Search amplifies music discovery

Free Text Search lets you look for music in the way you would naturally describe it. Write detailed prompts in full sentences, and Cyanite’s AI interprets the meaning behind your words to match intent with how tracks actually sound in your catalog.

This type of search is designed for situations where intent doesn’t translate cleanly into keywords. Tag-based searches work well when attributes are fixed and clearly defined, and Similarity Search is useful when you already have a reference track and want to find music that sounds close to it. Teams often get good results when they search in their own words first, then move into other search modes to refine the selection.

How to use Free Text Search effectively

In real-life workflows, searches rarely begin from the same place. Sometimes you’ll start with sound, sometimes with a scene, and sometimes with context. 

Not every idea can be reduced to tags or tied to a specific track. Choosing music is a creative process, so the way people search is often creative too. Free Text Search meets users where they are, allowing them to describe intent in natural language and shape discovery around how they think. 

1. Describing sound

With Free Text Search, you can add context and even cultural references to your search, making it possible to find the perfect soundtrack for your project and get the most out of your music catalog. 

This approach is commonly used when responding to sync briefs that describe musical detail and tone.

Sound-focused prompts should name what musical elements are present, then add how those elements are played or arranged. An extra cue about character or attitude can be included when it helps clarify intent.

[Instruments or sound sources] + [how they are played or arranged] + [optional: character or stylistic cue]

  • “Trailer with sparse repetitive piano and dramatic drum hits with Star-Wars-style orchestra themes”
  • “Laid-back future bass with defiant female vocal”
  • “Staccato strings with a piano playing only single notes”
  • “Solo double bass played dramatically with a bow”

These prompts work because they are specific, but not rigid. That level of detail helps surface relevant tracks faster and reduces reliance on perfectly maintained tags, which is especially valuable in large or uneven catalogs.

Common mistakes to avoid

  • Staying too abstract: Words like “cinematic” or “emotional” on their own don’t give enough information to form a clear sound.
  • Listing elements without context: Naming instruments or genres without describing how they are played or arranged often leads to broad results.
  • Overloading the prompt: Packing too many ideas into one sentence can blur intent and pull results in different directions.
  • Writing like a tag list: Free Text Search works best when the prompt reads like a description, not a stack of keywords.

Read more: AI search tool for music publishing: best 3 ways

2. Describing film scenes

Film scenes can evoke a wide range of emotions and visuals. When using Free Text Search for this purpose, consider whether your prompt captures objective elements of the scene or your own interpretation of it.

Publishers often use scene-based prompts to explore deeper parts of their catalog and surface music suited to narrative use cases beyond obvious genre labels.

You can reference popular movies or shows like Pirates of the Caribbean or Stranger Things in your search prompts.

It helps to think like a director. Focus on the action or moment in the scene and what the viewer is experiencing. The clearer the image you describe, the easier it is for the search to interpret what kind of music belongs there, without needing a list of musical traits.

[Action or moment] + [optional: setting or situation] + [optional: stylistic cue]

  • “Riding a bike through Paris”
  • “Thriller score with Stranger-Things-style synths “
  • “Tailing the suspect through a Middle Eastern bazaar”
  • “The football team is getting ready for the game”

An example result for the prompt: “Riding a bike through Paris”

These prompts work because they describe a cinematic moment rather than a list of musical characteristics. A scene like “riding a bike through Paris” suggests a certain musical style and progression, which helps frame how the music should unfold. That context gives Free Text Search a clearer sense of what the track needs to communicate.

To fine-tune your search, add different keywords, like “orchestral,” “industrial rock,” or “hip-hop,” to steer it in the direction you want.

Common mistakes to avoid

  • Writing scenes that only make sense to you personally: Prompts should be interpretable without extra explanation.
  • Dropping the visual context: Turning a scene into a genre description removes what makes this approach effective.
  • Using obscure references: If the reference is not widely known, it may not clarify the scene.

3. Describing activities, situations, and moods

Free Text Search empowers you to be as specific as your project demands. You can describe when and where music will be heard, and what it should communicate. Combining activity, situation, and mood helps direct discovery toward abstract or niche ideas that don’t translate cleanly into tags, making it easier to surface music that fits its intended use.

When writing the prompts, focus on how the music will be used and what it needs to communicate in that situation. Providing clear usage context helps the search narrow results without requiring detailed musical instruction.

[Style or sound] + [intended use or context] + [optional: tone or functional role]

  • “Latin trap for fitness streaming catalog”
  • “Mellow California rock for sports highlight content”
  • “Colorful pop music for lifestyle brand campaign”
  • “Subtle ambient textures for background use”

Example result for the prompt: Mellow California rock for a road trip”

Common mistakes to avoid

  • Leaving out the use case: Mood alone often leads to broad results without direction.
  • Mixing conflicting contexts: Background use and high-impact language can work against each other.
  • Lack of clarity: When the prompt doesn’t include enough context, results stay generic.

Free Text Search is available in the Cyanite web app. You can test prompts, explore results, and refine searches in minutes.

Using prompts to improve discovery

With Free Text Search, you can explore your music catalog using detailed descriptions. This lets you search based on how music is described in real projects, making it easier to find tracks that fit a specific brief, scene, or use case.

Whether you’re pitching music for sync, artists, or labels, looking to underscore a film scene, or setting the mood for an activity, Free Text Search empowers you to explore music in a whole new way.

As you craft your prompts, try to be specific and objective, as this will return better results. Use concrete details like instruments, playing styles, and specific scenes or activities. 

You already have the resources in your catalog. Free Text Search helps you access them more effectively.

Everything you’ve ever wanted to know about Cyanite (answering your FAQs)

Everything you’ve ever wanted to know about Cyanite (answering your FAQs)

Ready to explore your catalog? Sign up for Cyanite.

As music catalogs grow, finding the right track gets harder. Metadata doesn’t always keep up, but teams are still expected to deliver fast, reliable results.

Libraries, publishers, sync teams, and the technical leads supporting them need systems that make large catalogs easier to understand and search. Cyanite is designed to support that work.

This guide provides a clear, high-level introduction to how Cyanite works and how it’s used in practice, giving teams a simple starting point before diving deeper into specific topics.

Learn more: Explore our FAQs to dig deeper into how Cyanite works.

The problem of scaling modern music catalogs

Once a catalog reaches a certain size, searching it becomes an inconsistent process. Music is described through tags and metadata that were added by different people, at different times, often for different needs. As the catalog grows, those descriptions stop lining up, which makes tracks harder to compare and surface reliably.

Over time, the same song can become discoverable in one context and invisible in another. Familiar tracks tend to show up first, while large parts of the catalog stay beneath the surface simply because their sound isn’t clearly represented in the data.

Scaling a modern music catalog means creating a shared, consistent way to describe sound, so music can be worked with confidently across teams and workflows, no matter how large the catalog becomes.

What Cyanite is (and what it is not)

Cyanite is an intelligent music system that works directly with sound. It analyzes each track and translates what can be heard into structured information that stays consistent across the catalog. That information is used both to tag music automatically and support sound-based search.

Teams can use Cyanite through the web app, integrate it into their own systems via an API, or access it directly within supported music CMS environments.

Cyanite is not a replacement for listening or creative judgment. It doesn’t decide what should be used, pitched, or licensed. It provides a consistent, sound-based foundation that helps teams work with music at scale while keeping human decision-making at the center.

How Cyanite analyzes music

Cyanite analyzes music through sound, not user behavior. Instead of relying on plays, clicks, or listening history, it focuses on the audio itself and produces a consistent, reliable sound description. This means each piece of music enters the system under the same logic, regardless of when it was added or who uploaded it.

Read more: How do music recommendation systems work?

Core capabilities

At its core, Cyanite helps teams organize and work with large music catalogs through music tagging and search. The same audio-based logic applied to every track creates consistent descriptions and keeps music easy to find, compare, and explore, even as catalogs grow.

A table showing Cyanite's AI-Tagging Taxonomy

To make large catalogs easier to work with, Cyanite applies consistent labeling based on each track’s full audio.

  • Auto-Tagging analyzes the audio to generate metadata like genre, mood, and tempo.
  • Auto-Descriptions generate concise, neutral descriptions that highlight how a track sounds and give teams quick context without having to listen first.

Sound-based search: Similarity, Free Text, and Advanced Search

To help teams find music, Cyanite offers multiple ways to search a catalog. 

  • Similarity Search finds tracks with a similar sound to a reference song, whether it’s from your catalog, an uploaded file, or a YouTube preview. It’s often a good fit when a brief starts with a musical reference rather than a written description.
  • Free Text Search allows teams to describe music in natural language, including full sentences and prompts in different languages. It then matches that intent to sound in the catalog.
  • Advanced Search, available through the API as an add-on for Similarity and Free Text Search, adds more control as searches become more specific. It enables filters and visibility into why tracks appear in the results, making it easier to refine and compare matches.

Privacy-first, IP-safe audio analysis

Cyanite is built for professional music catalogs, with all data processed and stored on servers in the EU in line with GDPR. Audio files are stored securely, can be deleted at any time on request, and are not shared with third parties. All analysis and search algorithms are developed in-house. For additional protection, Cyanite also supports spectrogram-based uploads, allowing audio to be analyzed without being reconstructable into playable sound.

How teams combine AI and human expertise

Cyanite is used for organizing, pitching, searching, and curating a catalog. Automation applies a consistent, sound-based foundation across every track, while teams add context, intent, and custom metadata where it matters. 

Because there are clear limits to what can be inferred from audio alone, most teams adopt a hybrid approach to their work. They use Cyanite to keep catalogs structured and searchable at scale, while human input shapes how the music is ultimately used.

How Cyanite fits into existing catalog systems

Cyanite is used at the point where teams need to explore a catalog for a pitch, brief, or curation task. It applies a consistent, sound-based foundation across all tracks, so decisions can be informed by reliable discovery results. With technology supporting the process, teams can confidently listen, compare, and narrow options, applying human judgment to make the selection.

Where to go deeper

Now that we’ve covered the basics, you can explore specific parts of Cyanite in more detail in the following articles:

Getting started with Cyanite

To evaluate Cyanite, the simplest starting point is a track sample analysis. Many teams begin with a small set of tracks to review tagging results and search behavior before deciding whether to scale further. This makes it easy to validate fit without committing a full catalog upfront.

For teams building products or integrating search into their own tools, integrating our API is a hands-on way to explore analysis, tagging, and similarity search in a live environment. You can create an API integration for free after registering via the web app.

When preparing for a larger evaluation, a bit of structure helps. Audio should be provided in MP3 and grouped into clear folders or batches that reflect how the catalog is organized. Most teams start with a representative subset and expand in phases once results and timelines are clear. If you are not able to deliver your music as MP3 files, reach out to support@cyanite.ai

Can Meta’s audio aesthetic model actually rate the quality of music?

Can Meta’s audio aesthetic model actually rate the quality of music?

Last year, Meta released Audiobox Aesthetics (AES), a research model that proposes scoring audio based on how people would rate it. The model outputs four scores: Production Quality (PQ), Production Complexity (PC), Content Enjoyment (CE), and Content Usefulness (CU). 

The study suggests that audio aesthetics can be broken into these axes, and that a reference-free model can predict these scores directly from audio. If that holds, the scores could start informing decisions and become signals people lean on when judging music at scale.

I took a closer look to understand how the model frames aesthetic judgment and what this means in practice. I ran Audiobox Aesthetics myself and examined how its scores behave with real music.

What Meta’s Audiobox Aesthetics paper claims

Before jumping into my evaluation, let’s take a closer look at what Meta’s Audiobox Aesthetics paper set out to do.

The paper introduces a research model intended to automate how audio is evaluated when no reference version exists. The authors present this as a way to automate listening judgments. They describe human evaluations as costly and inconsistent, leading them to seek an automated alternative.

To address this need, the authors propose breaking audio evaluation into four separate axes and predicting a separate score for each:

  • Production Quality (PQ) looks at technical execution, focusing on clarity and fidelity, dynamics, frequency balance, and spatialization.
  • Production Complexity (PC) reflects how many sound elements are present in the audio.
  • Content Enjoyment (CE) reflects how much listeners enjoy the audio, including their perception of artistic skill and overall listening experience.
  • Content Usefulness (CU) considers whether the audio feels usable for creating content.

The model is trained using ratings from human listeners who follow the same guidelines across speech, music, and sound effects. It analyzes audio in short segments of around 10 seconds. For longer tracks, the model scores each segment independently and provides an average. 

Beyond the audio itself, the model has no additional context. It does not know how a track is meant to be used or how it relates to other music. According to the paper, the scores tend to align with human ratings and could help sort audio when it’s not possible to listen to it all. In that way, the model is presented as a proxy for listener judgment.

Why I decided to evaluate the model

I wasn’t the only one who was curious to look into this model. Jeffrey Anthony’s “Can AI Measure Beauty? A Deep Dive into Meta’s Audio Aesthetics Model,” for instance, offers a deep, philosophical examination of what it means to quantify aesthetic judgment, including questions of ontology and judgment. I decided to tackle the topic even more with a hands-on approach, testing the model on some real-world examples to understand whether we could find some interesting patterns in the model’s predictions. 

What caught my attention most was how these scores are meant to be used. Once aesthetic judgments are turned into numbers, they start to feel reliable. They look like something you can sort by, filter on, or use to decide what gets heard and what gets ignored.

This matters in music workflows. Scores like these could influence how catalogs are cleaned up, how tracks are ranked for sync, and how large libraries of music are evaluated without listening. With a skeptical but open mindset, I set out to discover how these scores behave with real-world data.

 

What I found when testing the model

A) Individual-track sanity checks

I began with a qualitative sanity check using individual songs whose perceptual differences are unambiguous to human listeners. The tracks I selected represent distinct production conditions, stylistic intentions, and levels of artistic ambition.

I included four songs:

The motivation for this test was straightforward. A model claiming to predict Production Quality should assign a lower PQ to “Funky Town” (low-quality MP3) than to “Giorgio by Moroder.” A model claiming to estimate production or musical complexity should recognize “Blue Calx” by Aphex Twin as more complex than formulaic late-90s pop-trance such as DJ Visage’s “Schumacher Song.” Likewise, enjoyment and usefulness scores should not collapse across experimental electronic music, audiophile-grade disco-funk, old-school pop-trance, and degraded consumer audio.

You can see that the resulting scores, shown in the individual-track comparison plot above, contradict these expectations. “Funky Town” receives a PQ score only slightly lower than “Giorgio by Moroder,” indicating near insensitivity to codec degradation and mastering fidelity. Even more strikingly, “Blue Calx” is assigned the lowest Production Complexity among the four tracks, while “The Schumacher Song” and “Funky Town” receive higher PC scores. This directly inverts what most listeners would consider to be structural or compositional complexity.

Content Enjoyment is highest for “Funky Town” and lowest for “Blue Calx,” suggesting that the CE dimension aligns more closely with catchiness or familiarity than with artistic merit or aesthetic depth.

Taken together, these results indicate that AES is largely insensitive to audio fidelity. It fails to reflect musical or structural complexity, and instead appears to reward constant spectral activity and conventional pop characteristics. Even at the individual track level, the semantics of Production Quality and Production Complexity don’t match their labels.

B) Artist-level distribution analysis

Next, I tested whether AES produces distinct aesthetic profiles for artists with musical identities, production aesthetics, and historical contexts that are clearly different. I analyzed distributions of Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness for Johann Sebastian Bach, Skrillex, Dream Theater, The Clash, and Hans Zimmer.

If AES captures musically meaningful aesthetics, we would expect to see systematic separation between these artists. For example, Hans Zimmer and Dream Theater might have a higher complexity score than The Clash. Skrillex’s modern electronic productions might have a higher quality score than early punk recordings. Bach’s works might show high complexity but variable enjoyment or usefulness depending on the recording and interpretation.

Instead, the plotted distributions show strong overlap across artists for CE, CU, and PQ, with only minor shifts in means. Most scores cluster tightly within a narrow band between approximately 7 and 8, regardless of artist. PC exhibits slightly more variation, but still fails to form clear stylistic groupings. Bach, Skrillex, Dream Theater, and Hans Zimmer largely occupy overlapping regions, while The Clash is not consistently separate.

This suggests that AES doesn’t meaningfully encode artist-level aesthetic or production differences. Despite extreme stylistic diversity, the model assigns broadly similar aesthetic profiles, reinforcing the interpretation that AES functions as a coarse estimator of acceptability or pleasantness rather than a representation of musical aesthetics.

C) Bias analysis using a balanced gender-controlled dataset

Scoring models are designed to rank, filter, and curate songs in large music catalogs. If these models encode demographic-correlated priors, they can silently amplify existing biases at scale. To test this risk, I analyzed whether AES exhibits systematic differences between tracks with female lead vocals and tracks without female lead vocals.

In our 2025 ISMIR paper, we showed that common music embedding models pick up non-musical singer traits, such as gender and language, and exhibit significant bias as a result. Because AES is intended to judge quality, aesthetics, and usefulness, it would be particularly problematic if it had similar biases. They could directly influence which music is considered “better” or more desirable.

I constructed a balanced dataset using the same methodology used in our 2025 paper, equalizing genre distribution and singer language across groups.

For each group, I computed score distributions for Content Enjoyment, Content Usefulness, Production Complexity, and Production Quality, visualized them, and performed statistical testing using Welch’s t-test alongside Cohen’s d effect sizes. For context, Welch’s t-test is a statistical test that compares whether the average scores between two groups are significantly different. Cohen’s d is a measure of effect size that quantifies how large that difference is in standardized units.

The results show consistent upward shifts for female-led tracks in CE, CU, and PQ. All three differences are statistically significant with small-to-moderate effect sizes. In contrast, there is virtually no difference in Production Complexity score between groups.

This pattern indicates that the model systematically assigns higher enjoyment, usefulness, and quality scores to material with female vocals, even under controlled conditions. Because complexity remains unaffected, the effect doesn’t appear to stem from structural musical differences. Instead, it likely reflects correlations in training data and human annotations, or the model treating certain vocal timbres and production styles associated with female vocals as implicit quality indicators.

These findings suggest that AES encodes demographic-correlated aesthetic priors, which is problematic for a model intended to judge musical quality, aesthetics, and usefulness.

When a measure becomes a target, it ceases to be a good measure.

Charles Goodhart

Economist

Why this matters for the industry

Economist Charles Goodhart famously observed that “when a measure becomes a target, it ceases to be a good measure.” He was describing what happens when a metric starts to drive decisions rather than just being an indicator. Once a number is relied on, it begins to shape how people think and choose.

That idea applies directly to aesthetic scoring. A score, once it exists, carries weight. It gets used as a shortcut in decisions, even when its meaning is incomplete. This matters in music workflows because aesthetic judgment depends on context and purpose. 

When a simplified score is treated as reliable, systems can start favoring what scores well rather than what actually sounds better or serves a creative goal. Over time, that can quietly steer decisions away from how audio is perceived and used in practice.

How we approach audio intelligence at Cyanite

At Cyanite, music isn’t judged in a vacuum, and neither are the decisions built on top of it. That’s why we don’t rely on single aesthetic scores. Instead, we focus on making audio describable and searchable in ways that stay transparent and grounded in context.

Aesthetic scoring can give the illusion of precision, but it often lumps together different technical qualities, genres, and styles. In music search and discovery, a single score doesn’t explain why a track is surfaced or excluded. That reasoning matters to us. Not to decide what’s “good,” but to give teams tools they can understand and trust.

We see audio intelligence as a way to expose structure, not replace judgment. Our systems surface identifiable musical attributes and relationships, knowing that the same track can be the right or wrong fit depending on how it’s used. The goal is to support human decision-making, not substitute it with scores.

Experimentation has a place, but in music, automation works best when it’s explainable and limit-aware.

What responsible progress in music AI should look like

Progress in music and AI is underpinned by transparency. Teams should be able to understand how a model was trained and how its outputs relate to the audio. When results are interpretable, people can see why a track surfaces and judge for themselves whether the signal makes sense in their own context.

That transparency depends on data choices. Music spans styles, cultures, eras, and uses, and models reflect whatever they are fed. Developers need to work with broad, representative data and be clear about where coverage is thin. Being open about what a model sees, and what it does not, makes its behavior more predictable and its limits easier to manage.

Clear communication matters just as much once tools are in use. For scores and labels to be applied responsibly, teams need a shared understanding of what those signals reflect and where their limits are. Otherwise, even well-intentioned metrics can be stretched beyond what they are able to support.

This kind of openness helps the industry build tools people can understand and trust in real workflows. 

We explored how these expectations show up in practice in “The state of AI transparency in music 2025,” a report developed with MediaTracks and Marmoset on how music licensing professionals make decisions around AI, creator background, and context. You can read the full report here.

So… does Meta’s model provide meaningful ratings for music?

Based on these tests, the answer is no. The model produces stable scores, but they don’t map cleanly to how musical quality or complexity are assessed in real catalog work. Instead, the model appears to align more with easily detectable production traits than with the distinctions people consistently make when judging music in context.

That doesn’t make Audiobox Aesthetics insignificant. It can support research by defining a clear scoring framework, showing how reference-free predictors can be trained across speech, music, and sound, and making its models and data available for inspection and comparison. It also illustrates where AES scores can be useful, particularly when large volumes of audio need to be filtered or monitored but full listening is impractical.

Problems emerge when scores like these begin shaping decisions. When a score is presented as a measure of quality, people need to know what it’s actually measuring so they can judge whether it applies to their use case. Without that clarity, it becomes easy to trust the number even when it’s not a good fit.

At Cyanite, we see this as a reminder of the importance of responsibility in music and AI. Progress is driven by systems that stay grounded in real listening behavior and make their assumptions visible.

How to use AI music search for your music catalog

How to use AI music search for your music catalog

Ready to level up your search workflows? Try AI-powered music search in Cyanite.

Even the most carefully organized catalog reaches a point where text metadata can no longer support effective search on its own. Genres blur, moods can overlap, and large libraries hold thousands of tracks that look similar on paper but sound different when you listen. When you’re working on a brief, your search method needs to reflect the sound itself—not just the words attached to it.

AI music search enables your catalog to reveal more. By working with audio alongside the metadata, it returns search results that match the intent behind a brief rather than the exact words used in a query. You get a shortlist faster and surface strong tracks that would otherwise stay buried.

We see this need showing up across the catalogs we serve, so we put together this guide to outline how AI music search works in Cyanite and how it supports faster, more intuitive discovery in real-world workflows.

Learn more: See how AI music tagging works in Cyanite and how it supports large catalogs.

What is AI music search?

Traditional catalog search depends heavily on how consistently tracks are described. It works well when metadata is uniform and when everyone searches in the same way. But this is rarely the case in practice. Different people use different language, and many musical qualities are easier to hear than to articulate precisely.

AI music search approaches the problem by analysing the sound itself. This allows the system to understand rhythm, harmony, instrumentation, intensity, and voice presence. These sonic attributes are then used alongside existing metadata to guide search results.

Instead of matching exact keywords, the system focuses on musical similarity and intent. That means you can start a search from a reference track or a descriptive sentence without losing nuance along the way.

AI music search does not replace structured tagging. Instead, it builds on it as an additional way to explore a catalog when sound, context, or creative intent are easier to hear than describe.

At the same time, well-structured tagging remains the baseline to navigate a catalog in many day-to-day scenarios. AI-driven search becomes most valuable when teams need to move beyond fixed labels or explore music from a different angle.

How different types of AI music search work together

In practice, AI music search is most effective when it supports multiple ways of thinking about music. These are three ways we enable catalog music search in Cyanite:

  1. Audio-based search
  2. Prompt-based search
  3. Customizable advanced search features

These tools are designed to work together. Audio gives a clear view of how a track moves, text helps describe what you’re looking for, and advanced filters narrow the field to traits that matter for the request. Using them together keeps the catalog flexible and reduces the chance of great tracks being missed.

Exploring your catalog through Similarity Search

Similarity Search starts from sound. Cyanite analyzes a reference track’s audio and compares it with the rest of your catalog, returning tracks with a similar shape or mood. 

The reference can come from within your library or from an external source, such as Spotify, YouTube, or an uploaded audio file. You can also choose which part of the reference track to use, such as the chorus, the intro, or a specific section that best represents the desired direction.

This approach is especially useful when a brief comes with a musical example rather than a written description. Instead of translating sound into words and back again, you can search directly from what you hear. If you work with multiple reference tracks or an entire playlist, the Advanced Search features below are here to help.

Read more: Similar song finder AI for catalogs: Use Cyanite to search your library by sound

Searching with language using Free Text Search

Not every search starts with a reference track. Free Text Search allows users to describe music in natural language, using full sentences rather than rigid keywords.  

Prompts can reference mood, pacing, instrumentation, scene context, or use case. They can also include cultural references and be written in different languages. The system interprets the prompt’s meaning and matches it against the audio-based understanding of the catalog, without relying on external language models.

This makes search accessible to a wider range of users, including those who may not be familiar with a catalog’s internal tagging conventions.

Read more: How to prompt: the guide to using Cyanite’s Free Text Search

Advanced Search

For more specific searches, you often need additional control. Advanced Search builds on Similarity and Free Text Search by adding structured filters and deeper insight into why tracks appear in the results.

This mode allows teams to:

  • View similarity scores that show how closely results align with a reference or prompt
  • Run similarity searches using up to 50 reference tracks at once
  • Upload custom metadata and use it as additional filters
  • Identify the most similar segments within each track

Testing Advanced Search free for a month gave us the confidence we needed to update our search and tagging systems. The integration was smooth, and we were able to ship several exciting features right away—but we’ve only scratched the surface of its full capabilities!” Jack Whitis, CEO at Wavmaker

Read more: How to level up your AI search with Advanced Search features

AI music search: build vs buy

Organizations considering AI search often decide based on whether they want to build internally or integrate an existing solution. It typically depends on the time, cost, and ongoing work you can take on.

Building an in-house system can make sense for teams with significant machine-learning expertise and long-term resources. It typically requires a dedicated engineering team, a large and well-structured training dataset, and ongoing investment to maintain and improve model quality as catalogs and user needs evolve.

However, for most catalogs, integrating a tested system is the more practical path. Cyanite offers AI music search through a web app, an API, and integrations with major catalog management systems. Teams can adopt advanced search capabilities without taking on the long-term cost and complexity of maintaining their own models.

Smaller teams can start with the web app and scale usage over time. Larger organizations can integrate search directly into their own platforms, with pricing that aligns more predictably with catalog size.

Cyanite’s approach to AI music search

Cyanite is built to help teams understand their catalog through sound. We bring audio, language, and filters into one place so you can move through briefs without switching tools.

Audio-first analysis

Cyanite listens to the full track from beginning to end and captures how it develops in instrumentation, energy, and mood. This audio-first approach drives Similarity Search, Free Text Search, and Advanced Search. Because the focus stays on the audio rather than popularity and text-only metadata, you reach tracks that often get overlooked.

Data security and model ownership

Your audio remains within Cyanite’s environment.

  • Audio analysis and search models are built and maintained in-house.
  • No files are sent to external AI providers.
  • All processing meets GDPR requirements.

Teams with specific copyright needs can use upload workflows specifically designed for internal and client-facing work.

Built for catalog scale

Full tracks are analysed in depth, with thousands of sonic details compared. This means large libraries can be processed quickly without search performance slowing as the catalog grows. Search performance remains steady at high volume, which makes it easier to bring new material into the library without disrupting ongoing work. 

Search that adapts to the workflow

Similarity Search, Free Text Search, and Advanced Search all draw from the same audio analysis, which makes it easy to move between a reference track, a written prompt, or a set of filters in a single workflow. Advanced Search adds scoring and segment highlights when you need more context, while the other modes help you move quickly through creative requests. Together, these tools support different working styles and keep results consistent across teams and briefs.

Try AI music recognition with your own tracks

AI music search helps catalogs stay workable as they grow. By reading the audio and supporting both reference-based and prompt-based queries, it reduces search time and brings more of the catalog into play.

Want to see how this works with your own tracks? You can test Similarity Search and Free Text Search in the web app, or explore Advanced Search through the API.

FAQs – API Integration

Q: How does AI music recognition work in a catalog?

A: AI music recognition interprets patterns in the audio and compares them across the catalog. This reduces reliance on metadata wording and supports searches that begin with a reference track or a natural-language prompt.

Q: Is Cyanite the same as an AI music finder or consumer music search engine?

A: No. Consumer-facing music search and recommendation systems are typically driven by listening behavior and user interaction data. Cyanite focuses on sound-based analysis and metadata, making it suitable for professional catalog search, editorial workflows, and internal systems.

Streaming platforms use Cyanite to complement behavioral data with objective audio understanding, especially for catalog organization, discovery, and editorial use cases.

Q: Can Cyanite be used in my CMS for music?

Cyanite is fully integrated with SourceAudio, Cadenzabox, Harvest Media, Music Master, Reprtoir, Synchtank, and TuneBud. DISCO users can also import Cyanite’s Auto-Tagging and Auto-Descriptions into their libraries. These integrations support a wide range of Cyanite use cases across catalog management systems.

Q: Who uses Cyanite?

A: Music publishers, production libraries, sync teams, audio branding agencies, and music-tech platforms use Cyanite for tagging, search, playlist building, onboarding, and catalog analysis. Artists and producers use the web app for fast tagging and discovery.

Q: Can I integrate AI search into my own platform?

A: Yes. The API supports Similarity Search, Free Text Search, Advanced Search, and audio analysis, making it possible to add AI-powered discovery directly into your product.