PR: Anghami partners with Cyanite | Music discovery with AI-powered metadata across 2.5 million songs

PR: Anghami partners with Cyanite | Music discovery with AI-powered metadata across 2.5 million songs

PRESS RELEASE

Berlin 24.03.2026 -Anghami, the leading music and entertainment streaming platform in the MENA region with over 120 million registered users, has partnered with Cyanite to enrich 2.5 million songs using AI-generated music metadata.

By integrating Cyanite’s auto-tagging API, Anghami has enhanced its catalog with detailed audio-based metadata across mood, genre, energy, instrumentation, and more. This structured data layer feeds directly into Anghami’s internal recommendation systems, enabling more precise and scalable music discovery.

At a catalog scale of millions of tracks, metadata quality becomes a strategic driver of personalisation. Structured and consistent tagging enables streaming platforms to better match songs with listeners, surface long-tail content, and improve personalization across diverse repertoires.

For Anghami, the partnership also underscores its commitment to accurately representing the richness of Arabic music. A significant share of its catalog consists of regional content that is often underrepresented in Western-centric AI systems.

Because Cyanite analyses audio directly, rather than relying on behavioural signals or language-based metadata, its models operate consistently across musical cultures and languages.

Anghami operates one of the most culturally diverse music catalogs in the world. Ensuring that Arabic repertoire is tagged with the same precision as Western music is not trivial. We’re proud that our audio-based AI can support music discovery at this scale and across such a rich regional landscape.

Markus Schwarzer

CEO & Founder, Cyanite

Arabic music carries immense depth, emotion and cultural nuance. Through our partnership with Cyanite, we’re ensuring that this richness is understood at a data level, allowing us to power more accurate personalisation and elevate discovery for millions of listeners.

Elias El Khoury

VP Information & Content Systems, Anghami

About Anghami Inc. (NASDAQ: ANGH):

Anghami is the leading multi-media technology streaming platform in the Middle East and North Africa (“MENA”) region, offering a comprehensive ecosystem of exclusive premium video, music, podcasts, live entertainment, audio services and more. Since its launch in 2012, Anghami has led the way as the first music streaming platform to digitize MENA’s music catalog, reshaping the region’s entertainment landscape.

In a strategic move in April 2024, Anghami joined forces with OSN+, a leading video streaming platform, forming a digital entertainment powerhouse. This pivotal transaction strengthened Anghami’s position as a go-to destination, boasting an extensive library of over 18,000 hours of premium video, including exclusive HBO content, alongside 100+ million Arabic and International songs and podcasts.

With a user base exceeding 120 million registered users and 2.5 million paid subscribers, Anghami has partnered with 47 telcos across MENA, facilitating customer acquisition and subscription payment, in addition to establishing relationships with major film studios, entertainment giants, and music labels, both regional and international.

Headquartered in Abu Dhabi, UAE, Anghami operates in 16 countries across MENA, with offices in Beirut, Dubai, Cairo, and Riyadh.

To learn more about Anghami, please visit: https://anghami.com

For media inquiries, please contact:
Umar Gulamnabi – Associate, Integrated Media, Current Global
osncg@currentglobal.com
+971 56 827 1966

About Cyanite

Cyanite is an AI music intelligence platform that helps streaming services, publishers, and music platforms enrich and organize their catalogs. Its auto-tagging API analyzes audio directly to generate structured metadata across genre, mood, energy, instrumentation, and more. Cyanite has tagged over 40 million songs and is trusted by more than 200 companies worldwide, including Warner Chappell, BMG, Epidemic Sound, and APM Music.

Media contact
Jakob Höflich
CMO at Cyanite
jakob@cyanite.ai

For interview requests or additional data, please contact: jakob@cyanite.ai

Context is what separates music from AI slop

Context is what separates music from AI slop

Add structured context to your discovery workflows with Cyanite’s Advanced Search.

Every week, thousands of new tracks enter music libraries. There’s no real limit to how many can be uploaded. As catalogs expand, it becomes harder to tell why one piece deserves attention over another.

At the same time, generative AI tools make it possible to produce a lot of music quickly and cheaply. This means catalogs can get flooded with “AI slop,” a term used to describe mass-produced generative content created for volume rather than quality.

Context is key to making the distinction between music and AI slop. Knowing who created a track, what shaped it, and why it exists roots it in human experience and creative intent.  Without that layer of insight, music becomes interchangeable audio, reduced to tags and search terms.

What makes music human?

The intention behind music and the social connection it creates are what make it human.

That humanity is visible in the decisions that shape a track. No matter how minimal or elaborate a composition is, every musical choice reflects human knowledge and experience. The key, rhythm, production, and instruments used are all guided by cultural exposure, emotional memory, and learned musical language.

And then there’s the risk. When someone releases music, they also release control over how it will be heard and judged. That exposure is vulnerable, and recognizing the risk and context behind a piece makes the connection to it stronger.

The AI limitation

When intention and personal stakes are missing, the difference is noticeable.

AI-generated music can sound like human-made tracks. It can replicate style, structure, and production detail with striking accuracy. In many contexts, it even meets professional standards.

However, it doesn’t come from lived experience and instead reconstructs patterns it has learned from existing music. There’s no vulnerability behind the track. There’s no social stake. And there’s no personal history shaping the decision to produce it. The output is coherent because the sequence fits statistically, not because something needed to be expressed.

Why context matters more than ever

With the sheer volume of modern catalogs, several tracks can sound interchangeable. You can work on a brief and find 10 pieces that would technically meet the requirements. 

What actually helps you choose is knowing where the music comes from and who made it. That extra layer of information changes how you hear it. In a space this crowded, context is what keeps everything from blending into the same background noise.

The potential of contextual metadata

If context gives music meaning, it needs to be structured as metadata so it can be searched and filtered at scale.

Custom tagging makes that possible. Catalogs can include fields for artist origin, geography, creative background, cultural context, and editorial positioning. When that information can be filtered, it starts shaping decisions. Context moves from description to action.

The same principle applies to one of the clearest distinctions in modern catalogs: whether a track is human-created or AI-generated. When that difference is structured as metadata, it becomes searchable inside existing discovery systems.

Melodie Music puts this into practice to spotlight original Australian artists. They combine Cyanite’s sound-based AI search with their own editorial and contextual metadata.

  • Cyanite analyzes the sound of a reference track and generates a shortlist based on emotional profile and sonic character.
  • Melodie layers contextual filters, such as artist origin, on top of those results.
  • Users refine further using editorial tags aligned with cultural or strategic priorities.
  • The final selection satisfies both the creative brief and the mandate to support specific artist communities.

What this means for music discovery

Algorithmic recommendations alone aren’t enough. Teams want clarity about origin, authorship, and AI involvement before committing to a track. 

In our joint study with MediaTracks and Marmoset, we found that contextual metadata plays a central role in how professionals work through briefs. Respondents described relying on origin details and creator background to avoid misalignment and explain their choices to clients. 

Clearly labeling AI involvement was part of that same expectation. Professionals are open to working with AI-generated music, but they want to know explicitly whether a track is AI-generated or human-made. Context, including transparency around authorship, informs decisions.

Read more: Why AI labels and metadata now matter in licensing

Cyanite’s Advanced Search, available via API integration, allows teams to upload their own custom metadata fields and use them for filtering. Fields can include artist origin, cultural background, clearance information, and editorial categories.

Search queries then run within that defined subset, so sound analysis operates inside contextual boundaries set by the catalog owner, as implemented by Melodie Music.

For platforms embedding Cyanite’s search algorithms into their own systems, this enables structured transparency at scale. Context becomes part of the discovery logic itself.

Choosing meaning over noise

“We have always connected to music because it carries intention, experience, and emotion – not just sound. A song means something because it was created in a specific moment, for a reason, by someone responding to their world. Today, we are surrounded by more music than ever, inevitably making it harder to feel that connection. Delivering context to a song gives a glimpse into what went into it, and with it a chance to understand the people and feelings behind the music.

Even though AI-generated music can sound pleasant, it is fundamentally an imitation – a reconstruction of patterns it has seen before. It lacks intention, situation, risk, and personal stake.

That’s why context matters more than ever. Knowing why a piece of music exists, where it comes from, and what went into it is what turns sound into something meaningful.”

Simon Timm

Music Producer, Cyanite Marketing Expert, Cyanite

Context can double as infrastructure in catalogs. As AI-generated music becomes easier to produce and distribute, what will separate human-made tracks from AI slop is whether a track’s origin is visible and understood. Catalogs that structure and surface contextual metadata can ensure music is selected based on where it comes from and why it exists, not just how it sounds.

Ready to add context to your discovery workflows?

FAQs

Q: How can contextual metadata help distinguish human-created music from AI-generated tracks?

A: Contextual metadata adds information beyond sound analysis, such as artist background, origin, editorial positioning, and authorship labeling. It can allow teams to filter catalogs based on transparency and creative intent, helping distinguish human-created music from generative content produced at scale.

 

Q: Does Cyanite detect whether music is AI-generated or human-made?

A: Cyanite is developing AI music detection capabilities designed to support transparent catalog workflows. Early implementations allow teams to label and filter tracks based on AI involvement, helping licensing professionals and curators make informed decisions during discovery.

Q: Can Cyanite’s Advanced Search filter music using custom metadata fields?

A: Yes. Advanced Search allows catalog owners to include their own metadata fields as filters within search queries. These filters narrow the searchable catalog before sound similarity or text-based matching is applied, helping teams surface results that fit their creative and business requirements.

Q: How can music platforms integrate contextual discovery into existing workflows?

A: Music catalogs can integrate Cyanite’s Advanced Search through the API, making it possible to combine sound analysis with custom metadata filters inside their existing workflows.

How To Prompt: The Guide to Using Cyanite’s Free Text Search

How To Prompt: The Guide to Using Cyanite’s Free Text Search

Ready to search your catalog in natural language? Try Free Text Search.

Do you have trouble translating your vision for music into precise keywords? If so, this guide on how to prompt using Cyanite’s Free Text Search is for you.

It’s a more natural way to search your music catalog and discover tracks. You can use complete sentences to describe soundscapes, film scenes, daily situations, activities, or environments. Prompts can be written in different languages and can include cultural references, so you’re not forced to reduce your idea to a fixed set of tags.

Before you explore what Free Text Search can do, keep in mind that prompt-based search works best when your input is specific. The clearer you are, the easier it is to find what you’re looking for. 

Read more: What is music prompt search?

Why music catalogs struggle with discovery

Most large catalogs contain inconsistent metadata. Many were built before modern tagging standards, then expanded over time through different workflows. New music arrives faster than metadata teams can standardize it, especially with the volume from UGC and AI-generated releases, while older tracks remain described in ways that don’t always support how music is searched for today.

Traditional search relies on tags and keyword logic. This approach can be effective for many searches, but it has limits when ideas are already highly specific, like with a detailed creative brief or a particular scene description. Translating concrete, nuanced needs into tags often loses critical details and context.

That’s where natural language search makes a difference. Instead of defining a specific vision in terms of available tags, you can describe what you need directly or even paste a brief into the search bar. The system interprets intent, mood, and context in ways that complement tag-based discovery.

This helps sync and licensing teams work faster with detailed requests, and gives catalog teams another tool to surface relevant music, especially from underused parts of the catalog.

Read more: How to use AI music search for your music catalog

How Free Text Search amplifies music discovery

Free Text Search lets you look for music in the way you would naturally describe it. Write detailed prompts in full sentences, and Cyanite’s AI interprets the meaning behind your words to match intent with how tracks actually sound in your catalog.

This type of search is designed for situations where intent doesn’t translate cleanly into keywords. Tag-based searches work well when attributes are fixed and clearly defined, and Similarity Search is useful when you already have a reference track and want to find music that sounds close to it. Teams often get good results when they search in their own words first, then move into other search modes to refine the selection.

How to use Free Text Search effectively

In real-life workflows, searches rarely begin from the same place. Sometimes you’ll start with sound, sometimes with a scene, and sometimes with context. 

Not every idea can be reduced to tags or tied to a specific track. Choosing music is a creative process, so the way people search is often creative too. Free Text Search meets users where they are, allowing them to describe intent in natural language and shape discovery around how they think. 

1. Describing sound

With Free Text Search, you can add context and even cultural references to your search, making it possible to find the perfect soundtrack for your project and get the most out of your music catalog. 

This approach is commonly used when responding to sync briefs that describe musical detail and tone.

Sound-focused prompts should name what musical elements are present, then add how those elements are played or arranged. An extra cue about character or attitude can be included when it helps clarify intent.

[Instruments or sound sources] + [how they are played or arranged] + [optional: character or stylistic cue]

  • “Trailer with sparse repetitive piano and dramatic drum hits with Star-Wars-style orchestra themes”
  • “Laid-back future bass with defiant female vocal”
  • “Staccato strings with a piano playing only single notes”
  • “Solo double bass played dramatically with a bow”

These prompts work because they are specific, but not rigid. That level of detail helps surface relevant tracks faster and reduces reliance on perfectly maintained tags, which is especially valuable in large or uneven catalogs.

Common mistakes to avoid

  • Staying too abstract: Words like “cinematic” or “emotional” on their own don’t give enough information to form a clear sound.
  • Listing elements without context: Naming instruments or genres without describing how they are played or arranged often leads to broad results.
  • Overloading the prompt: Packing too many ideas into one sentence can blur intent and pull results in different directions.
  • Writing like a tag list: Free Text Search works best when the prompt reads like a description, not a stack of keywords.

Read more: AI search tool for music publishing: best 3 ways

2. Describing film scenes

Film scenes can evoke a wide range of emotions and visuals. When using Free Text Search for this purpose, consider whether your prompt captures objective elements of the scene or your own interpretation of it.

Publishers often use scene-based prompts to explore deeper parts of their catalog and surface music suited to narrative use cases beyond obvious genre labels.

You can reference popular movies or shows like Pirates of the Caribbean or Stranger Things in your search prompts.

It helps to think like a director. Focus on the action or moment in the scene and what the viewer is experiencing. The clearer the image you describe, the easier it is for the search to interpret what kind of music belongs there, without needing a list of musical traits.

[Action or moment] + [optional: setting or situation] + [optional: stylistic cue]

  • “Riding a bike through Paris”
  • “Thriller score with Stranger-Things-style synths “
  • “Tailing the suspect through a Middle Eastern bazaar”
  • “The football team is getting ready for the game”

An example result for the prompt: “Riding a bike through Paris”

These prompts work because they describe a cinematic moment rather than a list of musical characteristics. A scene like “riding a bike through Paris” suggests a certain musical style and progression, which helps frame how the music should unfold. That context gives Free Text Search a clearer sense of what the track needs to communicate.

To fine-tune your search, add different keywords, like “orchestral,” “industrial rock,” or “hip-hop,” to steer it in the direction you want.

Common mistakes to avoid

  • Writing scenes that only make sense to you personally: Prompts should be interpretable without extra explanation.
  • Dropping the visual context: Turning a scene into a genre description removes what makes this approach effective.
  • Using obscure references: If the reference is not widely known, it may not clarify the scene.

3. Describing activities, situations, and moods

Free Text Search empowers you to be as specific as your project demands. You can describe when and where music will be heard, and what it should communicate. Combining activity, situation, and mood helps direct discovery toward abstract or niche ideas that don’t translate cleanly into tags, making it easier to surface music that fits its intended use.

When writing the prompts, focus on how the music will be used and what it needs to communicate in that situation. Providing clear usage context helps the search narrow results without requiring detailed musical instruction.

[Style or sound] + [intended use or context] + [optional: tone or functional role]

  • “Latin trap for fitness streaming catalog”
  • “Mellow California rock for sports highlight content”
  • “Colorful pop music for lifestyle brand campaign”
  • “Subtle ambient textures for background use”

Example result for the prompt: Mellow California rock for a road trip”

Common mistakes to avoid

  • Leaving out the use case: Mood alone often leads to broad results without direction.
  • Mixing conflicting contexts: Background use and high-impact language can work against each other.
  • Lack of clarity: When the prompt doesn’t include enough context, results stay generic.

Free Text Search is available in the Cyanite web app. You can test prompts, explore results, and refine searches in minutes.

Using prompts to improve discovery

With Free Text Search, you can explore your music catalog using detailed descriptions. This lets you search based on how music is described in real projects, making it easier to find tracks that fit a specific brief, scene, or use case.

Whether you’re pitching music for sync, artists, or labels, looking to underscore a film scene, or setting the mood for an activity, Free Text Search empowers you to explore music in a whole new way.

As you craft your prompts, try to be specific and objective, as this will return better results. Use concrete details like instruments, playing styles, and specific scenes or activities. 

You already have the resources in your catalog. Free Text Search helps you access them more effectively.

Everything you’ve ever wanted to know about Cyanite (answering your FAQs)

Everything you’ve ever wanted to know about Cyanite (answering your FAQs)

Ready to explore your catalog? Sign up for Cyanite.

As music catalogs grow, finding the right track gets harder. Metadata doesn’t always keep up, but teams are still expected to deliver fast, reliable results.

Libraries, publishers, sync teams, and the technical leads supporting them need systems that make large catalogs easier to understand and search. Cyanite is designed to support that work.

This guide provides a clear, high-level introduction to how Cyanite works and how it’s used in practice, giving teams a simple starting point before diving deeper into specific topics.

Learn more: Explore our FAQs to dig deeper into how Cyanite works.

The problem of scaling modern music catalogs

Once a catalog reaches a certain size, searching it becomes an inconsistent process. Music is described through tags and metadata that were added by different people, at different times, often for different needs. As the catalog grows, those descriptions stop lining up, which makes tracks harder to compare and surface reliably.

Over time, the same song can become discoverable in one context and invisible in another. Familiar tracks tend to show up first, while large parts of the catalog stay beneath the surface simply because their sound isn’t clearly represented in the data.

Scaling a modern music catalog means creating a shared, consistent way to describe sound, so music can be worked with confidently across teams and workflows, no matter how large the catalog becomes.

What Cyanite is (and what it is not)

Cyanite is an intelligent music system that works directly with sound. It analyzes each track and translates what can be heard into structured information that stays consistent across the catalog. That information is used both to tag music automatically and support sound-based search.

Teams can use Cyanite through the web app, integrate it into their own systems via an API, or access it directly within supported music CMS environments.

Cyanite is not a replacement for listening or creative judgment. It doesn’t decide what should be used, pitched, or licensed. It provides a consistent, sound-based foundation that helps teams work with music at scale while keeping human decision-making at the center.

How Cyanite analyzes music

Cyanite analyzes music through sound, not user behavior. Instead of relying on plays, clicks, or listening history, it focuses on the audio itself and produces a consistent, reliable sound description. This means each piece of music enters the system under the same logic, regardless of when it was added or who uploaded it.

Read more: How do music recommendation systems work?

Core capabilities

At its core, Cyanite helps teams organize and work with large music catalogs through music tagging and search. The same audio-based logic applied to every track creates consistent descriptions and keeps music easy to find, compare, and explore, even as catalogs grow.

A table showing Cyanite's AI-Tagging Taxonomy

To make large catalogs easier to work with, Cyanite applies consistent labeling based on each track’s full audio.

  • Auto-Tagging analyzes the audio to generate metadata like genre, mood, and tempo.
  • Auto-Descriptions generate concise, neutral descriptions that highlight how a track sounds and give teams quick context without having to listen first.

Sound-based search: Similarity, Free Text, and Advanced Search

To help teams find music, Cyanite offers multiple ways to search a catalog. 

  • Similarity Search finds tracks with a similar sound to a reference song, whether it’s from your catalog, an uploaded file, or a YouTube preview. It’s often a good fit when a brief starts with a musical reference rather than a written description.
  • Free Text Search allows teams to describe music in natural language, including full sentences and prompts in different languages. It then matches that intent to sound in the catalog.
  • Advanced Search, available through the API as an add-on for Similarity and Free Text Search, adds more control as searches become more specific. It enables filters and visibility into why tracks appear in the results, making it easier to refine and compare matches.

Privacy-first, IP-safe audio analysis

Cyanite is built for professional music catalogs, with all data processed and stored on servers in the EU in line with GDPR. Audio files are stored securely, can be deleted at any time on request, and are not shared with third parties. All analysis and search algorithms are developed in-house. For additional protection, Cyanite also supports spectrogram-based uploads, allowing audio to be analyzed without being reconstructable into playable sound.

How teams combine AI and human expertise

Cyanite is used for organizing, pitching, searching, and curating a catalog. Automation applies a consistent, sound-based foundation across every track, while teams add context, intent, and custom metadata where it matters. 

Because there are clear limits to what can be inferred from audio alone, most teams adopt a hybrid approach to their work. They use Cyanite to keep catalogs structured and searchable at scale, while human input shapes how the music is ultimately used.

How Cyanite fits into existing catalog systems

Cyanite is used at the point where teams need to explore a catalog for a pitch, brief, or curation task. It applies a consistent, sound-based foundation across all tracks, so decisions can be informed by reliable discovery results. With technology supporting the process, teams can confidently listen, compare, and narrow options, applying human judgment to make the selection.

Where to go deeper

Now that we’ve covered the basics, you can explore specific parts of Cyanite in more detail in the following articles:

Getting started with Cyanite

To evaluate Cyanite, the simplest starting point is a track sample analysis. Many teams begin with a small set of tracks to review tagging results and search behavior before deciding whether to scale further. This makes it easy to validate fit without committing a full catalog upfront.

For teams building products or integrating search into their own tools, integrating our API is a hands-on way to explore analysis, tagging, and similarity search in a live environment. You can create an API integration for free after registering via the web app.

When preparing for a larger evaluation, a bit of structure helps. Audio should be provided in MP3 and grouped into clear folders or batches that reflect how the catalog is organized. Most teams start with a representative subset and expand in phases once results and timelines are clear. If you are not able to deliver your music as MP3 files, reach out to support@cyanite.ai

Can Meta’s audio aesthetic model actually rate the quality of music?

Can Meta’s audio aesthetic model actually rate the quality of music?

Last year, Meta released Audiobox Aesthetics (AES), a research model that proposes scoring audio based on how people would rate it. The model outputs four scores: Production Quality (PQ), Production Complexity (PC), Content Enjoyment (CE), and Content Usefulness (CU). 

The study suggests that audio aesthetics can be broken into these axes, and that a reference-free model can predict these scores directly from audio. If that holds, the scores could start informing decisions and become signals people lean on when judging music at scale.

I took a closer look to understand how the model frames aesthetic judgment and what this means in practice. I ran Audiobox Aesthetics myself and examined how its scores behave with real music.

What Meta’s Audiobox Aesthetics paper claims

Before jumping into my evaluation, let’s take a closer look at what Meta’s Audiobox Aesthetics paper set out to do.

The paper introduces a research model intended to automate how audio is evaluated when no reference version exists. The authors present this as a way to automate listening judgments. They describe human evaluations as costly and inconsistent, leading them to seek an automated alternative.

To address this need, the authors propose breaking audio evaluation into four separate axes and predicting a separate score for each:

  • Production Quality (PQ) looks at technical execution, focusing on clarity and fidelity, dynamics, frequency balance, and spatialization.
  • Production Complexity (PC) reflects how many sound elements are present in the audio.
  • Content Enjoyment (CE) reflects how much listeners enjoy the audio, including their perception of artistic skill and overall listening experience.
  • Content Usefulness (CU) considers whether the audio feels usable for creating content.

The model is trained using ratings from human listeners who follow the same guidelines across speech, music, and sound effects. It analyzes audio in short segments of around 10 seconds. For longer tracks, the model scores each segment independently and provides an average. 

Beyond the audio itself, the model has no additional context. It does not know how a track is meant to be used or how it relates to other music. According to the paper, the scores tend to align with human ratings and could help sort audio when it’s not possible to listen to it all. In that way, the model is presented as a proxy for listener judgment.

Why I decided to evaluate the model

I wasn’t the only one who was curious to look into this model. Jeffrey Anthony’s “Can AI Measure Beauty? A Deep Dive into Meta’s Audio Aesthetics Model,” for instance, offers a deep, philosophical examination of what it means to quantify aesthetic judgment, including questions of ontology and judgment. I decided to tackle the topic even more with a hands-on approach, testing the model on some real-world examples to understand whether we could find some interesting patterns in the model’s predictions. 

What caught my attention most was how these scores are meant to be used. Once aesthetic judgments are turned into numbers, they start to feel reliable. They look like something you can sort by, filter on, or use to decide what gets heard and what gets ignored.

This matters in music workflows. Scores like these could influence how catalogs are cleaned up, how tracks are ranked for sync, and how large libraries of music are evaluated without listening. With a skeptical but open mindset, I set out to discover how these scores behave with real-world data.

 

What I found when testing the model

A) Individual-track sanity checks

I began with a qualitative sanity check using individual songs whose perceptual differences are unambiguous to human listeners. The tracks I selected represent distinct production conditions, stylistic intentions, and levels of artistic ambition.

I included four songs:

The motivation for this test was straightforward. A model claiming to predict Production Quality should assign a lower PQ to “Funky Town” (low-quality MP3) than to “Giorgio by Moroder.” A model claiming to estimate production or musical complexity should recognize “Blue Calx” by Aphex Twin as more complex than formulaic late-90s pop-trance such as DJ Visage’s “Schumacher Song.” Likewise, enjoyment and usefulness scores should not collapse across experimental electronic music, audiophile-grade disco-funk, old-school pop-trance, and degraded consumer audio.

You can see that the resulting scores, shown in the individual-track comparison plot above, contradict these expectations. “Funky Town” receives a PQ score only slightly lower than “Giorgio by Moroder,” indicating near insensitivity to codec degradation and mastering fidelity. Even more strikingly, “Blue Calx” is assigned the lowest Production Complexity among the four tracks, while “The Schumacher Song” and “Funky Town” receive higher PC scores. This directly inverts what most listeners would consider to be structural or compositional complexity.

Content Enjoyment is highest for “Funky Town” and lowest for “Blue Calx,” suggesting that the CE dimension aligns more closely with catchiness or familiarity than with artistic merit or aesthetic depth.

Taken together, these results indicate that AES is largely insensitive to audio fidelity. It fails to reflect musical or structural complexity, and instead appears to reward constant spectral activity and conventional pop characteristics. Even at the individual track level, the semantics of Production Quality and Production Complexity don’t match their labels.

B) Artist-level distribution analysis

Next, I tested whether AES produces distinct aesthetic profiles for artists with musical identities, production aesthetics, and historical contexts that are clearly different. I analyzed distributions of Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness for Johann Sebastian Bach, Skrillex, Dream Theater, The Clash, and Hans Zimmer.

If AES captures musically meaningful aesthetics, we would expect to see systematic separation between these artists. For example, Hans Zimmer and Dream Theater might have a higher complexity score than The Clash. Skrillex’s modern electronic productions might have a higher quality score than early punk recordings. Bach’s works might show high complexity but variable enjoyment or usefulness depending on the recording and interpretation.

Instead, the plotted distributions show strong overlap across artists for CE, CU, and PQ, with only minor shifts in means. Most scores cluster tightly within a narrow band between approximately 7 and 8, regardless of artist. PC exhibits slightly more variation, but still fails to form clear stylistic groupings. Bach, Skrillex, Dream Theater, and Hans Zimmer largely occupy overlapping regions, while The Clash is not consistently separate.

This suggests that AES doesn’t meaningfully encode artist-level aesthetic or production differences. Despite extreme stylistic diversity, the model assigns broadly similar aesthetic profiles, reinforcing the interpretation that AES functions as a coarse estimator of acceptability or pleasantness rather than a representation of musical aesthetics.

C) Bias analysis using a balanced gender-controlled dataset

Scoring models are designed to rank, filter, and curate songs in large music catalogs. If these models encode demographic-correlated priors, they can silently amplify existing biases at scale. To test this risk, I analyzed whether AES exhibits systematic differences between tracks with female lead vocals and tracks without female lead vocals.

In our 2025 ISMIR paper, we showed that common music embedding models pick up non-musical singer traits, such as gender and language, and exhibit significant bias as a result. Because AES is intended to judge quality, aesthetics, and usefulness, it would be particularly problematic if it had similar biases. They could directly influence which music is considered “better” or more desirable.

I constructed a balanced dataset using the same methodology used in our 2025 paper, equalizing genre distribution and singer language across groups.

For each group, I computed score distributions for Content Enjoyment, Content Usefulness, Production Complexity, and Production Quality, visualized them, and performed statistical testing using Welch’s t-test alongside Cohen’s d effect sizes. For context, Welch’s t-test is a statistical test that compares whether the average scores between two groups are significantly different. Cohen’s d is a measure of effect size that quantifies how large that difference is in standardized units.

The results show consistent upward shifts for female-led tracks in CE, CU, and PQ. All three differences are statistically significant with small-to-moderate effect sizes. In contrast, there is virtually no difference in Production Complexity score between groups.

This pattern indicates that the model systematically assigns higher enjoyment, usefulness, and quality scores to material with female vocals, even under controlled conditions. Because complexity remains unaffected, the effect doesn’t appear to stem from structural musical differences. Instead, it likely reflects correlations in training data and human annotations, or the model treating certain vocal timbres and production styles associated with female vocals as implicit quality indicators.

These findings suggest that AES encodes demographic-correlated aesthetic priors, which is problematic for a model intended to judge musical quality, aesthetics, and usefulness.

When a measure becomes a target, it ceases to be a good measure.

Charles Goodhart

Economist

Why this matters for the industry

Economist Charles Goodhart famously observed that “when a measure becomes a target, it ceases to be a good measure.” He was describing what happens when a metric starts to drive decisions rather than just being an indicator. Once a number is relied on, it begins to shape how people think and choose.

That idea applies directly to aesthetic scoring. A score, once it exists, carries weight. It gets used as a shortcut in decisions, even when its meaning is incomplete. This matters in music workflows because aesthetic judgment depends on context and purpose. 

When a simplified score is treated as reliable, systems can start favoring what scores well rather than what actually sounds better or serves a creative goal. Over time, that can quietly steer decisions away from how audio is perceived and used in practice.

How we approach audio intelligence at Cyanite

At Cyanite, music isn’t judged in a vacuum, and neither are the decisions built on top of it. That’s why we don’t rely on single aesthetic scores. Instead, we focus on making audio describable and searchable in ways that stay transparent and grounded in context.

Aesthetic scoring can give the illusion of precision, but it often lumps together different technical qualities, genres, and styles. In music search and discovery, a single score doesn’t explain why a track is surfaced or excluded. That reasoning matters to us. Not to decide what’s “good,” but to give teams tools they can understand and trust.

We see audio intelligence as a way to expose structure, not replace judgment. Our systems surface identifiable musical attributes and relationships, knowing that the same track can be the right or wrong fit depending on how it’s used. The goal is to support human decision-making, not substitute it with scores.

Experimentation has a place, but in music, automation works best when it’s explainable and limit-aware.

What responsible progress in music AI should look like

Progress in music and AI is underpinned by transparency. Teams should be able to understand how a model was trained and how its outputs relate to the audio. When results are interpretable, people can see why a track surfaces and judge for themselves whether the signal makes sense in their own context.

That transparency depends on data choices. Music spans styles, cultures, eras, and uses, and models reflect whatever they are fed. Developers need to work with broad, representative data and be clear about where coverage is thin. Being open about what a model sees, and what it does not, makes its behavior more predictable and its limits easier to manage.

Clear communication matters just as much once tools are in use. For scores and labels to be applied responsibly, teams need a shared understanding of what those signals reflect and where their limits are. Otherwise, even well-intentioned metrics can be stretched beyond what they are able to support.

This kind of openness helps the industry build tools people can understand and trust in real workflows. 

We explored how these expectations show up in practice in “The state of AI transparency in music 2025,” a report developed with MediaTracks and Marmoset on how music licensing professionals make decisions around AI, creator background, and context. You can read the full report here.

So… does Meta’s model provide meaningful ratings for music?

Based on these tests, the answer is no. The model produces stable scores, but they don’t map cleanly to how musical quality or complexity are assessed in real catalog work. Instead, the model appears to align more with easily detectable production traits than with the distinctions people consistently make when judging music in context.

That doesn’t make Audiobox Aesthetics insignificant. It can support research by defining a clear scoring framework, showing how reference-free predictors can be trained across speech, music, and sound, and making its models and data available for inspection and comparison. It also illustrates where AES scores can be useful, particularly when large volumes of audio need to be filtered or monitored but full listening is impractical.

Problems emerge when scores like these begin shaping decisions. When a score is presented as a measure of quality, people need to know what it’s actually measuring so they can judge whether it applies to their use case. Without that clarity, it becomes easy to trust the number even when it’s not a good fit.

At Cyanite, we see this as a reminder of the importance of responsibility in music and AI. Progress is driven by systems that stay grounded in real listening behavior and make their assumptions visible.