Can Meta’s audio aesthetic model actually rate the quality of music?

Can Meta’s audio aesthetic model actually rate the quality of music?

Last year, Meta released Audiobox Aesthetics (AES), a research model that proposes scoring audio based on how people would rate it. The model outputs four scores: Production Quality (PQ), Production Complexity (PC), Content Enjoyment (CE), and Content Usefulness (CU). 

The study suggests that audio aesthetics can be broken into these axes, and that a reference-free model can predict these scores directly from audio. If that holds, the scores could start informing decisions and become signals people lean on when judging music at scale.

I took a closer look to understand how the model frames aesthetic judgment and what this means in practice. I ran Audiobox Aesthetics myself and examined how its scores behave with real music.

What Meta’s Audiobox Aesthetics paper claims

Before jumping into my evaluation, let’s take a closer look at what Meta’s Audiobox Aesthetics paper set out to do.

The paper introduces a research model intended to automate how audio is evaluated when no reference version exists. The authors present this as a way to automate listening judgments. They describe human evaluations as costly and inconsistent, leading them to seek an automated alternative.

To address this need, the authors propose breaking audio evaluation into four separate axes and predicting a separate score for each:

  • Production Quality (PQ) looks at technical execution, focusing on clarity and fidelity, dynamics, frequency balance, and spatialization.
  • Production Complexity (PC) reflects how many sound elements are present in the audio.
  • Content Enjoyment (CE) reflects how much listeners enjoy the audio, including their perception of artistic skill and overall listening experience.
  • Content Usefulness (CU) considers whether the audio feels usable for creating content.

The model is trained using ratings from human listeners who follow the same guidelines across speech, music, and sound effects. It analyzes audio in short segments of around 10 seconds. For longer tracks, the model scores each segment independently and provides an average. 

Beyond the audio itself, the model has no additional context. It does not know how a track is meant to be used or how it relates to other music. According to the paper, the scores tend to align with human ratings and could help sort audio when it’s not possible to listen to it all. In that way, the model is presented as a proxy for listener judgment.

Why I decided to evaluate the model

I wasn’t the only one who was curious to look into this model. Jeffrey Anthony’s “Can AI Measure Beauty? A Deep Dive into Meta’s Audio Aesthetics Model,” for instance, offers a deep, philosophical examination of what it means to quantify aesthetic judgment, including questions of ontology and judgment. I decided to tackle the topic even more with a hands-on approach, testing the model on some real-world examples to understand whether we could find some interesting patterns in the model’s predictions. 

What caught my attention most was how these scores are meant to be used. Once aesthetic judgments are turned into numbers, they start to feel reliable. They look like something you can sort by, filter on, or use to decide what gets heard and what gets ignored.

This matters in music workflows. Scores like these could influence how catalogs are cleaned up, how tracks are ranked for sync, and how large libraries of music are evaluated without listening. With a skeptical but open mindset, I set out to discover how these scores behave with real-world data.

 

What I found when testing the model

A) Individual-track sanity checks

I began with a qualitative sanity check using individual songs whose perceptual differences are unambiguous to human listeners. The tracks I selected represent distinct production conditions, stylistic intentions, and levels of artistic ambition.

I included four songs:

The motivation for this test was straightforward. A model claiming to predict Production Quality should assign a lower PQ to “Funky Town” (low-quality MP3) than to “Giorgio by Moroder.” A model claiming to estimate production or musical complexity should recognize “Blue Calx” by Aphex Twin as more complex than formulaic late-90s pop-trance such as DJ Visage’s “Schumacher Song.” Likewise, enjoyment and usefulness scores should not collapse across experimental electronic music, audiophile-grade disco-funk, old-school pop-trance, and degraded consumer audio.

You can see that the resulting scores, shown in the individual-track comparison plot above, contradict these expectations. “Funky Town” receives a PQ score only slightly lower than “Giorgio by Moroder,” indicating near insensitivity to codec degradation and mastering fidelity. Even more strikingly, “Blue Calx” is assigned the lowest Production Complexity among the four tracks, while “The Schumacher Song” and “Funky Town” receive higher PC scores. This directly inverts what most listeners would consider to be structural or compositional complexity.

Content Enjoyment is highest for “Funky Town” and lowest for “Blue Calx,” suggesting that the CE dimension aligns more closely with catchiness or familiarity than with artistic merit or aesthetic depth.

Taken together, these results indicate that AES is largely insensitive to audio fidelity. It fails to reflect musical or structural complexity, and instead appears to reward constant spectral activity and conventional pop characteristics. Even at the individual track level, the semantics of Production Quality and Production Complexity don’t match their labels.

B) Artist-level distribution analysis

Next, I tested whether AES produces distinct aesthetic profiles for artists with musical identities, production aesthetics, and historical contexts that are clearly different. I analyzed distributions of Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness for Johann Sebastian Bach, Skrillex, Dream Theater, The Clash, and Hans Zimmer.

If AES captures musically meaningful aesthetics, we would expect to see systematic separation between these artists. For example, Hans Zimmer and Dream Theater might have a higher complexity score than The Clash. Skrillex’s modern electronic productions might have a higher quality score than early punk recordings. Bach’s works might show high complexity but variable enjoyment or usefulness depending on the recording and interpretation.

Instead, the plotted distributions show strong overlap across artists for CE, CU, and PQ, with only minor shifts in means. Most scores cluster tightly within a narrow band between approximately 7 and 8, regardless of artist. PC exhibits slightly more variation, but still fails to form clear stylistic groupings. Bach, Skrillex, Dream Theater, and Hans Zimmer largely occupy overlapping regions, while The Clash is not consistently separate.

This suggests that AES doesn’t meaningfully encode artist-level aesthetic or production differences. Despite extreme stylistic diversity, the model assigns broadly similar aesthetic profiles, reinforcing the interpretation that AES functions as a coarse estimator of acceptability or pleasantness rather than a representation of musical aesthetics.

C) Bias analysis using a balanced gender-controlled dataset

Scoring models are designed to rank, filter, and curate songs in large music catalogs. If these models encode demographic-correlated priors, they can silently amplify existing biases at scale. To test this risk, I analyzed whether AES exhibits systematic differences between tracks with female lead vocals and tracks without female lead vocals.

In our 2025 ISMIR paper, we showed that common music embedding models pick up non-musical singer traits, such as gender and language, and exhibit significant bias as a result. Because AES is intended to judge quality, aesthetics, and usefulness, it would be particularly problematic if it had similar biases. They could directly influence which music is considered “better” or more desirable.

I constructed a balanced dataset using the same methodology used in our 2025 paper, equalizing genre distribution and singer language across groups.

For each group, I computed score distributions for Content Enjoyment, Content Usefulness, Production Complexity, and Production Quality, visualized them, and performed statistical testing using Welch’s t-test alongside Cohen’s d effect sizes. For context, Welch’s t-test is a statistical test that compares whether the average scores between two groups are significantly different. Cohen’s d is a measure of effect size that quantifies how large that difference is in standardized units.

The results show consistent upward shifts for female-led tracks in CE, CU, and PQ. All three differences are statistically significant with small-to-moderate effect sizes. In contrast, there is virtually no difference in Production Complexity score between groups.

This pattern indicates that the model systematically assigns higher enjoyment, usefulness, and quality scores to material with female vocals, even under controlled conditions. Because complexity remains unaffected, the effect doesn’t appear to stem from structural musical differences. Instead, it likely reflects correlations in training data and human annotations, or the model treating certain vocal timbres and production styles associated with female vocals as implicit quality indicators.

These findings suggest that AES encodes demographic-correlated aesthetic priors, which is problematic for a model intended to judge musical quality, aesthetics, and usefulness.

When a measure becomes a target, it ceases to be a good measure.

Charles Goodhart

Economist

Why this matters for the industry

Economist Charles Goodhart famously observed that “when a measure becomes a target, it ceases to be a good measure.” He was describing what happens when a metric starts to drive decisions rather than just being an indicator. Once a number is relied on, it begins to shape how people think and choose.

That idea applies directly to aesthetic scoring. A score, once it exists, carries weight. It gets used as a shortcut in decisions, even when its meaning is incomplete. This matters in music workflows because aesthetic judgment depends on context and purpose. 

When a simplified score is treated as reliable, systems can start favoring what scores well rather than what actually sounds better or serves a creative goal. Over time, that can quietly steer decisions away from how audio is perceived and used in practice.

How we approach audio intelligence at Cyanite

At Cyanite, music isn’t judged in a vacuum, and neither are the decisions built on top of it. That’s why we don’t rely on single aesthetic scores. Instead, we focus on making audio describable and searchable in ways that stay transparent and grounded in context.

Aesthetic scoring can give the illusion of precision, but it often lumps together different technical qualities, genres, and styles. In music search and discovery, a single score doesn’t explain why a track is surfaced or excluded. That reasoning matters to us. Not to decide what’s “good,” but to give teams tools they can understand and trust.

We see audio intelligence as a way to expose structure, not replace judgment. Our systems surface identifiable musical attributes and relationships, knowing that the same track can be the right or wrong fit depending on how it’s used. The goal is to support human decision-making, not substitute it with scores.

Experimentation has a place, but in music, automation works best when it’s explainable and limit-aware.

What responsible progress in music AI should look like

Progress in music and AI is underpinned by transparency. Teams should be able to understand how a model was trained and how its outputs relate to the audio. When results are interpretable, people can see why a track surfaces and judge for themselves whether the signal makes sense in their own context.

That transparency depends on data choices. Music spans styles, cultures, eras, and uses, and models reflect whatever they are fed. Developers need to work with broad, representative data and be clear about where coverage is thin. Being open about what a model sees, and what it does not, makes its behavior more predictable and its limits easier to manage.

Clear communication matters just as much once tools are in use. For scores and labels to be applied responsibly, teams need a shared understanding of what those signals reflect and where their limits are. Otherwise, even well-intentioned metrics can be stretched beyond what they are able to support.

This kind of openness helps the industry build tools people can understand and trust in real workflows. 

We explored how these expectations show up in practice in “The state of AI transparency in music 2025,” a report developed with MediaTracks and Marmoset on how music licensing professionals make decisions around AI, creator background, and context. You can read the full report here.

So… does Meta’s model provide meaningful ratings for music?

Based on these tests, the answer is no. The model produces stable scores, but they don’t map cleanly to how musical quality or complexity are assessed in real catalog work. Instead, the model appears to align more with easily detectable production traits than with the distinctions people consistently make when judging music in context.

That doesn’t make Audiobox Aesthetics insignificant. It can support research by defining a clear scoring framework, showing how reference-free predictors can be trained across speech, music, and sound, and making its models and data available for inspection and comparison. It also illustrates where AES scores can be useful, particularly when large volumes of audio need to be filtered or monitored but full listening is impractical.

Problems emerge when scores like these begin shaping decisions. When a score is presented as a measure of quality, people need to know what it’s actually measuring so they can judge whether it applies to their use case. Without that clarity, it becomes easy to trust the number even when it’s not a good fit.

At Cyanite, we see this as a reminder of the importance of responsibility in music and AI. Progress is driven by systems that stay grounded in real listening behavior and make their assumptions visible.

How Melodie Music combines sound-based AI search and contextual metadata to spotlight original Australian artists

How Melodie Music combines sound-based AI search and contextual metadata to spotlight original Australian artists

Ready to improve your music discovery workflows? Try Similarity Search in Cyanite.

Cyanite aligns with our philosophy because it doesn’t use AI to generate content; it uses AI to uncover it. It solves a genuine pain point for our users: the time-consuming nature of music search. We immediately saw that Cyanite could amplify our existing search system rather than overwrite it. It wasn’t a case of ‘AI versus humans’; it was AI empowering humans to find better music, faster.

Evan Buist

Managing Director , Melodie Music

Melodie is a music licensing platform that provides pre-cleared music for film, TV, advertising, and content creation. All artists and tracks on the platform are carefully curated and hand-selected for quality, originality, and emotional resonance. Ethics are at the core of Melodie’s company philosophy. It operates under a 50/50 revenue and royalty split, meaning Melodie doesn’t earn money on downloads until the artist does.

To make it easier to discover artists at scale, Melodie continues to refine how users navigate its catalog. AI helps users explore more quickly—but it doesn’t replace the human element behind editorial curation.

The rising tension between depth and speed

As Melodie’s catalog grew, a familiar tradeoff emerged: depth versus speed.

Despite thoughtful editorial tagging, the reality was that users often struggled to translate nuanced creative briefs into static keywords. “Describing music is inherently subjective; what sounds ‘uplifting’ to one person might sound ‘intense’ to another. As the saying goes, talking about music is like dancing about architecture,” explains Evan.

By relying solely on tags, users often found themselves in an experimental searching-listening-refining-repeating loop—a time-consuming effort that most editors and producers simply don’t have the bandwidth for.

Melodie recognized this problem early on and set out to improve the user experience in their library. As Evan puts it, “bridging the gap between ‘hearing it in your head’ and ‘finding it on the screen’ is the holy grail of music licensing.”

AI as an enabler, not a generator

Human curation is central to how Melodie operates. Tracks are not scraped or auto-generated. Over time, it became clear that tags on their own couldn’t support the kind of discovery users needed, so AI was added to help surface music intuitively and improve navigation.

Cyanite aligned naturally with that philosophy.

Rather than positioning AI as a substitute for curation, Cyanite’s AI search treats sound as data that can be understood, compared, and explored. What clicked for Melodie in their search for AI music analysis software was Cyanite’s approach: “The technology felt musical rather than just mathematical. The analysis is intuitive and forgiving, respecting the nuances of the tracks,” says Evan.

Thanks to this shared understanding, Cyanite became part of Melodie’s day-to-day music discovery process.

How Cyanite fits into Melodie’s workflow

Today, Melodie users move fluidly between different music discovery pathways depending on their working process.

Sound-based Similarity Search

Users can use Cyanite’s Similarity Search to analyze a reference song and instantly explore tracks with a comparable emotional arc, energy, and sonic character. The reference can come from Spotify, YouTube, or a temporary edit.

This closes the gap between intuition and results in seconds.

A gif showing the similarity search interface of melodie music

Prompt-based Free Text Search

Some users prefer to express what they are looking for in their own words. Prompt-based search allows them to describe mood, pacing, or instrumentation, even with spelling errors or mixed languages. Evan believes natural language search has done for music libraries what Google did for information in the late 90s: democratized access.

Regardless of how a user describes music, AI provides a laser-accurate shortlist in seconds. It turns discovery into exploration, allowing users to combine the speed of AI with Melodie’s human-tagged editorial filters to find the perfect track.

Evan Buist

Managing Director , Melodie Music

A gif showing the similarity search interface of melodie music
A screenrecording showing a music similartiy search and highlighting music tags

Cyanite has become a vital part of our ecosystem, helping us prove that technology can support culture, not replace it.

Evan Buist

Managing Director , Melodie Music

Music CMS Solutions Compatible with Cyanite: A Case Study

Music CMS Solutions Compatible with Cyanite: A Case Study

In today's digital age, efficiently managing vast amounts of content is crucial for businesses, especially in the music industry. For those who decide not to build their own library environment, music Content Management Systems (CMS) have become indispensable tools....

AI Music Discovery: How Marmoset Uses Cyanite | A Case Study

AI Music Discovery: How Marmoset Uses Cyanite | A Case Study

Founded in 2010, Marmoset is a full-service music licensing agency representing hundreds of independent artists and labels. At the heart of it, their core experience involves browsing for music. They offer music discovery for any moving visual media. From sync (movies...

From upload to output: how Cyanite turns audio into reliable metadata at scale

From upload to output: how Cyanite turns audio into reliable metadata at scale

Explore how Cyanite turns sound into structured metadata: Just upload a couple of songs to our web app.

Managing a music catalog involves more than just storing files. As catalogs grow, teams start running into a different kind of challenge: music becomes harder to find, metadata becomes inconsistent, and strong tracks remain invisible simply because they are described differently than newer material.

Many teams still rely on manual tagging or have inherited metadata systems that were never designed for scale. Over time, this leads to uneven descriptions, slower search, and workflows that depend more on individual knowledge than on shared systems. Creative teams spend valuable time navigating the catalog instead of working with the music itself.

Cyanite’s end-to-end tagging workflow was built to address this challenge. It gives teams a stable, shared foundation they can build on, supporting human judgement—not replacing it. It complements subjective, manual labeling with a consistent, audio-based process that works the same way for every track, whether you’re onboarding new releases or making a legacy catalog more organized.

This article walks through how that workflow functions in practice—from the moment audio enters the system to the point where structured metadata becomes usable across teams and tools.

Why tagging workflows tend to break down as catalogs grow

Most tagging workflows start with care and intention. A small team listens closely, applies descriptive terms, and builds a shared understanding of the catalog. But as volume increases and more people get involved, the system begins to stretch.

As catalogs scale, the same patterns tend to appear across organizations:

  • Different editors describe the same sound in different ways.
  • Older metadata no longer aligns with newer releases.
  • Genre and mood definitions shift over time.
  • Search results reflect wording more than sound.

When this happens, teams increasingly rely on memory instead of the systems in place. This leads to strong tracks getting overlooked, response times increasing, and trust in the metadata eroding.

Cyanite’s workflow addresses this fragility by grounding metadata in the audio itself and applying the same logic across the entire catalog.

Preparing your catalog for audio-based tagging

Teams can adopt Cyanite very quickly, as there’s very little preparation involved. The system doesn’t require existing metadata, spreadsheets, or reference information. It listens to the audio file and derives all tags from the sound alone.

Getting started requires very little setup:

  • MP3 files up to 15 minutes in length
  • No pre-existing metadata
  • No manual pre-labeling
  • No changes to your current file structure

Even 128 kbit/s MP3s are usually sufficient, which means older archive files can be analyzed as they are—no need for additional audio preparation. Teams can then choose how they want to bring audio into Cyanite based on volume and workflow. Once that’s decided, tagging can begin immediately.

If you’re unsure about uploading copyrighted audio to Cyanite, you can explore our security standards and privacy-first workflows, including options to process audio in a copyright-safe way using encrypted or abstracted data.

Bringing audio into Cyanite in a way that fits your workflow

Different organizations manage music in different ways, so Cyanite supports several ingestion paths that all lead to the same analysis results.

Teams working with smaller batches often start in the web app. This is common for sync teams reviewing submissions, catalog managers auditing older libraries, or teams testing Cyanite before deeper integration. Audio can be uploaded directly, selected from disk, or referenced via a YouTube link, with analysis starting automatically once the file is added.

Platforms and larger catalogs usually integrate via the API. In this setup, tagging runs inside the organization’s own systems. Audio is uploaded programmatically, and results are delivered automatically via webhook as structured JSON as soon as processing is complete. This approach supports continuous ingestion without manual steps and fits naturally into existing pipelines.

For very large catalogs, Cyanite can also provide a dedicated S3 bucket with CLI credentials. This allows high-throughput ingestion without relying on browser-based uploads. It’s often used during initial onboarding of catalogs containing thousands of tracks.

Some teams prefer not to upload files themselves at all. In those cases, audio can be shared via common transfer tools before the material is processed and delivered in the agreed format.

What happens once the analysis is complete?

Cyanite produces a structured, consistent description of how each track sounds, independent of who uploaded it or when it entered the catalog.

Metadata becomes available either in the web app library or directly inside your system via the API. We can also deliver an additional CSV and Google Spreadsheet export on request.

Each track receives a stable set of static tags and values, including:

  • Genres and free-genre descriptors
  • Moods and emotional dynamics
  • Energy and movement
  • Instrumentation and instrument presence
  • Valence–arousal values
  • The most representative part of the track
  • An Auto-Description summarizing key characteristics

All tags are generated through audio-only analysis, which ensures that legacy tracks and new releases follow the same logic. Over time, this consistency becomes the foundation for faster search, clearer filtering, and more reliable collaboration across teams.

The full tagging taxonomy is available for teams that want deeper insight into how attributes are defined and structured. Explore Cyanite’s tagging taxonomy here.

Curious how the Google Spreadsheet export looks? Check out this sample.

How long tagging takes at different catalog sizes?

Cyanite processes audio quickly. A typical analysis time is around 10 seconds per track. Because processing runs in parallel, turnaround time depends more on workflow setup than on catalog size.

In practice, teams can expect:

  • Small batches to be ready almost instantly
  • Medium-sized libraries to complete within hours
  • Enterprise-scale catalogs to be onboarded within 5–10 business days, regardless of size

For day-to-day use via the API, results arrive in near real time via webhook as soon as processing finishes. This makes the workflow suitable both for large one-time onboarding projects and continuous ingestion as new music arrives.

Understanding scores, tags, and why both matter

Cyanite’s models produce two complementary layers of information.

Numerical scores describe how strongly an attribute is present, both across the full track and within time-based segments. These values range from zero to one, with 0.5 representing a meaningful threshold.

Cyanite creates final tags by using an additional decision layer that considers how different attributes relate to one another. It doesn’t just apply a simple cutoff. This approach helps resolve ambiguities, stabilize hybrid sounds, and produce tags that make musical sense in context.

This means you get metadata that remains robust even for tracks that blend genres, moods, or production styles—a common challenge in modern catalogs.

Exporting metadata into your existing systems

Once tags are available, your team can export them in the format that best fits your workflow.

API users typically work with structured JSON, delivered automatically via webhook and accessible through authenticated requests. Cyanite’s Query Builder allows teams to explore available fields and preview real outputs before integration.

For one-time projects or larger deliveries, metadata can also be provided as CSV files. Web app users can request CSV export through Cyanite’s internal tools, which is especially useful during catalog cleanups or migrations.

Because the structure remains consistent across formats, metadata can be reused across systems without rework.

Learn how to quickly build your queries for the Cyanite API with our Query Builder.

How teams use tagged metadata in practice

Once audio-based tagging is in place, teams tend to notice changes quickly. Search becomes faster and more predictable. Creative teams can filter by sound instead of guessing keywords. Catalog managers spend less time fixing metadata and more time shaping the catalog strategically.

In practice, tagged metadata supports workflows such as:

  • Catalog management and cleanup
  • Creative search and curation
  • Ingestion pipelines
  • Licensing and rights
  • Sync briefs and pitching
  • Internal discovery tools
  • Audits and reporting

Over time, consistent metadata reduces friction between departments and makes catalog operations more resilient as libraries continue to grow.

Best practices from real-world usage

Teams see the smoothest results when they work with clean audio sources, batch large uploads, manage API credentials carefully, and switch to S3-based ingestion as catalogs become larger. Thinking about export formats early also helps avoid rework during onboarding projects.

None of this changes the outcome of the analysis itself, but it does make the overall process more predictable and easier to manage at scale.

With Cyanite, we have a partner whose technology truly matches the scale and diversity of our catalog. Their tagging is fast and reliable, and Similarity Search unlocks a whole new way to discover music, not just through filters, but through feeling. It’s a huge step forward in how we help creators connect with the right tracks.

Stan McLeod

Head of Product, Lickd

Final thoughts

Cyanite’s tagging workflow is designed to scale with your catalog without making your day-to-day work more complex. Whether you upload a handful of tracks through the web app or process tens of thousands via the API, the result will be the same: structured, consistent metadata that reflects how your music actually sounds.

If you’re ready to move away from manual tagging and toward a more stable foundation for search and discovery, explore the different ways to work with Cyanite and choose the setup that fits your workflow.

Want to work with Cyanite? Explore your options, and get in touch with our business team, who can provide guidance if you’re unsure how to start.

FAQs

Q: Do I need to send existing metadata to use Cyanite’s tagging workflow?

A: No. Cyanite analyzes the audio itself. It doesn’t rely on existing tags or descriptions.

Q: Can Cyanite handle both legacy catalogs and new releases?

A: Yes, it can. The same analysis logic applies to all tracks, which helps unify older and newer material under a single metadata structure.

Q: How are results delivered when using the API?

A: Results are sent automatically via webhook as structured JSON as soon as processing is complete.

Q: Is the tagging output consistent across export formats?

A: Yes. JSON and CSV exports use the same underlying structure and values.

Q: Who typically uses this workflow?

A: Music publishers, production libraries, sync teams, music-tech platforms, and catalog managers use Cyanite’s tagging workflow to support search, licensing, onboarding, and catalog maintenance.

Q: How long will it take to tag my music?

A: Small batches are tagged almost immediately. For larger catalogs, we usually need 5–10 business days for the complete setup.

How Cyanite protects your sensitive audio: privacy-first workflows for every catalog

How Cyanite protects your sensitive audio: privacy-first workflows for every catalog

Looking for secure AI music analysis? Discover Cyanite’s integration options. 

For many music teams, a significant hesitation about AI analysis is not about its capability or quality. It’s about trust. When teams explore AI-driven tagging or search, the conversation almost always leads to the same question: What happens to our audio once it leaves our system?

At Cyanite, we’ve built our technology around that concern from the very beginning. Rather than offering a single security promise, we provide multiple privacy-first workflows designed to meet different levels of sensitivity and compliance. This gives teams the flexibility to choose how their audio is handled, without compromising on tagging quality or metadata depth.

This article outlines the three privacy models Cyanite offers, explains how each one works in practice, and helps you decide which setup best fits your catalog and internal requirements.

Why audio privacy matters in modern music workflows

For those who manage it, audio represents creative identity, contractual responsibility, and, often, years of human effort. It’s not just another data type. Sending that material outside an organization can feel risky, even when the technical safeguards are strong and the operational benefits are clear.

Teams that evaluate our services often raise concerns about protecting unreleased material, complying with licensing agreements, and maintaining long-term control over how their catalogs are used. They look for assurances around:

  • Safeguarding confidential or unreleased content
  • Complying with NDAs and contractual obligations
  • Meeting internal legal or security standards
  • Maintaining full ownership and control

These are not edge cases. They reflect everyday realities for publishers, film studios, broadcasters, and music-tech platforms alike. That’s why Cyanite treats privacy as a core design principle.

Security option 1: GDPR-compliant processing on secure EU servers

For many organizations, strong data protection combined with minimal operational complexity is the right balance. In Cyanite’s standard setup, all audio is processed on secure servers located in the EU and handled in full compliance with GDPR.

In practical terms, this means:

  • Audio files are never shared with third parties.
  • Songs can be deleted anytime.
  • Ownership and control of the music always remains with the customer.

This model works well for publishers, production libraries, sync platforms, and music-tech companies that want to scale tagging and search workflows without maintaining their own infrastructure. For most catalogs, this level of protection is both robust and sufficient.

That said, not every organization is able to send audio outside its own environment, even under GDPR. For those cases, Cyanite offers additional options.

Learn more: See how AI music tagging works in Cyanite and how it supports large catalogs.

Security option 2: zero-audio pipeline—tagging without transferring audio

Some teams manage catalogs that cannot be transferred externally at all. These include confidential film productions, enterprise music departments, and archives operating under strict internal compliance rules. For these situations, Cyanite provides a spectrogram-based workflow that enables full tagging without the audio files ever being sent.

Three spectograms

Spectrograms from left to right: Christina Aguilera, Fleetwood Mac, Pantera

Instead of uploading MP3s, audio is converted locally on the client side into spectrograms using a small Docker container provided by Cyanite. A spectrogram is a visual representation of frequency patterns over time. It contains no playable audio, cannot be converted back into a waveform without significant quality loss, and does not expose the original performance in any usable form.

From a metadata perspective, the results are identical to audio-based processing. From a privacy perspective, the original audio never leaves the customer’s environment. This makes the zero-audio pipeline a strong middle ground for teams that want AI-powered tagging while maintaining strict control over their content.

From a product perspective, all Cyanite features can be fully leveraged.

For us at Synchtank, the spectrogram-based upload was key. Many of our clients are cautious about where their audio goes, and this approach lets us use high-quality AI tagging and search without transferring any copyrighted audio. That balance, confidence for our customers without compromising on quality, is what made the difference for us.Amy Hegarty, CEO at Synchtank 

Learn more: What are spectrograms, and how can they be applied to music?

Security option 3: fully on-premise deployment via the Cyanite Audio Analyzer on the AWS Marketplace

For organizations with the highest security and compliance requirements, Cyanite also offers a pseudo-on-premises deployment option via the AWS Marketplace. In this setup, Cyanite’s tagging engine runs entirely inside the customer’s own AWS cloud infrastructure via the Cyanite Audio Analyzer.

This approach provides:

  • Complete pseudo-on-premise processing
  • Zero data transfer outside your AWS cloud environment
  • Full control over storage, access, and compliance
  • Tagging accuracy identical to cloud-based workflows

This option is typically chosen by film studios, broadcasters, public institutions, and organizations working with unreleased or highly sensitive material that must pass strict internal or external audits.

Because the pseudo-on-premise container operates in complete isolation (no internet connection), search-based features—including Similarity Search, Free Text Search, and Advanced Search—are not available in this setup. In pseudo-on-premise environments, Cyanite therefore focuses exclusively on audio tagging and metadata generation.

Important note: The rates on the AWS Marketplace are intentionally high to deter fraudulent activity. Please contact us for our enterprise rates and find the best plan for your needs.

Choosing the right privacy model for your catalog

Selecting the right setup depends less on catalog size and more on how tightly you need to control where your audio lives. A useful way to frame the decision is to consider how much data movement your internal policies allow.

In practice, teams tend to choose based on the following considerations:

  • GDPR cloud processing works well when secure external processing is acceptable.
  • Zero-audio pipelines suit teams that cannot transfer audio but can share abstract representations.
  • Pseudo-on-premise deployment is best for environments requiring complete isolation.

All three options deliver the same tagging depth, consistency, and accuracy. The difference lies entirely in how data moves, or doesn’t move, between systems.

Final thoughts

Using AI with music requires trust—trust that audio is handled responsibly, that ownership is respected, and that workflows adapt to real-world constraints rather than forcing compromises. Cyanite’s privacy-first architecture is designed to uphold that trust, whether you prefer cloud-based processing, a zero-audio pipeline, or a fully isolated pseudo-on-premise deployment.

If you’d like to explore which setup best fits your catalog, workflow, and compliance needs, you can review the available integration options.

FAQs

Q: Where is my audio processed when using Cyanite’s cloud setup?

A: In the standard setup, audio is processed on secure servers located in the EU and handled in full compliance with GDPR. Audio is not shared with third parties and remains your property at all times.

Q: Can I use Cyanite without sending audio files at all?

A: Yes. With the zero-audio pipeline, you convert audio locally into spectrograms and send only those abstract frequency representations to Cyanite. The original audio never leaves your environment, while full tagging results are still generated.

Q: What is the difference between the zero-audio pipeline and pseudi-on-premise deployment?

A: The zero-audio pipeline sends spectrograms to Cyanite’s cloud for analysis. The pseudo-on-premise deployment runs the Cyanite Audio Analyzer entirely inside your own AWS cloud infrastructure, which is cut off from the internet and only connected to your system. Pseudo-on-premises offers maximum isolation but only supports tagging, without search features.

Q: Are Similarity Search and Free Text Search available in all privacy setups?

A: Similarity Search, Free Text Search, and Advanced Search are available in cloud-based and zero-audio pipeline workflows. In fully pseudo-on-premise deployments, Cyanite focuses exclusively on tagging and metadata generation due to the isolated environment.

Q: Which privacy option is right for my catalog?

A: That depends on your internal security, legal, and compliance requirements. Teams with standard protection needs often use GDPR cloud processing. Those with higher sensitivity choose the zero-audio pipeline. Organizations requiring full isolation opt for on-premise deployment. Cyanite supports all three.

Why AI labels and metadata now matter in licensing

Why AI labels and metadata now matter in licensing

A new industry report from Cyanite, MediaTracks, and Marmoset reveals how professionals are navigating the rise of AI-generated music. Read here.

AI’s move to the mainstream has changed what people expect from music catalogs. Licensing teams now look for clearer data about the music they review. They want to know whether it’s human-made or AI-generated, and they also look for details that help place the music in the right creative or cultural setting. Many check these cues first, then move on to mood or tone.

At Cyanite, we partnered with MediaTracks and Marmoset to understand the level of transparency and cultural context music licensing professionals expect when reviewing AI-generated music. MediaTracks and Marmoset surveyed 144 people across their professional communities—including music supervisors, filmmakers, advertisers, and producers—and we worked with them to interpret the findings and publish this report.

The responses revealed that most people want clear labeling when AI is involved. Yet, despite this shared desire for transparency, only about half of the respondents said they would only work with human-made music.

The full study goes deeper into these findings and shows how they play out in real licensing work.

Why we ran this study

We wanted a clear view of how people make decisions when AI enters the picture. The conversation around AI in music moves fast, and many teams now ask for context that helps them explain their selections to clients. This study aimed to find out which parts of the metadata give them that confidence.

It also looked at how origin details and creator context guide searches and reviews. We wanted to see where metadata supports the day-to-day licensing process and where there are gaps.

Transparency is now a baseline expectation

97% of respondents said they want AI-generated music to be clearly labeled, and 37% used the word “transparency” in their written responses. They want a straightforward read on what they’re listening to. Some tied this to copyright worries. One person put it simply: 

“I’m concerned that if it were AI-generated, where did the AI take the themes or phrases from? Possible copyright infringement issues.”

Transparency doesn’t just apply to the AI label. We found that respondents also see context as part of that clarity—knowing who made the music and where it comes from. This information helps them assess whether the music is a good fit for the project. They use it during searches to filter for cultural background or anything else that’s relevant to the brief.

What these findings mean for the industry

These findings show how much clarity now shapes day-to-day work in music catalogs. People expect AI music to be labeled accordingly, and they lean on context to move through searches and briefs without second-guessing their choices. Human-made music is still highly valued. The real change has been in how teams use origin details to feel sure about their selection.

This sets a new bar for how catalogs present their music. Teams want dependable information, including context that helps them avoid missteps in projects that depend on cultural accuracy or narrative alignment.

This finding ties into how Cyanite supports catalogs today. Our audio-first analysis gives people a clear read of the music itself, which sits alongside the cultural or creative context they already rely on. It helps teams search with more clarity and meet the expectations that are now shaping the industry.

How Cyanite’s advanced search fits in

The study showed how important cultural background and creator context are when people review music. Teams often keep their own notes and metadata for this reason. Cyanite’s Advanced Search supports that need by letting catalogs add and use their own custom information in the search.

Custom Metadata Upload – one of many features of our new Advanced Search, lets you upload your own tags – such as cultural or contextual details that don’t come from the audio analysis – and use them as filters. You can set your own metadata criteria first, and the system will search only within the tracks that match those inputs.

When you then run a Similarity- or Free Text Search, the model evaluates musical similarity inside that filtered subset. As a result, search and discovery reflects both the sound of a track and the context around it.

You can search your catalog for “upbeat indie rock” but you can also search for “upbeat indie rock, human-produced, female-led, one-stop cleared, independent.

Read the full report

The survey drew responses from people who license music often as part of their work and feel the impact of unclear metadata. Their answers show how they think about AI involvement, creator background, and the context they need when they search.

The full report brings these findings together with information about the study—who took part, how often they search, the questions they answered, and how responses differed by role. It also includes partner insights from MediaTracks and Marmoset, along with charts and quotes that show how transparency and context shape real choices in licensing.

You can read the full study here.