I Built a Music Intelligence App in Five Days with Two AI Coding Agents
For the past few years I've been getting more serious about music production. Not just listening — actually producing. And one of the things producers do constantly is work with reference tracks: commercial songs that capture the energy, tonality, or vibe you're chasing in a mix. You pull them into your DAW, you A/B against them, you learn from them.
I had a problem. My music library had grown into thousands of MP3s, WAVs, and AIFFs spread across a few hard drives with no real organization. When I wanted a reference — something around 120 BPM, minor key, dark electronic — I was scrolling through Finder like it was 2004. There had to be a better way.
So I decided to build one.
The Idea
The concept was straightforward in principle: a desktop app that scans your audio library, runs machine-learning analysis on every track, and stores the results in a local database. From there, you can filter by BPM, key, genre, mood, and energy. You can search semantically — "find me something that sounds like this reference" — using audio embeddings. You can drag tracks directly from the app into your DAW.
No cloud. No subscription. Everything local, everything fast.
The technical scope was... not small. I wanted:
A recursive audio scanner with metadata extraction
A multi-pass ML pipeline: loudness (EBU R128), BPM, key detection, genre classification across hundreds of categories, mood/energy profiling, CLAP audio embeddings for semantic search, and optionally Qwen2-Audio for rich natural-language descriptions
A vector similarity search so I could ask "what sounds like this track?" or weird questions like “show me metal rock songs with luscious synths”
A 2D Music Map that projects my library into a visual space using UMAP
A References panel with a local player, spectrogram visualization, and native drag-to-DAW support
All of this running inside a native macOS desktop app with a responsive UI
I'm an experienced developer, but I was realistic: this would normally take weeks to build from scratch. I gave myself the weekend and then some. I ended up shipping the MVP in five days.
The tools that made this possible were Claude Code and Antigravity.
The Architecture
The app is built on Tauri — a Rust + WebView framework for native desktop apps. The frontend is Svelte 5 (TypeScript), and the backend is a single Rust binary that owns everything: the SQLite database, the ML inference, the audio pipeline, the IPC layer.
There's one external process: llama-server from llama.cpp, which runs the Qwen2-Audio model for deep audio descriptions. It gets spawned on demand when you run the Qwen pass, and killed automatically when the window closes.
I deliberately kept everything in-process on the Rust side. Early in the project, I had a FastAPI Python backend handling the ML work. I killed it. Moving CLAP embeddings, ONNX classifiers, resampling, log-mel spectrograms, and UMAP projections into native Rust wasn't trivial — it took several dedicated commits — but the payoff was a self-contained binary with no Python runtime dependency and significantly better performance. The ONNX models run through ort, the DSP through rustfft and rubato, the vector search through sqlite-vec.
The analysis pipeline is a priority queue of "passes" — each pass enriches tracks with one category of metadata. Passes are idempotent and resumable. If you quit mid-analysis and restart, it picks up from where it left off. The UI shows real-time progress with per-pass ETAs.
Working with Claude Code
Claude Code handled the majority of the implementation — probably around 56 out of 109 commits have a Co-Authored-By: Claude Sonnet 4.6 tag.
My role was architecture and direction. I made every structural decision: which stack, which ML models, how the database schema evolves, when to kill the Python backend. Claude Code translated those decisions into working Rust, TypeScript, and SQL.
What surprised me was how well it handled depth. Porting the log-mel spectrogram pipeline to Rust — including correct resampling with rubato, the FFT windowing, the mel filterbank math — is not a casual task. Claude got it right, including subtleties I would have taken hours to debug manually, like ensuring the ONNX input shapes matched what the exported CLAP model expected.
The agentic workflow also changed how I structured the project itself. I maintain a CLAUDE.md and AGENTS.md with custom project skills — documented patterns for how to add IPC commands, how to write database migrations, how to add ML passes. Rather than re-explaining context every session, these documents prime the agent with the project's conventions. It's a small thing, but it compresses the collaboration overhead considerably.
Where Claude Code shines is in already-understood scope: "implement this feature within this architecture." Where it needs more steering is when the design itself is uncertain — when you're deciding what to build, not just how. Those decisions stayed with me.
Working with Antigravity
The other half of the commits came from sessions with Antigravity, Google's agent-first coding tool. Honestly, the split between the two wasn't strategic — I alternated based on token limits. When one context window was exhausted, I'd switch to the other and keep going.
That said, after five days of going back and forth, I did develop a feel for where each one has an edge. Claude Code tends to be more conservative and realistic: it pushes back when something is unclear, it's honest when a design has a problem, it won't confidently charge into a solution that might be wrong. Antigravity leans more optimistic — it's willing to commit harder to a path and tends to feel faster. On deeply technical problems, like complex Rust type puzzles or intricate ONNX inference plumbing, I found Antigravity often produced sharper output.
In practice they're largely interchangeable. Both understood the codebase, both respected the project conventions, and neither produced code that felt stylistically foreign to the other's. The codebase is coherent, and a reader wouldn't be able to tell from looking at any given file which agent wrote it — including me.
What the MVP Can Do Today
Five days later, the app works on my actual library — thousands of tracks. Here's what I can do with it:
Browse and filter. The tracks browser supports filtering by BPM range (dual-handle slider), key (with enharmonic equivalence and Camelot notation), genre (free-text autocomplete over 400 categories), mood/energy range, vocal/instrumental, and collection. All filters compose.
ML analysis, fully local. Every track gets processed through the analysis pipeline: EBU R128 loudness, BPM via spectral-flux autocorrelation, key detection via chromagram + Krumhansl-Schmuckler, multi-label genre classification across 400 genres using a distilled ONNX model, mood/danceability/energy profiling, CLAP audio embeddings, sentence embeddings for text search, and UMAP 2D coordinates for the Music Map. No cloud API calls. No data leaves the machine.
Semantic search. The "Sounds Like" feature uses the CLAP embeddings stored in a sqlite-vec vector table to run K-nearest-neighbor search. Type a track name, pick a reference, and get back the most sonically similar tracks in your library. This is the feature I use most.
Music Map. A 2D canvas where every track is a dot, positioned by its UMAP-projected embedding coordinates. Clicking a dot plays the track. Hovering shows metadata. It's a genuinely different way to navigate a large library.
References panel. A dedicated panel for curating reference tracks — separate from the main library — with a local audio player. Once you've assembled a reference collection, you can drag any track directly into your DAW. The drag uses the native macOS file drag protocol via tauri-plugin-drag, so it works with Ableton, Logic, Reaper — anything that accepts audio file drops from Finder.
The one honest limitation right now is Qwen2-Audio. Running the full Qwen2-Audio-7B model for deep natural-language descriptions takes several hours over a large library. It's genuinely useful output when it runs — rich descriptions like "uptempo electronic track with heavy sidechain compression, minor key, melancholic vocal" — but it's not something you kick off casually. But, you run it once and then you cache the results.
Going Open Source
This started as a personal tool. But the more I use it, the more I think there's something here that other producers and music nerds would find useful.
The plan is to clean it up and release it on GitHub as an open-source project. Packaging a Tauri app with bundled ONNX models and an optional llama.cpp dependency isn't trivial, but it's solvable — I've already started drafting the distribution plan. The codebase is in good shape: documented architecture, a proper testing strategy, project conventions that a contributor could follow.
If you're a producer with a large library, or just someone who's frustrated that audio software doesn't treat metadata as a first-class citizen, watch this space.
What I Learned
Building this in five days with AI coding tools wasn't faster because the AI wrote faster code than I would have. It was faster because I could operate at a higher level of abstraction for longer. I spent most of those five days thinking about what the system should do, not wrestling with Rust lifetimes or TypeScript type gymnastics or the exact signature of an ONNX session's input tensors.
That's a meaningful shift. The cognitive bottleneck moved from implementation to design — which is where, honestly, it should always have been.
The two-agent setup (Claude Code + Antigravity) worked better than I expected. Having documented project conventions was the key. An agent without context is an agent you have to babysit. An agent with clear conventions and a well-structured codebase can work autonomously for long stretches, and you spend your energy reviewing and directing rather than hand-holding.
I'm still the architect. That hasn't changed. But the distance between "idea" and "working software" got a lot shorter.