Batch acquires Moonfish AI 🔥 Read the news →

Email deliverability: how Batch built the monitoring tool the market was missing

Data & Tech
9 Jun 2026 · Written by Mickael Bentz

Baptiste Guerre is Deliverability Expert at Batch. In a matter of weeks, he built the Deliverability Cockpit from scratch the tool that CSM, Delivery Managers, and Solution Engineers needed to deliver best-in-class client monitoring. We asked him how.

Why this tool? What was missing?

In email, every deliverability point gained translates directly into revenue. Millions of euros on the line. Our clients trust us on this, and we wanted the best possible tooling.

The data existed. SparkPost, Microsoft SNDS, Postmaster Tools, Signal Spam: all these sources expose APIs. But each one lives in its own interface, built for specialists. A CSM managing dozens of clients in parallel can't spend their morning manually cross-referencing sources to build a diagnosis.

As for market tools: Validity Everest is the reference. Powerful, well-designed, but built for companies managing their own deliverability. Their context is an internal team, a handful of sending domains, focused entirely on their own campaigns. At Batch, the situation is fundamentally different: we support hundreds of clients, each with their own domains, subdomains, IP pools, practices, and challenges. We also need to analyze things internally and surface some of that analysis directly to clients. No market tool is built for this specific reality of CEP operators.

The idea was simple: build what fits our needs. Not adapt something that already exists.


How did you build it, concretely?

The first thing I did before opening Claude Code was open Figma. I sketched out what I wanted to see. Not an elaborate mockup, more like a grade school collage. A view per domain, a health score, alerts. The goal was to give the AI a rough wireframe so it wouldn't go off in every direction from the start.

Then I wrote a three-to-four-page Markdown file. Not a formal spec, more of a needs statement:

  • The name of the tool, what it does, who it's for

  • The primary audience: CSMs without deep deliverability training who need the important information presented simply

  • The data mapping: this metric comes from SparkPost, this one from Postmaster Tools, and so on

I shared this file with Claude in conversational mode first, before switching to Claude Code, to help me sharpen the prompt. Claude asked questions. "When you mention history, over how long a period?" Clarifications I wouldn't necessarily have anticipated on my own. By the end, I had a genuinely solid spec document.

And the first run?

I built a layered final prompt: the full specs, a frontend-design skill (instruction files that constrain Claude Code on style, colors, and interface rules) customized with Batch's design principles, and security constraints. I asked Claude to read everything before starting, and to flag anything unclear.

One important note on skills: I tested a lot of them early on. And I quickly realized that poorly configured skills don't just fail to help. They slow you down. In the end, I work with three, well-configured ones:

  • Frontend-design to frame the interface

  • Impeccable for quality audits

  • Superpower to break large requests into agents and validate mockups before coding

Three mastered skills beat ten under-used ones.

The prompt ran for about half an hour. What came out: a functional Deliverability Cockpit on desktop and mobile, with connected data flows, ready to diagnose urgent issues. In a single session.

There must have been problems.

Several. The first: a lack of nuance in KPI interpretation. Take the minimum volume needed to qualify a bounce rate. If you've sent 2 emails and 1 bounces, displaying "50% bounce rate" is noise, not information. Some data also arrives several days late. Google Postmaster Tools data, for instance, needs to be correctly attributed in graphs to avoid triggering alerts for issues that happened in the past. Early on, all of this was hardcoded. Every change went through Claude, consumed credits, and after multiple iterations I had six different thresholds scattered across the project.

The fix: a centralized settings page with all thresholds. Everything adjustable in one place, without touching the code.

The second problem was naming. I hadn't defined naming conventions upfront. Midway through the project, I had variables called qualified on one page and isQualified on another. Completely chaotic code. I had to stop everything for a major refactor. Now it goes in the CLAUDE.md from day one: where things live, how they're named.

A third issue: when I integrated Stitch (a mockup tool that also exports code) for mobile interfaces, the AI sometimes prioritized values hardcoded in the design over the specs. I spent a morning debugging indicators displaying static data. After three hours, Claude told me it had pulled values from the Stitch design rather than the specs. The AI hadn't understood the difference between form and content.

How did you manage code quality without being a developer?

I read what it does. Not line by line, but when it removes 40 lines and adds 2 for a minor change, I know something's off. And when I'm not sure, I ask it to explain what changed.

Two reflexes I apply consistently: I ask it three times in a row whether it finds any bugs. First time: "All good, no bugs." Second time: "Are you sure?" Still nothing. Third time: it finds three or four. You have to push.

And before every major feature, I run /security-audit and /code-review in Claude Code. In practice, the risk is low on an internal tool that uses no personal data, protected behind SSO, built on strict technical and security constraints. Even so, we hold everything to a high standard regardless. No shortcuts.

What's the CLAUDE.md you keep mentioning?

It's Claude's reference point for the project between sessions. Architecture, design rules, security principles, where things live in the codebase, naming conventions. Without it, every new session starts from scratch and the AI reinvents things it may have already built. With it, it knows which direction to head from the first prompt.

A structured CLAUDE.md helps, but it only works alongside solid context hygiene. The AI has limited working memory. The longer a conversation runs, the more it loses track of what was said early on, without flagging this. It keeps responding with confidence, reconstructing what it's forgotten from context. That's "context rot": not silence, but silent invention. The result: incomplete work, rework, and wasted tokens.

To reduce it:

  • Don't let the context window get saturated. In Claude Code, /compact compresses session memory to keep only what's relevant going forward (e.g. /compact keep only details related to bounce rates)

  • Work in short sessions, each focused on one functional block. In Claude Code, /resume picks up an existing session, and /rename gives it a name so you can find it later

What does the tool actually enable for teams today?

A CSM monitoring a client in a deliverability support engagement opens the Deliverability Cockpit and immediately sees the health status of all their subdomains: a score out of 100, active issues confirmed in the last 48 hours, and AI-generated recommendations when the score alone doesn't tell the full story. They can see performance by ISP (Gmail, Outlook, Yahoo, Orange, etc.) and receive real-time Slack alerts if anything goes wrong across more than 15 control points: abnormal volumes, bounces, broken authentication, spam spikes, and more.

There's also a template audit module: before sending a critical campaign, you can score an email out of 100. HTML analysis, coherence between content and design, mobile rendering, accessibility, compliance, plus an AI layer that prioritizes the fixes to make.

What used to take an hour of manual collection and analysis now takes 5 minutes. And it doesn't require deep deliverability expertise to interpret. That's what lets the entire CSM team operate at the same level, not just the most technical profiles.

At Batch, deliverability goes beyond the internal tool, right?

From the start, we wanted to differentiate on this. To be better than other CEPs on the market: not just on features, but on the level of monitoring, the speed of response, the precision of diagnosis. This isn't positioning talk. The results are measurable. Clients gain performance when they move to Batch.

The Deliverability Cockpit is one of the foundations of that support. It encodes the team's expertise and makes it accessible to every profile.

The question of integrating it directly into the Batch CEP is an active discussion. Parts of the Deliverability Cockpit could eventually be exposed directly in the platform, on the client side.

Any advice for someone who wants to do the same?

Start by drawing the result before touching any code. The AI produces what you show it. Without a wireframe, even a rough one, it invents.

Write the specs before the prompt. Not for the AI, for yourself. It forces you to clarify what you actually want to build.

Work in short sessions, block by block. After 3 hours of vibe coding, you stop seeing what's happening, you stop understanding what's going on, and it's nothing but bugs. The AI degrades. So does the developer.

Avoid role-playing (e.g. "You are a deliverability expert, analyze the results..."). As shown by Zheng et al. (2024), role-playing prompts are not a good idea for objective tasks with LLMs: they don't improve performance, can introduce bias, and their effect is unpredictable. A neutral, direct approach works better.

Accept that there's a real learning curve. Over time, you start to know your Claude: to anticipate what it will do well and where it will stumble. It's not a tool you master on the first try. But once you have your reflexes, you move much faster.

Mickael Bentz

Head of Product Management @ Batch

Reading time
min

Follow us

linkedin iconyoutube iconwttj icontwitter icon
Newsletter

The CRM Newsletter

Subscribe to get the latest news in your inbox!