Why I ran this test

Email subject lines are the most A/B-tested element in marketing. Everyone has an opinion. Very few people share actual data. I wanted to find out whether AI-generated subject lines would outperform the ones I was writing myself — not in theory, but across a real audience over a real month.

The question I was testing: if I hand off subject line writing entirely to AI, do open rates go up, down, or sideways?

📋 Experiment Parameters

Duration: 30 days. Campaigns: 8 broadcast sends, each split 50/50 between AI-written and human-written subject lines. Total sends: ~240 (audience of ~30 per variant per send). Tool: Claude. List: BuzzRiding newsletter subscribers. Platform: Beehiiv.

The setup

Each week I wrote four email subject lines myself. Then I gave Claude the same email content and asked it to write four alternatives — same message, different phrasing. I kept the email body identical across both variants to isolate the variable.

I didn't tell Claude to "be creative" or "write 10 options". I gave it a tight brief: the email topic, the audience (marketing professionals, 27–42, growth-oriented), and the goal (drive opens). One prompt, one subject line, no cherry-picking from a list.

No emoji were used in either variant. No clickbait. Both variants followed the same brand voice: direct, data-informed, practitioner-level.

The results

8
A/B tests run
5/8
Rounds AI won
+6.4pp
Avg open rate lift (AI wins)
+11.2pp
Biggest human win margin

AI won 5 out of 8 rounds. That surprised me. What surprised me more was how it won and how it lost.

When AI won, it won by moderate margins — typically 4 to 8 percentage points on open rate. When the human subject line won, it won by larger margins. The biggest human win was 11.2 percentage points. The AI never produced a win that large.

That pattern matters. AI is more consistent, but less capable of the high-variance swing that a genuinely insightful subject line can produce.

What AI did well

Specificity under pressure. When I was tired or rushed, my subject lines got generic. "This week in AI marketing" is a subject line I'm not proud of. Claude never got lazy. It reliably produced tight, specific phrasing even when the brief was brief.

Avoiding hedging language. Humans write subject lines with unconscious softeners: "some thoughts on", "a quick note about", "have you considered". Claude didn't hedge. Its subject lines led with the point.

Format variety. Across 8 rounds, Claude used different structural approaches — question format, numbered format, plain declarative statement, contrast structure ("X is dead. Here's what works instead."). My subject lines were more stylistically consistent, which is another word for repetitive.

What AI did badly

Insider references fell flat. The two rounds where AI lost badly involved emails with a practitioner-specific hook — a specific tool update, a community conversation I'd seen. Claude wrote technically accurate subject lines but missed the cultural shorthand. "Marketers are ditching Jasper" is a different subject line from "Why the Jasper conversation changed this week". The first is news. The second is inside knowledge.

It doesn't know your list. The best-performing human subject line referenced a phrase from the previous week's email — a callback that only worked because I knew what I'd sent before. Claude has no memory of your relationship with your audience. That's a real ceiling.

Tone can drift slightly corporate. Claude's default register is slightly more formal than the BuzzRiding voice. Small adjustments to the prompt fixed this, but it required active management.

The templates that won

Three structural patterns outperformed everything else — across both AI and human-written lines:

Interestingly, question-format subject lines underperformed across the board — AI and human both. Questions require the reader to care about the answer. Statements assume they already should.

What I'd do differently

Run the experiment for 90 days, not 30. Email open rate data is noisy with small audience sizes. Eight tests isn't enough to draw strong conclusions — it's enough to identify patterns worth investigating further.

I'd also build a prompt that includes the previous three subject lines. Context about what you've already sent helps Claude avoid accidental repetition and opens the door to callback structures.

The third thing: test on a warm list of 500+, not 30. Small audience variance distorts the data significantly. The experiment was worth running — but the results would be more defensible at scale.

Should you use AI for subject lines?

Yes, as a first draft tool and a hedge against your own lazy days. No, as a total replacement for someone who knows their audience and can write an insider hook.

The honest framing: AI subject lines are reliably above average. Human subject lines have a higher ceiling and a lower floor. Which matters more depends on your list size and your tolerance for variance.

For a solo marketer running a small list: AI is probably net positive. For a team managing a 100k-subscriber list where a 2-point open rate swing means thousands of opens — the human ceiling matters more.

Frequently Asked Questions

What AI tool is best for email subject lines?
Claude performs best for subject lines that need to match a specific brand voice, because it responds well to detailed prompts about tone and audience. ChatGPT produces more variety but requires more editing. For pure volume testing, either works — the key is giving the AI your audience profile and brand voice in the prompt, not just the email topic.
Does AI-generated email content hurt deliverability?
No. Email deliverability is determined by sender reputation, list hygiene, authentication records (SPF/DKIM/DMARC), and engagement history — not the method used to write the subject line. An AI-written subject line is indistinguishable from a human-written one to any deliverability filter.
How do I prompt AI to write better subject lines?
Include four things in your prompt: the email topic in one sentence, your audience profile (role, experience level, primary concern), the goal of the email (open to read, click a link, etc.), and 2–3 examples of subject lines that have worked well for your list in the past. That context is what separates a generic output from something that actually sounds like you.
What open rate improvement can I expect from AI subject lines?
Based on this experiment: modest and inconsistent. AI won 5 of 8 rounds by an average of 6.4 percentage points — but the sample size is too small to treat as definitive. Industry data suggests AI-assisted subject lines improve open rates by 5–15% on average, but results vary significantly by industry, list quality, and how well the AI is prompted.
Is it worth A/B testing subject lines at a small list size?
It depends. With fewer than 200 subscribers, variance is high enough that individual results are unreliable. The value of small-list A/B testing is pattern recognition over time — not single-round conclusions. Run 20+ tests before drawing firm conclusions, and look for structural patterns (question vs. statement, specific vs. vague) rather than obsessing over individual winners.