Why I ran this test
Email subject lines are the most A/B-tested element in marketing. Everyone has an opinion. Very few people share actual data. I wanted to find out whether AI-generated subject lines would outperform the ones I was writing myself — not in theory, but across a real audience over a real month.
The question I was testing: if I hand off subject line writing entirely to AI, do open rates go up, down, or sideways?
📋 Experiment Parameters
Duration: 30 days. Campaigns: 8 broadcast sends, each split 50/50 between AI-written and human-written subject lines. Total sends: ~240 (audience of ~30 per variant per send). Tool: Claude. List: BuzzRiding newsletter subscribers. Platform: Beehiiv.
The setup
Each week I wrote four email subject lines myself. Then I gave Claude the same email content and asked it to write four alternatives — same message, different phrasing. I kept the email body identical across both variants to isolate the variable.
I didn't tell Claude to "be creative" or "write 10 options". I gave it a tight brief: the email topic, the audience (marketing professionals, 27–42, growth-oriented), and the goal (drive opens). One prompt, one subject line, no cherry-picking from a list.
No emoji were used in either variant. No clickbait. Both variants followed the same brand voice: direct, data-informed, practitioner-level.
The results
AI won 5 out of 8 rounds. That surprised me. What surprised me more was how it won and how it lost.
When AI won, it won by moderate margins — typically 4 to 8 percentage points on open rate. When the human subject line won, it won by larger margins. The biggest human win was 11.2 percentage points. The AI never produced a win that large.
That pattern matters. AI is more consistent, but less capable of the high-variance swing that a genuinely insightful subject line can produce.
What AI did well
Specificity under pressure. When I was tired or rushed, my subject lines got generic. "This week in AI marketing" is a subject line I'm not proud of. Claude never got lazy. It reliably produced tight, specific phrasing even when the brief was brief.
Avoiding hedging language. Humans write subject lines with unconscious softeners: "some thoughts on", "a quick note about", "have you considered". Claude didn't hedge. Its subject lines led with the point.
Format variety. Across 8 rounds, Claude used different structural approaches — question format, numbered format, plain declarative statement, contrast structure ("X is dead. Here's what works instead."). My subject lines were more stylistically consistent, which is another word for repetitive.
What AI did badly
Insider references fell flat. The two rounds where AI lost badly involved emails with a practitioner-specific hook — a specific tool update, a community conversation I'd seen. Claude wrote technically accurate subject lines but missed the cultural shorthand. "Marketers are ditching Jasper" is a different subject line from "Why the Jasper conversation changed this week". The first is news. The second is inside knowledge.
It doesn't know your list. The best-performing human subject line referenced a phrase from the previous week's email — a callback that only worked because I knew what I'd sent before. Claude has no memory of your relationship with your audience. That's a real ceiling.
Tone can drift slightly corporate. Claude's default register is slightly more formal than the BuzzRiding voice. Small adjustments to the prompt fixed this, but it required active management.
The templates that won
Three structural patterns outperformed everything else — across both AI and human-written lines:
- The specific number: "3 AI tools that outperformed Jasper this month" consistently beat vague alternatives
- The honest caveat: "AI email campaigns work — but not the way you'd expect" beats pure positive framing
- The practitioner contrast: "What agencies found vs. what the platform claims" — any structure that implies insider data
Interestingly, question-format subject lines underperformed across the board — AI and human both. Questions require the reader to care about the answer. Statements assume they already should.
What I'd do differently
Run the experiment for 90 days, not 30. Email open rate data is noisy with small audience sizes. Eight tests isn't enough to draw strong conclusions — it's enough to identify patterns worth investigating further.
I'd also build a prompt that includes the previous three subject lines. Context about what you've already sent helps Claude avoid accidental repetition and opens the door to callback structures.
The third thing: test on a warm list of 500+, not 30. Small audience variance distorts the data significantly. The experiment was worth running — but the results would be more defensible at scale.
Should you use AI for subject lines?
Yes, as a first draft tool and a hedge against your own lazy days. No, as a total replacement for someone who knows their audience and can write an insider hook.
The honest framing: AI subject lines are reliably above average. Human subject lines have a higher ceiling and a lower floor. Which matters more depends on your list size and your tolerance for variance.
For a solo marketer running a small list: AI is probably net positive. For a team managing a 100k-subscriber list where a 2-point open rate swing means thousands of opens — the human ceiling matters more.