How is this different from Google Optimize?

Google Optimize was sunset in 2023. Experiment Flow is a modern replacement with better multi-armed bandit support, faster setup, and more affordable pricing.

Do I need to install anything?

Just add one script tag to your site. No npm packages, no build steps, no dependencies.

How does statistical significance work?

We use two-proportion z-tests with 95% confidence intervals. Results only show as significant when there is less than 5% chance the difference is due to randomness.

Can I use this server-side?

Yes. The REST API works with any language. There are official SDKs for Node.js, Python, Go, and Ruby.

← Back to Blog

April 4, 2026 11 min read

Email Marketing Optimization: A/B Testing Every Stage of Your Email Funnel

emaila/b testingoptimizationmarketing

Introduction: Email’s ROI Problem Is Not What You Think

Email marketing consistently returns $36–$42 for every dollar spent — a figure that other channels rarely match. Yet the average marketing team invests far more energy in optimising its paid acquisition funnel than it does in the email channel that quietly generates a third of its revenue. The reason is not neglect. It is a mistaken belief that email is already optimised because it has been running for years.

Running the same welcome sequence and monthly newsletter for three years is not optimisation — it is inertia. The teams that extract compounding value from email are the ones that treat every element of the channel as a variable to be tested: subject lines, preview text, send time, content length, CTA copy, sequence timing, and re-engagement strategy. Each test yields a small lift. Each lift compounds against every email you send for the rest of time.

This guide covers the full breadth of email experimentation: what to test, how to design controlled experiments, which metrics to trust, and how to connect email click events to downstream product conversions using ExperimentFlow’s tracking API.

What to Test in Email

Email is a multi-variable system. Every message you send has dozens of elements that could be different. The discipline of email optimisation starts with understanding which variables are worth testing and in what order.

High-Impact Variables

Subject line — the primary determinant of open rate. Small changes here have immediate, measurable effects.
Preview text — the secondary headline displayed after the subject in most inbox clients. Often ignored, reliably impactful.
From name — whether to use a personal name, a brand name, or a hybrid (“Alex from Experiment Flow”).
Send time and day of week — when the email lands in the inbox relative to the recipient’s working day.
Content length and structure — long-form versus short-form, image-heavy versus text-only.
Call-to-action copy and placement — the words on the button and how many CTAs appear in a single email.
Template design — branded HTML versus plain-text style, single-column versus multi-column layout.

Lower-Impact Variables (Test Second)

Personalisation beyond the first name token
Sender email address domain
Footer content and unsubscribe language
Image alt text

Start with subject lines and send time because they affect the top of the email funnel — whether the email is opened at all. Every other optimisation is moot if your open rate is low.

Subject Line Experiments

The subject line is the single highest-leverage element in email marketing. A 5-percentage-point improvement in open rate on a list of 50,000 subscribers means 2,500 additional opens per send — for zero incremental cost. Over a year of weekly sends, that compounds to over 130,000 additional opens from a single one-time test.

Dimensions to Test

Personalisation tokens. “Your Q1 report is ready” vs “{first_name}, your Q1 report is ready.” First-name personalisation often lifts open rate by 2–10% but the effect varies by audience familiarity and brand voice.
Curiosity gaps. “We made a mistake” or “Don’t open this email” exploit pattern interruption. They can generate strong lifts but deplete audience trust quickly if overused. Test frequency, not just the format.
Length. Most ESPs render 40–60 characters before truncating on mobile. Test short subjects (under 30 characters) against long ones (50–70 characters) to find the sweet spot for your audience’s device mix.
Emoji in subject lines. A single relevant emoji at the beginning or end of a subject line can improve visibility in a crowded inbox — but only for audiences that expect this from your brand. Test emoji variants against text-only subjects; measure click-through rate, not just open rate, since emoji can attract opens from users who then disengage immediately.
Urgency and scarcity. “Offer ends Friday” vs no deadline framing. Urgency reliably lifts open rate but may depress long-term engagement if every email feels high pressure. Test urgency in the context of your send cadence.
Question vs statement. “Are you making these testing mistakes?” vs “Five common testing mistakes.” Questions can outperform statements by 10–20% for educational content.

Do not test subject lines against each other in a single send unless your list is large enough to reach statistical significance within that send. For lists under 10,000 subscribers, run sequential A/B tests across separate sends rather than a within-send split, accepting that time-of-send confounds the result slightly.

Structuring a Subject Line Test

Choose one dimension per test. If you change the length and add a personalisation token simultaneously, you cannot know which change drove the result. Send control and variant to equally sized random segments at the same time. Wait for the open-rate signal to stabilise — typically 24–48 hours after send — before declaring a winner.

Preview Text Experiments

Preview text is the second line of copy visible in most inbox clients before the email is opened. It appears immediately after the subject line in Gmail, Apple Mail, and Outlook. Many teams leave it empty, which causes the inbox client to pull the first text from the email body — often a navigation link or legal disclaimer.

What to Test

Continuity vs contrast. Does preview text that completes the subject line thought (“Subject: Your results are in — Preview: Here’s what changed this week”) outperform preview text that adds a new hook (“Preview: Plus the one metric most teams get wrong”)?
Call to action in preview text. “Click to see your personalised report” in the preview text can prime the open and improve the click rate of recipients who do open.
Social proof. “Join 12,000 marketers already using this approach” can increase open rate for cold or semi-engaged segments.
Length. Preview text is rendered differently across clients. Test short (under 50 characters) versus full-length (90–140 characters) to see which performs better for your audience’s dominant email client.

Preview text is easy to neglect precisely because it requires a small extra step to configure. That neglect is your competitive advantage: optimising preview text while competitors leave it blank gives you a free lift on every send.

Send Time and Day-of-Week Experiments

The conventional wisdom about email send times — “Tuesday and Thursday mornings” — is derived from industry averages. Industry averages are not your audience. Your subscribers have their own routines, device preferences, and inbox habits.

Controlling for Timezone

A B2B audience spread across North America will behave very differently from a consumer audience concentrated in Western Europe. Send time experiments must account for recipient timezone. Most ESPs allow you to send at a local time per recipient (e.g., 9 a.m. in the recipient’s timezone). Test this feature against a fixed UTC send time to see whether timezone-aware delivery improves your metrics.

What to Test

Day of week. Run a series of sends over six weeks, rotating through weekdays. Measure open rate and click rate per day. Hold content constant across the test period to isolate the time variable.
Time of day. Test morning (6–9 a.m.), mid-morning (9–11 a.m.), lunch (12–1 p.m.), and afternoon (3–5 p.m.) slots. Note that “best time” may differ for opens versus clicks versus downstream conversions.
Weekend sends. Consumer lists sometimes perform better on weekends, particularly for leisure, retail, and hobby content. Test explicitly rather than assuming weekends are dead.

Send time is a confound in every other email test you run. If you are testing subject lines, always send both variants at exactly the same time. If you are running a send-time test, hold all other variables constant.

Segmentation as an Experiment

Segmentation is usually presented as a deliverability and relevance strategy. It is also an experiment design strategy. The question “does a more targeted list outperform a broad send?” has a measurable answer, and that answer determines how much effort to invest in building and maintaining segments.

How to Test Segmentation

Choose a metric that matters — click rate, downstream conversion, revenue per send — and compare it across a segmented send and a broad send of equivalent content. Hold the content and send time constant. The broad send is the control; the segmented send is the variant. If the segmented send generates a higher per-recipient revenue figure, the investment in segmentation is validated.

Useful Segmentation Dimensions to Test

Engagement recency. Does a send to subscribers who opened at least once in the last 60 days outperform a send to the full list? This tests whether suppressing inactive subscribers improves overall campaign economics.
Behavioural segments. Users who visited a pricing page but did not convert. Users who completed onboarding but have not used a specific feature. Each is a testable segment with a specific message hypothesis.
Purchase history. Repeat buyers vs first-time buyers. Test whether different messaging for each segment improves revenue per send.

Content Structure Experiments

Once your email is opened, content structure determines whether the reader clicks. This is where click-through rate becomes the primary metric.

Long-Form vs Short-Form

Short emails (one to three paragraphs, one CTA) typically win for transactional and nurture sequences where the goal is a single action. Long-form emails (800+ words, multiple sections) can outperform for newsletter formats where the email itself is the product. Test the format hypothesis for each email type in your programme separately — do not generalise a long vs short test from your newsletter to your onboarding sequence.

Image-Heavy vs Text-Only

Image-heavy HTML templates can improve brand perception and open rates in visual categories (retail, design, food). They often underperform plain-text or lightly styled emails in B2B contexts, where heavy formatting reads as marketing collateral rather than communication. Test both templates against the same list and measure click rate, not just open rate.

Single CTA vs Multiple CTAs

The conventional optimisation wisdom says one CTA per email to avoid choice paralysis. This holds for transactional and promotional emails. Newsletter formats often benefit from multiple CTAs because readers have different interests. Test the number and placement of CTAs explicitly rather than applying a blanket rule.

CTA Copy

Button copy is one of the easiest tests to implement and one of the most reliably impactful. “Get started” vs “Start your free trial” vs “See your results” can produce 10–30% differences in click-through rate. Test verb choice (get, start, see, try, explore), specificity (generic vs personalised to the email content), and first-person framing (“Start my free trial”) against second-person (“Start your free trial”).

Onboarding Sequence Experiments

The onboarding email sequence — the series of emails triggered after signup — is the highest-value programme in most SaaS email stacks. Improving onboarding email performance directly improves activation rate, which directly improves retention and revenue. Yet most teams set up an onboarding sequence once and never touch it again.

Welcome Email Timing

The welcome email is typically sent immediately after signup. Test a deliberate delay: does sending the welcome email 15 minutes after signup (allowing the user to explore the product first) produce better click rates than an immediate send? Does sending a second welcome email 24 hours after signup to users who have not logged back in improve day-2 retention?

Step Count

A three-email onboarding sequence versus a six-email sequence is a clean experiment. Measure completion rate of each step, day-7 retention, and trial-to-paid conversion. More emails is not always better; for some audiences, a shorter, more direct sequence outperforms a comprehensive one.

Content Type at Each Step

For each step in the sequence, test the content type. Step 2 options might include:

Feature spotlight (what the product can do)
Social proof (customer story or testimonial)
Tutorial (how to complete the next action)
Plain-text personal note from a founder or success manager

The winning content type at step 2 may differ from the winning type at step 4. Test each step independently rather than redesigning the entire sequence at once.

Re-engagement Campaign Experiments

Inactive subscribers reduce deliverability metrics and distort engagement data. Re-engagement campaigns serve two purposes: winning back genuinely disengaged subscribers, and cleanly removing those who will never re-engage. Both outcomes improve the economics of your email programme.

Win-Back Subject Lines

Re-engagement emails benefit from high-contrast subject lines because they are, by definition, being sent to people who have stopped responding to normal subject lines. Test curiosity-gap approaches (“We miss you” vs “Is this goodbye?”) against direct value-led approaches (“Here’s what you missed in the last 90 days”). Measure both open rate and downstream re-engagement (did the subscriber open subsequent normal sends?).

Incentive vs No Incentive

Including a discount, free extension, or exclusive content in a re-engagement email often improves immediate response rate but may attract subscribers who only re-engage for the incentive and disengage again afterwards. Test incentive vs no-incentive against long-term re-engagement rate (opens over the following 60 days), not just the immediate click on the re-engagement email.

Sunset Policies

A sunset policy defines when to stop sending to a subscriber who does not re-engage. Common thresholds are 90, 120, and 180 days of inactivity. Test different sunset thresholds by randomly assigning inactive subscribers to different windows. Measure list health metrics (deliverability rate, spam complaint rate) and revenue per subscriber over six months. A tighter sunset policy often improves deliverability enough to more than offset the loss of marginally active subscribers.

Measuring Email Experiments Correctly

Email metrics are noisy and, since 2021, increasingly unreliable at the top of the funnel. Measuring your experiments correctly is as important as designing them correctly.

The Apple MPP Problem

Apple’s Mail Privacy Protection (MPP), launched in iOS 15, pre-fetches email content including tracking pixels regardless of whether the recipient actually opens the email. This inflates open rates for lists with a significant proportion of Apple Mail users — sometimes by 30–50 percentage points. If your list has substantial Apple Mail usage, open rate is no longer a reliable primary metric for A/B tests.

Use click rate as your primary A/B test metric for any experiment where open rate may be corrupted by MPP. Click rate measures a genuine human action that MPP cannot simulate.

The Right Metrics for Each Test

Subject line tests: Use click-to-open rate (CTOR) rather than raw open rate to control for MPP inflation. CTOR measures clicks as a percentage of opens, capturing the quality of the open rather than the quantity.
Content and CTA tests: Click rate (clicks divided by delivered emails) is the cleanest metric, as it is unaffected by MPP and directly measures the action you care about.
Send time tests: Use click rate within a defined window (e.g., within 48 hours of send) to control for the long-tail engagement that some sends accumulate over days or weeks.
Onboarding and re-engagement tests: Use downstream conversion (trial started, feature activated, subscription renewed) as the primary metric. Email engagement is a proxy; the action you actually care about is the downstream event.

Statistical Significance in Email Testing

Email A/B tests are subject to the same statistical requirements as any other experiment: you need a sufficient sample size before declaring a winner, and you should define your success metric and minimum detectable effect before the send, not after. Peeking at results mid-send and stopping early when the variant appears to be winning inflates false positive rates significantly. Commit to a predetermined sample size — typically calculated as the number of subscribers needed to detect a 2–5% relative improvement in click rate at 95% confidence — and do not evaluate results until that threshold is reached.

ExperimentFlow Integration: Connecting Email to Downstream Conversions

The most important metric for email experiments is often one that email platforms cannot measure natively: what happens after the click. A subscriber who clicks your email and then signs up for a trial, activates a feature, or makes a purchase is worth far more than one who clicks and bounces. Measuring this downstream conversion requires connecting your email click event to your product analytics, and ExperimentFlow’s tracking API makes this straightforward.

The Pattern: Tag, Track, Convert

The approach has three steps. First, tag your email links with UTM parameters and a visitor identifier. Second, capture those parameters on landing and call the ExperimentFlow track API to record the email click event. Third, when the downstream conversion event occurs (signup, purchase, feature activation), call the convert API with the same visitor identifier.

This creates a complete causal chain: email variant → click → downstream conversion, all linked by a consistent visitor ID. You can then compare conversion rates by email variant in ExperimentFlow’s dashboard, attributing revenue to the specific subject line or content structure that generated it.

Implementation Example

Add a query parameter to your email links that carries the experiment variant and a visitor ID:

// Email link template (set in your ESP)
// https://yourdomain.com/landing?ef_visitor={subscriber_id}&ef_variant={variant}&utm_campaign=onboarding-seq

// On your landing page, capture the parameters and initialise ExperimentFlow
const params = new URLSearchParams(window.location.search);
const visitorId = params.get('ef_visitor');
const emailVariant = params.get('ef_variant');

const ef = ExperimentFlow.init({ apiKey: 'your-api-key' });

if (visitorId && emailVariant) {
  // Track the email click as a named event
  await fetch('https://experimentflow.com/api/track', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': 'your-api-key'
    },
    body: JSON.stringify({
      visitor_id: visitorId,
      event: 'email_click',
      properties: {
        campaign: 'onboarding-sequence',
        step: 2,
        variant: emailVariant,
        subject_line: 'your-subject-line-slug'
      }
    })
  });
}

Then, when the visitor completes the conversion action (e.g., activates a feature or starts a trial), record the conversion against the same visitor ID:

// Call this when the downstream conversion event occurs
await fetch('https://experimentflow.com/api/convert', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your-api-key'
  },
  body: JSON.stringify({
    visitor_id: visitorId,
    experiment_id: 'onboarding-email-subject-line-test'
  })
});

With this instrumentation in place, you can evaluate subject line experiments not by their open rate or even their click rate, but by the trial activation rate or revenue they generate downstream — the metric that actually determines whether the test produced business value.

Batch Tracking for Sequence Experiments

For onboarding sequence experiments that span multiple emails and multiple downstream events, use ExperimentFlow’s batch decide API to retrieve variant assignments for all active email experiments at once, and attach the variant identifiers to every subsequent event in the user’s session:

// Fetch assignments for all active email experiments
const response = await fetch('https://experimentflow.com/api/decide/batch', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your-api-key'
  },
  body: JSON.stringify({
    visitor_id: userId,
    experiment_ids: [
      'onboarding-email-subject-lines',
      'onboarding-email-step-count',
      'onboarding-email-content-type'
    ]
  })
});

const { variants } = await response.json();

// Attach variants to all subsequent analytics events
// so downstream metrics can be segmented by email treatment
analytics.identify(userId, {
  email_subject_variant: variants['onboarding-email-subject-lines'],
  email_step_count_variant: variants['onboarding-email-step-count'],
  email_content_variant: variants['onboarding-email-content-type']
});

This pattern lets you analyse the long-term impact of email experiments across your entire analytics stack, not just within your email platform.

Building a Compounding Email Optimisation Programme

The teams that extract the most value from email experimentation treat it as a programme, not a project. A programme has a backlog, a cadence, a measurement standard, and a way to propagate learnings back into future experiments.

A practical cadence for a mature email programme: one subject line test per month, one content structure test per quarter, one send time test per quarter, and one sequence redesign test per half-year. This generates twelve to sixteen completed experiments per year, each producing a documented lift or a clear null result. After two years, the cumulative lift from these experiments typically accounts for 30–60% improvement in email revenue per subscriber compared to the unoptimised baseline.

The key discipline is documentation. Every experiment should produce a one-paragraph summary: what was tested, what the result was, how confident you are, and what the next test should be. This institutional memory prevents teams from re-running tests they already ran, and gives new team members a starting point rather than a blank slate.

Get started free with ExperimentFlow and run your first email conversion experiment in under ten minutes. Connect email clicks to downstream conversions, attribute revenue to specific variants, and build the compounding optimisation programme your email channel has been waiting for.

For related reading, see The Complete Guide to Conversion Rate Optimization for a broader framework on testing across channels, and A/B Testing Your Full Funnel for strategies that connect email experiments to on-site and in-product experiments.

Ready to optimize your site?

Start running experiments in minutes with Experiment Flow. Plans from $29/month.

Get Started