How is this different from Google Optimize?

Google Optimize was sunset in 2023. Experiment Flow is a modern replacement with better multi-armed bandit support, faster setup, and more affordable pricing.

Do I need to install anything?

Just add one script tag to your site. No npm packages, no build steps, no dependencies.

How does statistical significance work?

We use two-proportion z-tests with 95% confidence intervals. Results only show as significant when there is less than 5% chance the difference is due to randomness.

Can I use this server-side?

Yes. The REST API works with any language. There are official SDKs for Node.js, Python, Go, and Ruby.

← Back to Blog

March 31, 2026 13 min read

The Growth Experiment Funnel: Optimizing Every Stage from Awareness to Retention

funnelgrowthoptimizationconversion

Introduction: The Funnel as an Experimentation Map

Every growth team eventually arrives at the same frustration: experiments succeed in isolation but fail to move the business. A landing page test lifts clicks by 20%, yet monthly revenue stays flat. An onboarding redesign improves activation, but churn accelerates three months later. The culprit is usually the same — experiments that optimise one stage of the funnel without accounting for how that stage connects to every other.

The solution is not fewer experiments. It is a structured map that tells you which stage to focus on, what to measure at each stage, and how changes propagate downstream. That map is the growth funnel.

This post presents a complete framework for running experiments at every stage of the growth funnel. It uses the AARRR pirate metrics — Acquisition, Activation, Retention, Revenue, Referral — as the scaffold, because these five categories cover every meaningful interaction between a user and a product. By the end, you will have a systematic approach to designing funnel experiments, prioritising where to invest, and avoiding the cross-stage interference that undermines most growth programmes.

The funnel is not a waterfall. It is a system of interconnected stages. Optimising one stage in isolation is like tuning a single cylinder in an eight-cylinder engine — you feel the change, but you do not unlock the full output.

The AARRR Framework as an Experimentation Scaffold

Dave McClure introduced the AARRR framework in 2007. Despite its age, it remains the most actionable model for growth experimentation because it maps directly to measurable user behaviours. Each letter represents a stage with a corresponding metric, a set of levers, and a set of experiments you can run.

Acquisition — How do users find you? Measured by traffic volume, CAC, and channel conversion rate.
Activation — Do users have a great first experience? Measured by activation rate and time-to-value.
Retention — Do users come back? Measured by day-7, day-30, and day-90 retention curves.
Revenue — Do users pay? Measured by ARPU, LTV, and conversion-to-paid rate.
Referral — Do users tell others? Measured by K-factor, referral rate, and NPS.

The AARRR model works as an experimentation scaffold because it forces you to define a primary metric for each stage before you design any test. Without that definition, experiments drift toward vanity metrics — pageviews, clicks, impressions — that feel like progress but do not move the business.

In the sections that follow, each stage gets its own treatment: the key levers, the most productive experiment types, and the guardrail metrics that prevent local optimisation from creating downstream damage.

Acquisition Experiments

Acquisition is where most teams spend the majority of their experimentation budget, and for good reason: it is the top of the funnel. A 10% improvement in acquisition efficiency compounds through every subsequent stage.

Landing Page Headlines

The headline is the highest-leverage element on any landing page. It is the first thing a visitor reads, and it determines whether they continue or leave. The most productive headline experiments compare three distinct value proposition framings:

Feature-led: “Run A/B tests in under ten minutes.”
Outcome-led: “Grow revenue with experiments that actually ship.”
Pain-led: “Stop guessing which version converts better.”

Pain-led headlines consistently outperform feature-led headlines for products solving a clearly defined problem. Outcome-led headlines win when the target audience is already aware of the category but has not yet chosen a solution. Run three-way tests to determine which framing resonates with your specific audience segment before applying the winner sitewide.

Ad Creative Experiments

Paid acquisition channels — search, social, display — require continuous creative rotation because winning creatives decay as the audience reaches saturation. A productive ad creative experiment programme cycles through three variables independently:

Hook format: static image vs. video vs. carousel
Message angle: social proof vs. feature demonstration vs. problem agitation
Call to action: “Start free trial” vs. “See how it works” vs. “Get your first result today”

Measure click-through rate and cost-per-click at the ad level, but always track the full funnel to signup. An ad with a high CTR but a low signup rate is not winning — it is attracting the wrong audience.

SEO and Organic Content

SEO experiments operate on longer time horizons than product experiments. The most reliable approach is a clustered page test: group topically related pages, make a structural change to half the cluster, wait 28 days, and compare organic traffic and ranking position against the control group. Common variables to test include title tag format, meta description length, H1 phrasing, and internal linking density.

Measuring Acquisition

The primary acquisition metric is Customer Acquisition Cost (CAC) by channel. Secondary metrics include channel conversion rate (visitor to signup) and traffic quality score (the percentage of acquired users who reach activation). Always track CAC alongside activation rate — a channel that looks cheap on a cost-per-click basis may be expensive on a cost-per-activated-user basis.

Activation Experiments

Activation is the most underleveraged stage in most growth funnels. Teams spend heavily on acquisition, then deliver a generic onboarding experience that fails to show new users why the product is worth their time. The result is a leaky bucket: you pour users in at the top, and they drain out before they ever experience value.

Onboarding Flow Experiments

The onboarding flow is the first and most important activation lever. The core question is: what is the minimum set of steps required for a user to reach their first moment of value? Experiment by removing steps, not adding them. Every additional step is a door that some percentage of users will not walk through.

Common onboarding experiments include:

Replacing a multi-screen setup wizard with a single-screen quick-start template
Deferring optional configuration (profile photo, notification preferences) until after the first success moment
Adding a “see it work first” demo mode that shows value before requiring account creation
Personalising the onboarding path based on the user’s stated role or use case

Time-to-Value and the Aha Moment

Every successful product has an “aha moment” — a specific action or outcome that, once experienced, dramatically increases the probability of retention. For ExperimentFlow, the aha moment is running the first experiment and seeing variant assignment data appear in real time. For a project management tool, it might be the first task completed by a team member other than the creator.

To find your aha moment, run a cohort analysis: identify the top 20% of retained users and trace backward to find which early actions they share that less-retained users do not. Then design activation experiments that move more new users toward those actions faster.

Measuring Activation

The primary activation metric is activation rate — the percentage of new users who reach a defined success event within a defined time window (typically 7 days). Secondary metrics include time-to-first-value and step-completion rate at each point in the onboarding flow.

Engagement Experiments

Engagement sits between activation and retention. It is the ongoing behaviour that converts a newly activated user into a habit-forming one. Engagement experiments focus on three levers: feature discoverability, email and notification cadence, and in-app prompts.

Feature Discoverability

Most products have a set of high-value features that a significant portion of users never discover. Surfacing these features earlier — through contextual tooltips, empty-state suggestions, or progressive disclosure — is one of the highest-return engagement experiments you can run. The experiment design is straightforward: take users who have not yet used Feature X, show half of them a contextual prompt to try it at a relevant moment, and measure Feature X adoption rate and subsequent retention in each group.

Email and Notification Cadence

Email cadence experiments test frequency, timing, and content. The most common mistake is sending too many emails too early. Experiment by reducing the frequency for new users and measuring the effect on open rate, click-through rate, and unsubscribe rate simultaneously. A lower send volume often produces higher engagement per email and lower churn.

Test email timing by sending the same message to randomised cohorts at different hours of the day and days of the week. The optimal time varies significantly by audience segment — a B2B product sending to managers in North America will find a very different optimal time than a consumer product sending to college students.

In-App Prompts

In-app prompts — tooltips, banners, modals, and nudges — are the highest-frequency engagement lever. They are also the easiest to overuse. An experiment programme for in-app prompts should track prompt-to-action rate (did the user do what the prompt suggested?) and prompt-to-dismiss rate (did the user close it without acting?). High dismiss rates are a signal that the prompt is appearing at the wrong moment, targeting the wrong users, or offering the wrong action.

Measuring Engagement

The primary engagement metric is the DAU/MAU ratio — the proportion of monthly active users who are also daily active users. A high DAU/MAU ratio indicates that the product has become a habit. Secondary metrics include session length, feature breadth (the number of distinct features used per user per month), and notification open rate.

Retention Experiments

Retention is the foundation of sustainable growth. A product with strong retention compounds: each cohort retains a high fraction of its users, so the total active user base grows even with modest acquisition. A product with weak retention is a treadmill: you must acquire new users continuously just to replace the ones you are losing.

Churn Prediction and Intervention

The most effective retention experiments are preventive. Build a churn prediction model that assigns a risk score to each user based on recent behaviour patterns — declining login frequency, decreased feature usage, support ticket history. Then run experiments that intervene with high-risk users before they churn: a personalised check-in email, a proactive success call, or an in-app prompt offering a relevant use case they have not tried.

The experiment design is a simple holdout: randomise high-risk users into an intervention group and a control group, then measure 30-day retention in each. Even modest lift in this segment produces significant revenue impact because you are retaining users who have already demonstrated willingness to pay.

Win-Back Campaigns

Win-back campaigns target users who have already churned or lapsed. They are typically lower-ROI than preventive retention but worth testing because the marginal cost of reaching a lapsed user is low. Experiment with the win-back message framing (new feature announcement vs. personalised re-engagement vs. direct discount offer), the timing (30 days post-churn vs. 60 days vs. 90 days), and the channel (email vs. retargeting ad vs. SMS).

Feature Stickiness

Not all features contribute equally to retention. A stickiness analysis identifies which features are most correlated with long-term retention. Users who adopt Feature X retain at a 40% higher rate than those who do not — that is an experiment brief for your next onboarding test. Run feature stickiness experiments by surfacing high-retention features earlier in the user journey and measuring the effect on day-30 and day-90 retention curves.

Measuring Retention

The primary retention metric is the retention curve: the percentage of a cohort still active at day 1, day 7, day 14, day 30, and day 90. Look for the point at which the curve flattens — this is the “retained” baseline for that cohort. A rising baseline over successive cohorts indicates that the product is improving. A falling baseline indicates structural problems that acquisition growth cannot solve.

Revenue Experiments

Revenue experiments are the highest-stakes experiments in the funnel because they directly touch the product’s pricing model. They are also, for that reason, frequently avoided. Teams that run revenue experiments systematically gain a compounding advantage over those that set pricing once and revisit it annually.

Pricing Experiments

Pricing experiments test willingness to pay at a structural level. The most productive pricing experiments are not “should we charge $29 or $39?” — they are “should we charge per seat, per feature set, or per outcome?” Pricing model experiments are harder to run than copy or UI experiments because they require showing different pricing pages to different users and tracking downstream conversion and LTV. But the payoff is correspondingly larger: a pricing model change that better aligns price with value can produce 2–3x revenue lift from the same user base.

Upsell Timing

Upsell timing experiments test when in the user journey to present upgrade prompts. The common mistake is to trigger upsell prompts immediately when a user hits a paywall — a moment of frustration rather than success. More effective upsell timing is anchored to moments of demonstrated value: after a user’s first successful experiment, after they invite a team member, after they see a result that required a paid feature. Experiment by moving upsell triggers earlier or later in the user journey and measuring conversion-to-paid rate and subsequent LTV in each cohort.

Packaging and Feature Bundling

Feature bundling experiments test how grouping features into plans affects conversion and LTV. A common finding is that a mid-tier plan positioned as “most popular” lifts both the percentage of users choosing that plan and the percentage of users choosing the premium plan (because the mid-tier anchors the premium price as reasonable). Run a two-way test: the existing plan structure vs. a restructured plan with adjusted bundling and anchoring.

Measuring Revenue

The primary revenue metrics are ARPU (Average Revenue Per User) and LTV (Lifetime Value). Secondary metrics include conversion-to-paid rate, trial-to-paid conversion rate, and expansion revenue rate (the percentage of existing customers who upgrade). Always measure LTV alongside CAC — the ratio of LTV to CAC is the core unit economics health indicator for any subscription business.

Referral Experiments

Referral is the most capital-efficient acquisition channel because the marginal cost of each referred user is near zero. A product with a K-factor greater than 1 — where each user on average brings in more than one additional user — grows without any paid acquisition. In practice, most products have a K-factor between 0.1 and 0.5, which means referral amplifies but does not replace other acquisition channels.

Referral Program Design

Referral program experiments test the incentive structure, the sharing mechanism, and the referral message. The most common failure mode is a referral program that rewards the referrer but not the referred user. Bilateral incentive structures — where both parties receive value — consistently outperform unilateral ones. Experiment by comparing a one-sided reward (referrer gets a month free) against a two-sided reward (referrer and referee each get a month free), and measure both referral rate and referred-user activation rate.

Virality Mechanics

Virality mechanics are product features that naturally expose the product to new users as a side effect of normal usage. Examples include shareable reports, collaborative workspaces that require inviting team members, and public-by-default outputs (like published experiment results with a “powered by ExperimentFlow” footer). Experiment with making these mechanics more prominent and measuring the effect on organic invite rate and new user signups attributed to existing users.

Word-of-Mouth Triggers

Word-of-mouth is harder to instrument than a referral program because it happens outside the product. The most effective proxy is NPS (Net Promoter Score) and the qualitative follow-up to that score. High NPS does not automatically produce referrals — you must give promoters a reason and a mechanism to act on their positive sentiment. Experiment by asking high-NPS users to share a specific result or achievement, rather than asking them to refer a friend generically. A concrete, shareable artefact converts promoter intent into referral action far more reliably than a generic “tell a friend” prompt.

Measuring Referral

The primary referral metric is the K-factor — the number of new users generated per existing user over a defined time period. Secondary metrics include referral rate (the percentage of active users who make at least one referral per month), referral conversion rate (the percentage of referred users who sign up), and referred-user LTV relative to non-referred-user LTV. Referred users frequently have higher LTV because they arrive with a positive recommendation already in hand.

Prioritising Which Funnel Stage to Focus On

With five stages and dozens of possible experiments, the most common paralysis point is deciding where to start. The answer is always the same: find the biggest leak.

A funnel audit is the first step. Map the current conversion rate at each stage transition:

Visitor to signup (Acquisition)
Signup to activated user (Activation)
Activated user to retained user at day 30 (Retention)
Retained user to paying customer (Revenue)
Paying customer who makes a referral (Referral)

Calculate the cumulative conversion from visitor to paying customer. Then ask: which single stage, if improved by a realistic 20%, would produce the largest improvement in the end-to-end conversion rate? That is where you begin.

In practice, early-stage products almost always have the biggest leak at Activation. They have not yet found the aha moment, the onboarding flow is not guiding users to value, and new users are churning before they ever understand why the product is worth their attention. Mid-stage products with good activation often find the biggest leak at Retention — users activate, experience value once, but do not form a habit. Mature products with strong activation and retention typically find the biggest lever at Revenue, where pricing and packaging experiments can produce outsized LTV improvements from an already-retained user base.

Running Multiple Funnel Experiments Simultaneously Without Interference

One of the most common objections to a full-funnel experimentation programme is that running multiple experiments simultaneously makes it impossible to attribute results correctly. In practice, this concern is overstated if you apply two principles: user-level assignment and experiment isolation.

User-Level Assignment

Every experiment should assign variants at the user level, not the session level. A user who sees Variant A of the onboarding flow should see Variant A consistently across every session. Session-level assignment creates noise because the same user can experience multiple variants, contaminating both groups.

Experiment Isolation

Two experiments interfere when they share a primary metric and affect overlapping user populations simultaneously. The solution is to use mutual exclusion for experiments that share a metric, and layering for experiments that use different metrics. ExperimentFlow’s batch decide API makes this straightforward: you can fetch assignments for multiple experiments in a single call, and the platform handles the exclusion logic server-side.

For example, an onboarding flow experiment and a pricing page experiment can run simultaneously because they affect different user actions and measure different primary metrics. But two onboarding flow experiments — both measuring activation rate on the same user population — must be mutually exclusive.

The goal is to maximise the number of experiments running in parallel while minimising the number of experiments that contaminate each other’s results. Layered architecture achieves this by assigning each experiment to a non-overlapping slice of the user population before random assignment to variants occurs within that slice.

Building a Funnel Experiment Roadmap with Quarterly Targets

A funnel experiment roadmap translates the prioritisation analysis into a concrete execution plan. Structure it in three layers:

Layer 1: Quarterly Funnel Targets

Set one measurable target per funnel stage per quarter. For example:

Acquisition: Reduce CAC from $120 to $95 by end of Q2.
Activation: Improve 7-day activation rate from 34% to 42% by end of Q2.
Retention: Improve day-30 retention from 41% to 48% by end of Q2.
Revenue: Improve trial-to-paid conversion from 18% to 23% by end of Q2.
Referral: Increase K-factor from 0.12 to 0.18 by end of Q2.

These targets are hypotheses, not commitments. They give the team a shared definition of success and prevent the roadmap from drifting toward low-effort, low-impact experiments.

Layer 2: Monthly Experiment Sprints

Break the quarter into three monthly sprint cycles. Each sprint targets one or two stages and runs three to five experiments. At the end of each sprint, review results, promote winners, and update the prioritisation analysis before planning the next sprint. The prioritisation may shift as experiments land — a successful activation sprint may move Retention to the top of the priority list for the next sprint.

Layer 3: Weekly Shipping Cadence

Within each sprint, maintain a weekly shipping cadence: design on Monday, implement and QA by Wednesday, launch Thursday, and review preliminary data the following Monday. This cadence keeps the experimentation programme moving and prevents individual experiments from stalling in design or review.

ExperimentFlow for Full-Funnel Optimization

Running a full-funnel experimentation programme requires infrastructure that can handle multiple simultaneous experiments, cross-stage event tracking, and batch variant assignment without adding latency to critical user flows. ExperimentFlow is built for this use case.

Batch Decide API

The batch decide API fetches variant assignments for multiple experiments in a single HTTP request. This eliminates the latency problem of chaining multiple individual decide calls — a problem that becomes acute when a single page load requires checking assignments for three or four concurrent experiments.

// Fetch variants for all active funnel experiments in one call
const response = await fetch('https://experimentflow.com/api/decide/batch', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your-api-key'
  },
  body: JSON.stringify({
    visitor_id: userId,
    experiment_ids: [
      'onboarding-flow-v3',
      'pricing-page-headline',
      'referral-cta-placement',
      'upsell-trigger-timing'
    ]
  })
});

const { variants } = await response.json();
// variants: { 'onboarding-flow-v3': 'B', 'pricing-page-headline': 'control', ... }

// Apply variants
if (variants['onboarding-flow-v3'] === 'B') {
  showShortOnboarding();
} else {
  showStandardOnboarding();
}

Custom Event Tracking

Full-funnel experimentation requires tracking events across every stage. ExperimentFlow’s custom event API accepts any named event with a freeform properties object, making it straightforward to instrument acquisition events (ad click, landing page view), activation events (first experiment created, first variant assigned), retention events (weekly login, feature adoption), and revenue events (trial started, subscription activated).

// Track a funnel event with experiment context
await fetch('https://experimentflow.com/api/track', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your-api-key'
  },
  body: JSON.stringify({
    visitor_id: userId,
    event: 'activation_completed',
    properties: {
      time_to_activation_seconds: 847,
      first_feature_used: 'decide_api',
      onboarding_variant: variants['onboarding-flow-v3']
    }
  })
});

By including the experiment variant in the event properties, you can segment every downstream metric by experiment group — even for events that occur weeks after the initial variant assignment. This cross-stage attribution is essential for measuring whether an onboarding experiment that improves activation also improves day-30 retention and trial-to-paid conversion.

Getting Started

A full-funnel experiment programme does not require building all five stages at once. Start with a funnel audit, identify the biggest leak, and run your first two or three experiments against that stage. Once you have a baseline for each metric and a working experiment infrastructure, expanding to adjacent stages is straightforward.

Get started free with ExperimentFlow and run your first funnel experiment in under ten minutes. The batch decide API, custom event tracking, and statistical significance calculations are included in every plan. The first experiment is the hardest one to ship — everything after that is iteration.

For related reading, see When Not to A/B Test for guidance on which experiments are worth running, and Statistical Significance in A/B Testing for a practical guide to knowing when your results are trustworthy.

Ready to optimize your site?

Start running experiments in minutes with Experiment Flow. Plans from $29/month.

Get Started