How is this different from Google Optimize?

Google Optimize was sunset in 2023. Experiment Flow is a modern replacement with better multi-armed bandit support, faster setup, and more affordable pricing.

Do I need to install anything?

Just add one script tag to your site. No npm packages, no build steps, no dependencies.

How does statistical significance work?

We use two-proportion z-tests with 95% confidence intervals. Results only show as significant when there is less than 5% chance the difference is due to randomness.

Can I use this server-side?

Yes. The REST API works with any language. There are official SDKs for Node.js, Python, Go, and Ruby.

← Back to Blog

March 26, 2026 11 min read

The North Star Metric: How to Find It, Measure It, and Align Your Experiments to It

north star metricgrowthproductexperimentation

Why Most Companies Optimize the Wrong Things

Most product and growth teams measure too many things. They have dashboards with 40 metrics, weekly reports tracking signups, DAU, sessions, revenue, support tickets, and NPS. But more metrics don't produce more focus—they produce more confusion about what actually matters.

The North Star Metric (NSM) is a corrective to this. It's the single metric that best captures the value your product delivers to customers. It's not your revenue metric (that's an outcome, not a driver). It's the metric that, if it grows, reliably predicts that your business is growing sustainably.

Classic North Star Metric Examples

Understanding the NSM concept is easier with examples from well-known products:

Slack: Messages sent per organization per month. When teams are actively messaging through Slack, they're getting value, becoming dependent on it, and unlikely to churn.
Airbnb: Nights booked. This captures value for both hosts (revenue) and guests (experiences) simultaneously.
Spotify: Time spent listening. Users who listen longer are more engaged, more likely to convert to premium, and more likely to retain.
LinkedIn: Monthly active professionals with connections formed. Activity from connected professionals drives the network effects that make LinkedIn valuable.
HubSpot: Weekly active teams (teams with multiple users active in a given week). Collaborative usage within HubSpot predicts retention and expansion.

What Makes a Good North Star Metric

A good NSM has four properties:

It measures value delivered, not activity. Pageviews measure activity. "Problems solved" measures value. Your NSM should be something that only goes up when users are genuinely getting what they came for.
It predicts revenue, but isn't revenue itself. Revenue is the lagging result of value delivery. The NSM is the leading indicator. If your NSM is growing, revenue should follow. If revenue is growing but your NSM isn't, you have a short-term business, not a long-term one.
It's understandable across the team. An NSM that requires a PhD to explain will never create alignment. "Experiments run per team per month" is better than a complex composite score.
It's measurable in near-real-time. Metrics that take 90 days to observe can't guide weekly decisions. Your NSM should be measurable with a lag of days to weeks, not quarters.

Input Metrics: The Levers That Move Your NSM

The NSM is the destination; input metrics are the levers your team actually pulls. For each input metric, you can design experiments that test whether changing it moves the NSM.

For an A/B testing platform, the NSM might be "experiments run per team per month." The input metrics might be:

New teams onboarded (acquisition input)
Time to first experiment (activation input)
Experiments created per active team (engagement input)
Experiment completion rate (quality input)
Teams with 3+ members running experiments (collaboration input)

A great experiment is one that tests a hypothesis about one of these input metrics. "We think showing experiment templates during onboarding will reduce time-to-first-experiment and increase overall experiments per team per month" is a well-formed hypothesis tied directly to the NSM.

The Danger of Vanity Metrics

Vanity metrics look good in board decks but don't actually tell you whether your business is healthy. The most common culprits:

Registered users: Accounts created includes people who never came back. Active users is almost always more meaningful.
Total pageviews: Easy to inflate with thin content. More pages doesn't mean more value.
App downloads: Downloads without activation are just storage consumers. "Monthly active users" is the more honest metric.
Social media followers: Followers without engagement are irrelevant to your business. Conversion from social to signup is what matters.

The test for a vanity metric: can it go up while your business gets worse? If yes, it's a vanity metric.

Aligning Experiments to Your NSM

Once you have a clear NSM and its input metrics, experiment prioritization becomes straightforward: run experiments that test hypotheses about input metrics that most directly drive the NSM.

Use this framework for each experiment idea:

Which input metric does this experiment target?
How does that input metric connect to the NSM?
If this experiment succeeds, how much should the NSM move?
Is that NSM movement worth the engineering cost of building and running this experiment?

Experiments that can't clearly answer questions 1 and 2 probably shouldn't be run. You should be able to draw a direct line from experiment → input metric → NSM → business outcome.

Counter-Metrics: Avoiding Goodhart's Law

Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. Optimizing purely for your NSM without counter-metrics can produce perverse incentives.

For every NSM, define at least one counter-metric that you're explicitly protecting:

If your NSM is "experiments run per team," your counter-metric might be "experiments run to completion" (to prevent teams from running underpowered experiments and declaring winners too early)
If your NSM is "messages sent," your counter-metric might be "conversation quality rating" (to prevent spam-like behavior that inflates message counts)
If your NSM is "nights booked," your counter-metric might be "guest satisfaction score" (to prevent bookings at properties that produce bad experiences)

An experiment that improves your NSM while hurting your counter-metric is usually not a real win.

Reviewing and Updating Your NSM

NSMs aren't permanent. As products evolve, the metric that best captures value delivery changes. A startup in early growth might use "activated new users" as its NSM; the same company at Series B might shift to "expansion revenue per account." Review your NSM annually and after major product pivots.

Ready to optimize your site?

Start running experiments in minutes with Experiment Flow. Plans from $29/month.

Get Started