January 22, 2026 10 min read

Thompson Sampling Explained: The Bayesian Approach to Optimization

banditsthompson samplingbayesian

Why Thompson Sampling?

Among bandit algorithms, Thompson Sampling has emerged as the gold standard for online optimization. First described by William Thompson in 1933, it languished in obscurity for decades before being rediscovered by the tech industry. Today, it powers recommendation systems at Netflix, ad serving at Google, and A/B testing platforms worldwide.

The appeal is simple: Thompson Sampling is optimal in a mathematical sense (it achieves the theoretical lower bound on regret), easy to implement, and naturally handles uncertainty.

The Bayesian Foundation

Thompson Sampling treats each variant's conversion rate as a random variable rather than a fixed number. We start with a prior belief about what the conversion rate might be, then update that belief as we observe data.

For conversion rate optimization, we use the Beta distribution Beta(α, β), where:

α = number of successes (conversions) + 1
β = number of failures (non-conversions) + 1

Starting with Beta(1, 1)—a uniform distribution—means we initially believe all conversion rates are equally likely. As data accumulates, the distribution narrows around the true rate.

The Algorithm

Thompson Sampling is remarkably simple:

For each variant, sample a value from its Beta distribution
Select the variant with the highest sampled value
Observe the outcome (conversion or not)
Update that variant's Beta distribution
Repeat

// Thompson Sampling implementation
func (ts *ThompsonSampling) Select() int {
    bestArm := 0
    bestSample := 0.0

    for i, arm := range ts.Arms {
        // Sample from Beta(successes+1, failures+1)
        sample := betaSample(arm.Successes+1, arm.Failures+1)
        if sample > bestSample {
            bestSample = sample
            bestArm = i
        }
    }
    return bestArm
}

Why This Works

The beauty of Thompson Sampling is how it naturally balances exploration and exploitation:

Uncertain variants get explored: If a variant has little data, its Beta distribution is wide. It will occasionally produce high samples, sending traffic its way.
Good variants get exploited: If a variant has a high conversion rate, its distribution is concentrated at a high value. It will frequently produce the highest sample.
Bad variants get abandoned: If a variant has a low conversion rate with lots of data, its distribution is concentrated at a low value. It rarely wins the sampling competition.

Thompson Sampling vs UCB1

Upper Confidence Bound (UCB1) is the other popular bandit algorithm. While UCB1 is deterministic (always selecting the variant with the highest upper bound), Thompson Sampling is stochastic (randomized through sampling).

In practice, Thompson Sampling tends to converge faster because:

It explores more efficiently—it doesn't waste time on variants that are clearly worse just because their confidence interval happens to be wide
It naturally adapts its exploration rate as data accumulates
It handles non-stationary environments better due to its probabilistic nature

Real-World Performance

In our internal benchmarks using real experiment data:

Thompson Sampling finds the winning variant in 40% fewer samples than equal-split A/B testing
Total conversions during the test period are 15-25% higher compared to equal-split
With 5+ variants, the advantage grows to 30-50% fewer samples needed

The more variants you're testing, the bigger the advantage of Thompson Sampling over traditional A/B testing.

Implementation Tips

Batched Updates

In high-traffic scenarios, you don't need to update the model after every single visitor. Batching updates (e.g., every 100 visitors) is fine and reduces computational overhead without meaningfully affecting performance.

Prior Selection

Starting with Beta(1, 1) works well in most cases. If you have strong prior knowledge (e.g., from historical experiments), you can encode it: a variant that previously converted at 5% with 1,000 visitors could start at Beta(50, 950).

Minimum Exploration

For important experiments, consider a hybrid approach: allocate a minimum of 5-10% to each variant regardless of performance. This prevents the algorithm from abandoning a variant too quickly due to random early results.

Getting Started

Experiment Flow uses Thompson Sampling as its default bandit algorithm. Enable it when creating any experiment:

// Create an experiment with Thompson Sampling
const experiment = await fetch('/api/experiments', {
  method: 'POST',
  headers: {
    'X-API-Key': apiKey,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: 'Homepage Hero Test',
    variants: ['control', 'social-proof', 'urgency', 'benefit-focused'],
    bandit_mode: true
  })
})

The system handles all the Bayesian math, real-time updates, and traffic allocation. You just track conversions as usual and watch the algorithm converge on your winner.

Ready to optimize your site?

Start running experiments in minutes with Experiment Flow. Free plan available.

Start Free