Thompson Sampling Explained: The Bayesian Approach to Optimization
Why Thompson Sampling?
Among bandit algorithms, Thompson Sampling has emerged as the gold standard for online optimization. First described by William Thompson in 1933, it languished in obscurity for decades before being rediscovered by the tech industry. Today, it powers recommendation systems at Netflix, ad serving at Google, and A/B testing platforms worldwide.
The appeal is simple: Thompson Sampling is optimal in a mathematical sense (it achieves the theoretical lower bound on regret), easy to implement, and naturally handles uncertainty.
The Bayesian Foundation
Thompson Sampling treats each variant's conversion rate as a random variable rather than a fixed number. We start with a prior belief about what the conversion rate might be, then update that belief as we observe data.
For conversion rate optimization, we use the Beta distribution Beta(α, β), where:
- α = number of successes (conversions) + 1
- β = number of failures (non-conversions) + 1
Starting with Beta(1, 1)—a uniform distribution—means we initially believe all conversion rates are equally likely. As data accumulates, the distribution narrows around the true rate.
The Algorithm
Thompson Sampling is remarkably simple:
- For each variant, sample a value from its Beta distribution
- Select the variant with the highest sampled value
- Observe the outcome (conversion or not)
- Update that variant's Beta distribution
- Repeat
// Thompson Sampling implementation
func (ts *ThompsonSampling) Select() int {
bestArm := 0
bestSample := 0.0
for i, arm := range ts.Arms {
// Sample from Beta(successes+1, failures+1)
sample := betaSample(arm.Successes+1, arm.Failures+1)
if sample > bestSample {
bestSample = sample
bestArm = i
}
}
return bestArm
}
Why This Works
The beauty of Thompson Sampling is how it naturally balances exploration and exploitation:
- Uncertain variants get explored: If a variant has little data, its Beta distribution is wide. It will occasionally produce high samples, sending traffic its way.
- Good variants get exploited: If a variant has a high conversion rate, its distribution is concentrated at a high value. It will frequently produce the highest sample.
- Bad variants get abandoned: If a variant has a low conversion rate with lots of data, its distribution is concentrated at a low value. It rarely wins the sampling competition.
Thompson Sampling vs UCB1
Upper Confidence Bound (UCB1) is the other popular bandit algorithm. While UCB1 is deterministic (always selecting the variant with the highest upper bound), Thompson Sampling is stochastic (randomized through sampling).
In practice, Thompson Sampling tends to converge faster because:
- It explores more efficiently—it doesn't waste time on variants that are clearly worse just because their confidence interval happens to be wide
- It naturally adapts its exploration rate as data accumulates
- It handles non-stationary environments better due to its probabilistic nature
Real-World Performance
In our internal benchmarks using real experiment data:
- Thompson Sampling finds the winning variant in 40% fewer samples than equal-split A/B testing
- Total conversions during the test period are 15-25% higher compared to equal-split
- With 5+ variants, the advantage grows to 30-50% fewer samples needed
The more variants you're testing, the bigger the advantage of Thompson Sampling over traditional A/B testing.
Implementation Tips
Batched Updates
In high-traffic scenarios, you don't need to update the model after every single visitor. Batching updates (e.g., every 100 visitors) is fine and reduces computational overhead without meaningfully affecting performance.
Prior Selection
Starting with Beta(1, 1) works well in most cases. If you have strong prior knowledge (e.g., from historical experiments), you can encode it: a variant that previously converted at 5% with 1,000 visitors could start at Beta(50, 950).
Minimum Exploration
For important experiments, consider a hybrid approach: allocate a minimum of 5-10% to each variant regardless of performance. This prevents the algorithm from abandoning a variant too quickly due to random early results.
Getting Started
Experiment Flow uses Thompson Sampling as its default bandit algorithm. Enable it when creating any experiment:
// Create an experiment with Thompson Sampling
const experiment = await fetch('/api/experiments', {
method: 'POST',
headers: {
'X-API-Key': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
name: 'Homepage Hero Test',
variants: ['control', 'social-proof', 'urgency', 'benefit-focused'],
bandit_mode: true
})
})
The system handles all the Bayesian math, real-time updates, and traffic allocation. You just track conversions as usual and watch the algorithm converge on your winner.
Ready to optimize your site?
Start running experiments in minutes with Experiment Flow. Free plan available.
Start Free