Why Your Experiments Aren't Working
Jacob Dutton
20 Feb 2025

The quality of your experiment matters less than your stakeholders' belief in the results...
Get buy-in before you begin
The quality of your experiment matters less than your stakeholders' belief in the results.
As humans, we're hard-wired to reject evidence that challenges our existing beliefs. For corporate innovators, this means your stakeholders will always find ways to dismiss test results they don't like - unless you've involved them from the start.
Here's how to get better at bringing them on-side:
Map decision-makers and influencers for your project
Involve critics early and get them to define success criteria with you
Document your agreed-upon thresholds before testing
Create a pre-mortem to flush out potential objections
A retail client of ours learned this the hard way. Their experiment showed clear evidence a new concept wouldn't work. But because they hadn't agreed with their sponsor on what "wouldn't work" meant before starting, the results kicked off months of debate instead of decisive action.
When we ran the next test, we started differently:
Got the key sponsor to define success metrics with us
Documented their biggest concerns up front
Agreed on the exact go/no-go thresholds
Built answering their questions into the test design
When the data came in below the threshold (again!), the project was killed in one meeting. No debates, no politics.
Build the minimum viable test
The biggest mistake we see teams making in executing their experiments is building more than they need to answer their question. And it's not just wasteful - it actively damages your results.
Adding in more complexity just increases variables, variables increase noise, and noise makes it harder to spot real signals. But we see teams consistently over-building their tests.
Here's how to strip things right back and build the minimum viable test:
Start with your hypothesis
List every element needed to test it
Ruthlessly eliminate everything else
Build the simplest version that could give you an answer
A fintech client of ours wanted to test if users would trust AI for investment advice. Their initial plan was to build a full robo-advisor platform at a cost of £400,000 over 6 months.
We stripped it right back:
We spun up a simple landing page describing the AI advisor
Added a "Deploy £5,000" button
Tracked click-through rates and captured email addresses
Interviewed users who clicked
It took us two weeks. And we learned users wouldn't trust AI with their money before building anything.
Key questions for your test setup:
What's the simplest way to test your hypothesis?
What could you remove from it without affecting the core test?
Could you manually simulate parts instead of building them?
Make it real
You need to make your tests real enough to get genuine responses but controlled enough to get clean data.
Most teams get this backwards. They either make their test so "experimental" that users don't behave naturally or so polished that they can't isolate what's working.
Here's how you need to think about this:
Create as close to real-world conditions as possible
Control for outside variables
Measure actual behaviour (not intentions)
Document everything that could affect your results
We helped a healthcare company test a new patient monitoring service. But instead of building the tech, we:
Had nurses manually update patient data
Made it look automated to users
Tracked real behaviour and outcomes
Documented every manual intervention
Three weeks later, we knew exactly how patients would use the service - before writing a line of code.
Deployment checklist:
Are users experiencing this as they would in reality?
Have you controlled for external factors?
Are you measuring real behaviour?
Can you trace every interaction?
Watch and learn
The most valuable insights come from watching experiments unfold in real time. But you need to know what to watch for and how to adjust without invalidating your results.
The principle is simple: monitor enough to spot problems and opportunities, but not so much that you're tempted to interfere unnecessarily and disturb the test.
Here's what you need to do to monitor your experiments more effectively:
Set up early warning metrics
Define acceptable ranges for key indicators
Create clear intervention triggers
Document every adjustment
A media client was testing a new subscription model with us and during monitoring, we spotted something odd: a massive variance in conversion rates at different times.
Here's what we saw:
US users converting at 3x the rate of other users
Mobile users dropping off at payment
Price sensitivity varying massively by region
We adjusted the test to explore these patterns without compromising the core experiment.
Here's what you need to look out for:
What indicates the test is working as designed?
What would signal something's wrong?
What patterns might reveal new opportunities?
What changes would invalidate your results?
Capture what matters
The key thing to remember in data collection is you can't go back and get what you didn't capture. But if you collect too much, you'll drown in noise. The goal isn't to collect everything. It's to collect the specific data that could prove you wrong.
Data collection framework:
Start with your hypothesis
Identify evidence that could disprove it
Add context needed to understand results
Create backup collection methods
An e-commerce client we worked with last year was testing a new checkout flow for a subset of users. But instead of just tracking conversion rates, we captured:
Every user action in the flow
Drop-off points and recovery attempts
User frustration signals
Technical performance data
When conversions were lower than expected, we had everything needed to understand why.
This is what you need to do to avoid the same mistakes:
Be clear on what could disprove your hypothesis
Think about all the context you'd need to help you understand your results
Document all the things you think your stakeholders will ask to see
What This Means For You
Good execution is about intentionality. Every choice in your experiment setup, deployment, and monitoring should tie back to answering your core question.
Before your next experiment:
Get stakeholder agreement on paper
Strip your test to its absolute minimum
Plan your monitoring triggers
Define your must-have data points
More articles
Explore more insights from our team to deepen your understanding of digital strategy and web development best practices.