How my views on AB testing and experimentation have changed over the past 7 years

by Gerda Vogt-Thomas

April 24, 2024

Gerda Vogt-Thomas
blog post headers (3)

As of this moment in 2024, I’ve been working in Conversion Rate Optimization for about 7 years. I recently started thinking about how much my own views on how to do this job in a meaningful way have changed during this period. 

I think that we all do our best with what we have at the time and it’s important to keep learning to challenge your own thinking and how you do things. If someone declares themselves an “expert”, people almost seem to think that that person has done all the work there is to do and has now reached some sort of ultimate state of truth that they can bestow upon others. 

It should really be the opposite. Circumstances are always changing and especially in tech you need to keep learning and growing with your business.

So, here are 11 things that I’ve changed my mind about since I started working in CRO and how I think about them now.

Statistical Significance

7 years ago: We shouldn’t call or act on tests that reach under 95-99% significance.

Now: We’re not testing pharmaceuticals that determine whether people live or die. It’s always up for negotiation in terms of what type of risk the business owner is comfortable with. Sometimes 90 or even 80% is ok to implement the change if the trend stays consistent enough over the course of the experiment and you’re not overly concerned with false positives.

I recently had a great conversation with Juliana Jackson, who is a heavy hitter in the mobile app optimization and product experimentation space and as we were discussing the differences between that industry and web experimentation, it became clear that this obsession over statistical significance is a very web issue indeed. Mobile teams have so many other tech considerations for their launches that they simply cannot be concerned over reaching significance on every test. If you want to learn more about what web can learn from mobile, check out this convo here.

Analytics setups

7 years ago: We have to have a perfect analytics setup before we start a single test and can’t trust anything in the testing tool.

Now: Oh to be young and full of hope.. sometimes the analytics implementation can take months and obviously it’s necessary for the sustainability and long-term success of the experimentation program, but it really shouldn’t hold you back from starting with testing to get familiar with the process and work out the kinks that you’ll inevitably run into.

Best practices

7 years ago: Best practices don’t exist and nobody should rely on them.

Now: When you’re new or don’t have a lot of data to work with, best practices (or perhaps we should call it “other practitioners advice and experiences”) can be a helpful source of inspiration to get you started and moving in the right direction. 

Sliders, carousels, and all other blinking offers on the homepage hero

7 years ago: Carousels, especially on the home page, always hurt performance and nobody should ever have them on the site.

Now: While I do still believe that carousels mostly result from the organization’s inability to agree on what is the most important offer, I have now seen plenty of evidence that sometimes removing it can result in a losing test. Testing removing elements is perhaps equally important to testing adding new elements.

I think this also speaks to a more general topic - we can’t disregard a single experience in all contexts nor think it’s amazing for all contexts. An element or user journey that might work well for one business, can be detrimental to another, even if they’re in the same category. This is the essence of why people do experiments and don’t just copy what their competitors are doing.

Your CRO fate is not sealed by your traffic numbers

7 years ago: Low-traffic sites shouldn’t spend time on AB testing and should fully focus on qualitative research instead.

Now: This highly depends on the business case again. For example, if you work for a company where one lead is worth tens or even hundreds of thousands of dollars and you can swing big enough changes in tests that will result in increasing the conversions from 2 to 20, then it can all still be worth it.

Spending hours on perfect reports

7 years ago: AB test reports need to be as detailed as possible.

Now: Most execs absolutely do not care about p-value, stat. significance, MDE, and other numbers CROs love spending time arguing over. They care about what the change was on the website and whether it made them more money or at least moved things in the right direction for the company and what we are going to do based on these results.

Overlapping AB tests

7 years ago: We shouldn’t run tests that interfere or overlap with each other in any way.

Now: This doesn’t seem to be as much of an issue as I thought. In fact, most knowledgeable people in the industry now seem to agree that interaction effects are rare, and it's better to be aware of them early on since those changes will live together on the website when they are implemented anyway.

The highest-paid person's opinion actually matters (sometimes??)

7 years ago: Hippos are our worst enemy.

Now: You need to make Hippos your allies in the experimentation program because if they’re not, you’re not going to get much meaningful work done.

Not everything is worth the resources it takes

7 years ago: If you have the traffic then you should probably test every change you want to make on the website to make sure it’s not hurting things.

Now: Proper testing takes a tremendous amount of resources. From research to wireframing, to tech to design to dev to QA to analysis to implementation. Sometimes it just doesn’t make sense to use all those resources on a no-brainer change. For example, fixing the broken links on your About page or something similar.

There is no universal perfect time frame for an AB test to be live

7 years ago: All tests should run somewhere between 2 and 4 weeks. We’ll call it how we see it.

Now: Ideally you would do MDE calculations before launching any tests and determine the exact number of weeks a test should be run to gain the proper sample size. Read more here about how to do pre-test calculations.

The CRO being a SwissarmyPerson is a facade

7 years ago: A very good CRO person should be able to do it all - qualitative and quantitative research, analytics implementation, QA, wireframing, understanding stats to the t, analyzing the tests, everyday project management, and selling the project. 

Now: This is simply an insane expectation to put on one human being and I haven’t met anyone during these 7 years who is doing all of it on their own.

Conclusion

At Koalatative there are three of us right now and our roles heavily overlap at times, but generally, we all focus on our own areas and try to play to our individual strengths. We also work with other freelancers and agencies to bring in extra expertise through our network like development and design resources, ads and SEO, etc whenever it’s needed.

This is just a small list of things I’ve changed my mind about and I’m guessing my views on how to do this job properly will change again and again in the next 7 years to come. Each of the topics I mentioned here could obviously be explored in way more depth in individual articles dedicated to them, and perhaps that’s what I’ll do in the future.

But my biggest takeaway from this is that things are always way more contextual than they might seem at first glance. When you adopt a “do no harm but also mess around and find out” mentality you’re going to make mistakes but this is also the only way to learn. 

In experimentation programs it's common to have over 50% of the tests be “losers” or at least insignificant. Ideally, the wins will more than make up for it, but this is all part of the process.