Choosing an AB Testing Prioritization Framework
You may hear testing and feel that pang of anxiety reminiscent of your childhood. But AB testing isn’t supposed to be panic inducing. Quite the opposite, actually. It's enlightening and will take your digital marketing strategy to the next level.
AB testing and conversion rate optimization are all about efficiency – benefitting site users by creating meaningful experiences that encourage goal conversions. Choosing the right AB testing prioritization framework for your digital experimentation program can increase your traffic through escalated conversions regardless of the channel or acquisition method.
Successful testing programs utilize methodical AB testing frameworks to manage the administrative aspect of an overarching conversion rate optimization strategy. There are a lot of moving pieces. And just as the end goal of testing and conversion optimization is efficient conversions, productivity of program operations is a foundational aspect that’s a major factor in predicting the long-term success and viability of a digital experimentation program.
Prioritization Process: Which Tests Should You Run?
So, you’ve ideated endlessly, you reached out to your colleagues for input and now you have a very long list of potential tests. How do you decide which test to run first? Which one will get you to the statistical significancenecessary to declare a winner?
Well, that’s easy - always run the most important test first! This is a big step and getting it right will impact the success of your testing program. Here’s all you need to know.
Why Develop a Prioritization Process?
A formal prioritization process helps team members remain objective, consistent and identify the tests that are most closely aligned with organizational goals. It helps us allocate resources efficiently while avoiding wasted time and effort. Adopting a prioritization model also lessens the HiPPO effect (it’s easier to tell your boss their idea sucks if it’s validated by a numerical ranking system) or any other biases that may otherwise impact your test prioritization.
Running a strategic program utilizing a structured AB testing framework takes time, and there is a significant amount of effort required to manage even a small number of tests. But the impact can outweigh the effort. Here are a few ways to evaluate your need for a prioritization model:
- If you’re running more than three tests per week, you need a prioritization model.
- If you have a HiPPO problem or strong personalities relying heavily on subjective instincts, you need a prioritization model.
- If running a strategic testing framework and prioritization model takes up less than 15 percent of your total available program time, the benefits outweigh the cost and you should use a prioritization model.
Pros and Cons of Popular AB Testing Frameworks
There are several well-known models for prioritizing projects, including the PIE, ICE, and PXL model. Noble Studios has used them all, and has modified, and adapted them to our own needs. As with any tool, there are pros and cons of each popular prioritization framework.
PIE Framework
The PIE framework was made popular by the team at Widerfunnel and is very simple to use. This framework consists of three ranking factors – potential, importance, and ease – that are rated on a 1-10 scale.
Pros: Simple and quick. Good for teams who are just starting with AB testing and digital experimentation with a low test velocity.
Cons: The variables are too subjective. If we truly knew the potential of each split test, we wouldn’t need to go through this prioritization process at all. Each individual will likely have a different valuation of each variable – because they are subjective. In a data-informed testing program, we strive for objective assessments.
ICE Framework
The ICE framework was created by the team at GrowthHackers. Like PIE, it also uses three ranking factors – impact, confidence and ease – that are rated on a scale of one to 10.
Pros: Like PIE, it is simple and quick. The ease of use makes it good for teams that are new to testing or teams that have a small testing program.
Cons: ICE also has an extremely subjective set of variables. How could we ever know in advance how a test is going to perform? Truth is, we don’t know. It’s just a guess, and assigning numbers to guesses disguises subjective assessments as objective data points.
PXL Framework
The PXL framework was developed by CXL with “the problems of other prioritization frameworks in mind.” At Noble Studios, we’re fans of the work CXL did on this model. It assesses 10 aspects of a test with a weighted model and objective true/false binary scoring system, where the 10 ranking factors are summed to arrive at the final test score.
Pros: The core benefit of this model is the objective true/false nature of the variables. It eliminates most issues caused by subjective assessments (aka “guessing”) that the PIE and ICE models utilize. It’s very straight forward – the test either is or is not above the fold. No guessing involved.
Cons: For teams that don’t do analysis of user behavior or deep analytics assessments, many of the inputs will be irrelevant. Teams may also have to “sell” the model to decision makers. The model is more reliable but less intuitive than PIE/ICE to those who don’t live & breathe CRO , so you may need to explain and validate the model.
The Noble AB Test Prioritization Framework
After using all the prioritization frameworks on the market, we developed our own. The Noble Studios prioritization framework is integrated into our Digital Experimentation Program Management tool, which handles everything from test idea submission to completed test archives. It ties directly into our conversion rate optimization strategy and broader AB test framework.
Our prioritization model was inspired by the PXL model, focusing on objective assessments to help minimize the randomness and inconsistencies in subjective models like PIE and ICE, but is much easier to use and understand.
Key Benefit of the Noble Prioritization Framework
The key benefit of the Noble model is flexibility. Every testing program is different, and changes can occur within the same program as organizational goals change or new leadership comes on board. The Noble model can be adjusted easily for changes in values/objectives, and the entire test backlog will re-prioritize accordingly in real time.
It is founded on binary logic: true/false, yes/no and is/is not. We love that. It’s very straightforward. To complement the binary foundation, we’ve allowed for an overall inflation/deflation of any input variable. For example, if you think x is way more important than y, you can easily adjust for that, while keeping the foundational logic the same – x is either true or false, and y is either true or false. In that way, it’s both restrictive (true or false – that’s it) and flexible. We’ve found it to be very resourceful in helping with the goal of prioritization models – identifying the most important tests to run next.
Noble Framework Ranking Factors
- Campaign Alignment: Is the test on a designated campaign page/element identified and prioritized in the strategic site audit? At Noble, we begin every optimization program with an in-depth audit where we identify areas of focus. This variable identifies tests that are aligned with our strategic focus and prioritizes them higher.
- High Potential Test: Does this test present exceptionally high potential to impact the business? We specifically use the term “exceptionally” and fill in a “true” value only in exceptional situations. If you’re saying, “this is sort of subjective,” we recognize that, but at least with a binary system there aren’t 10 degrees of subjectivity.
- Origin of Evidence: Is evidence of an optimization opportunity founded on data? Whether website analytics, user surveys, or heatmaps, we want to know how the “problem” was identified. Tests founded on data rather than intuition are prioritized higher.
- Motivational Alignment: Does the lever directly affect a motivational variable? As part of every program we run, we utilize motivational variables, like “reducing distractions.” If a test is tied to a motivational variable, we prioritize it a little bit higher.
- Above the Fold: Is the test on an element visible above the fold? Just as in the PXL model, we reward tests that are on elements above the fold. Why? There is more potential for impact because more users will see it.
- Direct link to Primary Program Goal: Is the test intended to directly improve a primary goal? Goals are foundational for any strategic testing program and there shouldn’t be too many (unless you’re testing velocity supports it). We inflate the prioritization score for tests that are DIRECTLY linked to a primary goal.
- In-Platform Test with Visual Editor: Can the test be set up and run using only the visual editor, meaning no technical web development support is needed? Tests that can be easily implemented using the visual editor within the testing platform (VWO, Optimizely, etc) are easier to set up, so we inflate the prioritization score for these tests.
- Test Requirements Estimated: Were the test effort levels estimated or scoped by the department/team doing the work? This variable is related to effort. Here, we simply want to know if the person/team building the test was given time to scope it out, or if it’s just an estimate. Usually we have estimates, but if actually scoped, we increase the prioritization score for the test.
- FastTrack Executive Endorsement: Is there an executive sponsor willing to bypass the standard prioritization model? Sometimes the boss just wants to run a certain test. We could ask why or push back, but does it really matter? Nope. So we built in this variable. If “true,” we inflate the prioritization score for the test.
- Channel Targeting: Is the test channel specific or will it run for all website visitors regardless of channel? More channels mean more exposure, which means more potential. Test running on all channels are scored higher.
- Device Targeting: Will the test apply to all devices, or will it be limited to desktop or mobile only? Just like channel targeting, device targeting affects traffic/exposure which impacts potential. Tests running on all devices are prioritized higher.
- Organizational Synergy: Will this test help or inform the efforts of another organizational business? Sometimes a test can help inform decisions for other teams, rather than just the team responsible for site conversions. If that is the case, a test will be prioritized a little bit higher.
- Quick Results Expected: Will this test conclude with statistically significant results quickly (less than three weeks)? Tests that run quickly are our favorites. Get ‘em done and let’s move on to the next test! If a test is expected to be done in less than three weeks, we give it a little boost in the priority score.
- High Effort Test: Is this test considered "high effort," as defined by the team and their budget/capabilities. This is different for everyone. You might think 20 hours of work is a lot, but someone else might think that’s nothing. Define “high effort” in a way that makes sense for your program. This variable is a deflator of value. If “true,” we lower the prioritization score.
With 14 variables in total, it only takes about 90 seconds per test idea to fill in the appropriate values. It’s either true or false - pretty straight forward. In the past, we spent more time deciding whether “potential” in the PIE model was a “6” or a “7” than completing all 14 variables for the Noble Studios model – true story.
Customizability – Potential Impact and Power
Additionally, to meet the needs of each optimization project we work on at Noble Studios, we integrated a variable scale into the binary system so important factors have more impact on the overall prioritization score. For example, if a team is very limited in support by the web dev team, we can add additional weight to the “high effort test” variable, so tests that require high effort are deprioritized. We have 3 tiers within the binary value system:
- Binary: true=1, false=0
- Binary Plus: true=2, false=0
- Super Binary: true=4, false=0
We also define each variable as a Base Builder or a Base Reducer — builders increase the prioritization score and reducers decrease the weighted scoring.
Combined, this variable scale system allows us to easily customize the ranking algorithm and meet the specific needs of each optimization project.
As an additional level of control to make it super-easy to modify the algorithm and reprioritize the backlog, we integrated a control panel that allows optimizers to choose the current level of impact for each variable. Changing the variable weight for any specific factor will automatically redistribute its potential impact within the prioritization model.
Visual Representation of Variable Influence
To help understand how the 14 unique ranking variables are working together within the model, we’ve integrated a visual representation of the proportional impact of each factor. It shows impact potential for each variable as a percentage and indicates directional impact as well (score builder or score reducer). This feature is super useful and helps us understand how the prioritization model is working.
This example shows that the team is valuing “fast track executive endorsement,” which typically means a boss wants her voice heard. Besides that, “above the fold” and “high potential test” are the two other factors with the most potential influence on the overall prioritization. “High effort test” has the ability to deflate prioritization scores by 18 percent, which indicates that this team doesn’t want tests that require much dev work, even if the test shows a lot of potential. In this example, “device targeting” is low too, so they don’t really care if testing is on all devices; this indicates that their website traffic is skewed heavily to either desktop or mobile.
Here, an example of the Noble Studios prioritization model shows the team is not as concerned with “high effort tests,” as it only gives a max potential of 7 percent (as a deflator or score reducer). This team highly values campaign alignment, which suggests a focus on specific goals likely discovered through a full site audit. And notice too that they don’t give any power to the “FastTrack Executive Endorsement” variable – this team is not concerned with the HiPPO at all.
There is a tremendous amount of flexibility within the Noble model and we’ve found the visual representation is really useful in fine-tuning the prioritization framework to align with the goals of the project’s AB testing strategy.
How to Access the Noble Studios Prioritization Framework
Interested in trying our prioritization product? You’re in luck! Contact us to gain access to the Noble Studios Prioritization Framework model and try it on your CRO testing strategy. You’ll be astounded by the impact it has on your marketing objectives.