MaxDiff Tells You What People Say They Want. Not What Drives the Sale.

MaxDiff is the best ranking tool in survey research — and still measures the wrong thing for high-stakes decisions. Here's how it works, and where it quietly misleads.

FB
Dr. Frank Buckler Founder, SUPRA · 7 min read

What MaxDiff actually does

MaxDiff — maximum difference scaling, also known as best-worst scaling — is a method for ranking a list of items by importance or preference. It is the cleanest ranking instrument survey research has produced, and on its own terms it is genuinely excellent.

Here is the mechanic. Instead of asking respondents to rate every item on a scale, you show them small subsets — usually four or five items at a time — and ask two questions: which item in this set is best (most important, most appealing), and which is worst. Each respondent sees many such sets, with the items rotated so every item is shown a balanced number of times. From those repeated forced choices you reconstruct a full importance ranking for the entire list.

A worked example: ranking 12 product features

Say you want to rank 12 features for a SaaS product. A respondent might see this set of four:

Single sign-on  ·  Mobile app  ·  Advanced reporting  ·  24/7 support
Best: Advanced reporting    Worst: Mobile app

They repeat this across roughly a dozen sets, each a different combination. Across the whole sample, the math counts how often each feature is chosen best versus worst and produces a single utility score per feature — a clean rank order from "must-have" to "nobody cares." No respondent ever rated all 12 on a scale; they only made small, decisive comparisons.

Why researchers love it — and they're right to

MaxDiff earns its reputation. Compared with the rating scales it replaces, it fixes three real problems:

If your job is to take a long list and produce a defensible, bias-resistant ranking, MaxDiff beats Likert ratings every time. None of what follows takes that away.

The turn: it measures what people say, not what they do

Here is the part survey vendors rarely put on the slide, because they sell survey software and MaxDiff is what they sell.

MaxDiff measures stated importance inside a survey grid. It does not measure causal impact on real choice or revenue. Those are different things, and the gap between them is exactly the say-do gap — the most expensive blind spot in consumer research.

A respondent can rank "sustainability" as the #1 feature in your MaxDiff and then buy the cheaper, less sustainable competitor an hour later without a flicker of guilt. The ranking is sincere. It is also irrelevant to the purchase. Stated importance is not the same as a revealed driver of behavior. MaxDiff is very good at the first and silent about the second.

The three gaps

When a MaxDiff result misleads a real decision, it is almost always through one of these:

What to do instead — or alongside

The fix is not to abandon MaxDiff. It is to stop treating a stated ranking as if it were a causal one.

The decision-grade question is never "what do customers say matters?" It is "what actually moves choice, conversion, and willingness to pay?" That answer comes from derived, causal importance — modeling real behavioral data to see which drivers genuinely shift the outcome, rather than asking people to nominate their own reasons.

This is the core of SUPRA's approach: pairing Deep Implicit measurement of gut-level response with Causal AI that isolates true drivers from the noise of correlation. When the stakes are high, use MaxDiff to prioritize the list, then validate the top of that list against what behavior actually shows. When the two agree, you proceed with confidence. When they disagree — and on the attributes people most love to over-claim, they often do — the behavioral signal wins.

From the book
"A ranking of what people say they want is not a strategy. It's a list of hypotheses dressed up as findings. The job is to test which ones actually cause the sale."
Dr. Frank Buckler, Think Causal not Casual

When MaxDiff is fine — and when it isn't

Your Situation What to Use
Trim a long feature list to a shortlist for further work MaxDiff — it's built for this
Prioritize messages or claims for creative testing MaxDiff, then test winners behaviorally
Multi-country importance ranking with messy scale habits MaxDiff (scale-free is the point)
Decide which feature to build next, with real budget on the line Causal importance from behavioral data
Identify what truly drives conversion or willingness to pay Causal AI — not stated importance
Attributes people love to over-claim (sustainability, ethics, health) Behavioral validation, never MaxDiff alone
Cost of being wrong is high MaxDiff to scope, Deep Implicit + Causal to decide

The bottom line

MaxDiff is the best ranking tool in the survey toolbox. Use it when you need a fast, robust, bias-resistant ordering of a list — that is real value, and the vendors are right to sell it.

Just don't mistake the ranking for the answer. A list of what people say they want is the starting point of a decision, not the end of one. When the cost of being wrong is real, find out what actually drives the sale.

Frequently asked questions

What is MaxDiff analysis?

MaxDiff (maximum difference scaling, also called best-worst scaling) is a survey method for ranking a list of items by importance or preference. Respondents see small subsets of items repeatedly and pick the best and worst in each set. Aggregating those forced trade-offs produces a clean, discriminating importance ranking with no scale-use bias — far better than rating every item on a 1-to-5 scale.

What is the difference between MaxDiff and conjoint analysis?

MaxDiff ranks standalone items (features, messages, benefits) against each other to find relative importance. Conjoint analysis bundles attributes into realistic product configurations — including price — and measures how each attribute drives the choice of one bundle over another. Use MaxDiff to prioritize a list; use conjoint to model trade-offs and willingness to pay. Both are stated-preference methods and share the same say-do limitation.

How many items can MaxDiff handle?

Comfortably 10 to 30 items, and up to about 40 to 50 with an efficient design and enough respondents. Each respondent only sees a fraction of all possible sets, so the list scales further than a rating grid would — but more items means more screens, more fatigue, and a larger sample to estimate every item reliably.

Does MaxDiff predict real behavior?

Not directly. MaxDiff measures stated importance inside a survey grid — context-free, with no price, no competitor, and no real stakes. People routinely rank an attribute first and then buy on something else entirely. For directional prioritization it is reliable; for high-stakes decisions, validate the ranking against behavioral data and causal importance, which measures what actually moves choice, conversion, or willingness to pay.

Find out what actually drives the sale

If a survey ranking is steering a decision that really matters, it's worth knowing whether the top of that list survives contact with real behavior. That's what we do.

Get my AI Diagnostic →