Three measurement methods. Different jobs. Pick the right one.
MMM, Multi Touch Attribution, and Incrementality Testing answer different questions at different altitudes. Most teams pick what is easiest to access. That choice quietly shapes the budget. Start with the decision you need to defend.
The Problem
Most measurement stacks were built for reporting, not decision making. Reporting answers what happened. Decisions ask what caused it, and what happens if we change the mix.
That gap creates predictable distortions. Branded search looks efficient because it captures demand. Retargeting looks profitable because it closes the last mile. Upper funnel looks weak because it rarely gets the last touch.
MMM, MTA, and incrementality are not competing. They solve different problems. Strategy, optimization, validation. If you use a tactical tool for a strategic decision, you get a bias that repeats every week.
One practical test: If your measurement cannot tell you where diminishing returns start, it is not an allocation tool. It might still be useful. It just has a different job.
The Methods
Media Mix Modeling
Models the relationship between spend and outcomes over time, with controls for seasonality, pricing, promotions, and demand shocks. It explicitly handles carryover effects and diminishing returns, so you get response curves instead of one efficiency number. It is built for allocation decisions.
Multi Touch Attribution
Assigns conversion credit across trackable touchpoints inside digital journeys. Its output depends on identity, attribution windows, deduping rules, and what events are observable. Use it to compare options inside a platform. Do not use it as cross channel truth.
Incrementality Testing
Uses holdouts or geo splits to estimate causal lift. Done well, it gives a lift estimate with uncertainty, not a story. It is narrow by design, which is why it is credible.
Useful pairing: MMM estimates the shape of returns across the mix. Tests calibrate the parts you cannot afford to be wrong about.
Head To Head
This is the part most teams skip. What does each method output, and what does it tend to get wrong.
| MMM | MTA | Incrementality | |
|---|---|---|---|
| Primary question | How should we allocate budget across channels? | Which touches are associated with conversions? | Did this change cause lift? |
| Typical output | Contribution ranges, response curves, marginal returns, scenarios. | Credit shares, path reports, platform level performance. | Lift estimate with confidence bounds for a specific intervention. |
| Data required | Spend and outcomes for 18 to 24 months, consistent naming, major promos flagged, key business events noted. | User event tracking, stable conversion definitions, identity stitching, stable settings. | Holdout design, enough volume to detect lift, clean separation to reduce spillover. |
| Coverage | All channels, including halo effects and non marketing controls. | Digital only, and only what is observable. | One lever at a time, by design. |
| Where it lies | When promos are not flagged, when naming is inconsistent, when data cadence is too coarse. | When windows and identity inflate last touch, when privacy limits remove parts of the journey. | When geos bleed, when duration is too short, when holdouts are not truly held out. |
| Causal rigor | |||
| Best for | Allocation, diminishing returns, planning, and prioritizing what to test next. | Fast iteration inside a platform when tracking is reliable. | Validating high stakes claims before reallocating meaningful budget. |
What We Pull
MMM is not magic. It is data hygiene plus a defensible model plus validation. The fastest way to break it is to treat inputs as a spreadsheet problem.
We pull four categories: spend, outcomes, controls, context. If one is missing, we can still build, but uncertainty increases and the conclusions narrow.
| Category | Source | Typical endpoint or export | Notes |
|---|---|---|---|
| Spend | Meta Ads | GET /act_{ad_account_id}/insights |
Daily spend, impressions, clicks, campaigns, ad sets. We align billing totals and timezone. |
| Spend | Google Ads | GAQL via googleAds:searchStream |
Spend by campaign and network. Search, YouTube, Performance Max. We normalize naming to channels. |
| Spend | TikTok Ads | /open_api/v1.3/report/integrated/get |
Spend and delivery metrics. Useful when Meta and TikTok share credit but not incrementality. |
| Spend | Klaviyo /campaigns or exports |
Send volume, revenue attribution, and timing. We treat it as a channel with lag and saturation. | |
| Outcomes | Shopify | /admin/api/{version}/orders.json or GraphQL |
Orders, revenue, discounts, refunds. We choose the outcome definition up front. |
| Outcomes | GA4 | runReport via GA4 Data API |
Sessions, conversions, revenue. Helpful for split outcomes like new users vs purchases. |
| Outcomes | Stripe | /v1/charges or exports |
Subscription revenue and refunds. Useful when the business outcome is not Shopify revenue. |
| Controls | Promos and pricing | Promo calendar, price lists, product feeds | Promo flags are not optional. If you do not mark promotions, marketing gets blamed for them. |
| Controls | Inventory and ops | ERP exports or Shopify inventory endpoints | Stockouts and shipping delays create false negatives. We include them when they matter. |
| Context | Site health | Incidents log or simple outage notes | Downtime, checkout bugs, tracking breaks. We model around known failure weeks. |
Format we standardize to: one row per day per channel, plus an outcomes series, plus control flags. That sounds simple. The work is making it true.
How We Make It Defensible
Most MMM failures are not modeling failures. They are definition failures. Wrong outcome. Wrong channel mapping. Promos missing. Reporting periods misaligned.
We use a small set of repeatable checks before we trust any output. It is boring. It is also why the model stays usable.
- Spend reconciliation: platform totals match invoices within a tolerance, with timezone aligned.
- Channel mapping: naming rules are written, versioned, and applied consistently across time.
- Promo hygiene: major discounts, launches, and price changes are flagged as controls.
- Carryover handling: channels with delayed effect are modeled with lag and decay, not assumed instant.
- Saturation handling: returns bend. We model diminishing returns so scale decisions do not rely on linear myths.
- Holdout windows: we test on periods the model did not see. We do not grade on the training set.
- Sensitivity checks: results should not flip from minor parameter nudges. If they do, we say so.
Incrementality is the calibration layer. When a decision is expensive, we propose a test that isolates one lever. MMM then gets updated with a reality check.
Deliverable standard: the output is not a deck. It is assets you own. Data dictionary, mapping rules, model artifacts, and a decision memo tied to scenarios.
A simple decision template
We do not hand you channel ROAS tables and call it strategy. We hand you tradeoffs.
- If we cut Channel A by 20 percent, expected impact is a range, with assumptions listed.
- If we move that budget to Channel B, expected impact is another range, with saturation risk called out.
- If uncertainty is too wide, we propose one test that reduces it.
When To Use What
Match the method to the decision.
Ask two questions first. Is this decision about allocation, optimization, or proof. Then ask what mistake is most expensive, being slow or being wrong.
Triangulate. Do not average.
MMM gives you an allocation view with uncertainty. Attribution gives you a fast directional view inside what is trackable. Tests give you proof for specific claims. When they disagree, treat the gap as a hypothesis, then validate. Decisions, not reports.
Our Take
Start with MMM when you need an allocation system. It forces the work that most teams avoid. Clean channel definitions, clear outcomes, and explicit assumptions about carryover and saturation.
Do not oversell precision. MMM should output ranges and scenarios, not certainty. The useful part is the shape of returns and the tradeoffs, not the third decimal place.
Use tests to buy confidence where it matters. If a single decision moves a meaningful share of spend, validate it. A good test is cheaper than a quarter of confident misallocation.
Keep it operational. The goal is a loop the business can run. Audit the data, update the model, sanity check the outputs, test the riskiest assumptions, then reallocate.