What a customer health score should actually measure (and what most get wrong)

Ask ten Customer Success teams how they calculate health and you will get ten different spreadsheets, each one confidently weighting logins, ticket volume, and a CSM gut-feel field that nobody has updated since the last QBR. The scores look rigorous. They are color-coded. They are also, in most cases, measuring the wrong thing.

A health score has exactly one job: to predict whether an account will renew and expand, early enough that you can still change the outcome. Everything else is decoration. If your score goes red the same week the customer sends the cancellation email, it is not a health score, it is a postmortem.

The three things most scores get wrong

Before we talk about what to measure, it helps to name the failure modes. Almost every broken health score we have seen suffers from at least one of these.

It measures activity, not outcomes. Logins and feature clicks tell you someone showed up. They do not tell you whether the customer got the result they bought the product for. A power user who is frustrated is still churning.
It is a lagging indicator dressed as a leading one. Usage decline shows up after the relationship has already soured. By the time the line bends, the budget conversation is over.
It ignores the qualitative signal entirely. The clearest churn warning is usually a sentence in a Slack thread or a sigh on a call, not a number in a dashboard. Scores built only on product telemetry are deaf to the half of the relationship that lives in language.

The four inputs that actually predict renewal

A score worth trusting blends four categories. None of them is sufficient alone, and the weighting shifts by segment, but together they cover the surface area of a real B2B relationship.

1. Outcome attainment, not usage

Did the customer reach the result they signed up for? For a deployment tool that is shipping to production; for an analytics product it is decisions made, not dashboards built. Outcome attainment is harder to instrument than a login count, which is exactly why it is undervalued. It is also the single best predictor of renewal we track. Tie it to the goals stated during onboarding and revisit them every quarter.

2. Breadth and depth of adoption

One champion using one feature is a single point of failure. Accounts that survive a champion leaving are the ones where usage has spread across teams and into daily workflow. Measure how many distinct users, roles, and departments touch the product, and whether that number is growing or quietly contracting.

3. Relationship and sentiment

This is the input teams skip because it is hard to quantify, and it is the one that moves first. Tone in support tickets, enthusiasm on calls, response latency to your emails, whether the economic buyer still takes the meeting: these shift weeks before usage does. Read the leading indicators of churn for where to find them.

4. Commercial and operational fit

Late invoices, seat utilization well below the contract, a renewal date with no internal owner, an org change at the customer that wiped out your champion: these are structural risks that have nothing to do with how much the product is loved. They belong in the score because they predict friction at renewal regardless of sentiment.

A good health score is not a number you report. It is a number that tells you who to call on Monday, and what to say when you do.

Weight by segment, and let the weights move

A self-serve account and a seven-figure strategic account do not churn for the same reasons, so they should not share a scoring model. Self-serve health leans on adoption depth and outcome attainment. Strategic health leans harder on relationship signal and commercial fit, because a single executive sponsor leaving can erase a year of healthy usage overnight. Static, one-size-fits-all weights are how a score ends up technically accurate and practically useless.

How Merrily scores health

Merrily reads all four inputs from the tools you already run: product events from PostHog and your warehouse, sentiment and relationship signal from Slack, Gmail, and meeting notes, commercial signal from Stripe and your CRM. It assembles them into one score per account and, critically, shows the evidence behind every move, so a CSM can see the two support escalations and the 22% usage drop that pushed an account into the red. The score is only useful if you trust it, and you only trust it when you can see why it changed.

If you are rebuilding your health model, start by asking a blunt question of every input: would knowing this, a month earlier, have let me save an account I lost last year? If the answer is no, it is decoration. Cut it.