Reconciling two vendors when there's no ground truth

A common situation: a team buys the same data from two vendors, the two disagree on a meaningful share of records, and there is no third source to call the tie. The instinct is to pick the better vendor and move on. That instinct is wrong.

No vendor is better everywhere. One is sharper on liquid names and stale on the long tail. The other is faster on updates but fat-fingers the occasional identifier. Choose one globally and you import its specific weaknesses across the whole book. The disagreements do not go away; you have just stopped looking at them.

Classify before you resolve

Start by describing each disagreement instead of resolving it. Three dimensions do most of the work.

Type. A small gap on a stable record is noise; the same gap on a volatile one might be real. Tolerances belong per type, not as one global number.

Magnitude against tolerance. Express the difference as a fraction of an expected band, so a tiny move and a large move do not sit in the same bucket.

Staleness. Half of all disagreements are one vendor not having updated yet. Compare timestamps before values, and a large share resolve themselves.

Route, don't decide

With disagreements classified, most fall into bands you can handle automatically. What is left is the residue: large, fresh, unexplained gaps that matter. That set is usually small enough for a person to clear in minutes, which is the whole point. You have spent automation where a rule is defensible and human attention only where judgment is required.

The deliverable is not a single reconciled number. It is a number plus the reason it was chosen, and a short list of the cases nobody could choose for you.

Classify before you resolve

Route, don't decide

Notes on data done right.