Today, I want to revisit a foundational flaw in how federal audit contractors calculate overpayments – a flaw so severe that in any other federal domain, it would trigger congressional hearings.
We all know the government loves extrapolation: pull a sample of claims, determine an error rate, then multiply it across the universe. Simple math, massive impact. But this entire process depends on one critical assumption: that the underlying claim-level determinations are accurate. And the evidence shows they are not.
Let’s start with Unified Program Integrity Contractors (UPICs). When providers appeal audit determinations one claim at a time, UPICs lose about 60 percent of the time. The auditor is wrong more often than they’re right. Yet the Centers for Medicare & Medicaid Services (CMS) still allows those error-prone determinations to be multiplied across thousands of claims. In statistics, that’s not measurement; that’s error amplification.
But the problem runs even deeper. Evaluation and management (E&M) coding, arguably the most common audit target, has been shown to have disturbingly low inter-rater reliability. A study published in the Archives of Internal Medicine examined certified coding specialists scoring the same E&M visits. They reached consensus only 59 percent of the time. And only 7 percent agreed on all the test cases.
A companion study found that it worsens with physicians. For established patient visits, doctors agreed with expert coders about half the time. But for new patient encounters? Agreement dropped to just 17 percent. That means that for new patients, physicians and expert auditors agreed less than once in five claims.
So, now we have two layers of uncertainty. Auditors are wrong in roughly 60 percent of appeals. And coding experts can’t agree within one level on the “correct” answer 40 percent of the time (or 83 percent of the time for new patients).
Yet somehow, these subjective opinions get treated as objective fact – and then multiplied into multi-million-dollar overpayment demands.
Now, imagine this level of error in any other federal system. If the Internal Revenue Service (IRS) issued audit findings that were reversed 60 percent of the time on appeal, Congress would shut the process down overnight. There’d be hearings, investigations, and probably a moratorium on enforcement. Taxpayers wouldn’t tolerate it.
Or consider the Department of Defense. If a missile system worked only 40 percent of the time – if clearance decisions were wrong more often than right – that would be classified as a national security threat. The Pentagon treats single-digit error rates as unacceptable. Not sixty.
And then there’s aviation. Commercial aviation operates at what engineers call “nine sigma” – far beyond Six Sigma manufacturing standards. If maintenance documentation were wrong 60 percent of the time, planes would be grounded worldwide. If flight systems failed at that rate, we’d have thousands of crashes every single day.
No one in aviation, defense, or tax enforcement would ever accept these levels of inconsistency as the basis for large-scale penalties.
But somehow, in healthcare – the most complex, heterogeneous, and documentation-dependent sector of them all – we not only accept these error rates. We multiply them. This is why extrapolation, as currently practiced, is fundamentally unsound. You cannot stack layers of human disagreement, auditor error, and unstable statistical assumptions, and then claim the output is a reliable measure of improper payment. Before CMS multiplies anything, they need to demonstrate they can reliably get one claim right.
The current evidence says they can’t. To quote Winston Churchill, “However beautiful the strategy, you should occasionally look at the results.”
And that’s The World According to Frank.


















