The Centers for Medicare & Medicaid Services (CMS) operates a surveillance platform called the Fraud Prevention System (FPS) that analyzes Medicare claims across the full fee-for-service stream — before payment is made — using predictive analytics specifically designed to detect aberrant billing patterns. It is applied nationwide against every fee-for-service (FFS) claim before payment. It does not wait for a complaint. It runs continuously at a population scale.1
Keep that in mind as you think about what autonomous coding actually does to your claims data.
Autonomous coding — artificial intelligence (AI) and natural language processing (NLP) systems that analyze clinical documentation and assign procedure codes, modifiers, and diagnoses automatically — is no longer a pilot project at a handful of academic medical centers. It is a production technology moving into mainstream revenue cycle operations. And it is doing so at a moment when the U.S. Department of Justice (DOJ), U.S. Department of Health and Human Services (HHS) Office of Inspector General (OIG), and CMS have each independently begun signaling that algorithmic decision-making in healthcare billing is one of their next enforcement frontiers.
That timing is not coincidental. And it is not comfortable.
What Autonomous Coding Does to Your Risk Profile
To understand why regulators are paying attention, you have to understand what autonomous coding does to the structure of compliance risk — because it is genuinely different from anything traditional compliance programs were designed to handle.
Traditional coding errors are, by nature, distributed. One coder misapplies a modifier. Another selects the wrong level of service. A third misses a complication or comorbidity (CC) or major CC (MCC) pairing. These errors show up as noise in the data — a slightly elevated rate here, an unusual outlier there. They are real problems, but they can be caught through the same retrospective sampling and targeted education that compliance programs have used for decades.
Autonomous coding changes the error architecture entirely. When a machine assigns codes across thousands of encounters using the same underlying logic, a flaw in that logic does not produce noise. It produces a pattern — consistent, systematic, and statistically coherent. It’s the kind of pattern that predictive analytics platforms are specifically built to detect.
When a human coder makes a mistake, it shows up as noise. When an algorithm makes a mistake, it shows up as a pattern. And patterns are exactly what external audit programs are built to find.
The financial and regulatory exposure scales with claim volume. Misapplied modifier logic or incorrect documentation weighting that might represent a manageable overpayment in a manual coding environment can result in millions of dollars and thousands of claims in an automated one. The False Claims Act (FCA) does not require specific intent to allege fraud. It imposes liability where an entity acts with actual knowledge, deliberate ignorance, or reckless disregard of the truth or falsity of its claims – and an organization running autonomous coding without independent monitoring of its outputs has a very difficult time arguing that it falls outside any of those three standards.
The Regulatory Infrastructure Is Already in Place
Here is the part that I think many healthcare organizations are not yet fully reckoning with: you do not need to wait for new rules to be written. The enforcement tools that regulators would use to pursue liability related to autonomous coding already exist and are already operational.
Start with the DOJ. Through its AI and civil rights enforcement posture — including the Civil Rights Division’s ongoing coordination across federal agencies on algorithmic accountability — DOJ has made clear that organizations cannot point to vendors as a shield for algorithmic outcomes. The compliance principle is well-established; you are responsible for how you use an automated system, regardless of who built it. That is not a new doctrine. Applying it to autonomous coding systems is.2
The emerging enforcement record makes the practical implication clear. Investigations and litigation involving algorithmic utilization management tools — systems used by insurers that allegedly denied medically necessary services, at scale — show that when an algorithm is wrong at volume, the potential exposure scales with the claim count. The same logic applies directly to coding systems that produce systematic billing errors across large claim populations.3
The vendor contract is not a defense. Organizations remain responsible for the outcomes their automated systems produce — regardless of who built the system.
The OIG has been equally direct. Work Plan items and policy statements indicate that entities using automated tools — including algorithms in billing and claims processing — are expected to maintain adequate oversight and governance to prevent improper billing. That expectation is not yet codified as a formal minimum standard, but Work Plan items have a well-established history of becoming the basis for targeted audit activity.4
And then there is CMS – which, as noted at the outset, is already running the Fraud Prevention System against your claims every day.1 FPS uses the same class of pattern-recognition technology that makes autonomous coding attractive in the first place — which means that if an autonomous coding engine introduces unusual coding distributions, those patterns will surface in CMS analytics quickly, consistently, and at scale.
Why Traditional Auditing Doesn’t Solve This
The instinct for many compliance programs is to respond to autonomous coding risk the same way they respond to every other coding risk: increase the sample size, add a few more targeted reviews, beef up coder education. That instinct is understandable, and largely wrong.
Random sampling and retrospective chart review were designed for a world where coding decisions are made by individual humans whose behavior can be observed, corrected, and tracked over time. They are genuinely useful tools in that context. In a context where coding decisions are made algorithmically, across thousands of encounters per day, they have two critical limitations.
First, statistical power: a random sample of 30 or 50 charts cannot detect a subtle pattern operating across tens of thousands of encounters. The Physician Fee Schedule contains thousands of procedure codes and hundreds of modifier combinations. The sample sizes required to reliably detect a 2- or 3-percent anomaly in modifier utilization across a high-volume service line are simply beyond what traditional auditing can deliver.
Second, and more fundamentally, traditional auditing evaluates individual coding decisions. Autonomous coding risk is not located in individual decisions — it is in the aggregate behavior of the system that produces those decisions. Catching a wrong code on a specific claim does not tell you whether the algorithm that assigned it is systematically biased toward higher-level code selection across an entire specialty group. Only population-level analysis can answer that question.
External auditors don’t look for flawed charts. They look for flawed patterns. Internal compliance programs that aren’t doing the same thing will consistently be the last to know.
What Effective Governance Actually Looks Like
Given all of this, what should organizations deploying autonomous coding actually be doing, from a compliance governance standpoint? Based on the current regulatory environment and the structural characteristics of algorithmic risk, I think four things will become the baseline expectations:
- Human oversight that is real, not nominal. Algorithms cannot be the final authority on billing decisions — not because the technology is necessarily unreliable, but because the regulatory framework does not recognize algorithmic authority as a defense. There needs to be a documented human review process that is meaningful, not a rubber stamp after the system has already processed the claim.
- Active behavioral monitoring of the coding system itself. This means tracking the statistical distribution of coding outputs over time — evaluation and management (E&M)-level concentration, modifier utilization patterns, Diagnosis-Related Group (DRG) weight trends, and case mix index (CMI) movement — and establishing thresholds that trigger review when the system’s behavior deviates from expected parameters. The monitoring cannot be limited to individual claim accuracy. It has to operate at the same population level as the risk it is designed to detect.
- Explainability at the claim level. When a payor or government auditor challenges a coding decision, the organization needs to be able to trace that decision back to the clinical documentation that supported it. If the coding system cannot produce that explanation in a form that holds up to scrutiny, the audit defensibility posture is materially weakened, regardless of whether the code was actually correct.
- Independent compliance validation that is structurally separate from the coding system. This is the piece that is most often missing when I see organizations evaluating their autonomous coding governance posture. The entity responsible for monitoring the algorithm’s behavior cannot be the same entity that operates the algorithm. The independence is not just good practice — it is increasingly what regulators will look for when they ask whether an organization has had adequate oversight.
The Algorithm Cutting the Other Way
There is a common conversation that does not get nearly enough airtime in discussions about autonomous coding governance, and I want to name it directly before closing.
While providers are being told — correctly — that they need robust governance and independent monitoring over their autonomous coding systems to avoid regulatory scrutiny, private payors have been running their own autonomous algorithms for years. And those algorithms are not assigning codes; they are denying claims.
The most prominent example is UnitedHealth Group’s nH Predict algorithm, which is the subject of an ongoing federal class-action suit alleging that it denied post-acute care claims at rates far exceeding what human reviewers would have approved — and that the company knew the algorithm’s error rate, alleged in the complaint to approach 90 percent, was unacceptably high when it deployed the system. The case is active, and the discovery record is instructive.5
Cigna faced parallel scrutiny over a system that allegedly allowed physicians to deny claims in bulk at a rate of roughly 1.2 seconds per claim.6 The U.S. Senate Permanent Subcommittee on Investigations launched a formal investigation into algorithmic claim denials by Medicare Advantage (MA) plans, finding — based on more than 280,000 pages of documents subpoenaed directly from the insurers — that denial rates bore no reasonable relationship to the underlying clinical documentation, and that algorithms were being systematically used to override physician judgment on medically necessary care.7
A provider’s autonomous coding system that produces an elevated E&M distribution draws immediate audit attention. Payor-side AI denial cases, by contrast, have largely produced litigation and, where resolved, targeted settlements — rather than the structural reform that would change how the algorithms behave.
The asymmetry here is worth taking note of. The regulatory apparatus being brought to bear on provider-side algorithmic coding — FCA exposure, OIG governance expectations, CMS surveillance — is real and appropriate. But the same logic that makes autonomous coding risky on the provider side applies with equal force on the payor side. An algorithm that systematically denies medically necessary claims is producing improper outcomes at scale, with the same explainability problems and the same potential for error propagation that make autonomous coding a compliance concern.
The difference is not that the risk is structurally different. The difference is that the regulatory pressure has been applied asymmetrically. Providers face audit exposure for over-coding. Payors have faced litigation and investigation for algorithmic underpayment — but the resulting remedies have rarely required the kind of systemic change that would alter the underlying algorithm’s behavior.
I raise this not to let providers off the hook for their own governance obligations — they are real, and this article has laid out why. I raise it because the policy conversation about algorithmic accountability in healthcare billing needs to be honest about where the algorithms are actually operating, and whose interests they are serving. The answer, right now, is not symmetrical.
If regulators are serious about algorithmic accountability in healthcare — and the DOJ, OIG, and CMS signals suggest they are — then the scrutiny needs to run in both directions. A framework that demands governance and transparency from provider-side coding systems while giving payor-side denial systems effective immunity is not an accountability framework. It is a market structure.
The Question Worth Asking Now
Autonomous coding is not going away. The operational pressures driving its adoption — workforce shortages, claim volume growth, reimbursement complexity — are real, and the technology is genuinely capable of addressing them. The goal here is not to argue against automation. The argument is that automation without governance is a liability waiting to be discovered.
CMS is already running population-level predictive analytics against your claims every day. DOJ’s AI enforcement posture makes clear that algorithmic scale amplifies exposure. OIG has put AI governance on its radar. The organizations that will navigate this environment successfully are the ones that ask the right question now—not after the first contractor letter arrives.
The question is not “can we trust the algorithm?” The question is “do we have independent visibility into what the algorithm is actually doing — at scale, over time, across our entire claim population — before someone else does?”
If the honest answer is “no,” that is the compliance gap worth addressing first.
And if you happen to be a payor reading this, the same question applies to your denial engine as well. You just haven’t been asked it yet.
Sources
2. U.S. Department of Justice, Office of Public Affairs. “Five New Federal Agencies Join Justice Department in Pledge to Enforce Civil Rights Laws in Artificial Intelligence.” April 4, 2024. [Archived.] See also: DOJ Civil Rights Division, “Artificial Intelligence and Civil Rights,” archived at justice.gov/archives/crt/ai; and DOJ, “Artificial Intelligence and Criminal Justice: Final Report,” December 3, 2024.
3. Estate of Gene B. Lokken et al. v. UnitedHealth Group, Inc., et al., Case No. 0:23-cv-03514-JRT-SGE, U.S. District Court, District of Minnesota (filed November 14, 2023; ongoing). February 13, 2025, order denying motion to dismiss (breach of contract and good faith claims proceeding). See also: U.S. Department of Labor v. UMR Inc. (UnitedHealth subsidiary), filed June 2023.
4. HHS Office of Inspector General. General Compliance Program Guidance (GCPG). November 2023.
5. HHS Office for Civil Rights. “Section 1557 of the Affordable Care Act: Nondiscrimination in Health Programs and Activities” (Final Rule, effective July 5, 2024). 89 Fed. Reg. 37,522 (May 6, 2024).
6. Estate of Gene B. Lokken et al. v. UnitedHealth Group, Inc., et al., Case No. 0:23-cv-03514-JRT-SGE (D. Minn. 2023), complaint at ¶¶ 47–63 (alleging 90% appeal reversal rate for nH Predict-driven denials). See also: Casey Ross and Bob Herman, “UnitedHealth Faces Class Action Lawsuit over Algorithmic Care Denials in Medicare Advantage Plans,” STAT News, November 14, 2023.



















