A Study of Contractor Consistency in Reviewing Extrapolated Overpayments

CMS levies billions of dollars in overpayments a year against healthcare providers, based on the use of extrapolation audits.

The use of extrapolation in Medicare and private payer audits has been around for quite some time now. And lest you be of the opinion that extrapolation is not appropriate for claims-based audits, there are many, many court cases that have supported its use, both specifically and in general. Arguing that extrapolation should not have been used in a given audit, unless that argument is supported by specific statistical challenges, is mostly a waste of time. 

For background purposes, extrapolation, as it is used in statistics, is a “statistical technique aimed at inferring the unknown from the known. It attempts to predict future data by relying on historical data, such as estimating the size of a population a few years in the future on the basis of the current population size and its rate of growth,” according to a definition created by Eurostat, a component of the European Union. For our purposes, extrapolation is used to estimate what the actual overpayment amount might likely be for a population of claims, based on auditing a smaller sample of that population. For example, say a Uniform Program Integrity Contractor (UPIC) pulls 30 claims from a medical practice from a population of 10,000 claims. The audit finds that 10 of those claims had some type of coding error, resulting in an overpayment of $500. To extrapolate this to the entire population of claims, one might take the average overpayment, which is the $500 divided by the 30 claims ($16.67 per claim) and multiply this by the total number of claims in the population. In this case, we would multiply the $16.67 per claim by 10,000 for an extrapolated overpayment estimate of $166,667. 

The big question that normally crops up around extrapolation is this: how accurate are the estimates? And the answer is (wait for it …), it depends. It depends on just how well the sample was created, meaning: was the sample size appropriate, were the units pulled properly from the population, was the sample truly random, and was it representative of the population? The last point is particularly important, because if the sample is not representative of the population (in other words, if the sample data does not look like the population data), then it is likely that the extrapolated estimate will be anything but accurate.

To account for this issue, referred to as “sample error,” statisticians will calculate something called a confidence interval (CI), which is a range within which there is some acceptable amount of error. The higher the confidence value, the larger the potential range of error. For example, in the hypothetical audit outlined above, maybe the real average for a 90-percent confidence interval is somewhere between $15 and $18, while, for a 95-percent confidence interval, the true average is somewhere between $14 and $19. And if we were to calculate for a 99-percent confidence interval, the range might be somewhere between $12 and $21. So, the greater the range, the more confident I feel about my average estimate. Some express the confidence interval as a sense of true confidence, like “I am 90 percent confident the real average is somewhere between $15 and $18,” and while this is not necessarily wrong, per se, it does not communicate the real value of the CI. I have found that the best way to define it would be more like “if I were to pull 100 random samples of 30 claims and audit all of them, 90 percent would have a true average of somewhere between $15 and $18,” meaning that the true average for some 1 out of 10 would fall outside of that range – either below the lower boundary or above the upper boundary. The main reason that auditors use this technique is to avoid challenges based on sample error.

To the crux of the issue, the Centers for Medicare & Medicaid Services (CMS) levies billions of dollars in overpayments a year against healthcare providers, based on the use of extrapolation audits. And while the use of extrapolation is well-established and well-accepted, its use in an audit is not an automatic, and depends upon the creation of a statistically valid and representative sample. Thousands of extrapolation audits are completed each year, and for many of these, the targeted provider or organization will appeal the use of extrapolation. In most cases, the appeal is focused on one or more flaws in the methodology used to create the sample and calculate the extrapolated overpayment estimate. For government audits, such as with UPICs, there is a specific appeal process, as outlined in their Medical Learning Network booklet, titled “Medicare Parts A & B Appeals Process.”

On Aug.0 20, 2020, the U.S. Department of Health and Human Services Office of Inspector General (HHS OIG) released a report titled “Medicare Contractors Were Not Consistent in How They Reviewed Extrapolated Overpayments in the Provider Appeals Process.” This report opens with the following statement: “although MACs (Medicare Administrative Contractors) and QICs (Qualified Independent Contractors) generally reviewed appealed extrapolated overpayments in a manner that conforms with existing CMS requirements, CMS did not always provide sufficient guidance and oversight to ensure that these reviews were performed in a consistent manner.” These inconsistencies were associated with $42 million in extrapolated payments from fiscal years 2017 and 2018 that were overturned in favor of the provider. It’s important to note that at this point, we are only talking about appeal determinations at the first and second level, known as redetermination and reconsideration, respectively.

Redetermination is the first level of appeal, and is adjudicated by the MAC. And while the staff that review the appeals at this level are supposed to have not been involved in the initial claim determination, I believe that most would agree that this step is mostly a rubber stamp of approval for the extrapolation results. In fact, of the hundreds of post-audit extrapolation mitigation cases in which I have been the statistical expert, not a single one was ever overturned at redetermination.

The second level of appeal, reconsideration, is handled by a QIC. In theory, the QIC is supposed to independently review the administrative records, including the appeal results of redetermination. Continuing with the prior paragraph, I have to date had only several extrapolation appeals reversed at reconsideration; however, all were due to the fact that the auditor failed to provide the practice with the requisite data, and not due to any specific issues with the statistical methodology. In two of those cases, the QIC notified the auditor that if they were to get the required information to them, they would reconsider their decision. And in two other cases, the auditor appealed the decision, and it was reversed again. Only the fifth case held without objection and was adjudicated in favor of the provider.

Maybe this is a good place to note that the entire process for conducting extrapolations in government audits is covered under Chapter 8 of the Medicare Program Integrity Manual (PIM). Altogether, there are only 12 pages within the entire Manual that actually deal with the statistical methodology behind sampling and extrapolation; this is certainly not enough to provide the degree of guidance required to ensure consistency among the different government contractors that perform such audits. And this is what the OIG report is talking about.

Back to the $42 million that was overturned at either redetermination or reconsideration: the OIG report found that this was due to a “type of simulation testing that was performed only by a subset of contractors.” The report goes on to say that “CMS did not intend that the contractors use this procedure, (so) these extrapolations should not have been overturned. Conversely, if CMS intended that contractors use this procedure, it is possible that other extrapolations should have been overturned but were not.” This was quite confusing for me at first, because this “simulation” testing was not well-defined, and also because it seemed to say that if this procedure was appropriate to use, then more contractors should have used it, which would have resulted in more reversals in favor of the provider.   

Interestingly, CMS seems to have written itself an out in Chapter 8, section 8.4.1.1 of the PIM, which states that “[f]ailure by a contractor to follow one or more of the requirements contained herein does not necessarily affect the validity of the statistical sampling that was conducted or the projection of the overpayment.” The use of the term “does not necessarily” leaves wide open the fact that the failure by a contractor to follow one or more of the requirements may affect the validity of the statistical sample, which will affect the validity of the extrapolated overpayment estimate. 

Regarding the simulation testing, the report stated that “one MAC performed this type of simulation testing for all extrapolation reviews, and two MACs recently changed their policies to include simulation testing for sample designs that are not well-supported by the program integrity contractor. In contrast, both QICs and three MACs did not perform simulation testing and had no plans to start using it in the future.” And even though it was referenced some 20 times, with the exception of an example given as Figure 2 on page 10, the report never did describe in any detail the type of simulation testing that went on. From the example, it was evident to me that the MACs and QICs involved were using what is known as a Monte Carlo simulation. In statistics, simulation is used to assess the performance of a method, typically when there is a lack of theoretical background. With simulations, the statistician knows and controls the truth. Simulation is used advantageously in a number of situations, including providing the empirical estimation of sampling distributions. Footnote 10 in the report stated that ”reviewers used the specific simulation test referenced here to provide information about whether the lower limit for a given sampling design was likely to achieve the target confidence level.” If you are really interested in learning more about it, there is a great paper called
“The design of simulation studies in medical statistics” by Burton et al. (2006). 

Its application in these types of audits is to “simulate” the audit many thousands of times to see if the mean audit results fall within the expected confidence interval range, thereby validating the audit results within what is known as the Central Limit Theorem (CLT).

Often, the sample sizes used in recoupment-type audits are too small, and this is usually due to a conflict between the sample size calculations and the distributions of the data. For example, in RAT-STATS, the statistical program maintained by the OIG, and a favorite of government auditors, sample size estimates are based on an assumption that the data are normally (or near normally) distributed. A normal distribution is defined by the mean and the standard deviation, and includes a bunch of characteristics that make sample size calculations relatively straightforward. But the truth is, because most auditors use the paid amount as the variable of interest, population data are rarely, if ever, normally distributed. Unfortunately, there is simply not enough room or time to get into the details of distributions, but suffice it to say that, because paid data are bounded on the left with zero (meaning that payments are never less than zero), paid data sets are almost always right-skewed. This means that the distribution tail continues on to the right for a very long distance.  

In these types of skewed situations, sample size normally has to be much larger in order to meet the CLT requirements. So, what one can do is simulate the random sample over and over again to see whether the sampling results ever end up reporting a normal distribution – and if not, it means that the results of that sample should not be used for extrapolation. And this seems to be what the OIG was talking about in this report. Basically, they said that some but not all of the appeals entities (MACs and QICs) did this type of simulation testing, and others did not. But for those that did perform the tests, the report stated that $41.5 million of the $42 million involved in the reversals of the extrapolations were due to the use of this simulation testing. The OIG seems to be saying this: if this was an unintended consequence, meaning that there wasn’t any guidance in place authorizing this type of testing, then it should not have been done, and those extrapolations should not have been overturned. But if it should have been done, meaning that there should have been some written guidance to authorize that type of testing, then it means that there are likely many other extrapolations that should have been reversed in favor of the provider. A sticky wicket, at best.

Under the heading “Opportunity To Improve Contractor Understanding of Policy Updates,” the report also stated that ”the MACs and QICs have interpreted these requirements differently. The MAC that previously used simulation testing to identify the coverage of the lower limit stated that it planned to continue to use that approach. Two MACs that previously did not perform simulation testing indicated that they would start using such testing if they had concerns about a program integrity contractor’s sample design. Two other MACs, which did not use simulation testing, did not plan to change their review procedures.” One QIC indicated that it would defer to the administrative QIC (AdQIC, the central manager for all Medicare fee-for-service claim case files appealed to the QIC) regarding any changes. But it ended this paragraph by stating that “AdQIC did not plan to change the QIC Manual in response to the updated PIM.”

With respect to this issue and this issue alone, the OIG submitted two specific recommendations, as follows:

  • Provide additional guidance to MACs and QICs to ensure reasonable consistency in procedures used to review extrapolated overpayments during the first two levels of the Medicare Parts A and B appeals process; and
  • Take steps to identify and resolve discrepancies in the procedures that MACs and QICs use to review extrapolations during the appeals process.

In the end, I am not encouraged that we will see any degree of consistency between and within the QIC and MAC appeals in the near future.

Basically, it would appear that the OIG, while having some oversight in the area of recommendations, doesn’t really have any teeth when it comes to enforcing change. I expect that while some reviewers may respond appropriately to the use of simulation testing, most will not, if it means a reversal of the extrapolated findings. In these cases, it is incumbent upon the provider to ensure that these issues are brought up during the Administrative Law Judge (ALJ) appeal.

Programming Note: Listen to Frank Cohen report this story live during the next edition of Monitor Mondays, 10 a.m. Eastern.

Facebook
Twitter
LinkedIn

Frank Cohen, MPA

Frank D. Cohen is Senior Director of Analytics and Business Intelligence at VMG Health, LLC, and is Chief Statistician for Advanced Healthcare Analytics. He has served as a testifying expert witness in more than 300 healthcare compliance litigation matters spanning nearly five decades in computational statistics and predictive analytics.

Related Stories

Leave a Reply

Please log in to your account to comment on this article.

Featured Webcasts

Sepsis Sequencing in Focus: From Documentation to Defensible Coding

Sepsis sequencing continues to challenge even experienced coding and CDI professionals, with evolving guidelines, documentation gaps, and payer scrutiny driving denials and data inconsistencies. In this webcast, Payal Sinha, MBA, RHIA, CCDS, CDIP, CCS, CCS-P, CCDS-O, CRC, CRCR, provides clear guideline-based strategies to accurately code sepsis, severe sepsis, and septic shock, assign POA indicators, clarify the relationship between infection and organ dysfunction, and align documentation across teams. Attendees will gain practical tools to strengthen audit defensibility, improve first-pass accuracy, support appeal success, reduce denials, and ensure accurate quality reporting, empowering organizations to achieve consistent, compliant sepsis coding outcomes.

March 26, 2026
I022426_SQUARE

Fracture Care Coding: Reduce Denials Through Accurate Coding, Sequencing, and Modifier Use

Expert presenters Kathy Pride, RHIT, CPC, CCS-P, CPMA, and Brandi Russell, RHIA, CCS, COC, CPMA, break down complex fracture care coding rules, walk through correct modifier application (-25, -57, 54, 55), and clarify sequencing for initial and subsequent encounters. Attendees will gain the practical knowledge needed to submit clean claims, ensure compliance, and stay one step ahead of payer audits in 2026.

February 24, 2026
Mastering Principal Diagnosis: Coding Precision, Medical Necessity, and Quality Impact

Mastering Principal Diagnosis: Coding Precision, Medical Necessity, and Quality Impact

Accurately determining the principal diagnosis is critical for compliant billing, appropriate reimbursement, and valid quality reporting — yet it remains one of the most subjective and error-prone areas in inpatient coding. In this expert-led session, Cheryl Ericson, RN, MS, CCDS, CDIP, demystifies the complexities of principal diagnosis assignment, bridging the gap between coding rules and clinical reality. Learn how to strengthen your organization’s coding accuracy, reduce denials, and ensure your documentation supports true medical necessity.

December 3, 2025

Proactive Denial Management: Data-Driven Strategies to Prevent Revenue Loss

Denials continue to delay reimbursement, increase administrative burden, and threaten financial stability across healthcare organizations. This essential webcast tackles the root causes—rising payer scrutiny, fragmented workflows, inconsistent documentation, and underused analytics—and offers proven, data-driven strategies to prevent and overturn denials. Attendees will gain practical tools to strengthen documentation and coding accuracy, engage clinicians effectively, and leverage predictive analytics and AI to identify risks before they impact revenue. Through real-world case examples and actionable guidance, this session empowers coding, CDI, and revenue cycle professionals to shift from reactive appeals to proactive denial prevention and revenue protection.

November 25, 2025

Trending News

Featured Webcasts

Mastering MDM for Accurate Professional Fee Coding

In this timely session, Stacey Shillito, CDIP, CPMA, CCS, CCS-P, CPEDC, COPC, breaks down the complexities of Medical Decision Making (MDM) documentation so providers can confidently capture the true complexity of their care. Attendees will learn practical, efficient strategies to ensure documentation aligns with current E/M guidelines, supports accurate coding, and reduces audit risk, all without adding to charting time.

March 31, 2026

The PEPPER Returns – Risk and Opportunity at Your Fingertips

Join Ronald Hirsch, MD, FACP, CHCQM for The PEPPER Returns – Risk and Opportunity at Your Fingertips, a practical webcast that demystifies the PEPPER and shows you how to turn complex claims data into actionable insights. Dr. Hirsch will explain how to interpret key measures, identify compliance risks, uncover missed revenue opportunities, and understand new updates in the PEPPER, all to help your organization stay ahead of audits and use this powerful data proactively.

March 19, 2026

Top 10 Audit Targets for 2026-2027 for Hospitals & Physicians: Protect Your Revenue

Stay ahead of the 2026-2027 audit surge with “Top 10 Audit Targets for 2026-2027 for Hospitals & Physicians: Protect Your Revenue,” a high-impact webcast led by Michael Calahan, PA, MBA. This concise session gives hospitals and physicians clear insight into the most likely federal audit targets, such as E/M services, split/shared and critical care, observation and admissions, device credits, and Two-Midnight Rule changes, and shows how to tighten documentation, coding, and internal processes to reduce denials, recoupments, and penalties. Attendees walk away with practical best practices to protect revenue, strengthen compliance, and better prepare their teams for inevitable audits.

January 29, 2026

AI in Claims Auditing: Turning Compliance Risks into Defensible Systems

As AI reshapes healthcare compliance, the risk of biased outputs and opaque decision-making grows. This webcast, led by Frank Cohen, delivers a practical Four-Pillar Governance Framework—Transparency, Accountability, Fairness, and Explainability—to help you govern AI-driven claim auditing with confidence. Learn how to identify and mitigate bias, implement robust human oversight, and document defensible AI review processes that regulators and auditors will accept. Discover concrete remedies, from rotation protocols to uncertainty scoring, and actionable steps to evaluate vendors before contracts are signed. In a regulatory landscape that moves faster than ever, gain the tools to stay compliant, defend your processes, and reduce liability while maintaining operational effectiveness.

January 13, 2026

Trending News

Happy National Doctor’s Day! Learn how to get a complimentary webcast on ‘Decoding Social Admissions’ as a token of our heartfelt appreciation! Click here to learn more →

CYBER WEEK IS HERE! Don’t miss your chance to get 20% off now until Dec. 1 with code CYBER25

CYBER WEEK IS HERE! Don’t miss your chance to get 20% off now until Dec. 2 with code CYBER24