The Myth About Pass/Fail Rates for Providers

Projecting results of an audit to a larger population of providers can be a serious step.

Many of the organizations with which I engage feel this pressing need to use the results of individual provider audits to infer an overall pass/fail rate, either to everything that provider does or to the organization.

I get the reason for this; leadership wants a general idea of what their compliance risk and exposure looks like, and they often want to know which providers are contributing most to that risk. The problem with inferring those audit results to a larger population has to deal with both sampling methods and sample sizes for those audits. And much of that is driven by the model used to construct the audit. For example, some organizations are still caught in the random probe audit cycle, something that should have died years ago. The problem with a probe audit is that, unless it is targeted to a specific code, it misses the overwhelming majority of risk events. Take an internal medicine provider. If you look at just the Medicare submission database, IM docs billed 140 unique procedures/services (based on the code), of which 92 make up the top 80 percent. So let’s take a look at how this might break down. Suppose an organization using a legacy model, such as a probe audit, might have a mandate to audit 10 encounters per provider, per year. If it’s not a focused audit (in essence, a general random probe audit), the maximum number of codes you would be able to review would be 10, or 7 percent of all the unique codes the provider reported. That means that you just ignored 93 percent of the compliance risk events. And by the way, the chance of getting 10 unique codes in a random pull is something like 1.2 x 1020, or 1.2, followed by 20 zeros. You probably have a better chance of getting eaten by a shark while being struck by lightning than getting 10 unique codes. In this case, you would still only end up with one encounter to audit, and who believes that it is OK to extrapolate from one encounter?

How about “nobody?”

Maybe it’s a focused audit, so you audit 10 encounters for that provider for a specific code, say, 99233. The results of the audit reveal that 3 of those 10 encounters failed the audit for one reason or another. What’s easy is to say that this provider’s error rate is 30 percent, and if you were talking about the number of encounters that failed, you would be correct, because we are not inferring the results (yet), but rather describing the results. Where we go wrong is when we infer that 30 percent of all of the provider’s 99233 codes are coded wrong. Why can’t we do that? Well, it is based on the idea of sample error, which occurs with every single sample, as defined by the standards of statistical practice. While three out of 10 is 30 percent, from a descriptive perspective, three out of 10 is actually somewhere between 6.7 percent and 65.2 percent when inferring the results to a larger population. Many would know this process as extrapolation, and it is an inherent problem with extrapolations are conducted.

So, how many encounters would I have to audit in order to be able to extrapolate the results to the population of codes, or claims, or beneficiaries, or whatever the unit being audited? That depends on your tolerance for error. I am a fan of the 95 percent confidence interval. This is a way of measuring the relative error in our point estimate (30 percent is the point estimate, and the range is the margin of error), and it is a commonly accepted metric. Here’s what it means: if I were to pull 100 samples of 10 of the 99233 codes, in 95 of those 100 (or 95 percent), the actual error rate would be somewhere between 6.7 and 65.2 percent. That’s factual, but is it useful? That’s for you to decide, but in my world, that large of a range (or sampling error) is pretty much useless. So, back to the question of how many encounters are needed. Well, let’s say we pulled 100 encounters, and 30 of those were in error. In describing the error rate for that sample, we can still say that the point estimate is 30 percent, and we would be right on the mark. Inferentially, however, the 95 percent confidence interval is now 21.2 percent to 39.9 percent, and if you are OK with a plus-or-minus 10 percent, then you have your number. If not, then you will have to do more. How many more depends on a few assumptions, and you can use a sample size calculator (many can be found online) to calculate the sample size you need for your purposes.

Let’s say that you audited 10 of the 99233 codes for 100 providers. Now you have a sample size of 1,000. Let’s say again that 300 of them were found to have been coded in error. Now your range is from 27.1 percent to 32.9 percent, and I wouldn’t have any problem saying that there is a general error rate among all providers for the 99233 code of 30 percent, but I would not be comfortable predicting that for any individual provider, since their sample size is simply too small. Heck, you could also then analyze the averages of the results from those 100 samples, and having satisfied the Central Limit Theorem, the foundational axiom behind inferential statistics, you could be quite accurate in your projection. And you would likely have a normal distribution, which can be used for a lot more fun calculations. But more on that in another article.

I know that this can get a bit tedious, but projecting the results of an audit to a larger population of codes or providers can be a pretty serious step. For one, you may infer that a given provider has a bigger problem than it really does. Or you may push an organization into a more expensive (yet statistically valid) review of some codes or providers when it wasn’t necessary in the first place.

I am all for using statistics to estimate error rates, but I am not for extrapolating those error rates when doing so is not justified. The risks almost always outweigh the benefits. One of my favorite quotes is from George Box, a famous statistician, who said that “all models are wrong, but some are useful.”

And that’s the world according to Frank.

Program Note:
Listen to Frank Cohen report this story live during Monitor Monday, May 20, 10-10:30 a.m. ET.

Facebook
Twitter
LinkedIn

Frank Cohen, MPA

Frank Cohen is Senior Director of Analytics and Business Intelligence for VMG Health, LLC. He is a computational statistician with a focus on building risk-based audit models using predictive analytics and machine learning algorithms. He has participated in numerous studies and authored several books, including his latest, titled; “Don’t Do Something, Just Stand There: A Primer for Evidence-based Practice”

Related Stories

Leave a Reply

Please log in to your account to comment on this article.

Featured Webcasts

2026 IPPS Masterclass 3: Master MS-DRG Shifts and NTAPs

2026 IPPS Masterclass Day 3: MS-DRG Shifts and NTAPs

This third session in our 2026 IPPS Masterclass will feature a review of FY26 changes to the MS-DRG methodology and new technology add-on payments (NTAPs), presented by nationally recognized ICD-10 coding expert Christine Geiger, MA, RHIA, CCS, CRC, with bonus insights and analysis from Dr. James Kennedy.

August 14, 2025
2026 IPPS Masterclass Day 2: Master ICD-10-PCS Changes

2026 IPPS Masterclass Day 2: Master ICD-10-PCS Changes

This second session in our 2026 IPPS Masterclass will feature a review the FY26 changes to ICD-10-PCS codes. This information will be presented by nationally recognized ICD-10 coding expert Christine Geiger, MA, RHIA, CCS, CRC, with bonus insights and analysis from Dr. James Kennedy.

August 13, 2025
2026 IPPS Masterclass 1: Master ICD-10-CM Changes

2026 IPPS Masterclass Day 1: Master ICD-10-CM Changes

This first session in our 2026 IPPS Masterclass will feature an in-depth explanation of FY26 changes to ICD-10-CM codes and guidelines, CCs/MCCs, and revisions to the MCE, presented by presented by nationally recognized ICD-10 coding expert Christine Geiger, MA, RHIA, CCS, CRC, with bonus insights and analysis from Dr. James Kennedy.

August 12, 2025

Trending News

Featured Webcasts

The Two-Midnight Rule: New Challenges, Proven Strategies

The Two-Midnight Rule: New Challenges, Proven Strategies

RACmonitor is proud to welcome back Dr. Ronald Hirsch, one of his most requested webcasts. In this highly anticipated session, Dr. Hirsch will break down the complex Two Midnight Rule Medicare regulations, translating them into clear, actionable guidance. He’ll walk you through the basics of the rule, offer expert interpretation, and apply the rule to real-world clinical scenarios—so you leave with greater clarity, confidence, and the tools to ensure compliance.

June 19, 2025
Open Door Forum Webcast Series

Open Door Forum Webcast Series

Bring your questions and join the conversation during this open forum series, live every Wednesday at 10 a.m. EST from June 11–July 30. Hosted by Chuck Buck, these fast-paced 30-minute sessions connect you directly with top healthcare experts tackling today’s most urgent compliance and policy issues.

June 11, 2025
Open Door Forum: The Changing Face of Addiction: Coding, Compliance & Care

Open Door Forum: The Changing Face of Addiction: Coding, Compliance & Care

Substance abuse is everywhere. It’s a complicated diagnosis with wide-ranging implications well beyond acute care. The face of addiction continues to change so it’s important to remember not just the addict but the spectrum of extended victims and the other social determinants and legal ramifications. Join John K. Hall, MD, JD, MBA, FCLM, FRCPC, for a critical Q&A on navigating substance abuse in 2025.  Register today and be a part of the conversation!

July 16, 2025

Trending News

Happy National Doctor’s Day! Learn how to get a complimentary webcast on ‘Decoding Social Admissions’ as a token of our heartfelt appreciation! Click here to learn more →

CYBER WEEK IS HERE! Don’t miss your chance to get 20% off now until Dec. 2 with code CYBER24