Why AI Tools Used in the Revenue Cycle Require Human Oversight

Why AI Tools Used in the Revenue Cycle Require Human Oversight

I had planned a different topic for this week, sequencing sepsis, based on feedback from my last article, but an article about OpenEvidence rolling out a medical coding feature caught my eye. OpenEvidence is “an AI (artificial intelligence)-powered medical search engine” with active daily use of “more than 40 percent of physicians in the U.S.” Its robust adoption may be fueled by the fact that its core product, the “AI-powered medical search engine,” is free for verified clinicians.

Those using free AI tools at healthcare facilities need to be careful about exposing protected health information (PHI), which is regulated under the Health Insurance Portability and Accountability Act (HIPAA). More healthcare organizations are incorporating AI tools into their operations, but 71 percent of healthcare workers still use personal AI accounts for work purposes. This behavior can expose PHI, even if the PHI is only visible in a separate open tab and not used within the AI tool itself. A recent HIPAA Journal article stated that “if genAI (generative AI) tools are not HIPAA-compliant and the developers will not sign business associate agreements, using those tools with PHI violates HIPAA and puts organizations at risk of regulatory penalties.”

The OpenEvidence coding feature, called “Coding Intelligence,” is available in OpenEvidence Visits, a clinical documentation tool that automatically generates medical notes from patient conversations, e.g., an ambient scribe. According to a Fierce Health article, Coding Intelligence provides Current Procedural Terminology (CPT) code suggestions, including evaluation and management (E&M)-level recommendations, and can add supporting medical decision-making rationale directly into the provider’s notes, as well as corresponding ICD-10 diagnoses at the end of every doctor visit.

OpenEvidence positions their Coding Intelligence tool as a modern solution in an industry with “old-school” coding solutions, whereby most coding is still done by physicians or their staffs. They also contrast themselves with other AI-powered solutions that “take a simplistic view of the process and enable simple mappings.” Most clinical documentation integrity (CDI) and coding professionals have experience with these types of AI tools, which include natural language processing (NLP) capabilities that search health records for key terms and suggest corresponding ICD-10-CM and PCS codes. In fact, there are a plethora of articles that reinforce the growth of AI tools within the hospital revenue cycle as health systems embrace AI tools to “address both workforce constraints and financial pressures.” Another recent Fierce Healthcare article found a 36-percent adoption rate for AI coding solutions, with an expected 29-percent year-over-year (YOY) growth.  

Coding Intelligence “reasons over the entire transcript of the visit and the final clinical note generated to comprehensively understand exactly what was done, what diagnoses were reached and what treatments and labs were ordered. They use the same technology to find the “complete set of codes appropriate for the given visit.”

It is a little unclear to me if the “complete set of codes” includes the corresponding ICD-10-CM diagnosis codes, which this tool is purported to be able to identify, or if it only provides the appropriate CPT codes. Risk-adjustment coding is a newer coding discipline created in response to the growing importance of diagnosis codes reported in professional claims. Medicare Advantage (MA) plans can collect diagnoses from both Medicare Part A and Medicare Part B claims to contribute to the enrollee’s annual risk profile, as determined by the Centers for Medicare & Medicaid Services (CMS) Hierarchical Condition Categories (HCCs). Payments to the MA plan from Medicare are based on these risk profiles.

Even assuming accurate ICD‑10‑CM suggestions, the implications differ dramatically between professional and inpatient billing, because diagnoses function very differently across those payment systems. This may be why the healthcare industry is seeing success using AI tools in the outpatient setting. Coding requirements are more straightforward there, compared to the inpatient setting. Outpatient billing is not dependent upon the reported ICD-10-CM diagnosis. Diagnoses do not directly determine professional reimbursement, even though they increasingly affect risk-adjusted plan revenue.

In contrast, diagnoses are the basis of inpatient hospital reimbursement under MS-DRG reimbursement methodology. MS-DRG accuracy is dependent upon what diagnoses (and procedures) are assigned and how they are sequenced. Determining whether a condition is reportable, if there is conflicting data that must be clarified, what ICD-10-CM code best reflects the condition, and how to sequence all reportable diagnoses is a complicated task. It requires human oversight and human validation.

Often, humans do not agree on all of the components necessary to assign an MS-DRG to a patient’s stay. Consider the diagnosis of sepsis:

  • There is no single universally accepted definition, leading to clinical validation disputes;
  • Present-on-admission (POA) determination is required to sequence sepsis as the principal diagnosis;
  • POA confirmation may occur “after study,” when associated symptoms are present on admission; and
  • Sequencing is impacted by complication codes (e.g., CAUTI).

Accurately reporting sepsis requires critical thinking. It necessitates more than knowledge of coding rules and clinical definitions; it requires the ability to synthesize information and prioritize coding guidance (capabilities current GenAI tools do not possess). There is an art to what we do as CDI and coding professionals. It is not as simple as following algorithms.

Most AI tools require such large sets of training data, but it is not feasible to validate the accuracy of the data. Did the vendor use claims data that was submitted for payment, claims that were adjudicated, or a combination of both? The integrity of claims varies between billed claims and paid claims. If the training set is based upon paid claims, did it only include claims that were paid in full, without appeal? If the data set is flawed, the outcomes could be flawed as well.

I think this is an interesting finding from Fierce Healthcare: denial prediction tools have an adoption rate of 25 percent, but only a rate of 4-percent growth year over year (YOY). My guess is that the complexity of this type of work demonstrates the need for human intervention, which may consist of documentation expertise (CDI professionals), medical expertise (physician advisors), and coding expertise. Yet denial management is where there is the most growth opportunity for hospitals, as well as the most risk to earned revenue.

According to that same Fierce Healthcare article, one of the biggest growth areas is AI prepopulated technical appeals and clinical appeals adoption rates of 21 and 19 percent, respectively, but expected YOY growth of 50 and 27 percent, respectively. Autonomous AI tools are compelling within the healthcare industry. Many AI tools are purchased with the hope of reducing staffing costs, but I am not sure that has been realized at many (if any) organizations. Why? The science behind current AI capabilities, including those of generative AI.

That leads me to an illuminating article recently published in Rutgers Business Review titled “Unstable Intelligence: GenAI Struggles with Accuracy and Consistency.” The research is based on generative AI tools that involve deep-learning models. The research is prefaced with a discussion of the flaws associated with genAI, the newest iteration of AI tools. These flaws include “hallucinations, source confusion, lack of transparence, (and) potentially limited up-to-date knowledge.” This last point is significant in terms of accurate billing. It means that as new codes are introduced, the tool will not initially have sufficient data to accurately apply them, which can be problematic for healthcare businesses. The authors make the immensely powerful statement that while AI tools can significantly boost productivity, they also appear to make us “less critical, more confident, and arguably, less discerning.” We often fail to verify the information offered by AI, which is likely to contain misinformation. The authors of this research “recommend caution and due diligence before placing confidence in, or taking action based on, its outputs.”

The research discussed in this article was an examination of AI accuracy and consistency. The study extracted 719 hypothesis statements from 127 open-access research articles published in nine “premier” healthcare journals. They compared genAI outputs between 2024 and 2025, using the same sample to evaluate how much genAI improved over time, as newer versions became available. Their conclusion?

“The findings reveal a clear paradox: while generative AI’s linguistic data processing capabilities have advanced rapidly, its reasoning abilities have improved only marginally. The modest three-percentage-point gain in accuracy between 2024 and 2025 signals quantifiable progress but not conceptual growth. AI has become more articulate, but not necessarily more intelligent. Thus, its fluency should not be mistaken for understanding.”

Think of this like someone learning English as a second language. They may know a particular word or phrase, but not use it in the correct context. This is particularly true for slang terms or abbreviations, which are often used in healthcare. For example, “ARF” could be acute respiratory failure or acute renal failure. Hospitals are expected to use a reasonable approach to standardized terminology, definitions, abbreviations, acronyms, and symbols. They are no longer required to have a list of approved abbreviations, but “must be able to provide evidence of their approach to standardized entries.”

Hospital records are loosely standardized within any organization, let alone across several hundred hospitals. There are common components like history and physical (H&P), progress notes, consultations, etc., but these note types may vary by clinical specialty and not be limited to independent licensed practitioners. Pattern recognition is a key element for successful use of technology, and there is a lot of variation across health records, even when the same electronic medical record (EMR) is used. The researchers concluded that current AI tools (as of fall of 2025) “struggle with fundamental elements of research logic, especially          moderation and interaction effects, which demand an understanding of context and conditionality.”

Consequently, genAI reasoning “remains syntactic rather than semantic. It can recognize word patterns and reproduce logical forms but lacks a mental model of cause-and-effect relationships.” This is necessary, as documentation within the health record often fails to follow a strict timeline, and overuse of copying and pasting can create inconsistencies in documentation that continue to refer to planned services, like a CT or MRI or consultation – tasks that are completed, but continue to be referenced as “planned.” Disambiguation requires clinical context, timeline, and diagnostic evidence, not just word proximity.

It is also important to remember that AI tools are often trained on billed claims, as mentioned earlier, which may have already been reviewed by CDI and clarified by query. Or worse, the claims may lack clinical validation because they were not reviewed. For example, acute kidney injury may be coded without a 1.5 increase from baseline of creatinine, which is often the clinical criteria used to query for the diagnosis if it is missing. The perspective of reviewing a complete record for patterns related to billing is different from reviewing the record when the decision is made to query. Additionally, the query response may or may not be included in the training data set (if only documented on the query, it is not part of the medical record), which can also lead to discrepancies.

GenAI tools show great promise in many areas, but for now, it does not appear capable of performing the type of complex critical thinking required for accurate inpatient coding and billing. When implementing a new AI tool, be sure to ask about the training data set. Also, consider having a plan in place to address annual code updates, as there will be a lag before the AI tool learns to recognize new codes or coding guideline changes. Until AI systems can reason over time, uncertainty, and competing clinical narratives, inpatient coding and billing will remain a human discipline that is supported by technology – but not governed by it.

Facebook
Twitter
LinkedIn

Cheryl Ericson, RN, MS, CCDS, CDIP

Cheryl is the Senior Director of Clinical Policy and Education, Brundage Group. She is an experienced revenue cycle expert and is known internationally for her work as a CDI professional. Cheryl has helped establish industry guidance through contributions to ACDIS white papers and several AHIMA Practice Briefs in the areas of CDI, Denials, Quality, Querying and HIM Technology.

Related Stories

Leave a Reply

Please log in to your account to comment on this article.

Featured Webcasts

Mastering OB GYN Coding Accuracy: Precision Coding for Compliance and Reimbursement

Gain clarity and confidence in OB‑GYN coding with this expert‑led webcast featuring Stacey Shillito, CDIP, CPMA, CCS, CCS‑P, CPEDC, COPC. You’ll learn how to apply global maternity package rules accurately, select the right CPT codes for procedures and visits, and identify documentation gaps that lead to denials. With practical guidance and real examples, this session helps you strengthen compliance, reduce audit risk, and ensure accurate reimbursement for women’s health services.

May 14, 2026

2026 ICD-10-CM/PCS Coding Clinic Update Webcast Series

Uncover essential coding insights with nationally recognized coding authority Kay Piper, RHIA, CDIP, CCS. Through ICD10monitor’s interactive, on‑demand webcast series, Kay walks you through the AHA’s 2026 ICD‑10‑CM/PCS Quarterly Coding Clinics, translating each update into practical, easy‑to‑apply guidance designed to sharpen precision, ensure compliance, and strengthen day‑to‑day decision‑making. Available shortly after each official release.

April 13, 2026

2026 ICD-10-CM/PCS Coding Clinic Update: Fourth Quarter

Uncover critical guidance on the ICD-10-CM/PCS code updates. Kay Piper reviews and explains ICD-10-CM/PCS coding guidelines in the AHA’s fourth quarter 2026 ICD-10-CM/PCS Coding Clinic in an easy to access on-demand webcast.

December 14, 2026

2026 ICD-10-CM/PCS Coding Clinic Update: Third Quarter

Uncover critical guidance on the ICD-10-CM/PCS code updates. Kay Piper reviews and explains ICD-10-CM/PCS coding guidelines in the AHA’s third quarter 2026 ICD-10-CM/PCS Coding Clinic in an easy to access on-demand webcast.

October 12, 2026

Trending News

Featured Webcasts

Compliance for the Inpatient Psychiatric Facility (IPF-PPS): Minimizing Federal Audit Findings by Strengthening Best Practices

Federal auditors are intensifying their focus on inpatient psychiatric facilities, using advanced data analytics to spotlight outliers and pursue high‑dollar repayments. In this high‑impact webcast, Michael Calahan, PA, MBA, Compliance Officer and V.P., Hospital & Physician Compliance, breaks down what regulators are really targeting in IPF-PPS admissions, documentation, treatment and discharge planning. Attendees will learn practical steps to tighten processes, avoid common audit triggers and protect reimbursement and reduce the risk of multimillion-dollar repayment demands.

April 9, 2026

Mastering MDM for Accurate Professional Fee Coding

In this timely session, Stacey Shillito, CDIP, CPMA, CCS, CCS-P, CPEDC, COPC, breaks down the complexities of Medical Decision Making (MDM) documentation so providers can confidently capture the true complexity of their care. Attendees will learn practical, efficient strategies to ensure documentation aligns with current E/M guidelines, supports accurate coding, and reduces audit risk, all without adding to charting time.

March 31, 2026

The PEPPER Returns – Risk and Opportunity at Your Fingertips

Join Ronald Hirsch, MD, FACP, CHCQM for The PEPPER Returns – Risk and Opportunity at Your Fingertips, a practical webcast that demystifies the PEPPER and shows you how to turn complex claims data into actionable insights. Dr. Hirsch will explain how to interpret key measures, identify compliance risks, uncover missed revenue opportunities, and understand new updates in the PEPPER, all to help your organization stay ahead of audits and use this powerful data proactively.

March 19, 2026

Top 10 Audit Targets for 2026-2027 for Hospitals & Physicians: Protect Your Revenue

Stay ahead of the 2026-2027 audit surge with “Top 10 Audit Targets for 2026-2027 for Hospitals & Physicians: Protect Your Revenue,” a high-impact webcast led by Michael Calahan, PA, MBA. This concise session gives hospitals and physicians clear insight into the most likely federal audit targets, such as E/M services, split/shared and critical care, observation and admissions, device credits, and Two-Midnight Rule changes, and shows how to tighten documentation, coding, and internal processes to reduce denials, recoupments, and penalties. Attendees walk away with practical best practices to protect revenue, strengthen compliance, and better prepare their teams for inevitable audits.

January 29, 2026

Trending News

Prepare for the 2025 CMS IPPS Final Rule with ICD10monitor’s IPPSPalooza! Click HERE to learn more

Get 15% OFF on all educational webcasts at ICD10monitor with code JULYFOURTH24 until July 4, 2024—start learning today!

BLOOM INTO SAVINGS! Get 25% OFF during our spring sale through March 27. Use code SPRING26 at checkout to claim this offer.

CYBER WEEK IS HERE! Don’t miss your chance to get 20% off now until Dec. 1 with code CYBER25

CYBER WEEK IS HERE! Don’t miss your chance to get 20% off now until Dec. 2 with code CYBER24