Now That You Have “Big Data,” What Do You Do With it?

There is a widespread belief in many circles that if you have enough data, you should be able to derive some meaningful knowledge from it. The healthcare industry has struggled for years trying to get meaningful cross-enterprise information that provides consistent, shareable, and actionable knowledge to guide improvement. This has always been a difficulty, and there is no amount of “big data” or level of technology that will address this challenge adequately without first addressing two basic matters: the quality of the data and the aggregation of that data once it has been captured. 

Data quality and aggregation have been two areas where there has historically been surprisingly little focus in healthcare. If these areas are not addressed, then the cliché of “garbage in = garbage out” will not only be realized, it will be magnified as we acquire more data. Two prior papers published in HIMSS newsletter touched on some of requirements and considerations related to assessing and improving data quality. This article will focus primarily on the area of data aggregation, making the (rather presumptive) assumption that data quality has been addressed.

Data Aggregation

Data aggregation refers to the process of grouping data based on some categorical parameters for the purpose of analysis. Aggregation can be performed for a variety of purposes, and each may result in different rules that drive the selection of the data to be grouped. Aggregation should always be based on a consensus that results in:

  • The definition of the category. This definition must define:
    • Discrete concepts to be included in the category
    • Discrete concepts to be excluded from the category
  • The data values or codes that meet the included concepts in order to provide the basis or rules for aggregating the data.

A major challenge with current categorical schemes is that there is rarely a full definition of what the category represents or what it is intended to include or exclude. There is rarely a clearly defined and documented set of data values or codes that would constitute the basis for aggregating the data. When most users see analytics about a healthcare condition such as diabetes, there is an assumption that the category of diabetes is accurately reported, but how the data is categorized is rarely questioned. Are all types of diabetes, including secondary diabetes and gestational diabetes, included in the results? Since most information is reported at some level of categorization, without clear definitions of how data is aggregated, it will be impossible to understand the meaning of the results, let alone compare categories across different entities.

For example, it would be impossible to compare the volume or cost of services related to “radiologic services” if one entity includes ultrasounds, MRIs, and nuclear scans in its aggregation while another entity excludes these services and only includes X-ray-based imaging services.

Aggregation can occur using any type of parameter: provider, patient demographic, payer, service type, or virtually any other healthcare data metric. Historically, aggregation of claims data focused on services rather than diseases or health conditions. As we move to an era of value-based purchasing, however, concepts such as population health, provider risk, outcomes measures, and episodes of care are becoming a bigger factor. To address this focus, healthcare information needs to center not only on what was done, but why it was done and the patient’s condition before and after care delivery (outcomes).  This means that disease-based aggregation now is becoming much more of a critical component of health policy and healthcare payment. Disease aggregation is especially challenging because it is clinical, at its core.  Without clinical knowledge, it is impossible to accurately and consistently define aggregation categories or determine which coded values (diagnostic codes) are needed to consistently include and/or exclude data that meets the categorical definition. The following examples illustrate the definition of several different categories of patient conditions:

Burns

  • Proposed definition:

Burns refer to the clinical disorder of tissue damage caused by some form of thermal energy.

  • Includes:
    • Heat-related burns of all degrees
    • Electrical burns
  • Excludes:
    • Friction burns
    • Chemical burns
    • Cold-related burns or tissue damage
    • Sunburns
    • Radiation burns

The above definition is simply an example of how the category “burns” could be defined. Other entities might wish to include some of the excluded concepts or exclude some of the included concepts in their definition. It is critical that any comparisons or analyses of burns, in terms of costs, outcomes, or any other analytic focus, are done with a clear understanding of what is or is not included in this category, based on the category definition. The coded data and the resulting analysis may be markedly different, depending on the intent of this definition.

Down Syndrome

  • Proposed definition:

Down syndrome is a condition associated with a chromosomal abnormality that results in an extra chromosome 21 (trisomy).

  • Includes:
    • All conditions described as Down syndrome
    • Trisomy 21
  • Excludes:
    • All other trisomys or other genetic conditions

Figure 1 illustrates that a search for the term “Down syndrome” alone would leave out codes for “trisomy 21,” which by definition is Down syndrome.

Dr. Nichols 080117

Fig. 1

The person identifying the codes for this code set would need to know that trisomy 21 is Down syndrome, or results from analysis could leave out a substantial amount of data that should be included, based on the intent of the category. Payment policies might inappropriately pay or deny claims, depending on the intent of the policy, if codes are excluded or included inappropriately.

Renal Failure

  • Proposed definition:

Condition of the kidney function that meets the clinical criteria for “kidney failure.”

  • Includes:
    • All conditions described as “kidney failure” or “renal failure”
    • All condition described as “chronic kidney disease stage 4 or stage 5”
    • All conditions described as “end-stage renal disease (ESRD)”
  • Excludes:
    • All other descriptions of renal dysfunction not included above

Figure 2 illustrates some of the terminology used in the code set that would represent “renal failure:”

Dr. Nichols 080117 2
Fig. 2

As shown, sometimes the term “kidney failure” is used instead of “renal failure” Chronic kidney disease at stage 4 or stage 5 is generally considered renal failure, where stages 1, 2 and 3 are not included as renal failure. End-stage renal disease is also renal failure. All 20 codes associated with these types of descriptions would need to be included in order to accurately report on “renal failure.”

Drug-Induced Condition

  • Proposed definition:

Those conditions that are the result of use of some pharmacologic agent.

  • Includes:
    • Conditions described as “drug-induced”
    • Conditions described as “caused by” drugs
    • Conditions described as “secondary to” drug use
    • Conditions that are described as a “response” to drugs
    • Conditions that are described as a “poisoning” or “toxicity” to drugs
    • Conditions that are described as “withdrawal” or other manifestation of drug use
  • Excludes:
    • Underdosing of drugs
    • Intentional self-harm by drugs

Identifying the codes that should be categorized to the concept of “drug-induced” is especially complex. Based on this definition, there are 3,104 codes that could be included.

Dr. Nichols 080117 3
Fig. 4

As shown in figure 4, if the intent was to identify all data related to drug-induced conditions; these codes would be needed as part of the data set. There is no easy way to identify all codes without recognizing how these codes are described in the data.

Getting to Comparable and Reproducible Disease-Based Aggregation

Consistent and reliable reporting of disease-based categories of data requires that categories are clearly defined and medical concepts that should be included or excluded from the category are part of the definition. Identification of the coded values that appropriately represent the intent of the category requires that codes that include or exclude the defined concepts for this defined category are properly enumerated so that they can be referenced in the grouping or categorization analytic logic. 

There is a considerable amount of research required to identify all of the codes that represent each intended medical concept to be included or excluded for any category. If, however, all of the codes that represent some granular medical concept are mapped by a defined ontology, then it facilitates the ability to rapidly and consistently identify required code values that meet any combination of medical concepts to be included or excluded, based on the intent of the category definition. A separate Health Data Consulting white paper speaks to the use of ontologies to facilitate data aggregation in greater detail.

Facebook
Twitter
LinkedIn

Related Stories

Leave a Reply

Please log in to your account to comment on this article.

Featured Webcasts

Sepsis Sequencing in Focus: From Documentation to Defensible Coding

Sepsis sequencing continues to challenge even experienced coding and CDI professionals, with evolving guidelines, documentation gaps, and payer scrutiny driving denials and data inconsistencies. In this webcast, Payal Sinha, MBA, RHIA, CCDS, CDIP, CCS, CCS-P, CCDS-O, CRC, CRCR, provides clear guideline-based strategies to accurately code sepsis, severe sepsis, and septic shock, assign POA indicators, clarify the relationship between infection and organ dysfunction, and align documentation across teams. Attendees will gain practical tools to strengthen audit defensibility, improve first-pass accuracy, support appeal success, reduce denials, and ensure accurate quality reporting, empowering organizations to achieve consistent, compliant sepsis coding outcomes.

March 26, 2026
I022426_SQUARE

Fracture Care Coding: Reduce Denials Through Accurate Coding, Sequencing, and Modifier Use

Expert presenters Kathy Pride, RHIT, CPC, CCS-P, CPMA, and Brandi Russell, RHIA, CCS, COC, CPMA, break down complex fracture care coding rules, walk through correct modifier application (-25, -57, 54, 55), and clarify sequencing for initial and subsequent encounters. Attendees will gain the practical knowledge needed to submit clean claims, ensure compliance, and stay one step ahead of payer audits in 2026.

February 24, 2026
Mastering Principal Diagnosis: Coding Precision, Medical Necessity, and Quality Impact

Mastering Principal Diagnosis: Coding Precision, Medical Necessity, and Quality Impact

Accurately determining the principal diagnosis is critical for compliant billing, appropriate reimbursement, and valid quality reporting — yet it remains one of the most subjective and error-prone areas in inpatient coding. In this expert-led session, Cheryl Ericson, RN, MS, CCDS, CDIP, demystifies the complexities of principal diagnosis assignment, bridging the gap between coding rules and clinical reality. Learn how to strengthen your organization’s coding accuracy, reduce denials, and ensure your documentation supports true medical necessity.

December 3, 2025

Proactive Denial Management: Data-Driven Strategies to Prevent Revenue Loss

Denials continue to delay reimbursement, increase administrative burden, and threaten financial stability across healthcare organizations. This essential webcast tackles the root causes—rising payer scrutiny, fragmented workflows, inconsistent documentation, and underused analytics—and offers proven, data-driven strategies to prevent and overturn denials. Attendees will gain practical tools to strengthen documentation and coding accuracy, engage clinicians effectively, and leverage predictive analytics and AI to identify risks before they impact revenue. Through real-world case examples and actionable guidance, this session empowers coding, CDI, and revenue cycle professionals to shift from reactive appeals to proactive denial prevention and revenue protection.

November 25, 2025

Trending News

Featured Webcasts

Mastering MDM for Accurate Professional Fee Coding

In this timely session, Stacey Shillito, CDIP, CPMA, CCS, CCS-P, CPEDC, COPC, breaks down the complexities of Medical Decision Making (MDM) documentation so providers can confidently capture the true complexity of their care. Attendees will learn practical, efficient strategies to ensure documentation aligns with current E/M guidelines, supports accurate coding, and reduces audit risk, all without adding to charting time.

March 31, 2026

The PEPPER Returns – Risk and Opportunity at Your Fingertips

Join Ronald Hirsch, MD, FACP, CHCQM for The PEPPER Returns – Risk and Opportunity at Your Fingertips, a practical webcast that demystifies the PEPPER and shows you how to turn complex claims data into actionable insights. Dr. Hirsch will explain how to interpret key measures, identify compliance risks, uncover missed revenue opportunities, and understand new updates in the PEPPER, all to help your organization stay ahead of audits and use this powerful data proactively.

March 19, 2026

Top 10 Audit Targets for 2026-2027 for Hospitals & Physicians: Protect Your Revenue

Stay ahead of the 2026-2027 audit surge with “Top 10 Audit Targets for 2026-2027 for Hospitals & Physicians: Protect Your Revenue,” a high-impact webcast led by Michael Calahan, PA, MBA. This concise session gives hospitals and physicians clear insight into the most likely federal audit targets, such as E/M services, split/shared and critical care, observation and admissions, device credits, and Two-Midnight Rule changes, and shows how to tighten documentation, coding, and internal processes to reduce denials, recoupments, and penalties. Attendees walk away with practical best practices to protect revenue, strengthen compliance, and better prepare their teams for inevitable audits.

January 29, 2026

AI in Claims Auditing: Turning Compliance Risks into Defensible Systems

As AI reshapes healthcare compliance, the risk of biased outputs and opaque decision-making grows. This webcast, led by Frank Cohen, delivers a practical Four-Pillar Governance Framework—Transparency, Accountability, Fairness, and Explainability—to help you govern AI-driven claim auditing with confidence. Learn how to identify and mitigate bias, implement robust human oversight, and document defensible AI review processes that regulators and auditors will accept. Discover concrete remedies, from rotation protocols to uncertainty scoring, and actionable steps to evaluate vendors before contracts are signed. In a regulatory landscape that moves faster than ever, gain the tools to stay compliant, defend your processes, and reduce liability while maintaining operational effectiveness.

January 13, 2026

Trending News

Prepare for the 2025 CMS IPPS Final Rule with ICD10monitor’s IPPSPalooza! Click HERE to learn more

Get 15% OFF on all educational webcasts at ICD10monitor with code JULYFOURTH24 until July 4, 2024—start learning today!

CYBER WEEK IS HERE! Don’t miss your chance to get 20% off now until Dec. 1 with code CYBER25

CYBER WEEK IS HERE! Don’t miss your chance to get 20% off now until Dec. 2 with code CYBER24