The reliability of the current state of health information is extremely limited.
EDITOR’S NOTE: Dr. Joseph Nichols is producing a five-part series on healthcare data for ICD10monitor. This is the fourth installment in his exclusive series.
This is the fourth in a series of five articles on the search for reliable health information. This article is focused on that component of the health information domain related to data analysis.
As mentioned in the first article, the following are high-level requirements for data analysis:
Consistent, clearly defined methods of aggregation or classification
Assuming that the requirements of data acquisition and data management have been met, the next step in the search for reliable health information requires aggregation of data. Data elements need to be grouped and organized in order to represent key concepts that are the focus of questions. For example, “diabetes” may be the focus of questions like:
The first thing that’s needed is a clear definition of diabetes that delineates how data is aggregated to the “diabetes bucket,” and in a way that is understood by all seeking the information. What should be included? What should be excluded? Should Type I and Type II diabetes be included? Should secondary diabetes and gestational diabetes be included? What codes are needed to represent the definition to ensure that we are capturing all of the intended data from coded data transactions?
Today, concepts like diabetes, heart failure, kidney failure, or any number of other conditions are included in analytic reports without any clear definition of the concept or definition of the criteria by which the source data was aggregated. Without a clear definition and method of aggregation, there is no reassurance that we are making consistent comparisons, or that the analysis we rely on represents what we think it represents. Too often, the job of data aggregation is left to the technical team, with very little guidance.
Another good example of this issue is the grouping of “diabetes” in the hierarchal condition categories (HCCs) published by the Centers for Medicare & Medicaid Services (CMS). These are widely used categories in the industry for a variety of purposes, primarily risk assignment. However, there is no clear definition of “diabetes” within the HCCs. Very few users of this data know that the HCCs for diabetes do not include diabetes in pregnancy, or even Type I and Type II diabetes in pregnant women. Few would know whether secondary diabetes is included in this HCC category. The values resulting from any analysis could vary dramatically, depending on the inclusion or exclusion of key data elements.
- The definitions of all analyzed concepts should be complete, clear, consistently applied, published, maintained, and available to all potential users of data analytic reports.
- The selection of coded data should include input from clinicians, coders, and IT professionals.
- A repository of all data aggregation logic should be created, maintained, and made available for consistent analysis across all analytic efforts, or any applications that might use or reference the data aggregation methodologies.
- Clearly defined taxonomies and ontologies should be developed, defined, and maintained as a service for all systems for any data aggregation or reporting purpose.
We’ve all heard Mark Twain’s observation that there are “lies, damn lies, and statistics.” This might send a message that statistics are suspect, or of questionable value, but it simply means that statistics are often used inappropriately in an attempt to prove some desired answer, independent of the real answer. Statistics never provide an answer. Statistics simply provide a test to support that the data used passes a first mathematical step in getting at an answer. “Statistical significance” does not mean analytic significance. Statistics are an important part of analysis, but only as a numeric test of the data to determine if there is some viable use of that data in a meaningful analytic process.
- Some level of statistical analysis should be done as an important first step in the analytic process. The validity of any reporting of trending, comparisons, or other analytic activities needs to meet a basic number validation as a floor.
- Policies should be established for statistical requirements that should be a part of any analytic discipline.
- Data governance should ensure that statistical manipulation is never used to bias the data analysis. As tempting as it may be to present two or three numbers and declare a trend that supports an argument, the definitions of “trends” need to be supported by statistical requirements.
Multifactorial testing of reporting results
Reporting is the platform for disseminating information that is used by decision-makers to make informed choices. Assuming that aggregation and statistical validity have been achieved, the next concern is that those disciplines are carried out in the implementation in a report.
- The reporting team must be a key part of the data aggregation and statistical effort to ensure the inclusion of all requirements into report writing.
- Testing should include reporting that relies upon multiple parameters, and should confirm that all requirements are met consistently.
- User testing should include user input about the consistent meaning of the report from the perspectives of multiple users. If key users cannot arrive at consistent conclusions from a report, the report content and design should be re-evaluated.
Transparency to analytic methods
The goal of any reporting is to ensure that reported values represent reliable information derived from complete and reliable data. All users should be able to arrive at as consistent an interpretation as possible, to ensure consensus on the best decisions for action, based on information that accurately reflects reality.
- All documentation of report design, methodologies, logic, and statistical testing used to create the reported information should be available for all users.
- Definitions should be clear and complete, and documentation should be understandable at the user level.
- All users need to be trained in how to use available documentation as part of the decision-making process, and encouraged to present any clarifying questions to a reliable designated source.
Safeguards against bias-driven analysis
One of the biggest challenges facing health information – and for that matter, all information – is bias. There is a strong human tendency to seek information that confirms our own beliefs. Consciously or subconsciously, we tend to drive data and analysis toward the outcome we want, rather than determining the reality of what we should be measuring. It’s hard to objectively assess that which is inconvenient to our expectations. We will never learn anything if we just confirm what we think we know. The next, and last, article in the series will examine the challenging issue of bias.
Assuming that data acquisition and data management meet all of the requirements described in the preceding articles in this series, analysis must be done in such a way that data is aggregated and reported appropriately. Unless analytic reports support and lead to meaningful actions based on meaningful information, the wrong decisions may result, as well as risk of significant harm to entities, including patients, who rely on the accuracy of that information.
Programming Note: Listen to Dr. Nichols report this story live today during Talk Ten Tuesday, 10-10:30 a.m. EST.