AI Requires More Than Good Data to Produce Meaningful Insights

January 26, 2024

How Preverity’s data distinctives drive superior AI — to reduce patient risk.

I was recently asked whether Preverity competed with a technology company with AI in its name. While I was happy to be recognized for our leadership in AI, this company uses AI tools for policy administration—wholly unrelated to our mission to provide healthcare data and advanced analytics that drive results.

Confusion about AI got me thinking about Preverity’s distinctives and how our unique data drives superior analytics. In insurance, AI is poorly understood; in healthcare, it’s often perceived narrowly as replacing human oversight and control for diagnosis and testing.

Misconceptions surrounding AI are ubiquitous, even among professionals. Back when AI was mostly found in science fiction, “garbage in, garbage out (GIGO)” was used to describe how bad inputs lead to bad outputs. Well, GIGO is back with a vengeance. AI solutions typically depend upon massive amounts of data for their training stages and again in production. An inability to satiate this data hunger due to bottlenecks in the data infrastructure can severely hamper the functionality of AI technology.

Even more critical than the volume of data fed to AI systems, however, is the quality and relevance of that data. AI algorithms and models can be world-class but fail if the data they consume is suspect.

Rigorous logic doesn’t overcome bad data. When we speak of human intelligence, it’s obvious that relying on flawed or incomplete information typically leads to bad decisions and poor outcomes. But for artificial intelligence, data inputs are even more important. The definition of good data for AI is much broader than for human intelligence. Developing AI that delivers meaningful insights requires vast, accurate, relevant and complete data.  In other words, AI requires not only the four V’s – velocity, veracity, volume and variety—but also relevance to the AI model in development.

Miss any factor, and the AI may deliver inaccurate results. For example, the data may be good but irrelevant to the AI problem. Or it may be directly relevant but incomplete. If you feed inaccurate, irrelevant or incomplete data to an AI model, you’ll get outputs that may appear sound but are wrong.

How poor data fails: AI algorithms identify everything but COVID-19

In the early days of the pandemic, hundreds of AI tools were built to catch COVID, but none of them helped. MIT Technology Review chronicled several failures, most arising from errors in how the tools were trained or tested, where mislabeled data or data from unknown sources was a common culprit. Information about COVID patients, including medical scans, was collected and shared during a global pandemic, often by the doctors struggling to treat those patients.

Many issues arose from the poor quality of the data researchers used to develop their tools as front-line health professionals struggled to gather patient information. Researchers wanted to help quickly, and these were the only public data sets available. However, the resulting tools often were built using mislabeled data, or data from unknown sources.

High-quality data with clear sourcing is essential to successful AI tools for healthcare. Such data must be gathered to preserve its integrity and reliability. The file must also undergo proper testing, with rigorous quality control measures, to verify that it meets the highest standards. By paying attention to the quality and reliability of the data, researchers can help ensure the accuracy and effectiveness of AI tools.

At Preverity, we are committed to producing high-quality AI-driven solutions that amplify healthcare industry outcomes and minimize patient risk. Through our vast and meticulously curated database of malpractice event history, we build AI models that help detect and prevent clinical risk before it leads to an adverse event, thus improving patient safety and reducing malpractice exposure.

Preverity has the nation’s largest database of malpractice event history, dating back to 2012. Using data to develop actionable insights about healthcare risk and malpractice exposure is in our DNA. Here’s how we address the critical data factors for AI modeling:

  • Velocity—Weekly updates for clinical billing data on all US providers
  • Veracity—More than 80 billion patient interactions representing 80% of the US market
  • Volume—Over 100 billion lines of clinical billing data on all US providers
  • Variety—More than 150,000 providers representing over 1 million “physician years” of activity
  • Relevance—Approximately 60% of all commercial medical billing claims (35% of the nation) and 50% of all retail prescriptions from medical professionals, health systems and pharmacies.

A huge volume of data is produced daily in the healthcare environment. A recent study by the National Institute of Health (NIH) found that collecting, storing, sharing, analyzing, and reporting health data face numerous challenges that lead to incomplete, inaccurate, and untimely data. As a result, data quality issues now receive more attention than before.

Preverity is challenging conventional wisdom, and we have the data to do it. Our unique data distinctives set us apart from other companies in the field, and we use them to develop AI models that deliver meaningful insights for our clients. Contact me to learn how we can help you solve real-life problems that will result in better outcomes for you and your patients.

Kind Regards,
Gene Boerger, President and Chief Operating Officer

615-982-7076 |