9. Deeper Insight Into Healthcare Data and Data Sovereignty

9.1 Importance of Healthcare Data

Over the past decade, the world has talked about big data, machine learning, deep learning, LLMs (Large Language Models) and many other data-related innovations and possibilities, so most people know that data matters.

In healthcare, in particular, utilizing data measured scientifically and systematically has been in place for decades. A prime example is clinical trials, which use data to evaluate the safety and effectiveness of new treatments. Typical data generated during clinical trials are lab values measured by hospital testing equipment, gene sequences, and more. These data can be organized into well-structured spreadsheets that can be used to validate the efficacy and safety of certain medications, discover new statistical findings, and more.

In recent years, healthcare data has included data generated in hospitals and data that can be collected from wearable devices, smartphone sensors, patient self-reporting, and other sources that are part of a patient's daily life outside the hospital. These different types of data can be used to create a holistic understanding of a patient's health, considering factors beyond prescribed medications. This is called real-world data, and it's gaining traction with recent research into COVID-19 vaccines (see) and digital therapies (see).

As mentioned above, data is already playing an important role in healthcare. In the next chapter, we'll explore the specific types of healthcare data and the challenges of acquiring and utilizing it.

9.2 Types of Healthcare Data

There are many definitions and categorizations of healthcare data. In this chapter, we will introduce the concepts and methods of categorizing healthcare data used in this paper.

Classification upon the identifiability

Most major countries, including the U.S., include personally identifiable data as an important criterion for data classification. While it's important to avoid the risk of privacy breaches, health data, in particular, can combine various valuable data to drive new innovations that improve patient and individual health. This requires a delicate approach that distinguishes between privacy and utilization.

The most prominent laws that follow this approach are the HIPAA/HITECH laws in the U.S. These two laws set out fundamental principles for the protection and use of health information and categorize health information into three categories. Health information that does not fall into one of these categories is still basically subject to general privacy laws.

Last updated