Sources of Big Data in Health Care

Sources of Big Data in Health Care

A simple definition of big data in health care is “the totality of data related to patient healthcare and well-being” (Raghupathi 2014). But what exactly are these types of data, and where do they come from? The following is a broad overview of the types and sources of big data of interest to health care providers, researchers, payers, policymakers, and industry. These categories are not mutually exclusive, because the same data can originate from a variety of sources.

Nor is this list exhaustive, because the practical application of big data analytics will surely continue to expand.

Clinical information systems

These are traditional sources of clinical data that health care providers are accustomed to viewing.

  • Electronic health records (EHRs) collect, store, and display information such as demographics, past medical history, active medical problems, immunizations, allergies, medications, vital signs, results from laboratory and radiology tests, pathology reports, progress notes created by health care providers, and administrative and financial documents

  • Health information exchanges serve as hubs between disparate clinical information systems

  • Patient registries maintained by health care organizations on their own patients, often linked to the EHR. Other registries track immunizations, cancer, trauma, and other public health issues on a wider geographic scale.

  • Patient portals allow patients to access personal health information stored in a health care organization’s EHR. Some patient portals also allow users to request prescription refills and exchange secure electronic messages with the health care team.

  • Clinical data warehouses aggregate patient-level data from multiple clinical information systems, such as EHRs and other sources listed above

    Claims data from payers

    Public payers (e.g. Medicare) and private payers have large repositories of claims data on their beneficiaries.

    Research studies

    Research databases contain information about study participants, experimental treatments, and and clinical outcomes. Large studies are usually sponsored by pharmaceutical companies or government agencies. An application of personalized medicine is to match individual patients with effective treatments, based on patterns in clinical trials data.

    This approach moves beyond applying evidence-based medicine principles, by which a health care provider determines whether a patient shares broad characteristics (e.g. age, gender, race, clinical status) with trial participants. With big data analytics, it is possible to select a treatment based on much more granular information, such as the genetic profile of a patient’s cancer (see below).

    Genetic databases

    The repository of human genetic information continues to accumulate at a rapid pace. Since the Human Genome Project was completed in 2003, the cost of human DNA sequencing has been reduced by a million-fold.

    The Personal Genome Project (PGP), launched in 2005 by Harvard Medical School, seeks to sequence and publicize the complete genomes of 100,000 volunteers from around the world. The PGP itself is a prime example of big data project due to the sheer volume and variety of data. A personal genome contains about 100 gigabytes of data. In addition to sequencing genomes, the PGP is also collecting data from EHRs, surveys, and microbiome profiles.

    A number of companies offer direct-to-consumer genetic sequencing for health, personal traits, and pharmacogenetics on a commercial basis. This personal information could be subjugated to big data analytics. For example, 23andMe stopped offering health-related genetic reports to new customers as November 22, 2013 to comply with the U.S. Food and Drug Administration. However, as of August 29, 2014, the company provides access to de-identified genetic information from volunteers for research purposes.

    Public records

    The government keeps detailed records of events related to health, such as immigration, marriage, birth, and death. The U.S. Census has collected vast amounts of information every 10 years since 1790. The Census’ statistics website had 370 billion cells as of 2013, with approximately 11 billion more added yearly.

    Web searches

    Web search information gathered by Google and other web search providers could provide real-time insights related to a population’s health. However, the value of big data from web search patterns might be improved by combining it with traditional sources of health data.

    Social media

    Facebook, Twitter, and other social media platforms generate a rich variety of data around the clock, giving a view into the locations, health behaviors, emotions, and social interactions of users. 

    Devices

    Massive troves of health-related information are also collected and stored on mobile and home devices.

    • Smartphones: Thousands of mHealth apps capture information on the user’s physical activity, nutritional intake, sleep patterns, emotions, and other parameters. Native cell phone apps (e.g. GPS, email, texting) can also give clues about an individual’s health status.

    • Wearable monitors and devices: Pedometers, accelerometers, glasses, watches, and chips embedded under the skin also gather health-related information.

    • Telemedicine devices allow health care providers to monitor patients’ parameters such as blood pressure, heart rate, respiratory rate, oxygenation, temperature, ECG tracings, and weight.

    Financial transactions

    Patients’ credit card transactions are included in the predictive models used by Carolinas HealthCare System to identify patients who are at high-risk for being readmitted to the hospital.

    Sources:

    Lazer D et al. The Parable of Google Flu: Traps in Big Data Analysis. Science 2014:343 (6176):1203-1205. DOI: 10.1126/science.1248506

    Carolinas HealthCare System. How Carolinas HealthCare System is Turning Big Data Into Better Care. Accessed on August 30, 2013

    Fernandes L et al. Big Data, Bigger Outcomes. Journal of AHIMA 83, no.10 (October 2012): 38-43. Accessed on August 29, 2014.

    Raghupathi W and Raghupathi V. Big data analytics in healthcare: promise and potential. Health Information Science and Systems 2014; 2:3. doi:10.1186/2047-2501-2-3. Accessed on August 29, 2014.

    Continue Reading