Big Health Care Data Research and Consent

Guest Contributor: Fengyan Deng
PhD student at Texas Woman’s University,
CRNA at Texas Medical Center Houston Methodist Hospital

The origin of the word “big data” is vague. Only in 2012 did The New York Times publish multiple articles that helped bring the concept of “Big Data” into the mainstream. The most significant impact was Steve Lohr’s “How Big Data Became So Big,” published on August 11, 2012, which documented the term’s rapid rise and its effect beyond just technology circles. Today, the vocabulary of big data is very well-established in business, government, and science.                

What is Big Data? There are no universally agreed-upon definitions for this term. According to the U.S. Census Bureau, big data refers to data sources that are rapidly changing, large in size, and encompass a wide range of information. The data sources come from retail and payroll transactions, satellite images, “smart” devices, and surveys. Big data also encompasses administrative data from federal, state, and local governments, as well as data from third-party providers. The diversified data sources and techniques can provide unique insights for the U.S. Census Bureau’s research purposes and product development (https://www.census.gov/topics/research/big-data.html).                                                                     

To make big data distinctive, experts describe it using a framework known as the 5 Vs. Volume is the top feature of big data, dealing with terabytes, petabytes, or even zettabytes. Velocity refers to the constant flow of data, often in real-time, from multiple sources. Variety categorizes data into three categories: structured, semi-structured, and unstructured data. Veracity refers to the accuracy and integrity of data, as well as the trustworthiness of the information it conveys. Value refers to the fact that big data can be used to improve decision-making, solve problems, or provide valuable insights. Variability of big data is a new concept of V. It describes the inconsistency and unpredictability of the data.

The big health care data has its uniqueness compared to other big data. Healthcare data varies in both type and amount. It is presented as 10 Vs of volume, variety, venue, varifocal, varmint, vocabulary, validity, volatility, veracity, and velocity. (https://www.nature.com/articles/s41598-022-26090-5)                                                         

Big data has transformed the world by contributing a plethora of data types useful in diverse research domains. In healthcare, big data has contributed to changes in research methodology. Clinical research powered by the large-scale data set, analyzed, and managed by cheaper computing technology, supported by greater flexibility in study design to reveal health patterns, improve patients’ outcomes, and promises to provide solutions that have previously been out of society’s reach. (https://pmc.ncbi.nlm.nih.gov/articles/PMC7323266/).

Despite the numerous benefits, one of the ethical concerns that has become a pressing issue with the advent of big data research is the approach taken in research to respect individuals’ autonomy operationalized by obtaining informed consent. The Belmont Report established the ethical principles for research involving human subjects. One of them is the respect for the person, emphasizing the autonomy of individuals. The principal approach taken in research to respect individuals’ autonomy is obtaining informed consent. The primary goal of the consent process in research is to ensure that patients understand the purpose, risks, and methodology of the research being conducted. The consent process upholds the ethical principles of autonomy and freedom of choice by allowing patients to make informed decisions. One of the biggest concerns is whether the extensive research involving healthcare data, which raises concerns about consent, still adequately protects patients’ autonomy (https://pmc.ncbi.nlm.nih.gov/articles/PMC7819582/).

Patient data were previously obtained with consent, collected, and used primarily for direct clinical care. The broad scope and large scale of research that reuses existing health data are unprecedented. For example, the secondary use of health data for training AI systems to make predictions, and the secondary data analysis of using existing research data that differ from those initially addressed in the work. In this context, subjects are often unaware that their data is being collected and analyzed, and they lack the necessary control over their data, preventing them from withdrawing from a study that allows for autonomous participation. Although large-scale health data research does not require the direct involvement of human subjects and poses no direct risks to the health or well-being of those contributing to it, it does carry other kinds of risks or concerns. For example, it is possible to re-identify the subjects if the data are not properly anonymized or cannot be made fully anonymous.                                                 

Furthermore, the increasing amounts of available data can make it possible to indirectly identify a person by linking health data to other types of data available elsewhere. Large-scale health data research, although conducted without traditional intervention, is associated with specific informational risks to patients. Therefore, ensuring patient consent for future unforeseeable research is paramount to upholding ethical standards and protecting patients’ autonomous authorization of research activities.                                                                       

The autonomy in current theoretical ethics in research can be traced back to the philosopher John Stuart Mill(180-1873). Although he did not explicitly develop a theory or notion of autonomy, many of his ideas about the respect owed to a person and the importance of freedom are the main ingredients of current conceptions of patient autonomy, also underlying its most important embodiment in the context of informed consent. In Millian perspectives, the patient is the authentic and original focus of control. In big data research, autonomy safeguards the practice of informed consent .

Consent is a complex practice in research and can not be fully understood within various theories. Literature has proposed several consent theories, including positivism (real consent), Social constructionism, functionalist consent, critical theory, and postmodern choice. The positivist framework promotes respect for informed consent, which has significant benefits. It encourages health professionals/researchers to be accountable and to explain what they plan to do and why clearly. Patients have the free choice to consent to or refuse participation in the research. It defends them from unwanted interventions and from deception or coercion during treatment and research.                     

Critical theory views consent as a necessary protection for patients against unnecessary, harmful, and unwanted interventions in research. Real and critical consent remind practitioners and researchers of the standards that protect them and their patients. Social construction demonstrates that consent is a dynamic process—perceived, experienced, and shaped through interactions between individuals and their social contexts. Construction theories provide a framework of consent as a process rather than an event.

Obtaining informed consent from patients to reuse their health data for unpredictable future research proves challenging and complex. Several consent models have been created. Blanket consent refers to agreeing to the reuse of health data without any restrictions, including future research uses. Meta consent focuses on how and when people would like to provide consent in the future. The broad consent model requires patients to provide their consent for various future research uses, rather than consenting to each use specifically.                                        

All the above consent models are challenged by the argument that they do not account for sufficiently informed consent because they do not provide the level of control necessary for authentic self-determination, the autonomy. The criticism has promoted the emergence of the dynamic consent model. In this model, patients first agree that their health data may be included in a research platform and then receive regular updates about new studies, accompanied by requests to accept or decline those research uses. In essence, dynamic consent enables study-specific consent for health data research. The term’ dynamic consent’ is also used to refer to digital interfaces that enable continuous communication between researchers and participants, and that help manage the disclosure of information and provision of consent. In addition to consent, a dynamic consent interface entails two-way communication, providing participants with information and returning relevant results. The dynamic consent is a process rather than a one-time event, as addressed in the constructionism theory.    

Big data research has transformed the research paradigm in research methodology, introducing ethical challenges in protecting the autonomy of research subjects, particularly in the form of informed consent. Nursing scholars will face and be challenged by the big data research paradigm, and be aware of the nursing ethics challenges as well. The big nursing care data research has great potential to predict patient outcomes, enhance patient safety, and reduce healthcare costs. The nurse scientist is uniquely positioned to leverage big data for research. Nursing science has an excellent opportunity to evolve and embrace the potential of big data. Nurse scientists should be collaborators and drivers of utilizing the potential of big data.

Meanwhile, it is imperative to apply the American Nurses Association (ANA) Code of Ethics as a guide for nurse scholars/scientists to align their practice in a manner that is consistent with the ethical obligations of the profession. Though the ANA Code of Ethics does not explicitly address big data, its principles can be applied to guide the ethical use of big data research. This post focuses on informed consent. The Code of Ethics, Provision 1, states that “nurses practice with compassion and respect for the inherent dignity, worth, and unique attributes of every person.”  Provision 2 stated that “a nurse’s primary commitment is to the recipient(s) of nursing care, whether an individual, family, group, community, or population.” Provision 3 illustrated that “ a nurse establishes a trusting relationship and advocates for the rights, health, and safety of the recipient(s) of nursing care.”  Applying these three principles in the big nursing data research is crucial for protecting patients’ autonomy in the practice of informed consent for big data research.               

Informed consent in the context of big data research is a subject of significant ethical debate, particularly regarding concerns about participant autonomy. As mentioned earlier, the reuse of patients’ health information and trial data occurs without initial consent, which is one example. Another situation in the oncology nursing study is that a participant signed a broad consent form by providing a specimen for a candidate gene association study of an inflammatory marker. This specimen may then be used for future research, such as a genome-wide study of pathogenic variants. All these issues disrespect patients’ autonomy and their rights. The Code of Ethics, provisions 1 to 3, focus heavily on a patient-centered approach, emphasizing the dignity of patients, nurses’ commitment and protection, and advocacy for patient autonomy. The considerations of these ethics should be upheld in the informed consent process for big data research.

The patient-centered approach in the big data informed consent research may require nurses’ commitment to advocacy by informing patients about the dynamic and unpredictable nature of big data research, ensuring that patients truly understand and consent to the potential long-term, secondary uses of their data, and how patients can withdraw consent or have their data removed from large datasets. Nurse scientists may stay up to date with the diverse informed consent models that align with the Code of Ethics in the context of big data research.

References

Alderson, P., & Goodey, C. Theories of consent. BMJ. 1998 Nov 7;317(7168):1313-5. doi:   10.1136/bmj.317.7168.1313. PMID: 9804727; PMCID: PMC1114211                                      

American Nurse Association. (n.d.). CodeEthics for Nurses. Retrieved September 19, from         https://codeofethics.ana.org/provision-1-3                                                                                      

Bruns, A., & Winkler, E. C. (2024). Dynamic consent: a royal road to research consent? Journal of Medical Ethics, https://10.1136/jme-2024-110153

Duah, H. O., Boch, S., Arter, S., Nidey, N., & Lambert, J. (2024). A guide to understanding big data for the nurse scientist: A discursive paper. Nursing Inquiry, 31(3), e12648. https://10.1111/nin.12648

Favaretto, M., De Clercq, E., Gaab, J., & Elger, B. S. (2020). First do no harm: An exploration of researchers’ ethics of conduct in Big Data behavioral studies. PloS One, 15(11), e0241865. https://10.1371/journal.pone.0241865

Harris, C. S., Pozzar, R. A., Conley, Y., Eicher, M., Hammer, M. J., Kober, K. M., Miaskowski, C., & Colomer-Lahiguera, S. (2023). Big Data in Oncology Nursing Research: State of the Science. Seminars in Oncology Nursing, 39(3), 151428. https://10.1016/j.soncn.2023.151428

Howe Iii, E. G., & Elenberg, F. (2020a). Ethical Challenges Posed by Big Data. Innovations in Clinical Neuroscience, 17(10-12), 24–30.

Khan, S., Khan, H. U., & Nazir, S. (2022). Systematic analysis of healthcare big data analytics for efficient care and disease diagnosing. Scientific Reports, 12(1), 22377–5. https://10.1038/s41598-022-26090-5

Mallappallil, M., Sabu, J., Gruessner, A., & Salifu, M. (2020). A review of big data and medical research. SAGE Open Medicine, 8, 2050312120934839. https://10.1177/2050312120934839

Muller, S. H., van Thiel, G. J., Mostert, M., & van Delden, J. J. (2023). Dynamic consent, communication and return of results in large-scale health data reuse: Survey of public preferences. Digital Health, 9, 20552076231190997. https://10.1177/20552076231190997     

Syracuse University Information Studies. (2025, June 8). What is Big Data? Definition, How it works, and Use Cases. Retrieved September 7, 2025, from https://ischool.syracuse.edu/what-is-big-data/.                                                                           

United States Census Bureau. (2022, July 7). Big Data. Retrieved September 7th, 2025, from  https://www.census.gov/topics/research/big-data.html                                                  

Vedder, A., & Spajić, D. Moral autonomy of patients and legal barriers to a possible duty of health related data sharing. Ethics Inf Technol 25, 23 (2023). https://doi.org/10.1007/s10676-023-09697-8

About Fengyan Deng

Fengyan, Deng, DNP, PhD student, Certified Registered Nurse Anesthetist. Big healthcare data research will become a global phenomenon and pose significant challenges to researchers in terms of methodology and ethical issues. Nursing scientists will likely embrace big healthcare data research and encounter ethical challenges as well. The big nursing care data research has great potential to predict patient outcomes, enhance patient safety, and reduce healthcare costs.

4 thoughts on “Big Health Care Data Research and Consent

  1. Thank you very much for this important blog. I have to wonder how the dynamic model of informed consent works. Does the person wanting to use the data have to contact the original researchers and ask them to contact all the participants in the big data set for their individual informed consent for each new study? What happens if the participants are no longer contactable (died, new email address, etc.) or the researchers are no longer contactable (new email address, etc.)?

    • Thank you very much for reading my blog and your inquiry. Here are my answers:
      1 )You do not always have to contact the original researchers and re-consent every participant for new studies, as an institutional review board (IRB) can grant a waiver of informed consent. This is especially common for studies using de-identified or aggregated data, where re-contacting participants is often impractical or impossible. If the new research falls outside the scope of the original consent, You may need to re-consent or obtain IRB approval for a waiver, but the specific requirements depend on the data’s identifiability and the nature of the new study.
      )The communication between the researchers and participants uses a personalized, secure online platform (like a website or mobile app) that serves as the interface between the participant and the researcher. The platform facilitates ongoing dialogue, allowing participants to be informed about research projects and providing updates as they evolve. Participants can make specific, detailed choices about their data and change these preferences at any time, including revoking consent entirely. The model provides real-time insight into how data is being used, a key difference from traditional consent, where future uses may not be known. By the way, I could not find the answers what the solution is if both participants and the researcher are not contactable.
      Sincerely,
      Fengyan

  2. I too want to thank the author and encourage her to explore further. My questions are pragmatic like Dr. Fawcett’s and they center about harm. Are there instances of individual harm not to some sense of an unknowable loss of autonomy but of actual concrete or even perceived harm to individuals because of big data practices? In the case of HIPPA violations, I can see actual harm being reported if family members or employers had unlimited and/or illegal access to medical records. But what harm can come to me as a patient if my inflammatory biomarker information and my lipid results are studied by epidemiologists using a database consisting of 40K members enrolled in a health care system using the same EHR program covering 15 states? I do not personally experience a violation of my autonomy whether or not I sign a form allowing the epidemiologist access to my personal ESR, CPR, neutrophil to lymphocyte ratios and lipid results or not. In fact,I am more affronted to think I am limiting epi research if my consent is so specific that it would limit the scientist’s ability to follow up on any other big data cues unless she retuned and obtained additional consent from me and the other 39,999 members in the bid data database.

    Of course, the author could reply that patient autonomy may be preserved if patients were asked to sign a general consent. True but what is the value in doing so? What actual harm is being prevented among those who don’t sign? What perceived harm? Now this is a question the author may want to pursue, I.e. What is the perceived harm that is prevented by those who refuse to give big data scientists access to their personal data? This may actually yield an evidence-based ethical rationale for the author who currently lacks such. I, for one, would be interested in knowing the answer. It may lead to diagnosing or labeling types of anti-scientific biases, information of benefit to health educators.

    • Thank you very much for reading my blog and your comments and insights. There will undoubtedly be no direct physical harm to the patients when researchers use their health records data without consent. However, here are some considerations based on the current literature: Unauthorized or undisclosed use of health data can severely damage the public’s trust in healthcare providers and researchers, making future participation in studies more difficult. Without consent, there is a greater risk that data intended for research will be used for other purposes, such as marketing, which is a concern for individuals regarding their data privacy. Using data without consent violates fundamental principles of autonomy and non-maleficence (do no harm) that are crucial for ethical research. Large EHR databases can be targets for cybercriminals, leading to breaches that can be used for identity theft or extortion.
      Sincerely,
      Fengyan

Leave a Reply to fdenga4f77b2fdbCancel reply