Informatics Research Seminar: Statistical Approaches to Electronic Health Data: Identifying and Quantitating Disease

November 12 @ 4:00 – 5:00 pm


Speaker: Joseph E. Lucas, PhD
Presented from Duke University

Broadcast Link: Seminar



The systematic collection of electronic medical records is creating new opportunities throughout healthcare. Applications based on EHR data have the potential to disrupt the way physicians and health care systems interact with patients, the way clinical research is conducted, and the ability of regulators to encourage medical practice focused on patient health. However, medical records are complicated and messy, incorporating many different types of data that often contain mistakes or holes, and sometimes subject to abrupt changes in structure and content.

A statistical approach to jointly analyzing text, categorical, and continuous data to model a patient’s disease state is presented. This approach allows for the collection of multiple sources of evidence into a single coherent picture of disease. A parametric family of curves is used to describe the progression of health and disease through time (as recorded in the medical record). This allows different health outcomes to be tied together as a single picture of heath, such as the onset of Alzheimer’s disease and future admission into skilled nursing facilities. Aspects of this approach applied to multiple different EHR data sets will be demonstrated.


Joseph E. Lucas, PhD conducts research focused on analytics at the intersection of biology/medicine and large, complex sets of data. His statistical expertise includes variable selection, factor modeling, experimental design, and topic models. His domain expertise includes proteomics, metabolomics, genomics, personalized medicine, biosignature discovery, and electronic health records. He has worked with collaborators on discoveries relating to cancer, infectious disease, radiation toxicity, and environmental exposure.

Before joining the Information Initiative at Duke, Dr. Lucas spent six years as a professor in the Institute for Genome Sciences and Policy at Duke University and worked at Quintiles as a Senior Director in the Predictive Analytics group. He is an Adjunct Research Associate Professor in the Department of Statistical Science at Duke University. He obtained a PhD in statistics from Duke University and a MA in both Mathematics and Computer Science from the University of Pennsylvania. He earned an undergraduate degree in Applied Math/Biology from Brown University.