Informatics Research Seminar: Text-based Disease Classification of Medical Literature

February 11 @ 4:00 – 5:00 pm

Speaker: Angela Zoss
Presented from Duke University

Broadcast Link: Seminar



This project explored ways to use natural language terms and phrases to detect broad disease categories in the titles of articles from the PubMed database. An early attempt, the project classifies four million papers written in five different languages over the last 50 years into nine broad disease categories, visualizing the results as flows and streams to explore the changing focus of medical research over time. The visualization received an award as a top student submission to the ACM Web Science 2014 Conference data visualization challenge. Future work will include refining disease detection with more sophisticated text and data mining, as well as developing new visual interfaces to the results.


Angela Zoss began work as Duke University’s first Data Visualization Coordinator in the summer of 2012. While helping to develop this new position at Duke, she has created new library workshops on visualization; hosted an annual student data visualization contest; consulted with students, researchers, and faculty members on research projects; and helped to introduce visualization concepts and tools into several undergraduate and graduate courses.  She co-organizes a weekly talk series on visualization topics and is collaborating within and outside the Duke community to improve instructional and technical support for visualization projects. She holds a Master of Science in Communication from Cornell University and is pursuing a doctorate in Information Science from Indiana University, where she taught a semester-long graduate course in Information Visualization. Her research interests include scientometrics, network analysis, computational linguistics, and literacy issues surrounding information visualization.