Notes from module 10 of the Interprofessional Health Informatics course I’m working on (plus side reading that I did to fill in some blanks/learn more about some things mentioned in the course).
- Big Data = size of a database that is too large to manipulate with traditional methods
- there are terabytes and terabytes of patient data being collected
- data is also collected from instruments, devices, sensors, social media, mobile technologies, etc.
- see notes re: eScience
- methods: data mining, data visualization – helps you generate hypotheses from the data
- neural networks – e.g., can help with pattern recognition
- 3 big projects:
- Exploring and Understanding Adverse Drug Reactions project
- computer system to detect ADE
- EHRs in 4 EU countries
- analyzing the EHRs for signals, combos of drugs & AEs
- Exploring the Frontier of EHR Surveillance: The Case of Postop Complications
- data mining
- replicated Norwegian study of heart disease risk by data mining EHRS
- 3 months (vs. 13 years) and gave more precise results
- eScience Challenges
- how do we codify and represent our knowledge?
- ontologies provide common scheme on how to organize, reorganize, and share data
- digital infrastructure for data capture, standardized data elements and transfer, data analysis, dissemination, and research funding is all need
- ontology-based search – allowed by structured data
- these 4 sets of data ((which I Googled and found are knowns as “Ansombe’s quartet“) have identical statistics (i.e., the mean and standard deviation of x is the same for all 4, the mean and standard deviation of y is the same for all 4 sets)
- for all 4 sets:
- mean of x = 9
- mean of y =11
- variance of x = 7.50
- variance of y =4.12
- correlation between x and y = 0.816
- linear regression equation = y = 3 + 0.5x
- so they are “statistically identical” but when you graph them, you see they are quite different!
- data visualization isn’t new:
- e.g., John Snow’s cholera map; Florence Nightingale did lots of graphs
- reading visualizations:
- Perception: low-level activity of sending the visual aspects of a day
- Cognition: the higher-level process of interpreting the display and translating it into meaning
- The challenge: using what we know about perception and cognition to make visualizations better
- research has been done on which things we can perceive more quickly (e.g., comparing things along a 2D line is quicker than comparing areas of a shape which is quicker than comparing volume of a 3D object)
- cognitive burden – how hard is it for us to interpret the data (e.g., extract values, compare values, detect trends)
- when creating data visualizations, we should make the images easiest to perceive and we should match the method of visualization with its purpose
- e.g., if you need to extract an exact value, use a table; but if you need to detect a trend in the data, use a line graph (if you want to get an exact value, it’s harder to do from a graph)
- key point: there isn’t one best way to display data – it depends on the purpose
- some tasks may require combinations
- many published guidelines with different aims
- persuasive graphs
- statistical graphs
- there is no general theory of data visualization
- suggested practices (general):
- for value extraction: table
- for proportions: pie charts, stacked bar charts
- for value comparison: bar charts, line graphs, scatterplots
- tended detection: line graphs
- use the design which minimized the cognitive burden for the task at hand
- 3 questions to ask when designing a visualization:
- who is the intended audience?
- what is the goal? (e.g., exploration, education, persuasion)
- what are the data composed of, statistically? (e.g., continuous, categorical, time series)
- in addition to the graphs we commonly use (line graph, bar chart), there are some other types of graphs:
- a streamgraph: “a type of stacked area graph which is displaced around a central axis, resulting in a flowing, organic shape.” (Wikipedia)
“LastGraph example” by Psychonaut – Own work. Licensed under CC0 via Commons.
- a sunburst graph: “used to visualize hierarchical data, depicted by concentric circles” (Wikipedia)
“Disk usage (Boabab)” by w:Baobab (software). Licensed under CC0 via Commons.
- you do a lot with simple tools like Excel, but there is also more advanced software to do even cooler things:
- e.g. Tableau is a software (costs money) that makes it easy to do data visualization; it uses knowledge of best practices of data visualization to suggest what to do; it’s quite expensive, requires some training
- e.g., ggplot2 for R (free) – builds on basic graphing in R and allows you do to stuff more easily; very sophisticated graphs
And with that, I’ve completed my first ever Coursera course!