Informatics, Big Data, and Data Visualization

WARNING: unbalanced footnote start tag short code found.

If this warning is irrelevant, please disable the syntax validation feature in the dashboard under General settings > Footnote start and end short codes > Check for balanced shortcodes.

Unbalanced start tag short code found before:

“which I Googled and found are knowns as “Ansombe’s quartet“) have identical statistics (i.e., the mean and standard deviation of x is the same for all 4, the mean and standard deviation of y is the same for all 4 sets) I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10…”

Notes from module 10 of the Interprofessional Health Informatics course I’m working on (plus side reading that I did to fill in some blanks/learn more about some things mentioned in the course).

Big Data = size of a database that is too large to manipulate with traditional methods
there are terabytes and terabytes of patient data being collected
data is also collected from instruments, devices, sensors, social media, mobile technologies, etc.
see notes re: eScience
methods: data mining, data visualization – helps you generate hypotheses from the data
neural networks – e.g., can help with pattern recognition
3 big projects:

Exploring and Understanding Adverse Drug Reactions project
- computer system to detect ADE
- EHRs in 4 EU countries
- analyzing the EHRs for signals, combos of drugs & AEs
Exploring the Frontier of EHR Surveillance: The Case of Postop Complications
- data mining
MetroHealth
- replicated Norwegian study of heart disease risk by data mining EHRS
- 3 months (vs. 13 years) and gave more precise results

eScience Challenges
- how do we codify and represent our knowledge?
- ontologies provide common scheme on how to organize, reorganize, and share data
digital infrastructure for data capture, standardized data elements and transfer, data analysis, dissemination, and research funding is all need
ontology-based search – allowed by structured data

Data Visualization

these 4 sets of data ((which I Googled and found are knowns as “Ansombe’s quartet“) have identical statistics (i.e., the mean and standard deviation of x is the same for all 4, the mean and standard deviation of y is the same for all 4 sets)


I		II		III		IV
x	y	x	y	x	y	x	y
10.0	8.04	10.0	9.14	10.0	7.46	8.0	6.58
8.0	6.95	8.0	8.14	8.0	6.77	8.0	5.76
13.0	7.58	13.0	8.74	13.0	12.74	8.0	7.71
9.0	8.81	9.0	8.77	9.0	7.11	8.0	8.84
11.0	8.33	11.0	9.26	11.0	7.81	8.0	8.47
14.0	9.96	14.0	8.10	14.0	8.84	8.0	7.04
6.0	7.24	6.0	6.13	6.0	6.08	8.0	5.25
4.0	4.26	4.0	3.10	4.0	5.39	19.0	12.50
12.0	10.84	12.0	9.13	12.0	8.15	8.0	5.56
7.0	4.82	7.0	7.26	7.0	6.42	8.0	7.91
5.0	5.68	5.0	4.74	5.0	5.73	8.0	6.89

for all 4 sets:
- mean of x = 9
- mean of y =11
- variance of x = 7.50
- variance of y =4.12
- correlation between x and y = 0.816
- linear regression equation = y = 3 + 0.5x
so they are “statistically identical” but when you graph them, you see they are quite different!

“Anscombe’s quartet 3” by Anscombe.svg: Schutz
derivative work (label using subscripts): Avenue (talk) – Anscombe.svg. Licensed under CC BY-SA 3.0 via Commons.

data visualization isn’t new:
- e.g., John Snow’s cholera map; Florence Nightingale did lots of graphs
reading visualizations:
- Perception: low-level activity of sending the visual aspects of a day
- Cognition: the higher-level process of interpreting the display and translating it into meaning
- The challenge: using what we know about perception and cognition to make visualizations better
research has been done on which things we can perceive more quickly (e.g., comparing things along a 2D line is quicker than comparing areas of a shape which is quicker than comparing volume of a 3D object)
cognitive burden – how hard is it for us to interpret the data (e.g., extract values, compare values, detect trends)
when creating data visualizations, we should make the images easiest to perceive and we should match the method of visualization with its purpose
- e.g., if you need to extract an exact value, use a table; but if you need to detect a trend in the data, use a line graph (if you want to get an exact value, it’s harder to do from a graph)
key point: there isn’t one best way to display data – it depends on the purpose
some tasks may require combinations
many published guidelines with different aims
- persuasive graphs
- statistical graphs
there is no general theory of data visualization
suggested practices (general):
- for value extraction: table
- for proportions: pie charts, stacked bar charts
- for value comparison: bar charts, line graphs, scatterplots
- tended detection: line graphs
- use the design which minimized the cognitive burden for the task at hand
3 questions to ask when designing a visualization:
- who is the intended audience?
- what is the goal? (e.g., exploration, education, persuasion)
- what are the data composed of, statistically? (e.g., continuous, categorical, time series)
in addition to the graphs we commonly use (line graph, bar chart), there are some other types of graphs:
a streamgraph: “a type of stacked area graph which is displaced around a central axis, resulting in a flowing, organic shape.” (Wikipedia)

“LastGraph example” by Psychonaut – Own work. Licensed under CC0 via Commons.

a sunburst graph: “used to visualize hierarchical data, depicted by concentric circles” (Wikipedia)

“Disk usage (Boabab)” by w:Baobab (software). Licensed under CC0 via Commons.

you do a lot with simple tools like Excel, but there is also more advanced software to do even cooler things:
- e.g. Tableau is a software (costs money) that makes it easy to do data visualization; it uses knowledge of best practices of data visualization to suggest what to do; it’s quite expensive, requires some training
- e.g., ggplot2 for R (free) – builds on basic graphing in R and allows you do to stuff more easily; very sophisticated graphs
- d3js.org – JavaScript library for manipulating documents based on data (allows you to make your graphics interactive and plug it right into your website) (free)

And with that, I’ve completed my first ever Coursera course!

Informatics, Big Data, and Data Visualization

Data Visualization

“LastGraph example” by Psychonaut – Own work. Licensed under CC0 via Commons.

“Disk usage (Boabab)” by w:Baobab (software). Licensed under CC0 via Commons.

Leave a Reply

Check out my cool logic modeling software:

Recent Posts

Meta