1.read the case and fill out the rest of the table and follow the instructions.
2. fill out the table about diabetic devices.
Introduction to Data Analysis
and Basic Statistical Concepts
Inna Miroshnyk, PhD
New Curriculum/Spring 2022
DESCRiptive Statistics
Summarizing, organizing, and presenting data
Learning Objectives (Part II)
❑ Describe and calculate the measures of central tendency of data sets
❑ mean
❑ median
❑ mode
❑ Define and calculate the measures of dispersion
❑ range and IQR
❑ Variance/CV
❑ SD
❑ Describe the measures of distribution shape
❑ Compare and contrast normal and skewed distributions
❑ Differentiae between and calculate the measures of risk
❑ Incidence, prevalence, mortality rate
❑ Define and calculate performance measures for diagnostic tests
❑ Sensitivity, specificity, accuracy, positive/negative predictive values
❑ Organize and present data in a scientifically meaningful way
3
Spring 2021-2022
Measures of Central Tendencies / Central
Location
❑ Mean (average)
(appropriate for interval and ration levels of data
measurement)
❑ Median
(the value in the middle of the ordered list)
• 20 20 20 38 42 42 51 68
Most appropriate for data sets with outliers
❑ Mode
(most frequently occurred value)*
*Rarely used in medical research
How to Measure the Variability between
Two Groups?
Mean 1 = Mean 2
?
Measures of Variability/Dispersion
Describe how data are spread
Range
• Used for any numerical data
Interquartile Range (IQR)
• Used for any numerical data
Variance
• Used for continuous & some discrete data
Standard deviation (SD)
• Used for continuous & some discrete data
Measures of Variability: Range
Range = MAX value – MIN value
Range 1 = (202 -170) = 32
Range 2 = (235 -140) = 95
Interquartile Range (IQR)
The range restricted to values within the middle 50% of the distribution.
50%
IQR = Q3 – Q1
25% 25%
25%
25%
Upper half
Lower half
20 20 20 38 42 42 51 68
Median = Q2
(40)
Min
Q1 =
20+20
2
= 20
Max
Q3 =
42+51
2
= 46.5
•
Range = 68 – 20 = 48
•
IQR = 46.5 – 20 = 26.5
Standard Deviation & Variance
Standard Deviation (SD)
Variance (𝛔2)
The “average” deviation of all
values from the sample mean
Quantifies the
spread around the
mean
Σ(Data Value − Mean)2
Total # of observations
𝛔2 = SD2
SD =
Same units as the original data
❑ Coefficient of variation (CV) shows the extent of variability
CV =
𝑆𝐷
𝑚𝑒𝑎𝑛
Mean = 37.6
Standard Deviation (SD) = 17.2
Variance = 296
CV = 0.46
Calculation of the Standard
Deviation
x
ẋ (the mean)
x- ẋ
(x- ẋ)^2
101.8
103
-1.2
1.44
103.2
103
0.2
0.04
104.0
103
1.0
1.00
102.5
103
-0.5
0.25
103.5
103
0.5
0.25
Σ (x – ẋ)^2 = 2.98
Σ x = 515
SD =
Σ (x − ẋ)^2
𝑁
=
2.98
= 0.77
5
Give it Some Thought!
Does this study properly describe the dose given?
Measures of Distribution Shape
Does Distribution Shape Matter?
Skewness (asymmetry)
= uneven distribution of the data around
the mean
Normal (Gaussian) Distribution
Symmetrical,
bell-shaped
curve
50% values are
on the right
side
50% values are
on the left side
The skewness
and kurtosis
are zero
AUC = 1
#SD
Mean=Median=Mode
Skewed Distributions
Negative (left) skewness due to outliers
Mean, Median and Mode are
NOT equal
Reasons: small sample size or
due to outliers (extreme
values)
Skew refers to the direction
of the tail
Skewed distributions need to
be converted into
approximately normal for
further analysis
More high values
(mean < median < mode)
True positive (right) skewness
More low values
(mean > median > mode)
Median as preferable measure of central
tendency for skewed distributions
Give it Some Thought!
Does this study properly describe the dose given?
Measures of Risk in Clinical
Research
Epidemiology of Diabetes in the US
Incidence and Prevalence as Morbidity
Measures
❑ Morbidity is defined as any departure, subjective or objective, from a
state of physiological or psychological well-being.
❑ disease
❑ injury
❑ disability
❑ Measures of morbidity frequency
❑ Incidence
❑ Prevalence
Prevalence as Morbidity Measure
Prevalence
Measures # of existing cases (new and preexisting) at a
particular point in time
Point Prevalence
(Prevalence Rate)
# 𝐨𝐟 𝐞𝐱𝐢𝐬𝐭𝐢𝐧𝐠 𝐜𝐚𝐬𝐞𝐬 𝐚𝐭 𝐚 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐞𝐝 𝐩𝐨𝐢𝐧𝐭 𝐨𝐟 𝐭𝐢𝐦𝐞
= population
at the same specified point in time
Incidence as Morbidity Measure
Incidence
Measures the number of new cases of a disease during a
given period
Incidence proportion is a measure of the risk of disease or the
probability of developing the disease during the specified period.
Incidence Proportion =
(Risk)
# 𝐨𝐟 𝐍𝐄𝐖 𝐜𝐚𝐬𝐞𝐬 𝐝𝐮𝐫𝐢𝐧𝐠 𝐚 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐞𝐝 𝐩𝐨𝐢𝐧𝐭 𝐨𝐟 𝐭𝐢𝐦𝐞
population at start of the time interval
Global
Epidemiology
of COVID-19:
MORTALITY
Mortality Rate =
# 𝐨𝐟 𝐢𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐥𝐬 𝐝𝐢𝐞𝐝 𝐝𝐮𝐫𝐢𝐧𝐠 𝐚 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐞𝐝 𝐩𝐞𝐫𝐢𝐨𝐝 𝐨𝐟 𝐭𝐢𝐦𝐞
population at start of the time interval
Diagnostic Test and Their
Performance Measures
Real-World Performance of COVID-19 Rapid
Antigen Tests
https://asm.org/Articles/2021/December/Real-World-Performance-of-COVID-19-Rapid-Antigen-T
Test Performance Measures
GOAL: ↑True positives / ↓ False-positives
1. Sensitivity or true positive rate
Test ability to correctly identify individuals WITH disease
calculated as
proportion of individuals with the disease who are correctly identified
by the test
Sensitivity =
# 𝑜𝑓𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (+) 𝑡𝑒𝑠𝑡𝑠
# 𝐴𝐿𝐿 𝑝𝑎𝑡𝑒𝑖𝑛𝑒𝑡𝑠 𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒
Typical sensitivity ~ 80%
Test Performance Measures (cont’d)
2. Specificity or true negative rate
Test ability to correctly identify individuals WITHOUT disease
calculated as
proportion of individuals without the disease who are correctly identi
fied by the test
SPECIFICITY =
# 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑖𝑡𝑖𝑣𝑒 (−) 𝑡𝑒𝑠𝑡𝑠
# 𝐴𝐿𝐿 𝑝𝑎𝑡𝑒𝑖𝑛𝑒𝑡𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒
Typical specificity ~ 90%
Sensitivity and Specificity of tests depend on the prevalence of the disease.
Test Performance Measures (cont’d)
3. Accuracy
proportion of all tests that are correct classification
ACCURACY =
# 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡𝑠 + # 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑖𝑡𝑖𝑣𝑒 (−) 𝑡𝑒𝑠𝑡𝑠
# 𝐴𝐿𝐿 𝑝𝑎𝑡𝑒𝑖𝑛𝑒𝑡𝑠 𝑡𝑒𝑠𝑡𝑒𝑑
Test Performance Measures (cont’d)
4. Predictive Value
Shows how likely it is that the tested individual does/does not have
the disease
Positive predictive value
Probability that a positively tested patient has the disease
PPV =
# 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (+) 𝑡𝑒𝑠𝑡𝑠
# 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡+# 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡𝑠
Negative predictive value
Probability that a negatively tested patient dose not have the disease
NPV =
# 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑖𝑡𝑖𝑣𝑒 (−) 𝑡𝑒𝑠𝑡𝑠
# 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡𝑠+𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡𝑠
Visual Representation of Data
Tables, Plots, Graphs, and Charts
Examples of Frequency Table
• Lack of labels and
description of
units
• Clear organization
+ appropriate
description
Example Box and Whisker Plot
Used to present the
range/spread of data
❑ “Box” part = IRQ
❑ Line inside the box
represents the median
❑ The ‘whiskers” mark the
min and max values
Example Bar
Chart
Visualizes the ordinal
data (categorical &
discrete) for a
question that used a
Likert-type scale
How often does your pharmacist offer to provide information about the
prescriptions you fill? (N = 100)
Example Histogram
Mammography screening data
Visualizes the
continuous data by
venue over time
Commonly used to
display the frequency
distribution of a single
interval or ratio varia
ble.
Example Pie
Chart
Visualizes the
proportions or relative
quantities of values
Criteria to consider:
–
Useful for small 3 of
categories
–
Must be clearly
labeled and colored
(or legend must be
used)
–
Logical dividing of
data
Recap
Data Variables
Four Data Measurement Levels
Descriptive Statistics
• Measures of Central Tendency (Mean, Median, Mode)
• Measures of Dispersion (Range/IRQ, SD and variance)
• Measures of Distribution shape (Normal vs Skewed)
Measures of Risk
• Morbidity & Mortality
Performance Measures of Diagnostic tests
• Sensitivity
• Specificity
• Accuracy
• Predictive values
Visual Data Representation
Students Names: ___________________________
_______________________
____________________________
_________________________
____________________________
Date: _________________
Total Points (total: 84 points): ___________________
Complete the SOAP note PRIOR to the laboratory session and be able to present every category
in the exact order listed in the rubric below. Every group member must present information to get
credit. Your group has a maximum of 15 minutes to present the information. Review the rubric
and the entire group must practice together in advance of the session. You can write your
answers in the sections listed below. Your grade will be based on your group’s presentation and
not the written material (the assignment does not need to be submitted).
Criteria
Full Credit
(2 points)
Subjective Section
·
·
·
·
·
Chief complaint
– Find out results of blood work
HPI
– 65 female
– caucasian
PMH
– Type 2 DM
– Stroke
Family history
– Mother and father 85 YO
– Both still alive
– Both DM, HTN, and hyperlipidemia
Surgical/social history
– Walks 150mins/week (with resistance
exercises 2 times/week) and eats fried foods
couple times a month
Objective Section
·
·
Allergies
– NKDA
Immunization history
– Need pneumonia vaccine
– All others up to date
Half
Credit
(1 point)
No Credit
(0 points)
·
·
·
·
Medication list
– Metformin 1000mg twice a day
– ASA 81 mg once a day
– Lisinopril 10 mg once a day
ROS/physical exam
Vital signs
– BP: 134/96
– Pulse: 80 bpm
– Height: 5’3’’
– Weight: 160 lbs
– Temp: 98.6 degrees F
– BMI 28.3
Labs/diagnostic test results
– Fasting blood glucose: 150 mg/dL
– HDL: 43 mg/dL
– LDL: 150 mg/dL
– TG: 200 mg/dL
– Total cholesterol: 243 mg/dL
– ALT: 16
– AST: 18
– Urine albumin excretion: 35 mcg/mg
creatinine (same result for the 2 out of 3 in a 3-6
month period)
– Na: 140
– K: 4.0
– Mg: 2.0
– Ca: 9.5
– Albumin: 4.0
– Scr: 1.1 (normal range for female)
– BUN: 12
– HCO3: 20
– HbA1c: 7.9%
– TSH: WNL
– GFR: = 65 ml/min/1.73m2
Assessment and Problem List_Nellie A.
· Correctly identifies primary problem
Type 2 diabetes mellitus
· Characterizes (controlled, uncontrolled, etc.) and
provides appropriate action (requiring therapy
initiation, therapy continuation, etc.) for primary
problem
AM has uncontrolled type 2 DM due to both her
parents having DM; I would suggest GLP-1
treatment since the patient is already on Metformin.
Since the pt would prefer a once weekly medication;
I would suggest Ozempic (Semaglutide).
· Correctly identifies secondary problems, if
applicable
AM also has HTN and Stroke
· Characterizes (controlled, uncontrolled, etc.)
and provides appropriate action (requiring therapy
initiation, therapy continuation, etc.) for secondary
problems
AM has uncontrolled HTN which requires her to
start therapy (also noticed she is only on Lisinopril )
· Correctly identifies a second secondary
problems, if applicable
AM has Hyperlipidemia (perhaps she can exercise
more, adding more fiber into her diet/eating
healthier )
· Characterizes (controlled, uncontrolled, etc.) and
provides appropriate action (requiring therapy
initiation, therapy continuation, etc.) for the
secondary problem
Due to her having hyperlipidemia and stroke hence, i noticed she is not on other medications such
as statins; her doctor should initiate this therapy
immediately.
Plan for Primary Problem_Nellie A.
· Includes three SMART goals of therapy (need to
include time frame for every goal)
Since her A1C is out of control, her A1C needs to
be lower to under 7% -measure every 3 months until
therapy changes or at goal.
Fasting blood glucose needs to be lower to under
126 mg/dL
Frequent eye examination instead
Monitor blood pressure