Effect Sizes

University of San Francisco, MSMI-603

Matt Meister

2022-12-31

Thus far

So far, we have learned about:

  • Means
  • Variance
  • Statistical significance

(among other things)

Thus far

We’ve learned to say things like:

  • The difference in clicking between group A and B is 2%
    • And this is significant because p < .001
  • With every $10,000 increase in income, customers spend $25 more in our stores
    • And this slope is significantly different from 0 because p = .02
  • Customers who are 25-34 are more interested in our product than those who are 35-44
    • \(M_{25-34}\) = 4.85/6
    • \(M_{25-34}\) = 4.32/6
    • This might be due to chance, as p = .09

Thus far

Have we learned to say things like:

  • The difference in clicking between group A and B is 2%
    • This is a big difference?
  • With every $10,000 increase in income, customers spend $25 more in our stores
    • This is a big difference?
  • Customers who are 25-34 are more interested in our product than those who are 35-44
    • \(M_{25-34}\) = 4.85/6
    • \(M_{25-34}\) = 4.32/6
    • This is a big difference?

Thus far

No!

For the clearest example, let’s focus on the third:

  • The one that uses a 0-6 scale
  • What is a difference of .53 on a 0-6 scale?
    • Is that big?
    • Does it matter in this context?
    • To answer this, we are going to learn about effect sizes

Effect sizes

Effect sizes put our results into a standard format.

  • They do not tell us if our result is statistically significant or not.
    • We use them after that
  • They tell us about how big our results are
    • Again, in a standardized format

Effect sizes

Effect sizes put our results into a standard format.

There are two kinds of effect sizes, broadly:

  • Standardized differences
    • These give us a standardized way to say whether the difference between groups is big
  • Variance explained
    • These tell us whether some variable explains a lot or a little of our DV

Effect sizes

Effect sizes put our results into a standard format.

We will learn two today

  • Standardized differences
    • Cohen’s d
      • \(\frac{(M_A - M_B)}{SD_{AB}}\)
  • Variance explained
    • \(R^2\)
      • \(1 - \frac{SSR}{n - p - 1} \div \frac{SST}{n - 1}\)
    • These tell us whether some variable explains a lot or a little of our DV

Cohen’s d

\(\frac{(M_A - M_B)}{SD_{AB}}\)

  • \(M_A\): Mean of group A
  • \(M_B\): Mean of group B
  • \(SD_{AB}\): Pooled standard deviation
    • Averaging the standard deviation is fine

This tells us how large the difference between groups is in terms of total variance in the data.

Cohen’s d - Examples

Heights of men and women in the US:

Are men and women different heights on average?

  • \(M_{Male}\) = 69 inches
  • \(M_{Female}\) = 64 inches
  • \(SD_{Height}\) = 2.75 inches
  • Cohen’s d?
    • 1.81

Cohen’s d - Examples

Heights of men and women in the US:

  • Cohen’s d = 1.81

Cohen’s d - Examples

Heights of men and women in the US:

  • Cohen’s d = 1.81

Cohen’s d - Examples

Are people are more aggressive toward individuals who have provoked them?

  • \(M_{Provoked}\) = 8.232/10
  • \(M_{Unprovoked}\) = 4.4/10
  • \(SD_{Aggression}\) = 3.22
  • Cohen’s d?
    • 1.19

Cohen’s d - Examples

Are people are more aggressive toward individuals who have provoked them?

  • Cohen’s d = 1.19

Cohen’s d - Examples

Are people are more aggressive toward individuals who have provoked them?

  • Cohen’s d = 1.19

Cohen’s d - Examples

Are people who are seen as more credible are also more persuasive?

  • \(M_{Credible}\) = 5.42/10
  • \(M_{Not}\) = 4.76/10
  • \(SD_{Persuasion}\) = 3.29
  • Cohen’s d?
    • .20

Cohen’s d - Examples

Are people who are seen as more credible are also more persuasive?

  • Cohen’s d = .20

Cohen’s d - Examples

Are people who are seen as more credible are also more persuasive?

  • Cohen’s d = .20

Contextualize your effect sizes

Sometimes you can look to other research

  • Or benchmarks like the things above

Sometimes you cannot

  • One good comparison is covariates

Contextualizing with covariates

Hypothesis: Mode of ordering (smartphone vs. desktop) will influence people’s portion choices

\(Portion Size = \beta_{Device}xDevice + \beta_{Hunger}Hunger + \beta_{Dieting}Dieting\)

Contextualizing with covariates

Hypothesis: Mode of ordering (smartphone vs. desktop) will influence people’s portion choices

\(Portion Size = \beta_{Device}xDevice + \beta_{Hunger}Hunger + \beta_{Dieting}Dieting\)

…And use common sense…

R-Squared

\(1 - \frac{SSR}{n - p - 1} \div \frac{SST}{n - 1}\)

This tells you:

  • For an entire model, how much of all of the variance you are explaining
    • We get this result from lm()
  • For each individual effect, how much of all of the variance it explains
    • We can get this result from anova()

R-Squared

From anova()

customerData <- read.csv('customerData.csv')

m_1 <- lm( data = customerData, sat.service ~ 1) # Just the mean
m_2 <- lm( data = customerData, sat.service ~ email) # Effect of email
m_3 <- lm( data = customerData, sat.service ~ email + income) # Effect of email and income

anova(m_1, m_2, m_3)
Analysis of Variance Table

Model 1: sat.service ~ 1
Model 2: sat.service ~ email
Model 3: sat.service ~ email + income
  Res.Df     RSS Df Sum of Sq        F  Pr(>F)    
1    590 1187.70                                  
2    589 1179.40  1      8.30   5.9544 0.01497 *  
3    588  819.51  1    359.89 258.2261 < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-Squared

From anova()

Analysis of Variance Table

Model 1: sat.service ~ 1
Model 2: sat.service ~ email
Model 3: sat.service ~ email + income
  Res.Df     RSS Df Sum of Sq        F  Pr(>F)    
1    590 1187.70                                  
2    589 1179.40  1      8.30   5.9544 0.01497 *  
3    588  819.51  1    359.89 258.2261 < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • \(R^2_{email}\)?
    • \(1 - \frac{1592}{658 - 1 - 1} \div \frac{1606}{658 - 1}\)
    • .009
  • \(R^2_{income}\)?
    • \(1 - \frac{945}{658 - 2 - 1} \div \frac{1606}{658 - 1}\)
    • .407

R-Squared

From lm()

summary(m_2)

Call:
lm(formula = sat.service ~ email, data = customerData)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9813 -0.7347  0.0187  1.0187  4.0187 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.98131    0.09673  41.159   <2e-16 ***
emailyes    -0.24656    0.12111  -2.036   0.0422 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.415 on 589 degrees of freedom
  (409 observations deleted due to missingness)
Multiple R-squared:  0.006987,  Adjusted R-squared:  0.005301 
F-statistic: 4.144 on 1 and 589 DF,  p-value: 0.04222

Effect Size Conclusion

  • There are lots of effect size measures out there
  • They are useful, in that it’s nice to contextualize our effects
  • They come in two forms:
    • Standardized differences
      • These give us a standardized way to say whether the difference between groups is big
    • Variance explained
      • These tell us whether some variable explains a lot or a little of our DV