Introduction

The Stroop Effect, named after the original researcher John Ridley Stroop, is a psychological phenomenon that demonstrates the effect of interference on reaction time. The basis of the experiment consists of participants reading aloud the color of each word from a list of colored words as fast and correctly as possible, with their final time being recorded. There are two trials, each with a unique condition: the first where the words are congruent and the second where the words are incongruent. Congruent words are words where the content of the word matches the color of the word: for example, RED, GREEN, BLUE. Incongruent words are words where the content of the word does not match the color of the word: for example RED, PURPLE, PINK. The amount of time it takes to complete each trial is measured and recorded for each participant.

In this project, the Stroop Effect will be explored again, but with a focus on the statistical context. The experiment will be run using a provided dataset and my own recorded times. The independent and dependent variables of this study will be identified and a hypothesis testing will also be conducted. Ultimately, the sample dataset will undergo a statistical test, and a conclusion will be reached based on the results.

Independent and Dependent variables

The independent variable for this experiment is the conditions of the words. They change from congruent to incongruent between the two trials.

The dependent variable is the amount of time it takes a participant to complete each trial, measured in seconds.

Null and Alternative Hypothesis

The null hypothesis for this experiment is that there will not be a statistical difference between the congruent and incongruent trial times.

The alternative hypothesis is that there will be a statistical difference between the congruent and incongruent trial times.

These hypotheses can be pictured mathematically below, with \(H_0\) being the null hypothesis, \(H_a\) being the alternative hypothesis, \(\mu_c\), being the congruent times, and \(\mu_i\) being the incongruent times:

\[ H_0 : \mu_c = \mu_i \\ H_a : \mu_c ≠ \mu_i \]

Statistical Test: Paired t-test

The statistical test being used for this experiment is a two-tailed paired sample t-test. The confidence level is set to 0.05. This type of t-test is chosen because the population parameters are unknown and the sample data are related to each other.. It is two-tailed because we are unsure whether the times will rise or fall between word conditions.

In order to perform a paired sample t-test, there are a couple requirements that must be met:

  1. Dependent variable must be continuous.
  2. Samples/groups must be related (Ex: Person 1 takes both congruent and incongruent test)
  3. Random sample from population
  4. Approximately normal distribution of the difference between paired values
  5. No outliers in difference between paired groups

To use the paired t-test on this dataset, it is assumed that the distribution of the difference between paired values is normal and there are no outliers. It is also assumed that the sample was taken randomly.

Loading the Dataset into R and Running the Experiment

The starting dataset consists of 24 paired observations to start with. This sample of records was retrieved from this site, provided by Udacity.

congruent <- c(12.079, 16.791, 9.564, 8.63, 14.669, 12.238, 14.692, 8.987,
               9.401, 14.48, 22.328, 15.298, 15.073, 16.929, 18.2, 12.13,
               18.495, 10.639, 11.344, 12.369, 12.944, 14.233, 19.71, 16.004)

incongruent <- c(19.278, 18.741, 21.214, 15.687, 22.803, 20.878, 24.572, 17.394,
                 20.762, 26.282, 24.524, 18.644, 17.51, 20.33, 35.255, 22.158,
                 25.139, 20.429, 17.425, 34.288, 23.894, 17.96, 22.058, 21.157)

stroop <- data.frame(congruent, incongruent)

With the bulk of the sample loaded into R, I now need to run the experiment myself and add my times to the dataset.

If you are interested, I encourage that you run the experiment yourself as well! It is hosted by the University of Washington, free to use, and only takes a couple of minutes. It definitely helped me better understand the design of the experiment (plus, I found it quite fun).

After running the test, I scored 10.278 secs on my first (congruent) run and 18.169 secs on my second (incongruent) run. I can now add my own results to the dataset:

stroop <- rbind(stroop, c(10.278, 18.169))

After the additions of my own data into the dataframe, the total sample size is 25 paired samples.

Descriptive Statistics

With all of the observations saved into a dataframe, descriptive statistics can be computed from the dataset using R. Each treatment group will be looked at seperately.

Congruent Trial

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.63   11.34   14.23   13.90   16.00   22.33
## [1] 3.565194

The mean and median for the congruent trial data are 14.23 and 13.90 seconds respectively. The median and mean are relatively close to each other, which may suggest that the data is possibly normal. The standard devation is ~3.57.

Incongruent Trial

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.69   18.64   20.88   21.86   23.89   35.26
## [1] 4.758664

Compared to the congruent trial data, the incongruent trial data has a much higher mean (21.86) and median (20.88). The standard deviation (~4.76) is also higher, meaning that there is more variation in the incongruent trial data compared to the congruent trial data. The incongruent trial data also has a very high maximum value, especially compared to the congruent trial’s maximum value.

Visualizing the Dataset

Although descriptive statistics provide a good summary of the data, they do not provide insight into the data’s shape or distribution. In order to learn more about the two trials, they need to be visualized.

Let’s look at the congruent trial distribution first, visualized as a histogram:

library(ggplot2)
library(gridExtra)

ggplot(data = stroop, 
       aes(x = congruent)) +
  geom_histogram(binwidth = 2, 
                 fill = "#538fef",
                 color = "black") +
  scale_x_continuous(breaks = c(seq(8, 36, 2))) +
  labs(x = "Time (s)", 
       y = "Frequency", 
       title = "Congruent Trial")

This distribution has a semi-normal shape with a majority of the values clustered near the median.

How does this plot compare to the incongruent trial?

ggplot(data = stroop, 
       aes(x = incongruent)) +
  geom_histogram(binwidth = 2, 
                 fill = "#538fef",
                 color = "black") +
  scale_x_continuous(breaks = c(seq(8, 36, 2))) +
  labs(x = "Time (s)", 
       y = "Frequency", 
       title = "Incongruent Trial")

The incongruent trial data is right skewed, with a large gap between 26 and 34 seconds. There are two values greater than 34. ased on their distance form the rest of the data, they could potentially be outliers.


Because of the different x-axis limits between the two plots, it is difficult to compare the two trials side by side. By plotting both visuals together using the same x limits, it should be much easier to compare them. The addition of a vertical line for the mean and a density curve should also improve comparison between the two trials.

c_vis <- ggplot(data = stroop, 
                aes(x = congruent)) +
           geom_histogram(binwidth = 2, 
                          fill = "#538fef",
                          color = "black") + 
           geom_density(aes(y = ..count.. * 2)) + ## scaled to better shape plot
           geom_vline(color = "#ed3838", 
                      xintercept = mean(congruent),
                      size = 1.25,
                      linetype = 2) +
           scale_x_continuous(limits = c(4, 40), 
                              breaks = c(seq(4, 40, 2))) +
           labs(x = "Time (s)", 
                y = "Frequency", 
                title = "Congruent Trial")

i_vis <- ggplot(data=stroop, 
                 aes(x = incongruent)) +
           geom_histogram(binwidth = 2, 
                           fill = "#538fef",
                           color = "black") +
           geom_density(aes(y = ..count.. * 2)) + ##scaled here as well
           geom_vline(color = "#ed3838", 
                       xintercept = mean(incongruent),
                       size = 1.25,
                       linetype = 2) +
           scale_x_continuous(limits = c(4, 40), 
                               breaks = c(seq(4, 40, 2))) +
            labs(x = "Time (s)", 
                 y = "Frequency", 
                 title = "Incongruent Trial")

grid.arrange(c_vis, i_vis)

It is clear that the incongruent trial had on average longer completion times than the congruent trial. But is this difference significant, or is the difference due to chance? Only a statistical test can answer that.

t-test and Conclusion

The t-test will be able to determine whether the differences between the two trial groups are statistically significant.

For this project, the t-test will be performed both manually and through the use of the R programming language.

Calculating the t-score Manually

The formula for the dependent t-test is as follows:

\[ t = \frac{\overline{x}_{diff} - 0}{s_\overline{x}} \]

where:

\[ s_\overline{x} = \frac{s_{diff}}{\sqrt{n}} \]

where:

      \(\overline{x}_{diff}\) = the difference between \(\overline{x}_c\) and \(\overline{x}_i\).
      \(s_\overline{x}\) = the standard error of the mean.
      \(s_{diff}\) = the standard deviation of the differences.
      \(n\) = the sample size.

Lets start by solving for \(\overline{x}_{diff}\). In order to solve \(\overline{x}_{diff}\), a seperate column for the difference between the data points in the congruent and incongruent trials is created.

stroop$difference <- stroop$congruent - stroop$incongruent

The mean difference is then calculated and stored in xdiff:

xdiff <- mean(stroop$difference)
print(xdiff)
## [1] -7.96184

Now to find \(s_\overline{x}\). To do that, \(s_{diff}\) is calculated. n is also stored for later calculations:

sdiff <- sd(stroop$difference)
n <- 25

print(sdiff)
## [1] 4.762421

With 4.76 as \(s_{diff}\) and 25 as \(n\), \(s_\overline{x}\) can now be calculated.

se <- sdiff / sqrt(n)
print(se)
## [1] 0.9524842

All the pieces are in place. It is time to calculate the t-value:

t_score <- xdiff / se
print(t_score)
## [1] -8.359026

Using the t-table, the t-critical value is ±2.064, based upon a two-tailed test, 24 degrees of freedom, and a 0.05 confidence level.

Because the t-value is beyond the range of the t-critical value, the differences between the two groups are significant.

In order to find the probability of gettting this t-value due to chance, the p-value can be calculated.

Mathematically, the p-value is not very easy to calculate by hand. Instead, the p-value can be calculated programmatically using the built-in R function pt:

format(2*pt(t_score, 24), digits = 1, scientific = FALSE)
## [1] "0.00000001"

Calculating t-score using R

In the previous section, t was mathematically calculated by hand. This was done intentionally to show the math behind the t-score.

With R, a t-test can be run very easily using t.test:

t.test(stroop$congruent, y = stroop$incongruent, paired = TRUE)
## 
##  Paired t-test
## 
## data:  stroop$congruent and stroop$incongruent
## t = -8.359, df = 24, p-value = 1.438e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.927671 -5.996009
## sample estimates:
## mean of the differences 
##                -7.96184

The results from the R t-test are identical to the results obtained manually.

Conclusion

Because the t-value (-8.36) is well below the t-critical value (-2.064), we reject the null hypothesis that the two groups are the same. The congruent and incongruent trials are significantly different at the 0.05 confidence level, with a p-value < .05.

The incongruent words took much longer to read on average than the congruent words. One theory is that the incongruent words slow the processing time of the brain as to tries to say the correct color. Because the incongruent words give conflicting input, it hampers the brains ability to choose the color quickly. The congruent words are encoded correctly with their corresponding color, allowing to brain to quickly identify the correct color.

Reflection

The Stroop Effect is an interesting phenomenon and makes since intuitively. But how could these finding be used practically?

The results of this study and the effect of encoding on the brain can be applied to good visualization techniques. Proper encoding of variables in visualizations allows viewers to quickly digest and understand the data. Poorly encoded visualizations have the opposite effect, and sometimes go beyond impeding understanding to causing complete misinterpretations of the data.

Taking this further, you could run a similiar experiment to the Stroop Effect, but use visualizations instead of colored words. Each participant is shown a series of visualizations with consistent encoding and asked a question regarding each plot. The time required to answer all the questions correctly is recorded. The trial is run again using poorly encoded visualizations and the time is recorded again. You could then compare to the two groups to see if the questions for the properly encoded visualizations were answered more quickly than those for the poorly encoded visualizations.