The Stroop Effect, named after the original researcher John Ridley Stroop, is a psychological phenomenon that demonstrates the effect of interference on reaction time. The basis of the experiment consists of participants reading aloud the color of each word from a list of colored words as fast and correctly as possible, with their final time being recorded. There are two trials, each with a unique condition: the first where the words are congruent and the second where the words are incongruent. Congruent words are words where the content of the word matches the color of the word: for example, RED, GREEN, BLUE. Incongruent words are words where the content of the word does not match the color of the word: for example RED, PURPLE, PINK. The amount of time it takes to complete each trial is measured and recorded for each participant.
In this project, the Stroop Effect will be explored again, but with a focus on the statistical context. The experiment will be run using a provided dataset and my own recorded times. The independent and dependent variables of this study will be identified and a hypothesis testing will also be conducted. Ultimately, the sample dataset will undergo a statistical test, and a conclusion will be reached based on the results.
The independent variable for this experiment is the conditions of the words. They change from congruent to incongruent between the two trials.
The dependent variable is the amount of time it takes a participant to complete each trial, measured in seconds.
The null hypothesis for this experiment is that there will not be a statistical difference between the congruent and incongruent trial times.
The alternative hypothesis is that there will be a statistical difference between the congruent and incongruent trial times.
These hypotheses can be pictured mathematically below, with \(H_0\) being the null hypothesis, \(H_a\) being the alternative hypothesis, \(\mu_c\), being the congruent times, and \(\mu_i\) being the incongruent times:
\[ H_0 : \mu_c = \mu_i \\ H_a : \mu_c ≠ \mu_i \]
The statistical test being used for this experiment is a two-tailed paired sample t-test. The confidence level is set to 0.05. This type of t-test is chosen because the population parameters are unknown and the sample data are related to each other.. It is two-tailed because we are unsure whether the times will rise or fall between word conditions.
In order to perform a paired sample t-test, there are a couple requirements that must be met:
To use the paired t-test on this dataset, it is assumed that the distribution of the difference between paired values is normal and there are no outliers. It is also assumed that the sample was taken randomly.
The starting dataset consists of 24 paired observations to start with. This sample of records was retrieved from this site, provided by Udacity.
congruent <- c(12.079, 16.791, 9.564, 8.63, 14.669, 12.238, 14.692, 8.987,
9.401, 14.48, 22.328, 15.298, 15.073, 16.929, 18.2, 12.13,
18.495, 10.639, 11.344, 12.369, 12.944, 14.233, 19.71, 16.004)
incongruent <- c(19.278, 18.741, 21.214, 15.687, 22.803, 20.878, 24.572, 17.394,
20.762, 26.282, 24.524, 18.644, 17.51, 20.33, 35.255, 22.158,
25.139, 20.429, 17.425, 34.288, 23.894, 17.96, 22.058, 21.157)
stroop <- data.frame(congruent, incongruent)
With the bulk of the sample loaded into R, I now need to run the experiment myself and add my times to the dataset.
If you are interested, I encourage that you run the experiment yourself as well! It is hosted by the University of Washington, free to use, and only takes a couple of minutes. It definitely helped me better understand the design of the experiment (plus, I found it quite fun).
After running the test, I scored 10.278 secs on my first (congruent) run and 18.169 secs on my second (incongruent) run. I can now add my own results to the dataset:
stroop <- rbind(stroop, c(10.278, 18.169))
After the additions of my own data into the dataframe, the total sample size is 25 paired samples.
With all of the observations saved into a dataframe, descriptive statistics can be computed from the dataset using R. Each treatment group will be looked at seperately.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.63 11.34 14.23 13.90 16.00 22.33
## [1] 3.565194
The mean and median for the congruent trial data are 14.23 and 13.90 seconds respectively. The median and mean are relatively close to each other, which may suggest that the data is possibly normal. The standard devation is ~3.57.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.69 18.64 20.88 21.86 23.89 35.26
## [1] 4.758664
Compared to the congruent trial data, the incongruent trial data has a much higher mean (21.86) and median (20.88). The standard deviation (~4.76) is also higher, meaning that there is more variation in the incongruent trial data compared to the congruent trial data. The incongruent trial data also has a very high maximum value, especially compared to the congruent trial’s maximum value.
Although descriptive statistics provide a good summary of the data, they do not provide insight into the data’s shape or distribution. In order to learn more about the two trials, they need to be visualized.
Let’s look at the congruent trial distribution first, visualized as a histogram:
library(ggplot2)
library(gridExtra)
ggplot(data = stroop,
aes(x = congruent)) +
geom_histogram(binwidth = 2,
fill = "#538fef",
color = "black") +
scale_x_continuous(breaks = c(seq(8, 36, 2))) +
labs(x = "Time (s)",
y = "Frequency",
title = "Congruent Trial")
This distribution has a semi-normal shape with a majority of the values clustered near the median.
How does this plot compare to the incongruent trial?
ggplot(data = stroop,
aes(x = incongruent)) +
geom_histogram(binwidth = 2,
fill = "#538fef",
color = "black") +
scale_x_continuous(breaks = c(seq(8, 36, 2))) +
labs(x = "Time (s)",
y = "Frequency",
title = "Incongruent Trial")
The incongruent trial data is right skewed, with a large gap between 26 and 34 seconds. There are two values greater than 34. ased on their distance form the rest of the data, they could potentially be outliers.
Because of the different x-axis limits between the two plots, it is difficult to compare the two trials side by side. By plotting both visuals together using the same x limits, it should be much easier to compare them. The addition of a vertical line for the mean and a density curve should also improve comparison between the two trials.
c_vis <- ggplot(data = stroop,
aes(x = congruent)) +
geom_histogram(binwidth = 2,
fill = "#538fef",
color = "black") +
geom_density(aes(y = ..count.. * 2)) + ## scaled to better shape plot
geom_vline(color = "#ed3838",
xintercept = mean(congruent),
size = 1.25,
linetype = 2) +
scale_x_continuous(limits = c(4, 40),
breaks = c(seq(4, 40, 2))) +
labs(x = "Time (s)",
y = "Frequency",
title = "Congruent Trial")
i_vis <- ggplot(data=stroop,
aes(x = incongruent)) +
geom_histogram(binwidth = 2,
fill = "#538fef",
color = "black") +
geom_density(aes(y = ..count.. * 2)) + ##scaled here as well
geom_vline(color = "#ed3838",
xintercept = mean(incongruent),
size = 1.25,
linetype = 2) +
scale_x_continuous(limits = c(4, 40),
breaks = c(seq(4, 40, 2))) +
labs(x = "Time (s)",
y = "Frequency",
title = "Incongruent Trial")
grid.arrange(c_vis, i_vis)
It is clear that the incongruent trial had on average longer completion times than the congruent trial. But is this difference significant, or is the difference due to chance? Only a statistical test can answer that.
The t-test will be able to determine whether the differences between the two trial groups are statistically significant.
For this project, the t-test will be performed both manually and through the use of the R programming language.
The formula for the dependent t-test is as follows:
\[ t = \frac{\overline{x}_{diff} - 0}{s_\overline{x}} \]
where:
\[ s_\overline{x} = \frac{s_{diff}}{\sqrt{n}} \]
where:
\(\overline{x}_{diff}\) = the difference between \(\overline{x}_c\) and \(\overline{x}_i\).
\(s_\overline{x}\) = the standard error of the mean.
\(s_{diff}\) = the standard deviation of the differences.
\(n\) = the sample size.
Lets start by solving for \(\overline{x}_{diff}\). In order to solve \(\overline{x}_{diff}\), a seperate column for the difference between the data points in the congruent and incongruent trials is created.
stroop$difference <- stroop$congruent - stroop$incongruent
The mean difference is then calculated and stored in xdiff
:
xdiff <- mean(stroop$difference)
print(xdiff)
## [1] -7.96184
Now to find \(s_\overline{x}\). To do that, \(s_{diff}\) is calculated. n
is also stored for later calculations:
sdiff <- sd(stroop$difference)
n <- 25
print(sdiff)
## [1] 4.762421
With 4.76 as \(s_{diff}\) and 25 as \(n\), \(s_\overline{x}\) can now be calculated.
se <- sdiff / sqrt(n)
print(se)
## [1] 0.9524842
All the pieces are in place. It is time to calculate the t-value:
t_score <- xdiff / se
print(t_score)
## [1] -8.359026
Using the t-table, the t-critical value is ±2.064, based upon a two-tailed test, 24 degrees of freedom, and a 0.05 confidence level.
Because the t-value is beyond the range of the t-critical value, the differences between the two groups are significant.
In order to find the probability of gettting this t-value due to chance, the p-value can be calculated.
Mathematically, the p-value is not very easy to calculate by hand. Instead, the p-value can be calculated programmatically using the built-in R function pt
:
format(2*pt(t_score, 24), digits = 1, scientific = FALSE)
## [1] "0.00000001"
In the previous section, t was mathematically calculated by hand. This was done intentionally to show the math behind the t-score.
With R, a t-test can be run very easily using t.test
:
t.test(stroop$congruent, y = stroop$incongruent, paired = TRUE)
##
## Paired t-test
##
## data: stroop$congruent and stroop$incongruent
## t = -8.359, df = 24, p-value = 1.438e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9.927671 -5.996009
## sample estimates:
## mean of the differences
## -7.96184
The results from the R t-test are identical to the results obtained manually.
Because the t-value (-8.36) is well below the t-critical value (-2.064), we reject the null hypothesis that the two groups are the same. The congruent and incongruent trials are significantly different at the 0.05 confidence level, with a p-value < .05.
The incongruent words took much longer to read on average than the congruent words. One theory is that the incongruent words slow the processing time of the brain as to tries to say the correct color. Because the incongruent words give conflicting input, it hampers the brains ability to choose the color quickly. The congruent words are encoded correctly with their corresponding color, allowing to brain to quickly identify the correct color.
The Stroop Effect is an interesting phenomenon and makes since intuitively. But how could these finding be used practically?
The results of this study and the effect of encoding on the brain can be applied to good visualization techniques. Proper encoding of variables in visualizations allows viewers to quickly digest and understand the data. Poorly encoded visualizations have the opposite effect, and sometimes go beyond impeding understanding to causing complete misinterpretations of the data.
Taking this further, you could run a similiar experiment to the Stroop Effect, but use visualizations instead of colored words. Each participant is shown a series of visualizations with consistent encoding and asked a question regarding each plot. The time required to answer all the questions correctly is recorded. The trial is run again using poorly encoded visualizations and the time is recorded again. You could then compare to the two groups to see if the questions for the properly encoded visualizations were answered more quickly than those for the poorly encoded visualizations.