Saturday, December 28, 2013

Are women better than men at multitasking? A myth that is hard to dispel....

The movie Kramer vs. Kramer offers an interesting
perspective on gender differences in multitasking
Almost every time I give a talk about multitasking, people ask me about gender differences. Even in my ERC grant interview, in which it was clear that gender differences were not the focus of study, I was asked the question.
Interestingly enough, this is not a question that is easy to answer. There are many references in popular culture and the popular press, but hardly any in scientific papers. This is somewhat mysterious: why haven't scientists ever investigated this?
The most likely answer is that they have, but did not find anything. However, a null result is not easy to interpret (it certainly isn't proof), and even harder to publish. Sometimes it is, for example Thomas Buser from the University of Amsterdam (http://link.springer.com/article/10.1007/s10683-012-9318-8), but this is hardly ever picked up.

Final evidence for gender differences?

Sometimes, however, there are some claims that evidence for gender differences have finally been found. Keith Laws from the University of Hertfordshire was already mentioned in the popular press in 2010 (BBC News - Is multi-tasking a myth?), but only recently a paper was published with the actual evidence: "Are women better than men at multi-tasking?"(http://www.biomedcentral.com/2050-7283/1/18). This gave quite a stir (again) in the popular press (BBC News - Women 'better at multitasking' than men, study findsWomen Really Are Better At Multitasking...In Some Cases - Big ThinkWomen better at multitasking, study says | National News - KSBW ..., etc.). But what is the actual evidence?
Close scrutiny of the article reveals several flaws both in methodology and statistics. Let me go over these in a bit more detail. Experiment 1 uses a so-called task-switching paradigm. In task switching, subjects have to perform different tasks on the same stimuli. In particular, the stimuli consisted of a shape (diamond or square) with a filling (two or three dots). Subjects had to either do a single-task block, in which they always responded to either the shape or the filling, (so AAAAAAA or BBBBBBB) or a switching block, in which the they had to do one of the tasks twice, followed by the other twice, etc. (AABBAABBAABB) There is a huge body of literature on this phenomenon, consistently showing that switching is harder than single tasking, and a switch trial (going from A to B), is slower than a repeat trial (going from A to A or from B to B in a switching block). What the paper claims to have found is a significant gender difference in the so-called mixing costs, that is the  difference in average reaction times between single task blocks vs. switching blocks.

Problems in methodology and statistics

There are two problems with this experiment. The first is whether it tests actual multitasking ability. Task switching is widely studied in experimental psychology, but although it tests a skill that is in the general family of executive control abilities that multitasking is also considered to be part of, it is not generally considered to require multitasking skills as such. Why not pick a "typical" multitasking task?
The second problem is statistical. The experiment explores many different contrasts: repeat vs. switch trials, congruent vs. incongruent trials, reaction times and error rates. There are no gender differences in all but one: the mixing costs. And even that comparison is dubious, because the actual outcome of the interaction test (the so-called Anova) is not reported. Instead, the difference is recalculated and fed into a t-test with a significance of 3%. This means that among all the possible gender differences, only one of them was found to be significant after a dubious transformation at a measly 3%, even though the experiment used 240 (!) subjects. Officially, the 3% should have been corrected for multiple comparisons by multiplying it by the number of comparisons, rendering it insignificant.

A better follow-up experiment?

Sometimes a somewhat sketchy first experiment can be saved by a follow-up experiment that fixes mistakes in the first. However, in this article the second experiment is far worse than the first. It uses a fairly loose paradigm to study multitasking: subjects are given three tasks to perform within a limited timeframe, during which they are interrupted by a phone call that they can choose to answer or ignore. If they answer the call, they can choose whether or not to do this concurrently with whatever task they were performing. On all possible measures of multitasking that this experiment allows (differences in the three tasks, whether they answered the phone at all, whether they multitasked during the phone call, whether this affected their performance), the only gender difference that showed up was on one of the three tasks.
This experiment has the same problem as the first: many contrasts can be tested for gender differences, but only one of them shows a significant (2% this time) results. Moreover, even if this would be "properly" significant, it still only shows a significant difference between men and women on that task. The authors cite a personal communication with the designer of that test that there are no known gender differences in a single task setting. This is not enough: instead an interaction between gender x single vs multitasking should be demonstrated, which is a tougher test.

Sloppy science?

The discussion above may appear to be awfully technical, but the quality of psychological research has been under scrutiny lately exactly when it concerns these details (e.g., Nobel laureate challenges psychologists to clean up their act - Nature). The authors include various disclaimers in their discussion. However, the suggestion that is picked up by the media is clear: there are gender differences, and we only need someone to do a replication to prove it. Instead, my conclusion would be that the experiments demonstrate nothing at all. I have written a letter to the journal, outlining my concerns, but I have not received an answer yet. I will post an update if I do.

So, are there gender differences or what?

As far as I can tell, there is no evidence for one way or the other. But the fact that no reliable study has been published yet makes it much more probable, given the way scientists work, that there are no differences. From a more general perspective of philosophy of science, it is also better to assume there are no differences until they are clearly supported by evidence. 
The only reason to suppose that there are gender differences is anecdotal. But anecdotal evidence can easily confuse being skilled in the tasks themselves with the ability to multitask. People are just better at multitasking when they have fully mastered the tasks involved, and those effects are extremely strong.

2 comments:

  1. The authors do state that they obtained a significant interaction in the Exp. 1 ANOVA analysis (although it is quite strange that the F and p are not reported): "This effect interacted significantly with the gender of participants".

    They also report the effect size of the "gender difference" (i.e., the interaction, I presume) as d=.23. Given the number of participants, this should indeed be significant.

    As for the recalculation, this is a different analysis which also reveals an interaction (as expressed by the t-test, which should be equivalent to a 2x2 ANOVA). I think the authors' point is that the interaction shows up both in the absolute cost and the relative cost analyses.

    So all in all, I'd say the paper presents some evidence for gender differences in mixing costs.

    Although, yes, I wouldn't bet too much on its replicability... as you note, the inconsistencies in the reporting of the statistics make it quite "dubious".

    ReplyDelete
    Replies
    1. oops, just noted that I resurrected a 2 yr. old post... :)

      Delete