Tuesday, 28 May 2013

Approx 1 SD decline in general intelligence since 1889 confirmed in a near-perfectly-matched sample from 1989


Michael A. Woodley, Jan te Nijenhuis & Raegan Murphy have followed up their long-term analysis of reaction times as evidence of a rapid and significant decline in average intelligence since Victorian times


by publishing a rapid and robust refutation of the criticism that this result could be explained-away by differential sampling.


They present a modern population sample from 1989 which is (as-near-as-dammit in this imperfect world) perfectly--matched with Francis Galton's sample from the late 19th century.

The 1989 sample had an average reaction time of 245 milliseconds compared with Galtons 1889 average reaction time of much slower average reaction time of 194 ms - confirming Woodley at al's identification of a decline of general intelligence of approximately one standard deviation, or 15 IQ points.


This unusual compression of the time-scale of scientific debate presents a litmus test of the honesty and competence among the commentators who rejected the original paper on micro-methodological grounds of having serious concerns about sampling issues; grounds which I argued were inappropriate, incompetent and - in their effect - anti-scientific: 


Thus we are now in a position to observe whether such critics understand and acknowledge that they have in fact been refuted; or else whether they reveal the existence of some hidden agenda by maintaining their rubbishing and rejection of the Woodley et al paper by ignoring this refutation, or shifting the grounds for criticism. 


ORIGINAL PAPER: A high-quality replication of Galton’s study one century later: Wilkinson & Allison (1989)

Michael A. Woodley, Jan te Nijenhuis & Raegan Murphy

In Woodley, te Nijenhuis, and Murphy (2013, in press) we argue that intelligence has declined substantially since Victorian times, based on a meta-analysis of simple reaction time. An exchange of ideas started at several blogs. We hereby reply to the blogposts of Scott Alexander and HBD Chick, reacting to an earlier post made by us.

A paper has come to our attention that provides strong evidence against the supposed representativeness problem across cohorts (e.g. Alexander, 2013). The study in question is that of Wilkinson and Allison (1989) using a sample of 5,324 visitors to the London Science Museum, which is situated at the exact site of Galton’s 19th century Anthropometric Laboratory in South Kensington.  All visitors undertook psychophysical testing on a simple reaction time-measuring apparatus, just as the people in Galton’s study did. Of these mixed-sex participants 1,189 were aged between 20 and 29, and are thus highly similar to the age range employed in our own study. Their simple RT mean was substantially slower than the weighted 1889 RT mean (245 ms vs. 194.06 ms), and furthermore the mean of this sample falls very close to the meta-regression-estimated mean across studies for the late 1980s (approximately 250 ms, see: Figure 1 in Woodley, te Nijenhuis & Murphy, 2013). The remarkable features of this study are the ways in which it replicates virtually every significant demographic aspect of Galton’s study.

There is the issue of a participation fee. Galton is known to have requested a participation fee of 3 pennies (approximately £5 in modern UK currency). The London Science Museum required the payment of an admissions fee right up until December 2001. Furthermore it still requires the payment of fees of £6 to £10 for access to some special exhibitions (London Science Museum, 2013a). The Wilkinson and Allison (1989) study was in fact conducted as part of a special exhibition entitled Medicines for Man, which was hosted by the Museum from the early 1980s (Medicines for Man Organizing Committee, 1980). Therefore participation fees were employed in the case of both studies.

There is strong evidence for the demographic convergence between the two studies. Johnson et al. (1985) indicate that whilst Galton’s sample included persons from all occupational and socioeconomic groups in Victorian London, it was nonetheless skewed towards students and professionals, and both groups could fairly be described as solidly White and middle class. In the last decades of the 20th century, museum attendance in the UK exhibited precisely the same skew in terms of sociodemography. Eckstein and Feist (1992) for example noted that most UK museum visitors are drawn from White and upper-middle-class populations. Furthermore Hooper-Greenhill (1994) observed that the largest minority ethnic groups in the UK (i.e. Asians and Afro-Caribbeans) are underrepresented amongst museum visitors. In acknowledging this issue, a House of Commons report in 2002 stated that free admission to museums would unlikely ‘… be effective in attracting significant numbers of new visitors from the widest range of socio-economic and ethic groups’ (House of Commons report, 2002, p. 23).

The presence of this self-selection amongst visitors strongly harmonizes the studies of Galton and Wilkinson and Allison. Add to this the fact that participation fees were employed in both cases, the fact that the geographical locations were exactly the same and finally the fact that the age demographic of interest (i.e. twenty-somethings) were intensively sampled in both cases (i.e. 3,410 in the case of Silverman’s subset of Galton’s sample and 1,189 in the case of Wilkinson and Allison). The net of this is that the studies become even more strongly convergent in terms of comparing like with like. Thus the argument of more heterogeneous samples visiting museums in the 1980s compared to more restricted samples visiting museums in the 1880s is critically weakened. The principal objections that can be leveled against this are as follows.

Firstly there is the issue of tourism. Most tourists to the UK are from the US and Europe (Tourism 3B), meaning that they are likely to be both ethnically and socioeconomically matched to the majority of the participants in this study (i.e. UK citizens). In fact, international arrivals in the United Kingdom in 1990 show that of the 439 million inbound tourists, 60% were European in origin and 21% emanated from the Americas. Hence, 81% of the tourist population came from groups which are highly ethnically similar to the British. Only 12% came from Asia and the Pacific with a meager 3% coming from the Middle East and 2% from Africa (Tourism 3B). In sum, it is unlikely that tourists being tested in the 1989 study were substantially ethnically different from the typical UK museum visitor. Based on current statistics from the Science Museum, the preponderance of visitors hail from the UK (69%) and the preponderance of those are from Greater London (44%; London Science Museum, 2013b). Historically, especially prior to the 1990s this figure would have been much higher, owing to far lower levels of tourism to the UK (in 1990 international tourism levels were less than half the current levels,  >940 million per year, BBC, 2013). This means that in all likelihood well over 70% of the participants in Wilkinson and Allison’s study would have been British, and the overwhelming majority of these would have been White, upper middle-class and from London. The overwhelming majority of the international visitors would have been ethnically and broadly socioeconomically matched to the British visitors.

Secondly is the issue of instrumentation. Galton utilized a pendulum chronoscope with a temporal resolution of around a centi-second (i.e. 1/100th of a second, or 0.01 seconds). The electronic apparatus employed by Wilkinson and Allison in all likelihood had a higher resolution (post-1908 chronoscopy at least had the potential to be accurate to a single milli-second; Haupt, 2001), however a centi-second level only resolution in Galton’s apparatus cannot account for the substantial discrepancies between these two studies.
Thirdly, Galton’s sample was single person-single trial, whereas Wilkinson and Allison’s study employed two practice trials followed by 10 trials per person for the purposes of averaging. This protocol would almost certainly have enhanced the reliability of Wilkinson and Allison’s data relative to Galton’s (Jensen, 1980); however in both cases we are dealing with aggregates. Strong biases (i.e. jumping the gun vs. slow to start) have the potential to cancel each other out when employing these sorts of very large datasets, as these sources of error are distributed in a Gaussian fashion. This means that aggregate-level mean-wise comparisons are appropriate for comparisons between data exhibiting different coefficients of reliability coupled with very large Ns.

On this basis Wilkinson and Allison’s (1989) study must be considered an excellent replication of Galton’s study. Its mean reaction time for the relevant age cohort is almost precisely where our meta-regression predicts it should be. This is clearly strong supporting evidence for the robustness of the increase in simple RT latency produced to date and so puts even more nails in the coffin of those who argue that the trend can be accounted for by lack of representativeness across cohorts.

Alexander, S. S. (2013). The wisdom of the ancients. Slate Star Codex. URL: http://slatestarcodex.com/2013/05/22/the-wisdom-of-the-ancients/ [retrieved on 24/05/13]
BBC. (2013). GCSE Bitesize. Geography tourism trends. http://www.bbc.co.uk/schools/gcsebitesize/geography/tourism/tourism_trends_rev1.shtml
Eskstein, J. & Feist, A. (1992). Cultural Trends, 1991. London, Policy Studies Institute.
Haupt, E. J. (2001). Laboratories for experimental psychology: Gottingen’s ascendancy over Leipzig in the 1890s. In: Rieber, R. W., & Robinson, D. K. (Eds.), Wilhelm Wundt in History. The Making of a Scientific Psychology. (pp. 205-250). New York: Kluwer Academic.
Hooper-Greenhill, E. (1994). Museums and their Visitors. London, Routledge.  
House of Commons, Culture, Media and Sport Committee (2002). National Museums and Galleries: Funding and free admission. House of Commons, United Kingdom.
Jensen, A. R. (1980). Bias in Mental Testing. New York: Free Press.
Johnson, R. C., McClearn, G., Yuen, S., Nagosha, C. T., Abern, F. M., & Cole, R. E. (1985). Galton's data a century later. American Psychologist, 40, 875–892.
Medicines for Man Organizing Committee. (1980). Medicines for Man: A Booklet Based on an Exhibition at the Science Museum about Medicines - how They are Discovered and how They Work, how They are Made and Tested, how They are Prescribed and Dispensed, and how Laws Control Their Use. London, Science Museum.
No author (no date). Tourism 3 SB. Oxford University Press
London Science Museum. (2013a). http://www.sciencemuseum.org.uk/visitmuseum/prices.aspx [retrieved on 27/05/2013]
London Science Museum. (2013b). http://www.sciencemuseum.org.uk/about_us/history/facts_and_figures.aspx [retrieved on 27/05/2013]
Wilkinson, R. T., & Allison, S. (1989). Age and simple reaction time: Decade differences for 5,324 subjects. Journal of Gerontology, 44, 29–35.
Woodley, M. A., te Nijenhuis, J., & Murphy, R. (2013). Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence. doi:10.1016/j.intell.2013.04.006



Saturday, 25 May 2013

Is the claim of one standard deviation decline in intelligence since Victorian times an extraordinary claim, implying the need for extraordinary evidence?


The phrase and practice of "extraordinary claims require extraordinary evidence" is one of those superficially-plausible statements which are untrue, and indeed damagingly false.


Of course, this does not mean we should believe anything anybody might say no matter how absurd - but in practice the whole thing hinges on the meaning of 'extraordinary'.

In practice:

1. The definition of an extraordinary claim is 'something I don't already believe'...


2. The definition of extraordinary evidence is that no amount of evidence will ever be enough to convince me of that.


People find all sorts of things extraordinary which are not.

Modern atheist intellectuals, for example, find the idea of God or gods to be extraordinary, and all kinds of other things such as souls, angels and demons, the Virgin Birth and the resurrection of Christ - plus all sorts of things like telepathy and prophetic dreams - despite that these are not extraordinary to the vast majority of people alive, and to almost nobody in the sweep of human history.

Claims of the reality of things may or may not be factually correct - in general, or specifically - but these are not extraordinary claims - and the 'evidence' for them is of exactly the same nature as for anything else.


In science, a major area which asks for extraordinary evidence is claims of group differences in hereditary personality and intelligence. Around about 1965, it was rather suddenly decided that the claim of heritable group differences was extraordinary, and extraordinary evidence was demanded - and fifty years of research later exactly the same demand is still being made.

Each piece of evidence in support of the supposedly-extraordinary claim of heritable group differences is put under a microscope to check for flaws - and of course flaws can be found - especially when people don't look use the same microscope to study the beliefs that they believe.

So, those who reject heritable group differences believe what they believe, and which (because they believe it) they do not regard as extraordinary - and accept any further evidence in support of this belief with just a cursory glance; but anything they don't want to believe is checked, inch by inch, under a microscope and - guess what? - is found to have flaws! Therefore they feel justified in rejecting it.


This pretends to be 'rigorous' ('Look at me - I'm a real scientist! I'm using a microscope!') but it is of course, that is phony science, politicized science.

Indeed, the practice of accepting supporting evidence at a glance while putting opposing evidence under a microscope is not just bad science - it is not science at all.


Yet even outside of politics this technique of differential rigour is an easy trap to fall-into - indeed there are very few who are exempt.

This is something to be guarded against.


When I was editor of Medical Hypotheses it was the last major non-peer reviewed journal in science - and its special value was that it had the potential to overcome the intrinsic bias of 'peers' to defend the existing paradigm by demanding 'extraordinary' evidence for any new work which disagreed with the prevailing paradigm.

But what if the evidence for the prevailing paradigm is weak?

In modern professional science there are many dominant but evidentially (and logically) weak research paradigms - but these are defended as tenaciously - or even more tenaciously, than coherent and strongly supported paradigms.


For example, in psychiatry there is a decades-established prevailing explanation of 'depression' in terms of neurotransmitter abnormalities and of treatment by 'antidepressants' which correct these presumed abnormalities.

This paradigm does not make any sense and there is no evidence to support it - yet 'everyone' believes it - and any alternative perspectives are (therefore?) treated as requiring 'extraordinary' levels of evidence, such as clear and unambiguous support from multiple multi-million-dollar, mega-randomized controlled trials. Hence the prevailing nonsense is impregnable.


Declaration of interest: all this stuff is happening, now, to me and Michael A Woodley et al with respect to the use of longitudinal reaction time data to estimate long term trends in general intelligence.



The problems are that the result contradicts current notions of:

1. Direction 

2. Size

3. Rapidity

of change in intelligence. Therefore the result challenges the current paradigm in several respects.


To put it another way, Woodley and myself are regarded as making an extraordinary claim - and this is used to justify putting this particular study under an extra-powerful methodological microscope (and if that doesn't suffice - an electron microscope) to search for, and inevitably find, microscopic cracks (gaps) and flaws (distortions) in the evidence.


But consider the counter-factuals.

Suppose that the data had shown a reduction in reaction time since the Victorian era (implying increasing intelligence) - which would be consistent with the Flynn effect - what would people have done?

In other words, what if the study - and I mean exactly the same study in terms of exactly the same evidence and methodology - had found exactly what people expected it to find?

Well, of course this would not have been an 'extraordinary' claim, so people would have accepted the study as a matter of course and without further discussion.


But suppose (again counter-factually) that the study (exactly the same data set and methodology) had instead found a small increase in reaction time? Implying a small reduction in average intelligence?

Well, that too would have been accepted without much discussion - since it would not signify much either way.

Nobody would have put the study under an electron microscope. 


But in real life the data showed a big increase in reaction times implying a big reduction of intelligence since Victorian times.

This is indeed a paradigm-shifting claim; and it is 'therefore' regarded as an extraordinary claim - and therefore this is used to justify (as described above) extraordinary examination of the claim - on the basis of the slogan in the title.

But is this an extraordinary claim?


If it really was extraordinary to claim that intelligence had declined by about one standard deviation since Victorian times, it would be easy to present many evidences and examples which very obviously refuted that claim.

Yet none have been presented.

Obviously, the best way to reject a claim is to refute it with clear and unambiguous contrary evidence - and if you are reduced to quibbling over methodology, then the claim is revealed as being not extraordinary - and therefore not requiring extraordinary levels of micro-critique.


So, although people feel that the claim of a rapid and significant decline in intelligence is so extraordinary as to invite extreme skepticism; in practice - or so it seems - it is not easy to refute this claim without stepping outside of real science and becoming the kind of phony, fake, pseudo-scientist who polices the field of heritable IQ group differences.

And this is what we find.


Under pretence of rigour, there is not just bad science, but non-science - anti-science.


So, if micro-analysis of potentially paradigm changing claims is revealed as anti-science, then what should be done?

Well, paradigm-changing claims should be evaluated in the same way and to the same standards of research competence and honesty as paradigm-supporting claims.

When a piece of research reaches the normal standards of competence and honesty, yet has extraordinary implications, then it is distorting and inappropriate and in fact anti-science to dwell upon the micro-details of this research.


The correct thing to do is to recognize that in real science multiple evidences converge upon the truth.

That is exactly why science is important - because true scientific claims have multiple consequences - or, to put it another way, the implications of important true theories ramify through reality, and therefore their consequences can be observed in many places.

So when the claim of reduced intelligence is based upon longitudinal change in reaction times, then the matter is not to be settled 'methodologically' by ever more, ever more voluminous and complex and precise measurements of reaction times - but instead by seeking convergent evidence from different fields - by tracing out the implications of the claim through reality until these implications go into places where they can be observed and checked.

Therefore, what is needed is to discover what would-be the implications of a significant and rapid decline in average intelligence, and then make observations to see whether these implications really have happened.


If the claim of an approximately one standard deviation decline in intelligence since Victorian times really is false; then, because this is a very significant claim, it should be easy to discover some strong evidence which contradicts the claim.

But such a claim need not, and certainly should not be required to, meet 'extraordinary' standards of evidence!


In conclusion:

If a claim is both 'extraordinary' and also wrong, it must be trivially easy to refute.

But if a claim which seems 'extraordinary' cannot in practice easily be refuted; then it is not really an extraordinary claim, and should not be treated as such. 

Wednesday, 15 May 2013

More than one third decline in general intelligence since Victorian times?


Having reflected on the title of an earlier post which described the decline in intelligence since Victorian times in terms of approximately 15 IQ points (or one standard deviation):


And then thinking about the idea of using reaction times as a scale of general intelligence:


I realized that it would make more sense to most people if the decline in intelligence was expressed in terms of percentages.


Of course this is only approximate, but if the data from Silverman's paper is used the (median) average Victorian men's reaction time was 183ms (milliseconds), while the (mean) average Modern men's reaction time was 250 ms.


So the slowing from Victorian to Modern reaction times is 67ms

which is approx 37 percent of the Victorian reaction time of 183 ms; and 37 percent is more than one third.


So, in a phrase and using Victorian intelligence as the baseline; it would be reasonable to say that average intelligence had declined by more than one third since Victorian times.


And what does this 'one third' refer to?

MA Woodley talks of reaction times as a measure of core efficiency of the central nervous system - and this slowing of reaction times therefore suggests that our average thinking is biologically less efficient than the Victorians such that Victorian brains could perform an extra thirty seven percent more intelligent processing per unit time.


Sunday, 12 May 2013

The e-mails from 2008 when I had the idea of using historical reaction time data to measure trends in intelligence

Two e-mails to Richard Lynn (Emeritus Professor, University of Ulster) 

5th July 2008 14:43 


I've just this morening got Dysgenesis from the ILL service - I've so far just looked through the contents and headings.

I think I have an idea for measuring dysgenic change which doesn't seem to be in the book - by using reaction times.

Since reaction time correlates with IQ, and since I believe it is an old physological measurement, it is possible that there are representative data on national population reaction times over the past 100 or so years.

Since high IQ people have a fertility considerably below replacement level, my prediction would be that average reaction time would have become longer, and that the standard deviation would have become smaller (due to selective loss of shorter reaction times).

Do you think this makes sense as an hypothesis?

Maybe somebody has already done it, or if not - do you know of any databases of reaction times (or somebody whom I might contact about this?)

Best wishes, Yours, Bruce


9th July 2008 09:28h 

Dear Richard,

Thanks for this. Yes - I was assuming/ hoping that RT might be a way to look at genotypic IQ, but as you say this is not known.

I haven't been able to find any old population measures of RT so far (I have only spent a couple of hours on the job, admittedly) - but I think that _if_ reaction times had lengthened throughout the twentieth century, then that would be an interesting observation. On the other hand, average RTs had stayed the same (like inspection times in the paper you mentioned) or if they had shortened throughout the 20th century, then neither of _those_ results would be interesting.


Two e-mails to Ian Deary (Professor, University of Edinburgh): 

Monday 7th July 2008 09:51h

Dear Ian,

I'm sorry to badger you when you are still catching up but I have had an idea for measuring dysgenic change by using reaction times which I am keen to follow-up.

Since reaction time correlates with IQ, and since reaction time is an old physiological measurement, it is possible that there are representative data on national population reaction times over the past 100 or so years.

Because high IQ people have a fertility considerably below replacement level, my prediction would be that average reaction time in developed countries should have become longer, and that the standard deviation would have become smaller (due to selective loss of shorter reaction times).

But probably somebody has already done it (perhaps your group?)?

Or maybe reaction time correlates with 'phenotypic' (or measured) IQ (and therefore gets enhanced by the Flynn effect) rather than correlating with underlying 'genotypic' IQ? - I don't know.

If it hasn't already been done - do you know of any databases of reaction times (or somebody whom I might contact about this?

Best wishes, Yours, Bruce


7th July 2008 20:11h

Dear Ian

That is *extremely* helpful of you (as ever).

So you have a modern average simple reaction time of 358 ms from Deary et al 2001. And the inspection time paper suggests that RT may be 'genotypic' intelligence - and not comtaiminated by the Flynn effect.

One would imagine it should be simple, in principle, to compare this with older estimates of RT - but so far I have failed to find any - either due to the papers not quoting any actual numbers, or different measure, or not being able to access the paper.

Hmm - I shall keep digging.  

With best wishes, Yours, Bruce


NOTE: It was not until 28 February 2012 when - as a result of an e-mail interchange we had earlier that day - Michael A Woodley found that data which answered this question: Silverman IW. Simple reaction time: it is not what it used to be. American Journal of Psychology. 2010; 123: 39-50. As can be seen, Silverman's review and analysis had not been published in July 2008, when I made my failed attempt to discover some relevant evidence. 



Friday, 10 May 2013

The 15 point decline in IQ since Victorian times - what next?


Okay - so simple reaction time data indicates that average general intelligence has declined by about one standard deviation (15 IQ points) since the late 1800s.





What next?

Should we believe it?


Well, here are a few considerations:

1. Simple reaction times are the most objective evidence we have concerning trends in general intelligence.

Therefore we should:

a. believe them, or

b. come-up with something better, or

c. show how these studies of reaction times are incompetent, or

d. assume that the studies of reaction times are dishonest, or

e. assume that this data is wrong on the basis that most data is wrong (and because we are under no obligation to believe any particular bit of data - quite the contrary, there must be good reasons for believing any specific proposition)


2. Any other data which can be brought to bear on the matter of declining intelligence is (I think) relatively 'soft', subjective and imprecise compared with reaction times - (things like rates of major innovation, rates of geniuses etc) so - although I see much of what seems to be consistent with a significant and relatively rapid decline in general intelligence - it is unlikely to convince someone who does not want to believe in the first place.

On the other hand, even if it does not support, does any of this long terms evidence of general intelligence contradict the thesis that it has declined?

I mean, is there any decent evidence that general intelligence has actually risen? (aside from the Flynn effect of increasing pen and paper IQ test scores - which is not relevant here since it is what is being explained, and therefore inadmissible as evidence).

Has, for instance, per capita performance improved since the late 1800s in any 'g' related and quantifiable human endeavour? I know this is tricky to answer, since there are shifts in the specialized activities of intellectuals (e.g. away from study of The Classics and towards some of the sciences, or medicine) - but is there is reasonably compelling evidence of this sort of improvement?


3. If is is decided that we should, after all, accept the best evidence as showing a significant decline in average intelligence since the late 1800s, then there is quite a lot of further work to be done on the mechanism - because, on the whole, present understanding of 'dysgenic' mechanisms in relation to intelligence is not adequate to explain the rapidity of this decline.

(I mean, that calculations based on differential reproduction and heredity of intelligence predict a much slower rate of decline than could lead to one SD reduction in 'g' in the space of just four or five generations.)

The accumulation of deleterious mutations with the relaxation of selection due to high rates of child mortality is my first best guess at the likely 'extra' mechanism (and Michael Woodley agrees)


but really this is just a best guess, and the detailed mechanisms of how this might work are unclear.

However, it could make interesting science in trying to find-out!



Thursday, 9 May 2013

The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time


[Here is a rather rough version - sorry! best I can do - of a forthcoming paper to which I contributed; although I am not one of the authors.] 

Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time
Intelligence – 2013 Available online 7 May 2013
a Department of Psychology, Umeå University, Sweden
b Center Leo Apostel for Interdisciplinary Studies, Vrije Universiteit Brussel, Belgium
c Work and Organizational Psychology, University of Amsterdam, The Netherlands
d School of Applied Psychology, University College Cork, Ireland
Received 17 February 2013
Revised 15 April 2013
Accepted 15 April 2013
Available online 7 May 2013

The Victorian era was marked by an explosion of innovation and genius, per capita rates of which appear to have declined subsequently. The presence of dysgenic fertility for IQ amongst Western nations, starting in the 19th century, suggests that these trends might be related to declining IQ. This is because high-IQ people are more productive and more creative. We tested the hypothesis that the Victorians were cleverer than modern populations, using high-quality instruments, namely measures of simple visual reaction time in a meta-analytic study. Simple reaction time measures correlate substantially with measures of general intelligence (g) and are considered elementary measures of cognition. In this study we used the data on the secular slowing of simple reaction time described in a meta-analysis of 14 age-matched studies from Western countries conducted between 1884 and 2004 to estimate the decline in g that may have resulted from the presence of dysgenic fertility. Using psychometric meta-analysis we computed the true correlation between simple reaction time and g, yielding a decline of − 1.23 IQ points per decade or fourteen IQ points since Victorian times. These findings strongly indicate that with respect to g the Victorians were substantially cleverer than modern Western populations.

1. Introduction
1.1. The Victorians
Queen Victoria of the United Kingdom reigned from 1837 to 1901. The Victorian era was a period of immense industrial, cultural, political, scientific, and military change in Western Europe marked by an explosion of creative genius that strongly influenced all other countries in the world. In international relations there was a long period of peace, known as the Pax Britannica. Breakthroughs in science led to an escape from the Malthusian trap: increasing populations did not starve and longevity increased. The growth in economic efficiency before the Victorian era was a miniscule 1% per century (Clark, 2008), but started increasing spectacularly in the Victorian era. The height of the per capita numbers of significant innovations in science and technology and also the per capita numbers of scientific geniuses was clearly situated in the Victorian era; after which there was a decline ( Huebner, 2005, Murray, 2003, Woodley, 2012 and Woodley and Figueredo, 2013).
IQ scores are excellent predictors of job performance (Schmidt & Hunter, 1999) and high-IQ people are more productive and more creative (Jensen, 1998). A population with a higher intelligence will in general be more productive and creative than a population with lower intelligence (Lynn and Vanhanen, 2012 and Rindermann et al., 2009). Were the Victorians therefore cleverer than us? Here we test this hypothesis using measures of reaction time (RT), which give a good indication of general intelligence (e.g. Johnson & Deary, 2011) in a meta-analytic study.
1.2. Measured IQ scores increase: The Flynn effect
At first sight, the case for a decrease in intelligence since Victorian times seems highly implausible. After all, there is now consensus that at least since World War II, IQ scores have been going up, the so-called Flynn effect. Flynn, 1987 and Flynn, 2009 showed a worldwide increase in measured IQ scores of approximately 3 points a decade. Recent studies show similar gains in South Africa (te Nijenhuis, Murphy, & van Eeden, 2011) and much larger effects in South Korea (te Nijenhuis, Cho, Murphy, & Lee, 2012). These gains are thought to be due almost entirely to environmental improvements stemming from factors such as improved education, nutrition, hygiene, and exposure to cognitive complexity (Neisser, 1997). The Flynn effect has therefore been described as an increase in phenotypic intelligence, i.e. the intelligence that results from a combination of genes and environmental factors (Lynn, 2011).
1.3. The dysgenics paradox
Dysgenic trends result from socially valued and heritable traits, such as intelligence, declining within populations over time due to the effects of selection operating against those traits (Galton, 1869 and Lynn, 2011). Before 1825 Western countries were in eugenic fertility, in that those with the highest levels of education and/or social status had the largest numbers of surviving offspring ( Lynn, 2011 and Skirbekk, 2008). The majority of these countries completed the transition into dysgenic fertility for these IQ proxies by around the middle of the 19th century ( Lynn, 2011 and Skirbekk, 2008).
The presence of a dysgenic effect on intelligence has proven difficult to detect via direct measurement, i.e. by comparing IQ scores of different age-matched generations on the same IQ battery. The earliest cross-sectional studies (1930s–1950s), attempting to quantify the decline actually found the opposite effect i.e. rising IQ scores (e.g. Cattell, 1950). This presented a paradox as studies from the same time period consistently found negative correlations between IQ and variables such as fertility and family size (Lynn, 2011 and van Court and Bean, 1985). Given the observation that IQ is substantially heritable, this finding should have entailed declining rather than increasing IQ (Lynn, 2011 and van Court and Bean, 1985). The failure to directly measure a dysgenic effect on IQ is now attributed to the Flynn effect: the strong secular rise in IQ simply masks the likely much weaker dysgenic decline in IQ (Lynn, 2011).
Nonetheless attempts have been made to estimate the theoretical rate of dysgenic change in IQ based on the magnitude of the negative correlation between fertility and IQ (see: Lynn, 2011 for an overview of these studies). These estimates, which range from a low of − .12 (Retherford & Sewell, 1988) to a high of approximately − 1.3 points per decade (Lentz, 1927), are however inferred rather than observed declines. So, dysgenic effects appear to be unmeasurable directly using standard IQ tests.
1.4. Genotypic IQ decreases
Other research has examined whether dysgenic effects have a genetic component by testing for so-called Jensen effects (Rushton, 1998). When looking at the subtests of an IQ battery these subtests range from high complexity (high loadings on the g factor of intelligence) to low complexity (low loadings on the g factor). Jensen effects refer to the tendency for the test's g loadings to positively correlate with the size of the effect of other variables on the same subtests. So, subtests with high g loadings go with strong effects and subtests with low g loadings go with weak effects. Jensen effects exist on genetic variables, such as heritability, inbreeding depression, and it’s opposite, hybrid vigour ( Jensen, 1998 and Rushton and Jensen, 2010). Clear Jensen effects have also been found for dysgenic fertility (Woodley & Meisenberg, in press). This indicates that dysgenic fertility is predominantly a genetic effect: i.e. genotypic IQ or more accurately ‘genetic g’ (Rushton & Jensen, 2010) decreases. However, the Flynn effect is clearly not a Jensen effect, as it exhibits a modest, negative correlation with subtest g loadings (te Nijenhuis & van der Flier, this issue). In summary therefore the pattern of genetic effects such as heritabilities on the subtests of an IQ battery are highly similar to the pattern in dysgenic effects, however both show no resemblance to the pattern in the Flynn effect.
1.5. Reaction time as a high-quality measure of general intelligence
Galton (1883) was the first to suggest that RT might be an elementary cognitive measure as it appeared to be an indicator of speed of mental processing. Subsequent research has confirmed many key predictions of the speed-of-processing theory of intelligence via the demonstration of robust correlations between measures of RT and IQ (see: Jensen, 2006 for an overview). Moreover, there is a Jensen effect on RT, as more g-loaded subtests of an IQ battery correlate more strongly with RT measures than do less g-loaded ones (Jensen, 1998, pp. 234–238). This has led Jensen, 1998, Jensen, 2006 and Jensen, 2011 to suggest that RT is in fact a biological marker of mechanisms fundamental to the operation of general intelligence, such as neurophysiological efficiency. Furthermore, RT is a 'ratio-scale' measure of intelligence meaning that it has a true zero (analogously to the Kelvin scale in temperature measurement). This means that RT can be used to meaningfully compare historical and contemporary populations in terms of levels of general intelligence (Jensen, 2011).
Even the most simple measure of RT (i.e. the time that it takes for an individual to respond to a sensory stimulus) appears to be robustly associated with IQ. Rijsdijk, Vernon, and Boomsma (1998) for example investigated the relationship between simple RT and IQ in a genetic analysis using twins. Simple RT and IQ as measured using the Raven's Advanced Progressive Matrices were found to exhibit identical levels of heritability (.58 and .58, respectively) and furthermore the phenotypic correlation between the two of − .21 (increasing IQ goes with decreasing RT speed, hence the correlation is negative) was completely mediated by common genetic factors. Another relevant study is that of Deary, Der, and Ford (2001) who set out to generate benchmark estimates for the correlation between IQ and various RT measures (including simple) in a population-representative sample yielding a correlation between the two of -.31, indicating a substantive relationship.
1.6. A secular slowing of reaction time
Silverman (2010) reviews simple RT studies conducted between the 1880s and the present day. In Silverman's (2010) study, Galton's estimates collected between 1884 and 1893 (as reported in Johnson et al., 1985) were compared with twelve studies from the modern era (post 1941). Galton's measures indicated a simple visual RT mean of 183 milliseconds (ms) for a large sample of 2522 young adult males (aged between 18 and 30), along with a mean of 187 ms for a sample of 888 equivalently aged females. These means seem to be representative of the period as a 1911 review of various studies conducted in the last 19th and early 20th centuries (Ladd & Woodworth, 1911), which did not include Galton's measures, found an RT range of 151–200 ms (mean 192 ms), using different instrumentation to that employed by Galton (1889). Moreover, Silverman was also able to comprehensively rule out lack of socioeconomic diversity, as Galton's samples were diverse enough to be stratified into seven male and six female occupational groups (Johnson et al., 1985).
Twelve modern (post 1941) simple RT studies by contrast revealed considerably slower RTs for both males (mean 250 ms) and females (mean 277 ms) in a combined sample of 3836. In comparing the 19th-century measures with the modern ones, Silverman found that in 11 of the 12 studies and in 19 out of 20 comparisons, the differences were statistically significant. Furthermore age was not a confounding factor as Silverman matched studies across time based on age range.
1.7. Estimating the dysgenic effect for g
The studies of Deary et al. (2001) and Rijsdijk et al. (1998) combine to indicate that the simple RT/IQ correlation is substantial at the population level, and that furthermore the association between the two is completely mediated by common genetic factors. Hence, given the strong Jensen effects on both dysgenic effects (Woodley & Meisenberg, in press) and simple RT (Jensen, 1998) a secular increase in simple RT latency is in fact an expected outcome of a dysgenic decline in ‘genetic g’. Based on this it should be possible to estimate the degree to which ‘genetic g’ has declined in Western populations due to dysgenic pressures, since the 1880 s using Silverman's (2010) data.
1.8. Research questions
This leads to the following two research questions. 1) How strong is the secular slowing of simple RT? 2) How strong is the decadal g decline based on simple RT measures?
2. Methods
The data on simple RT used here, with the exception of one study (Thompson, 1903), comes from Silverman (2010) and sources contained therein. Silverman carried out various analyses on simple visual reaction time measures and is an excellent source. He describes a thorough meta-analytical search yielding the means for 13 different studies, involving samples of equivalent age ranging in time from between 1884 and 2004. He also mentions the review by Ladd and Woodworth (1911) of eight early studies of reaction time, many of them from the late 19th century, which indicate the representativeness of Galton's simple RT estimates. Like Silverman we do not include the results of this study in our final analysis as there are too few details provided that would permit its suitability to be determined, based on the inclusion rules. This leaves all 13 age-matched studies used in Silverman's (2010) analysis, along with one other 19th century simple RT study from the US (Thompson, 1903). We were directed to the Thompson study by Silverman (pers. com), on the basis that even though he missed it in his 2010 study, it nonetheless satisfies his inclusion rules and should be included on that basis.
2.1. General inclusion rules
We take our general inclusion rules from the meta-analysis by Silverman (2010). First, the samples consisted of people recruited from the general population and whose ages ranged from about 18 to 30 years. Second, the study sample had to be in good health, as poor health is a known inhibitor of RT performance. Third, given that Galton's sample was British the studies had to have been conducted in a Western country. Fourth, the study samples had to be 20 or larger in size for each sex. Fifth, the delivery of the stimulus was not predictable, which ruled out studies in which the interval between stimuli was fixed or increased or decreased according to a regular pattern. Sixth, the response to the stimulus had to be manual in nature, such as pressing or releasing a button or key. Seventh, to generate the response, the arm did not have to be moved (this restriction was based on the consideration that if the arm must be moved, RT is necessarily lengthened, and the g-loadedness of the estimate potentially reduced due to the addition of a non-cognitive ‘movement time’ component to the measures (Jensen, 2006). Eighth, the RT measure had to be representative of the total set of RTs. This restriction eliminated studies in which RT was measured in terms of the best RTs or the longest or shortest RT.
As sex-differences data were not available for each study, we generate weighted averages for studies reporting sex differences, thus we produce a single RT mean for each study. Finally it must be noted that reaction time measures tend to show strongly skewed distributions (see: Jensen, 2006). For skewed distributions the median would be the better measure, but because not all reaction time studies reported the median, we choose the mean instead. Table 1 reports all data used in this study.

Table 1. 14 simple RT studies used in Silverman (2010) and Thompson (1903) along with 16 simple RT means, sample sizes, collection/publication year and references.
Testing year and countryMales (N)Females (N)Sample size weighted mean (total N)Reference
1889a (1884–1893) (UK)183 (2522)187.9 (888)184.3 (3410)Galton's data in Johnson et al. (1985)
1894.5a (1889–1900) (USA)199 (24)217 (25)208 (49)Thompson (1903)
1941 (USA)197 (47)n.a197 (47)Seashore, Starmann, Kendall, and Helmick (1941)
1941 (USA)203 (47)n.a203 (47)Seashore et al. (1941)
1945 (UK)286 (76)n.a286 (76)Forbes (1945)
1970 (Canada)236 (40)263 (40)249.5 (80)Lefcourt and Siegel (1970)
1990 (Finland)199 (20)n.a199 (20)Taimela (1991)
1987 (Finland)183 (20)n.a183 (20)Taimela, Kujala, and Osterman (1991)
1993 (USA)260 (80)285 (140)275.9 (220)Anger et al. (1993)
1993 (USA)250 (73)280 (163)270.7 (236)Anger et al. (1993)
1999 (UK)306 (64)n.a306 (64)Smith et al. (1999)
2002 (UK)324 (24)n.a324 (24)Brice and Smith (2002)
1999.5a (1999–2000) (Australia)214 (1163)224 (1241)219.5 (2404)Jorm, Anstey, Christensen, and Rodgers (2004)
2004 (Canada)253 (171)268 (198)261 (369)Reed, Vernon, and Johnson (2004)
1987.5 (1987–1988) (UK)295 (254.5)b306 (288.5)b300.3 (543)Deary and Der (2005a)
1984.5 (1984–1985) (UK)300 (834)318 (1023)309.6 (1857)Der and Deary (2006)
Additional. We went back to Johnson et al. (1985) and cross-referenced it with Silverman (2010). The total N for females should be 888 rather than 302. We changed the above N to reflect the correct females sample size.
When a range of years is given the average is taken.
In these studies between 254–255 males and 288–289 females were used — hence the Ns are averaged.
2.2. Psychometric meta-analysis
Regression with year is used to generate trend-weighted estimates of 19th-century (1889 — median year of Galton's study) and modern (2004 — the year of the most recent study in the collection) RT means. The population-representative study of Deary et al. (2001) is used for obtaining benchmark estimates of the simple RT/IQ correlation, along with estimates of standard deviations. Psychometric meta-analysis (Hunter and Schmidt, 1990 and Hunter and Schmidt, 2004) can be used to correct for statistical artefacts that typically alter the value of outcome measures. There are five such artefacts that need controlling. These include sampling error, reliability of the first variable, reliability of the second variable, restriction of range, and deviation from perfect construct validity. These corrections are used to determine the true correlation between g and simple RT, and hence the rate of g decline between 1889 and 2004.
2.3. Meta-regression
Meta-regression is a method for examining the influence of one or more covariates on the outcome effects. We carried out meta-regression in which we regressed the effect size – the mean RT of a study – on the covariate — the year of the study, using the software available on www.stattools.net. We carried out a random-effects meta-regression, because it is generally considered to be the more appropriate technique in most studies (Borenstein, Hedges, Higgins, & Rothstein, 2009), and computed Tau2 using the Empirical Bayes Estimate (see: Thompson & Sharp, 1999).
Each study needs to be given a specific weight, and in this case we took the Standard Error of the Mean (SEM) score of the study. SEM is the standard deviation of the sample-mean's estimate of a population mean. SEM is usually estimated by the sample estimate of the population standard deviation (sample standard deviation) divided by the square root of the sample size: SEM = s/√N.
is the sample standard deviation, and
is the size of the sample.
The study of Deary et al. (2001) reports a good approximation of the population SD of a simple RT measure. Adding to the study of Deary et al. (see above) we estimate that the population SD = 160.4. The population SD is of course much better than the sample SDs, which are merely estimates of the population SD, and which vary substantially among themselves introducing additional error. So, we decided to use the value of the population SD in the computation of the standard errors of all the individual studies in our meta-analysis, using the formula SEM = 160.4/√n. 

3. Results
3.1. Estimation of decline in average reaction times
In Fig. 1 the simple RT means for all 16 effects were regressed against year so as to determine the overall temporal trend. The trend beta coefficient, computed using a random-effects meta-regression, equals .265 and is significant at p = .003.
Fig. 1. Simple RT mean vs. year for 16 effects. The size of the bubbles is categorically determined by sample size with small bubbles representing studies with N values < 40 and large bubbles representing N values > 40. The scatter is fitted to a linear function, so as to illustrate the secular trend, and is weighted based on a random-effects meta-regression model.

3.2. Estimation of the population SD of reaction time and IQ
The study by Deary et al. (2001) is an attempt to generate a benchmark estimate of various parameters relating to a variety of RT measures and IQ based on a sample broadly representative of, in this case, the Scottish population (N = 900). The sample was drawn from the West of Scotland Twenty-07 Study, which is a population-based cohort study and was obtained using a two-stage random sampling strategy. The mean age of the participants was 56 so the sample is representative of cohorts that are older than those used by Silverman. RT (both simple and choice) was measured using a ‘Hick’-style device and IQ was measured using the 65 items of deductive reasoning constituting the numeric and verbal sections of the Alice Heim Group Ability Test Part I (AH4 Part I).
The study reports a simple RT SD of 119.7 ms. This value is much higher than the SDs in the individual samples (which range from 15 [Lefcourt & Siegel, 1970] to 90 [Deary & Der, 2005a]) indicating a strong restriction of range in virtually all samples in Silverman's study. When comparing the SD values of this Scottish sample on the AH4 total score (11.3) with the SD values of the samples in the AH4 manual it is clear that the value from the Scottish sample is much lower, indicating that it underestimates the true population value. The samples in the AH4 manual are not nationally representative (Alexopoulos, 1998, p. 645). However, quite a few samples of children and young adults were tested and some of the samples are quite large.
We compare the SD values for simple RT with the SD values of the AH4 from the manual, so as to correct the former for range restriction. This is achieved by comparing the SD of the Deary et al. (2001) sample to SDs of young adults (seventeen-plus-years-old) and of young children from the manual, it is apparent that all samples of seventeen-plus-years-old have SDs that are substantially larger. The sample size-weighted SD is 14.3, indicating that the range in the Scottish sample is at least 21% too small. However, these samples are still not truly nationally representative, hence the population SDs will most likely be even larger. The best approximations of nationally representative samples in the manual are the two large samples of, respectively, eleven-year-olds and twelve-year-olds from comprehensive schools. This is because after primary school, children are allocated secondary education which is most optimal for their IQ level, leading to IQs that are more homogeneous in secondary than in primary education. We therefore computed the sample-size weighted mean SD of the two large samples of eleven-year-olds and twelve-year-olds from comprehensive schools, which yielded an SD with a value of 17.2. This suggests that the simple RT SD in the Scottish sample is similarly underestimated by no less than 34%, meaning that the SD is not 119.7, but 160.4. We take this value of 160.4 as the best estimate of the population SD of simple RT.
3.3. Estimate of the true correlation between reaction time and IQ/g
Deary et al. (2001) estimate the correlation between simple RT and IQ in their population-representative sample at − .31. However, in contrast with the meta-analysis by Jensen (1987) Deary et al. do not correct for measurement artefacts. On this basis there is likely to be measurement error in the correlation, and the value of − .31 is an underestimate of the true correlation. We therefore correct for unreliability in the simple RT and IQ measures, restriction of range in the IQ measure and imperfect measurement of the construct of g ( Hunter and Schmidt, 2004 and Jensen, 1998).
3.4. Reliability in the simple RT and IQ measures
Deary et al. (2001, p. 397) suggest a test–retest reliability of both the simple RT and IQ measures of .85. We use this value for our corrections. Correcting for unreliability means dividing the observed value by the square root of the reliability, which yields a correction factor in both cases of 1.09.
3.5. Restriction of range
The value of the correlation between the IQ measure and the simple RT measure is attenuated by range restriction in the sample. The solution to variation in range is to define a reference population and express all correlations in terms of it (Hunter & Schmidt, 1990, pp. 47–49). The next step is to compute what the correlation in a given population would be if the SD were the same as in the reference population. The SDs can be compared by dividing the SD of the study population by the SD of the reference group, that is u = SDstudy / SDref. Previously we showed that the Scottish sample is likely strongly affected by range restriction, yielding a value of u = 119.7/160.4 = .75. Correcting for restriction of range means dividing the observed correlation by the value of u, yielding a correction factor of 1.33.
3.6. Imperfectly measuring the construct of g
The deviation from perfect construct validity in g attenuates the values of the correlation between the IQ test and the simple RT measure. In making up any collection of cognitive tests, we do not have a perfectly representative sample of all possible cognitive tests. Therefore any one limited sample of tests will not yield exactly the same g as another such sample. The sample values of g and therefore also correlations involving measures of g are attenuated by psychometric sampling error, but the fact that g is very substantially correlated across different test batteries implies that the differing obtained values of g can all be interpreted as estimates of a “true” g (e.g. Johnson, Bouchard, Krueger, McGue, & Gottesman, 2004).
The more tests and the higher their g loadings, the higher the g saturation of the composite score. The Wechsler tests have a large number of subtests with quite high g loadings, yielding a highly g-saturated composite score. The g score of the Wechsler tests correlates more than .95 with the tests' IQ score (Jensen, 1998, pp. 90–91). However, shorter batteries with a substantial number of tests with lower g loadings will lead to a composite with somewhat lower g saturation. The average g loading of an IQ score as measured by various standard IQ tests lies in the + .80s (Jensen, 1998, ch. 10). When this value is taken as an indication of the degree to which an IQ score is a reflection of “true” g, it can be estimated that a tests’ g score correlates about .85 with “true” g. As g loadings represent the correlations of tests with the g score, it is likely that most empirical g loadings will underestimate “true” g loadings. To limit the risk of overcorrection a conservative value of .90 can be used as a basis for a classical test battery like the Wechsler.
As the AH4 Part 1 consists solely of items that measure fluid intelligence it is expected that the g loadedness of the sum score is high. The manual of the AH4 (p. 10) shows correlations ranging from .60 to .76 between the total score on the AH4 and the total score on other IQ tests, including the Raven’s Progressive Matrices, with higher values for the larger samples. This is actually higher than the mean correlation of .67 between the total scores of various standard intelligence tests reported elsewhere in the literature (Jensen, 1980, pp. 314-315). The manual reports a factor analysis of the intercorrelations between the various sum scores of similar items showing a strong general factor running through the whole test, each sum score correlating highly with the general factor — values of r lying between .80 and .86 (Heim, 1970, p.9). So, it appears the g loadedness of the AH4 Part 1 is similar to the g loadedness of a classical battery such as the Wechsler. Therefore, the correction for imperfectly measuring the construct g should be modest: 10%, hence a correction factor of 1.10.
In sum, the observed correlation in the Scottish data between the IQ test and the simple RT test is − .31. The correction factor for unreliability in both the simple RT and IQ measures is 1.09, the correction factor for restriction of range is 1.33, and the correction factor for imperfectly measuring the construct of g is 1.10. Applying the four corrections to the value of the correlation yields a true correlation ρ = − .54. This demonstrates that the g loadedness of simple RT is quite a bit larger than the observed correlations suggest.
3.7. Using effect sizes for reaction time to compute effect sizes for IQ/g
Our measure is an imperfect reflection of g: its true, absolute correlation with g is .54 hence our measure's true g-loadedness is .54, similar to that of certain subtests of an IQ battery, whereas the true g-loadedness of g is by definition 1.00. In other words, simple RT measure 54% of the g factor. As our interest is in the decline in g we need to extrapolate our findings from a measure with a g loading of .54 to a measure with a g loading of 1.00. We therefore divide the effect size (d) for simple RT by the value of .54, as the d for the simple RT measure between 1889 and 2004 is .51 (81.4/160.4). With a g loading of .54 this yields an equivalent d = .51/.54, which results in a correction factor of .94 for the total score on a broad IQ battery. This means that ‘genetic g’ (recall that the most heritable IQ measures are also the most g loaded; Rushton & Jensen, 2010) has decreased by − 14.1 IQ points since Galton carried out his studies; this is a decline of − 1.23 IQ points per decade between 1889 and 2004.
3.8. Regression line
We carried out a meta-regression, where we test the hypothesis that the year of the study (year) predicts the mean simple RT of a study, so the regression formula is simple RT = a + b (year). This resulted in a regression line according to the formula:

With the standard error (SE) of the regression coefficient b = .265 and the 95% confidence interval of b from .1884 to 1.2271, it is clear that the regression coefficient does not traverse 0 implying that the value is significant. A precise estimate of the significance is found by calculating the absolute value of the ratio of the b coefficient to the SE of b, which results in a z value of 2.67 with an associated probability value of .003. The mean simple RT becomes significantly slower as time goes by.
Fig. 1 shows the meta-regression weighted scatter of the means of the individual studies in the meta-analysis over time. It shows that there are a couple of data points that are at quite a distance from the regression line, but they all concern samples with small N values. Most of the larger studies are close or relatively close to the regression line. Our analyses confirm that the regression formula explains the variance between the data points quite well. The residual error of the Sum of Squares (Qe) is 13.655 (df = 14; p = .47), which is non-significant. This means that after taking the year of study into account not a lot of heterogeneity is left over, so, there is only little room left for additional moderators. We conclude that the year of the study has a clear influence on the mean simple RT of a study, in that the mean simple RT becomes slower over time.
4. Discussion
The Victorian era was characterized by great accomplishments. As great accomplishment is generally a product of high intelligence, we tested the hypothesis that the Victorians were actually cleverer than modern populations. We used a robust elementary cognitive indicator of general intelligence, namely measures of simple RT.
In the present study we used the data on the secular increase in simple RT described in a meta-analysis of 14 age-matched studies from Western countries conducted between 1884 and 2004 to generate estimates of the rate of IQ decline. The decline estimate of − 1.23 IQ points per decade from the present study falls within the range of those produced in previous studies employing the magnitude of the dysgenic effect on IQ as the basis for estimating declines (i.e. − .12 to approximately − 1.3 points per decade). Our estimate is the first to be based on the use of real data rather than inference, however.
Whilst the dysgenic model is a plausible cause of the decline in RT performance Silverman (2010) does not address this potential cause, and instead offers other suggestions.
Silverman's first suggestion is that an ambient, population-wide increase in neurotoxic load stemming from persistent exposure to substances such as lead, may be responsible for the slowdown in simple RTs. Studies indicate however that the depressant effect of neurotoxins on IQ are typically least pronounced on the strongest measures of g (Lezak, 1983). As we are only considering the decline in simple RT that is due to the decline in ‘genetic g’, this is grounds for ruling out contributions from neurotoxins, as whatever is diminishing ‘genetic g’ should be a Jensen effect. This makes dysgenic fertility the prime candidate (Woodley & Meisenberg, in press).
Silverman’s other suggested cause of the decline is that the trend has resulted from those with poorer health and slower simple RTs surviving into adulthood more so in the modern era than in the past, and that it is the increasing numbers of such individuals that has diminished simple RT performance over time. We argue that this observation is fully compatible with the dysgenic model. One of the papers that Silverman cites in support of the association between health and RT is Deary and Der (2005b), who found that g mediates this relationship. One source of correlations amongst these diverse traits is pleiotropic mutation-load (Miller, 2000). Pleiotropy describes the tendency for mutations to have general rather than isolated effects on different traits. For example a mutation which reduces myelination of neurons might simultaneously diminish both IQ and RT performance, as less myelinated neurons are less able to carry signals efficiently, therefore such neurons will be less efficient at processing information in the brain ( Holm et al., 2011 and Miller, 1994). Owing to the fact that they have general physiological effects, such mutations can diminish health also (Arden, Gottfredson, & Miller, 2009). As a consequence of this, if dysgenic fertility is favouring the carriers of mutant alleles that reduce ‘genetic g’ and RT performance, the frequency of certain diseases and disorders should increase also. Indeed there is evidence that this may well be occuring (Lynn, 2011).
There are some limitations to this study. Although Silverman used stringent selection criteria the trend may nonetheless be influenced by methodological artefacts and sample peculiarities. This is a potentially important issue as there appears to be a substantial discrepancy between the test-retest coefficients in Galton's data reported by Johnson et al. (1985), i.e .21 for people tested within a year (N = 421) and .17 for people retested over any time interval (N = 1069), and the equivalent suggested coefficient of the ‘Hick’-style device employed in our reference study (.85; Deary et al., 2001). Given the large N used by Silverman (2010) in establishing the Galton simple RT means, it is unlikely that even relatively low reliability at the individual level would seriously compromise the accuracy of the group mean of Galton's data. This is especially likely to be the case given the apparent representativeness of Galton's mean relative to other contemporaneous studies of simple RT, some of which employed likely much better quality instrumentation than that used by Galton (1889), such as the electro-mechanical Hipp chronoscope ( Ladd and Woodworth, 1911 and Thompson, 1903). It should also be emphasized that whilst our value of a − 14.1 IQ point decline is an estimate based on the best meta-analytical data available, a simple inspection of our figure shows there is a non-negligible amount of scatter around the regression line. The real magnitude of the effect might therefore be several IQ points lower or even higher.
In conclusion however these findings do indicate that with respect to ‘genetic g’ the Victorians were indeed substantially cleverer than modern populations.
We would like first and foremost to thank Bruce Charlton for inspiring this study and for constructively critical comments on earlier drafts of this manuscript. He was not only the first person to propose a relationship between declining reaction times and dysgenic fertility, but he was the first to attempt an estimation of the decline using Silverman's data on his blog Charlton’s Miscellany (see: Charlton, 2012 in the references for a URL to the original). We would also like to thank Irwin Silverman for sharing his expertise on reaction time testing with us and supplying an additional data point for our meta-analysis. Finally, we would like to thank Richard Lynn, Gerhard Meisenberg, and Guy Madison for comments, which in all cases enhanced this manuscript.



    • D.S. Alexopoulos
    • Factor structure of Heim's AH4
    • Perceptual and Motor Skills, 86 (1998), pp. 643–646
    • W.K. Anger, M.G. Cassitto, Y.-X. Liang, R. Amador, J. Hooisma, D.W. Chrislip et al.
    • Comparison of performance from three continents on the WHO-recommended Neurobehavioral Core Test Battery (NCTB)
    • Environmental Research, 62 (1993), pp. 125–147
    • R. Arden, L.S. Gottfredson, G. Miller
    • Does a fitness factor contribute to the association between intelligence and health outcomes? Evidence from medical abnormality counts among 3654 US Veterans
    • Intelligence, 37 (2009), pp. 581–591
    • M. Borenstein, L.V. Hedges, J.P.T. Higgins, H.R. Rothstein
    • Introduction to meta-analysis
    • Wiley, Chichester, UK (2009)
    • C.F. Brice, A.P. Smith
    • Effects of caffeine on mood and performance: A study of realistic consumption
    • Psychopharmacology, 164 (2002), pp. 188–192
    • R.B. Cattell
    • The fate of national intelligence: Test of a thirteen-year prediction
    • The Eugenics Review, 42 (1950), pp. 136–148
    • G. Clark
    • A farewell to alms: A brief economic history of the world
    • Princeton University Press, Princeton, NJ (2008)
    • I.J. Deary, G. Der
    • Reaction time, age, and cognitive ability: Longitudinal findings from age 16 to 63 years in representative population samples
    • Aging, Neuropsychology and Cognition, 12 (2005), pp. 187–213
    • I.J. Deary, G. Der
    • Reaction time explains IQ's association with death
    • Psychological Science, 16 (2005), pp. 64–69
    • I.J. Deary, G. Der, G. Ford
    • Reaction times and intelligence differences: A population-based cohort study
    • Intelligence, 29 (2001), pp. 389–399
    • G. Der, I.J. Deary
    • Age and sex differences in reaction time in adulthood: Results from the United Kingdom Health Lifestyle Survey
    • Psychology and Aging, 21 (2006), pp. 62–73
    • J.R. Flynn
    • Massive IQ gains in 14 nations: what IQ tests really measure
    • Psychological Bulletin, 101 (1987), pp. 171–191
    • J.R. Flynn
    • What is intelligence? Beyond the Flynn effect
    • (expanded ed.)Cambridge University Press, Cambridge, UK (2009)
    • G. Forbes
    • The effect of certain variables on visual and auditory reaction times
    • Journal of Experimental Psychology, 35 (1945), pp. 153–162
    • F. Galton
    • Hereditary genius
    • Macmillan Everyman's Library, London, UK (1869)
    • F. Galton
    • Inquiries into human faculty and its development
    • Macmillan Everyman’s Library, London, UK (1883)
    • F. Galton
    • An instrument for measuring reaction time
    • Report of the British Association for the Advancement of Science, 59 (1889), pp. 784–785
    • A.W. Heim
    • AH4 group test of general intelligence manual
    • NFER, Windsor (1970)
    • L. Holm, F. Ullén, G. Madison
    • Intelligence and temporal accuracy of behaviour: Unique and shared associations with reaction time and motor timing
    • Experimental Brain Research, 214 (2011), pp. 175–183
    • J. Huebner
    • A possible declining trend for worldwide innovation
    • Technological Forecasting and Social Change, 72 (2005), pp. 980–986
    • J.E. Hunter, F.L. Schmidt
    • Methods of meta-analysis: Correcting error and bias in research findings
    • Sage, Newbury Park, CA (1990)
    • J.E. Hunter, F.L. Schmidt
    • Methods of meta-analysis (2nd Ed.): Correcting error and bias in research findings
    • Sage, Thousand Oaks, CA (2004)
    • A.R. Jensen
    • Bias in mental testing
    • The Free Press, New York, NY (1980)
    • A.R. Jensen
    • Process differences and individual differences in some cognitive tasks
    • Intelligence, 11 (1987), pp. 107–136
    • A.R. Jensen
    • The g factor: The science of mental ability
    • Praeger, Westport, CT (1998)
    • A.R. Jensen
    • Clocking the mind: Mental chronometry and individual differences
    • Elsevier, Oxford, UK (2006)
    • A.R. Jensen
    • The theory of intelligence and its measurement
    • Intelligence, 39 (2011), pp. 171–177
    • W. Johnson, T.J. Bouchard Jr., R.F. Krueger, M. McGue, I.I. Gottesman
    • Just one g: Consistent results from three test batteries
    • Intelligence, 32 (2004), pp. 95–107
    • W. Johnson, I. Deary
    • Placing inspection time, reaction time, and perceptual speed in the broader context of cognitive ability: The VPR model in the Lothian Birth Cohort 1936
    • Intelligence, 39 (2011), pp. 405–417
    • R.C. Johnson, G. McClearn, S. Yuen, C.T. Nagosha, F.M. Abern, R.E. Cole
    • Galton's data a century later
    • American Psychologist, 40 (1985), pp. 875–892
    • A.F. Jorm, K.J. Anstey, H. Christensen, B. Rodgers
    • Gender differences in cognitive abilities: The mediating role of health state and health habits
    • Intelligence, 32 (2004), pp. 7–23
    • G.T. Ladd, R.S. Woodworth
    • Physiological psychology
    • Scribner, New York, NY (1911)
    • H.M. Lefcourt, J.M. Siegel
    • Reaction time behaviour as a function of internal–external control of reinforcement and control of test administration
    • Canadian Journal of Behavioural Sciences, 2 (1970), pp. 253–266
    • T. Lentz
    • Relation of IQ to size of family
    • Journal of Educational Psychology, 18 (1927), pp. 486–496
    • M.D. Lezak
    • Neuropsychological assesment
    • Oxford University Press, New York, NY (1983)
    • R. Lynn
    • Dysgenics: Genetic deterioration in modern populations
    • (revised ed.)Ulster Institute for Social Research, London, UK (2011)
    • R. Lynn, T. Vanhanen
    • Intelligence: A unifying construct for the social sciences
    • Ulster Institute for Social Research, London, UK (2012)
    • E.M. Miller
    • Intelligence and brain myelination: A hypothesis
    • Personality and Individual Differences, 17 (1994), pp. 803–832

    • C. Murray
    • Human accomplishment: The pursuit of excellence in the arts and sciences, 800 BC to 1950
    • Harper Collins, New York, NY (2003)
    • U. Neisser (Ed.), The rising curve. Long-term gains in IQ and related measures, American Psychological Association, Washington DC (1997)
    • T.E. Reed, P.A. Vernon, A.M. Johnson
    • Sex difference in brain nerve conduction velocity in normal humans
    • Neuropsychologica, 42 (2004), pp. 1709–1714
    • R.D. Retherford, W.H. Sewell
    • Intelligence and family size reconsidered
    • Social Biology, 35 (1988), pp. 1–40
    • F.V. Rijsdijk, P.A. Vernon, D.I. Boomsma
    • The genetic basis of the relation between speed-of-information-processing and IQ
    • Behavioral Brain Research, 95 (1998), pp. 77–84
    • H. Rindermann, M. Sailer, J. Thompson
    • The impact of smart fractions, cognitive ability of politicians and average competence of peoples on social development
    • Talent Development & Excellence, 1 (2009), pp. 3–25
    • J.P. Rushton
    • The “Jensen effect” and the “Spearman–Jensen hypothesis” of Black–White IQ differences
    • Intelligence, 26 (1998), pp. 217–225
    • J.P. Rushton, A.R. Jensen
    • The rise and fall of the Flynn effect as a reason to expect the narrowing of the Black–White gap
    • Intelligence, 38 (2010), pp. 213–219
    • F.L. Schmidt, J.E. Hunter
    • Theory testing and measurement error
    • Intelligence, 27 (1999), pp. 183–198
    • R.H. Seashore, R. Starmann, W.E. Kendall, J.S. Helmick
    • Group factors in simple and discrimination reaction times
    • Journal of Experimental Psychology, 29 (1941), pp. 346–394
    • I.W. Silverman
    • Simple reaction time: It is not what it used to be
    • The American Journal of Psychology, 123 (2010), pp. 39–50
    • V. Skirbekk
    • Fertility trends by social status
    • Demographic Research, 18 (2008), pp. 145–180
    • A. Smith, W. Sturgess, N. Rich, C. Brice, C. Collison, J. Bailey et al.
    • The effects of idazoxan on reaction times, eye movements and the mood of healthy volunteers and patients with upper respiratory tract illness
    • Journal of Psychopharmacology, 13 (1999), pp. 148–151
    • S. Taimela
    • Factors affecting reaction-time testing and the interpretation of results
    • Perceptual and Motor Skills, 73 (1991), pp. 1195–1202
    • S. Taimela, U.M. Kujala, K. Osterman
    • The relation of low grade mental ability to fractures in young men
    • International Orthopedics, 15 (1991), pp. 75–77
    • J. te Nijenhuis, S.H. Cho, R. Murphy, K.H. Lee
    • The Flynn effect in Korea: Large gains
    • Personality and Individual Differences, 53 (2012), pp. 147–151
    • J. te Nijenhuis, R. Murphy, R. van Eeden
    • The Flynn effect in South Africa
    • Intelligence, 39 (2011), pp. 465–467
    • H.B. Thompson
    • The mental traits of sex. An experimental investigation of the normal mind in men and women
    • The University of Chicago Press, Chicago, IL (1903)
    • S.G. Thompson, S.J. Sharp
    • Explaining heterogeneity in meta-analysis: A comparison of methods
    • Statistics in Medicine, 18 (1999), pp. 2693–2708
    • M. van Court, F.D. Bean
    • Intelligence and fertility in the United States: 1912–1982
    • Intelligence, 9 (1985), pp. 23–32
    • M.A. Woodley
    • The social and scientific temporal correlates of genotypic intelligence and the Flynn effect
    • Intelligence, 40 (2012), pp. 189–204
    • M.A. Woodley, A.J. Figueredo
    • Historical variability in heritable general intelligence: Its evolutionary origins and socio-cultural consequences
    • The University of Buckingham Press, Buckingham, UK (2013)
    • M.A. Woodley, G. Meisenberg
    • A Jensen effect on dysgenic fertility: An analysis involving the National Longitudinal Survey of Youth
    • In press Personality and Individual Differences (2013) http://dx.doi.org/10.1016/j.paid.2012.05.024

Corresponding author contact information
Corresponding author at: Department of Psychology, Umeå University, Sweden.
MAW conceived the analysis and drafted the manuscript.
The two first authors contributed equally to this study; the order of the names is random.
JTN conducted the analyses and contributed to subsequent drafts of the manuscript.
RM collected and validated the data used in the analysis.