It would be a serious mistake to think corporations like Big Tobacco have a monopoly on socially irresponsible denial games. In the past few decades, there has been widespread of concern over the decline in American educational standards. One of the major indicators of educational decline has been the steady decline of test scores on college entrance examinations such as the SAT, or Scholastic Aptitude Test. For many years, though, the response of many educators to the decline in test scores has been flat denial that the decline had any significance. The denial games continue to the present day.
National score data for the SAT are first available for 1952. Between then and 1963, SAT test scores held constant or even increased, despite the fact that the proportion of high-school students taking the SAT rose from 7% in 1952 to 30% in 1963, and thus that many less-qualified students were taking the test. It seems reasonable to conclude that the quality of American education held steady or even increased a bit during those years. In 1964, scores declined, and by 1970, national average scores on the verbal aptitude portion of the SAT had fallen from 478 out of a possible 800 to 460; mathematical aptitude scores fell from 502 to 488. When millions of people are taking the test, even a small variation in the average can be significant. By 1977, verbal scores were down to 429, math scores to 470. By 1981, scores had declined for 19 consecutive years; verbal scores had fallen a total of 54 points to 424, math scores had fallen 36 points to 466. In 1982, for the first time in two decades, scores rose; math by one point, verbal scores by two.
If the drop in test scores had lasted only a few years or amounted to only a few points, we might be justified in writing the shift off as a statistical fluke. Denying a 19-year decline is the educational equivalent of denying that smoking causes lung cancer. After critics of the SAT charged that the decline in scores up to 1970 had no significance but merely reflected drift in the grading standards, Educational Testing Service, the authors of the SAT, conducted a survey. They found there had indeed been a drift, but in the direction of leniency, and the actual decline had been ten points greater than scores indicated! Another explanation that had been advanced was that the influx of disadvantaged minorities beginning in the mid-1960's tended to lower test scores because less-skilled students who would not otherwise have taken the SAT were taking it. Most analysts of the decline in test scores have concluded that this effect was strongest in the late 1960's and early 1970's and may account for up to half of the decline. In recent years, minority test scores, though still lower than white scores, have increased. Black math scores rose 12 points and verbal scores 9 points between 1976 and 1982, while white scores dropped ten points in math and seven points in verbal skills in the same period! After we allow for all the demographic effects, we are still left with almost a full decade during which many educational theorists either denied there was a problem or worse yet, attacked the tests.
One of the commonest attacks on testing methods might be called the "deep thinker" fallacy. Banesh Hoffmann, a major critic of testing, presented the following example in his book The Tyranny of Testing of a true-false question that is true on one level but false on another: George Washington was born on February 22, 1732 --True/False. A mediocre thinker might immediately answer true, but a deeper thinker, knowing that Britain adopted the Gregorian Calendar after Washington was born, and that Washington's original birthdate was February 11, might well suffer some confusion.
If only it were true that this was a common problem! There would be no need for me to have pages on pseudoscience! A deep thinker might well lose an occasional point because of such a question, but that loss is more than made up by the extra points gained by being able to answer more difficult questions that others miss. The student is not taking the test in a vacuum, either; he or she knows why the test is being given. If the questions are generally simple, the answer is likely to be "true"; in a course on astronomy or the history of science the expected answer might well be "false", but the student who is a really deep thinker (and paying attention in class) should have little trouble telling the two situations apart.
One example of a test question on the SAT that was successfully challenged involved a circle rolling around the circumference of another circle three times larger in diameter. The question: how many revolutions does the smaller circle make? Before going on, answer the question. Then answer these: what happens if the stationary coin is twice as large as the rolling coin? The same diameter? Half the diameter? Now explain why.
The expected answer on the SAT was three, but an outside observer would actually see four; three from the rolling and one more from travelling around the circle. The question is exactly like asking whether the Moon rotates; as seen from the Earth, no, but as seen from anyplace else, yes -- once every month as it travels around the earth. Thousands of students picked up a few points when ETS allowed four as a possible answer, but there is not the slightest reason to think that more than a tiny fraction of them could reason out correctly why four might be a correct answer. To get credit, they should have been asked to explain why four was a correct answer. (Incidentally, the answers to the other questions above are: relative to the stationary coin, 2, 1 and 1/2, and relative to an outside observer, 3, 2, and 1-1/2.)
The "deep thinker" fallacy is simply a variation on the "Galileo fallacy" so widespread among cranks: they persecuted Galileo and he was right. They persecute me, therefore I am also right. In this context, a brilliant student got the rolling-circle question wrong. I also got the question wrong, therefore, I must be brilliant. A very widespread version involving Einstein goes that Einstein didn't like school and was considered dull by his teachers, therefore any student who doesn't like school and is considered dull by his teachers is a potential Einstein.
Few people have been more critical of corporate denial games than Ralph Nader and his followers, but a Nader Group study, The Reign of ETS, by Allan Nairn and Associates, is one of the classic attacks on testing. The study attacks the SAT and similar tests for not testing relevant skills, for being useless as a predictor of future performance, and for being primarily a measure of social class.
The charge that test scores do not predict future performance is based on the fact that test scores have only a weak correlation with first-year college grades and even lower correlation with later grades and lifetime earnings. Would the group accept college grades and lifetime earnings as measures of competence in some other context? Of course not! College students major in a wide variety of subjects, so that a student with poor math or verbal skills can all too easily find ways to avoid subjects that tax those skills and end up with a high, if meaningless, grade-point average. It is also a sad fact that a semi-literate athlete or entertainer earns more than a college professor. Criticizing tests because they fail to agree with measures that measure nothing is bizarre methodology. The real questions should be, how do students with poor math scores on the SAT do in situations that require mathematical skills? How well do students with poor verbal scores on the SAT do when confronted with the need to read complex literature or write something of their own? And really, to deal squarely with the issue of unfairness, the question is how often do students with poor SAT scores perform at a high level in college, without requiring remedial education?
The best way to consider whether the tests measure relevant skills is to look at two actual SAT questions.
All of the complaints about testing become utterly irrelevant when weighed against the simple fact that the tests are so trivial than no literate person should have problems with them. The math required is nothing more than simple algebra and grade-school geometry, and the factual knowledge required is nil. Students are not expected to know the name of a single historical figure, state or country, star, planet, plant, or animal.
The sociological reasoning of the study is interesting. Consider the following three quotes from The Reign of ETS.
First of all, if income is mostly correlated with one's social class, why criticize the SAT for failing to predict income? It makes about as much sense as criticizing the test because it fails to predict height or hair color. More interestingly, the three statements are mutually incompatible; if income and test scores both increase in proportion to social class, then how can there be no apparent correlation between income and test scores? People with high incomes, who come from affluent families, should tend to have higher test scores even if the scores do reflect nothing more than social class. And how could we be certain which factor causes the high income? Is it at all possible that affluent people are affluent because they have more discipline and better attitudes toward learning? Something is seriously wrong with the statistical methodology.
The usual statistical measure of correlation is called the "correlation coefficient". It has a value of 1 for perfect correlation, zero for no correlation at all, and -1 for perfect negative correlation. The correlation coefficient between SAT scores and first-year college grades is 0.35. This is a rather moderate correlation; it would be more useful to examine the correlation between math SAT scores and mathematics grades, and so on. The Nader Group prefers a different measure, the "percentage of perfect prediction", which is the square of the correlation coefficient; in this case 0.119, which is impressively smaller than 0.35. On this basis, Nairn and Associates claim that the SAT scores account for only 11.9% of perfect prediction of first-year college grades and are essentially worthless.
Now, what's the correlation coefficient between social class and SAT score? Here Nairn and Associates pursue a very different course. Unlike the correlation between test scores and grades, for the link between family income and test scores they provide lengthy tables comparing test scores and average family income. There is indeed a correlation, but without the correlation coefficient, we have no way of knowing how significant the correlation is -- and there is no mention of the correlation coefficient anywhere in the discussion! There is a passing reference to it in a footnote in the back of the book; it is -- 0.35! The level of correlation that supposedly makes tests worthless in predicting grades suddenly becomes ironclad proof that social class determines test scores!
There is, for the record, no doubt that test scores among disadvantaged students are lower than among the affluent. Using this correlation to deny the reality that literacy is lower among the poor than among the rich is about the most socially irresponsible stance imaginable. It in fact amounts to intellectual apartheid. No surer means of keeping the poor isolated (and dependent on social service programs?) could possibly be imagined. Reform college admission procedures, provide more remedial services -- anything but use statistical mumbo-jumbo to deny reality!
Created 8 July 1998, Last Update 02 June, 2010
Not an official UW Green Bay site