Snake Oil in a Computer -- The Pseudo-Science of Transportation Modeling

Snake Oil in a Computer --

The Pseudo-science of Transportation Modeling

Michael J. Vandeman

August 16, 1991

This is the Decade of the Environment. Planners, politicians, and other decision-makers want to know what effect their projects will have on the environment. In many cases they don't really want to know, but want to convince their constituents that the results will be beneficial, or at least neutral. In both cases, computer modeling is being used to "answer" the questions. But does computer modeling answer the questions?

If you buy a ballpoint pen, you can quickly determine if it does what it is advertised to do. The pen either writes well enough, or it doesn't. If you buy a medicine, it becomes more difficult. Perhaps the improvement in your illness is a placebo effect. Maybe there is an effective ingredient, but you are also paying for several "fillers". Maybe you get worse before you get better. Maybe the body cures itself and the drug is irrelevant. "Snake oil" is any medicine that has not been proven effective. Computer modeling of transportation projects (e.g. air quality impacts) is, precisely speaking, snake oil.

First a word about my qualifications: I have a B.A. Magna cum Laude from the University of California at Berkeley in Mathematics, With Special Distinction in Mathematics. As a Junior at U.C. Berkeley, I ranked 37 1/2th out of 1300 college math students in the nation in the annual mathematics contest sponsored by the Mathematical Association of America. I have an M.A. in Mathematics (including study in Statistics) from Harvard University. And my Ph.D. from the University of California at Los Angeles is in Psychology, concentrating in Psychometrics. Psychometrics is the science of the measurement of human behavior and traits, and forms the scientific basis upon which transportation modeling and all other forms of human measurement rest. I taught measurement theory -- specifically, Reliability and Validity -- at California State University, San Francisco. I have been a computer programmer for 29 years. I taught computer science for U.C. Berkeley Extension. In other words, I am an expert in mathematics, statistics, scientific method, measurement science (including modeling), and computer science.

Modeling is really a very simple process, when the modeler is not trying to make it mysterious. A scientific principle is expressed in the form of a mathematical formula. Then data are substituted for the variables in the formula, allowing a result to be computed (e.g. emissions of CO, from vehicle type, speed, temperature, etc.). When the formula is in dispute, statistics must be used to determine if it does what its users want it to do. The relevant factors are reliability (giving repeatable results) and validity (measuring what it is supposed to be measuring). Both are measured using correlations, and only qualify the model to be utilized in situations similar to those in which it was validated, if at all. For example, an intelligence test that was validated only on white, middle class Americans could be expected to give meaningful results only when used with such subjects.

If every measure must be validated by comparing it with "the real thing", one might ask why measures and models are used at all -- why not just use the "real thing"? The answer is simply practicality: the test is relatively quick and easy to administer, whereas rigorous scientific research is very slow and expensive. A yardstick is available in any hardware store, but a highly accurate scientific instrument is unwieldy and extremely expensive. However, one must never forget that the reliability and validity of the measure or model is strictly limited; a judgment of reliability and validity doesn't confer any magical ability to predict accurately in all situations, nor any special consonance of the formula used with the forces that guide the universe!

Even in physics, the "hardest" of the sciences, reliability and validity are limited. Newtonian physics may be adequate to predict events on Earth, but fails utterly when applied to the behavior of the stars or the nucleus of the atom. There, the more accurate formulas of Einstein's Relativity must be used. When we come to predicting human behavior, both reliability and validity tend to be so low that accurate predictions are impossible. In other words, the probability that your conclusion is correct would be very close to 0.5. It could not be relied upon. And where the stakes are as high as they are with highway construction (air pollution and massive environmental destruction on one hand, loss of millions of dollars of federal and state subsidies on the other), transportation models are far too unreliable a tool.

Take, as an example, a bathroom scale. It is "calibrated" by turning a thumbscrew until it reads zero when no weight is on it. This only guarantees an accurate reading at one point -- zero. It does not guarantee that any other reading of the scale is correct. Bathroom scales are generally fairly "reliable". In other words, if you weigh yourself several times, and if others weigh you using that scale, the readings will all be very close, if not identical (assuming, of course, that you don't snack or go to the bathroom in between weighings!). The scale is also fairly "valid": the readings are close to those that our most accurate scientific scales would give. This is because the internal mechanism of the scale responds in a fairly linear fashion to weight, and because the dial has been designed ("calibrated") in such a way that a 100 pound weight causes it to read "100". However, it is not perfect. Jumping up and down on it might affect its future accuracy, as might metal fatigue. It is also important to use the tool properly: it is not designed to accurately weigh either bacteria or sumo wrestlers: it is reliable and valid only within a certain range of uses.

It is extremely important that the reliability and validity of the instrument actually be measured, within its intended domain of use (e.g., in this case, with humans between, say, 50 and 250 pounds), and that the results be publicly available: without these two numbers (usually given in terms of a "correlation coefficient"), the manufacturer has no right to advertise and sell the scale as a measuring instrument! (Some manufacturers circumvent such requirements by advertising their device as a toy, conversation piece, decoration, etc., instead of a measuring instrument.) In commercial use, such instruments must pass strict regulations. For example, a grocer can be jailed for deliberately using an inaccurate scale for weighing his goods.

In the case of instruments that purport to predict the future, similar criteria obtain: their results must be reliable (achieve very similar results when repeated by the same or different measurers) and valid (correlate highly with "reality" -- that is, with actual values of the variables that are being predicted). For example, independent parties should be able to obtain highly correlated (close) values when using a model to predict future pollutant emissions from a highway project, and these predictions must correlate highly with actual emissions in test (validation) situations. In addition, a model developed and validated in Los Angeles might not be valid for use in another part of the country, such as Berkeley (due, perhaps, to different attitudes toward the automobile?).

MTC (the Metropolitan Transportation Commission, of the San Francisco Bay Area) and its consultants violate all of the rules of measurement science in attempting to predict emissions changes due to highway expansion. The documentation for both MTCFCAST and STEP (MTC's computer models for predicting the impacts of transportation projects) reveals no evidence that reliability was ever measured. It makes vague, non-quantified claims of validity, but shows no evidence that the concept of validity was even understood. After running the model, the results were compared with one set of data. Then in a process they called "calibration", they modified the coefficients of the formula to make the model conform to that set of data. They imply that this process makes the model valid. Actually, all that it does is to make the formula "predict" one set of data. If it were to be applied to another set of data, or if a different factor were to be "predicted" using it, even if it were applied to a similar set of data, there is absolutely no guarantee that it would continue to predict accurately. In other words, this process does not result in a valid model. It merely conforms the data to the model. As an analogy, it is as if MTC had a ruler made of putty, and stretched or shrank it in different situations to make it register in a way convenient to the situation. If the formula is wrong, changing its coefficients won't help. An entirely different formula may be required!

All of the models use a standard formula that they call a "logit model equation". An example is P(q,i-j) = exp(Uj) / SUM(exp(Uk)) (k=1 to j). Here "exp" means e raised to a certain power, where e is the base of the natural logarithms. Out of the billions of possible formulas that could be used in the model, there is absolutely nothing special about this one, that qualifies it to be used in transportation modeling! The probability that it is the best formula to use is practically zero. The fact that it has been used by others has nothing whatever to do with whether it is valid.

In short, it is extremely unlikely that MTC's models have sufficient validity even to predict in situations similar to ones used in the past. And even if they had some reliability and validity in such situations, the probability that the models would continue to work in new situations (e.g. the collapse of a segment of a freeway, or the expansion of a freeway not studied before) is vanishingly small. Stated more simply, "Garbage In -- Garbage Out" (GIGO). Saying that they are "state of the art" with regard to transportation and air quality modeling merely compares garbage with garbage. Transportation modelers are not known for their impartiality, nor for their sophistication with regard to statistics or measurement science.

So how, then, are we to predict the effects of freeway expansion, if all current models are worthless? We have to fall back on basic scientific research, which is not easy or cheap, but which is the only way we have of reliably answering such questions. Agencies or researchers that receive funding for supporting the building of freeways have little motivation to develop accurate models, when they have models that give them the conclusions that they wish (basically, an extension of the status quo). They have even less interest in funding or conducting honest, unbiased research on this question. It can only be accomplished by scientists who have not been "bought" by highway interests.

On the other hand, why do any research? Isn't it obvious that expanding highways can only encourage prople to drive more, and hence worsen air quality?! And if, as required by the greenhouse effect, we must decrease traffic by 50% below current levels, isn't it obvious that we won't need all that extra pavement?!

Common sense leads one to ignore lies told by computers, just as we learned in the past to ignore the lies told by statistics, politicians, and snake oil salesmen.

Unfortunately, some decision makers still take these models seriously, and some of their constituents are taken in by them. The Bay Area Air Quality Management District, in the Draft Environmental Impact Report for its proposed State Clean Air Plan, called the MTC modeling process "beyond state of the art"! U.S. District Court (Ninth Circuit) judge Thelton Henderson rebuked MTC for its earlier air quality conformity (with the federal Clean Air Act) assessment procedure for highway projects, where they simply rated them as "Beneficial", "Neutral", or "Potentially Detrimental", and put all highway expansion in the Bay Area on hold with an injunction. However, he accepted their new procedures -- apparently because they used a computer and some formulas in common use among planners -- in spite of the fact that the models have never been validated! In other words, Judge Henderson has nullified the conformity assessment requirements of the Clean Air Act: if you want to build a new freeway, all you have to do is run some numbers through a computer model that "proves" it will improve air quality. If the computer says highway expansion is good for the air, then, by golly, it must be true!

Because this was a federal suit (Sierra Club vs. MTC), Henderson's decision has set a precedent for the whole United States. MTC's consultant, Greig Harvey, is now in great demand around the country, where other pro-highway planners and agencies will have to use similar procedures to prove that their highway projects will help clean up the air. For those of us who don't want to see our neighborhoods and the rest of the world turned into another Los Angeles, it behooves us to arm ourselves with some knowledge of mathematics, computer programming, and measurement science. Learn to recognize snake oil! And let's use our freedom of speech to demand honesty (and clean air) from our government.