Friday, December 03, 2010

My comments on Alcock's comments on Bem's precognition article

The Skeptical Inquirer (online version) has published a commentary on Daryl Bem's precognition research, written by dedicated skeptic James Alcock. The same story has also bubbled up to the attention of the mainstream media, including the New York Times and NPR. Some of the op-eds border on the hysterical. Others are more rational.

An example of the hysterical type includes Alcock's article. I will not address his critique of Bem's procedure (Bem does that calmly and effectively, hurting Alcock's feelings in the process), but I will comment on his preamble, which I reproduce here in indented blue text.
Parapsychology has long struggled unsuccessfully for acceptance in the halls of science. Could this article be the breakthrough? After all, it apparently provides evidence compelling enough to persuade the editors of that APA journal of its worthiness. However, this is hardly the first time that there has been media excitement about new "scientific" evidence of the paranormal. Over the past 80-odd years, this drama has played out a number of times, and each time, parapsychologists ultimately failed to persuade the scientific world that their phenomena actually exist. Recalling George Santayana’s now-clichéd dictum, "Those who cannot remember the past are condemned to repeat it,” we should approach Bem’s work with a historical framework to guide us. Consider the following:
DR: One reason for this failure are myths about psi research that are incessantly repeated by skeptics whose Bayesian priors are so close to zero that is it virtually impossible for any new evidence to sway their original beliefs. The strategy of repeating the same talking points ad nauseum is effective because most listeners eventually absorb those words as though they are true. A genuine skeptic would wonder if the critique offered by Alcock is backed up by solid facts. After doing some homework, he or she would eventually discover that most of it isn't.
  1. In 1934, Joseph Banks Rhine published Extra-sensory perception (Rhine & McDougall, 1934/2003), summarizing his careful efforts to bring parapsychology into the laboratory through application of modern psychological methodology and statistical analysis. Based on a long series of card-guessing experiments, he wrote: "It is independently established on the basis of this work alone that Extra-Sensory Perception is an actual and demonstrable occurrence.” (p. 210).

Elsewhere, he wrote: "We have then, for physical science, a challenging need for the discovery of the energy mode involved. Some type of energy is inferable and none is known to be acceptable….” (p.166)

Despite Rhine’s confidence that he had established the reality of extrasensory perception, he had not done so. Methodological problems eventually came to light, and as a result, parapsychologists no longer run card-guessing studies, and rarely even refer to Rhine’s work.

Is this conclusion sound? No. In Rhine's 1940 book, Extra-Sensory Perception After 60 Years, which refers to the period 1880 to 1940 (the authors were Pratt, Rhine, Smith and Stuart), Rhine et al discussed in great detail every critique their work had received and how potential design and analytical loopholes were addressed in subsequent experiments. They also listed all known replications of their card guessing method. This volume makes it clear that Rhine and his colleagues were as methodologically sophisticated and as hard nosed as their harshest critics, and that their data -- when viewed under the most critical light available at the time -- withstood those critiques.

One key reason why Rhine's work failed to sustain the initial excitement it had generated in the 1930s was the rise of behaviorism in academic psychology. Within that paradigm, not only was ESP considered to be flatly impossible, but any form of subjective experience, including conscious awareness, became forbidden topics. The reason few researchers today use ESP cards is not because the method was flawed, but because better methods were developed. Like any other area of research, methods and ideas naturally evolve and build upon the work of previous generations.

  1. Physicist Helmut Schmidt conducted numerous studies throughout the 1970s and 1980s that putatively demonstrated that humans (and animals) could paranormally influence and/or predict the output of random event generators. Some of his claims were truly extraordinary – for example, that a cat in a cold garden shed, heated only by a lamp controlled by a random event generator, was able through psychokinetic manipulation of the random event generator to turn the lamp on more often than would be expected by chance. His claim to have put psi on a solid scientific footing garnered considerable attention, and his published research reported very impressive p-values. In my own extensive review of his work (Alcock, 1988), I concluded that Schmidt had indeed accumulated impressive evidence that something other than chance was involved. However, I found serious methodological errors throughout his work that rendered his conclusions untenable, and the “something other than chance” was attributable to methodological flaws.

As with Rhine, excitement about Schmidt's research gradually dwindled to the point that his work became virtually irrelevant, even within parapsychology itself.

Accusations of "serious methodological errors" provide an easy justification for dismissing this remarkable body of work. But is it really true that such errors made Schmidt's work irrelevant, or that other researchers did not follow up his work? No. Hundreds of studies involving random number generators were published after Schmidt's studies, and meta-analyses of those experiments have been published and debated in mainstream physics and psychology journals. Schmidt's work inspired dozens of researchers to replicate and extend his work, and it continues to do so today in research programs like the Global Consciousness Project. His work has also spawned several psi-related patents.

  1. The 1970s gave rise to "remote viewing,” a procedure through which an individual seated in a laboratory could supposedly receive psychic impressions of a remote location being visited by someone else. Physicists Russell Targ and Harold Puthoff claimed that their series of remote viewing studies demonstrated the reality of psi. This attracted huge media attention, and their dramatic findings (Targ & Puthoff, 1974) were published in Nature, one of the world's top scientific journals. At first, their methodology seemed unassailable, but years later, when more detailed information became available, it became obvious that there were fundamental flaws in their procedure that could readily account for their sensational findings. When other researchers repeated their procedure with the flaws intact, significant results were obtained; with flaws removed, outcomes were not significant (Marks & Kamman, 1978; 1980).

Add Targ and Puthoff to the list of “breakthrough” researchers whose work is now all but forgotten.

Did supposed flaws adequately account for the results of remote viewing studies? No. Were those study designs abandoned? No. Did skeptics like Ray Hyman, who reviewed a small subset of the SRI/SAIC remote viewing studies for the CIA, conclude that the studies were flawed? No. Did this research paradigm, which was an updated version of picture-drawing techniques developed a half-century earlier, disappear? No.

Targ and Puthoff, and later Ed May and colleagues, continued not only to conduct substantial research on remote viewing, but it proved to be so useful for gathering information in a unique way that it was ultimately used for thousands of operational missions by the DoD. Some portions of the history of the formerly secret Stargate program (and other projects with different code names) is in the public domain now, so it is not necessary to go into that here. Suffice it to say that those research programs were very carefully monitored by skeptical scientific oversight committees who continued to recommend funding for over two decades (as long as the program remained secret).

  1. In 1979, Robert Jahn, then Dean of Engineering and Applied Science at Princeton University, established the Princeton Engineering Anomalies Research unit to study putative paranormal phenomena such as psychokinesis. Like Schmidt, he was particularly interested in the possibility that people can predict and/or influence purely random subatomic processes. Given his superb academic and scientific credentials, his claims of success drew particular attention within the scientific community. When his laboratory closed in 1970, Jahn concluded that: “Over the laboratory's 28-year history, thousands of such experiments, involving many millions of trials, were performed by several hundred operators. The observed effects were usually quite small, of the order of a few parts in ten thousand on average, but they compounded to highly significant statistical deviations from chance expectations. 3

However, parapsychologists themselves were amongst the most severe critics of his work, and their criticisms were in line with my own (Alcock, 1988). More important, several replication attempts have been unsuccessful (e.g., Jeffers, 2003, including a large-scale international effort led by Jahn himself (Jahn et al, 2000).

One more name for the failed-breakthrough list.

Other than the flub about a lab closing in 1970 that opened in 1979, again we see a cavalier dismissal of decades of research, implying that the work was systematically sloppy or methodologically naive or both. Nothing can be further from the truth. I was at Princeton for three years and spent enough time in the PEAR Lab to know that the research conducted there was as rigorously vetted and executed as any scientific project you will find anywhere. There weren't just "several replications." There were hundreds. The PEAR Lab's RNG research was a replication and extension of Helmut Schmidt's studies, and their remote perception research was a replication and extension of the SRI/SAIC remote viewing research. PEAR successfully and independently replicated both of those study designs. Even Jeffers, who Alcock cites to suggest that the PEAR RNG work could not be replicated, was later involved in a successful RNG experiment.

Do parapsychologists criticize each others' work? Of course they do. As in any scientific discipline, those who know the most are also the most qualified to provide critiques. This is healthy for advancing any field and for refining methods and interpretations, and such debates can be found in all areas of science and scholarship. It does not mean that colleagues are suggesting a wholesale dismissal of the evidence, as devout Skeptics are wont to do.

  1. In the1970s, the Ganzfeld, a concept borrowed from contemporaneous psychological research into the effects of sensory deprivation, was brought into parapsychological research. Parapsychologists reasoned that psi influences may be so subtle that they are normally drowned out by information carried through normal sensory channels. Perhaps if a participant were in a situation relatively free of normal stimulation, then extrasensory information would have a better opportunity to be recognized. The late Charles Honorton carried out a large number of Ganzfeld studies, and claimed that his meta-analysis of such work substantiated the reality of psi. Hyman (1985) carried out a parallel meta-analysis which contradicted that conclusion. Hyman and Honorton (1986) subsequently published a "Joint Communiqué" in which they agreed that the Ganzfeld results were not likely to be due to chance, but that replication involving more rigorous standards was essential before final conclusions could be drawn.

Daryl Bem subsequently published an overview of Ganzfeld research in the prestigious Psychological Bulletin (Bem & Honorton, 1994), claiming that the accumulated data were clear evidence of the reality of paranormal phenomena. That effort failed to convince, in part because a number of meta-analyses have been carried out since, with contradictory results (e.g., Bem, Palmer & Broughton, 2001, Milton & Wiseman, 1999). Recently, the issue was raised again in the pages of Psychological Bulletin, with papers from Storm et al (2010) and Hyman (2010). While the former argued that their meta-analyses demonstrate paranormal influences, Hyman pointed to serious shortcomings in their analysis, and reminded us that the Ganzfeld procedure has failed to yield data that are capable of being replicated by neutral scientists.

Because of the lack of clear and replicable evidence, the Ganzfeld procedure has not lived up to the promise of providing the long-sought breakthrough that would lead to acceptance by mainstream science.

Add Honorton (and Bem first-time-around) to the list.

Yes, Honorton and Hyman agreed that the results available in 1986 were not due to chance, and that confirmations with new data would be required to be persuasive. Honorton subsequently provided this successful prospective replication, and Bem and Honorton published it in 1993. That should be the end of the skeptical story.

But now we learn that the successful replication was in fact not persuasive because of publications that appeared six years later that presented "contradictory" results. Besides the retrocausal reason for dismissing a successful replication, is it really true that the meta-analyses were contradictory, or that avowed skeptics cannot successfully replicate the effect? No. Neither claim is true. The Milton & Wiseman (1999) analysis was flawed because it used unweighted statistics. When proper methods, based on a simple hit/miss count, are employed, that meta-analysis produces a statistically significant positive outcome. In fact, of the half-dozen meta-analyses of the ganzfeld database published to date, every single one is significantly positive (this is discussed in the online journal NeuroQuantology in an article by Tressoldi, Storm and Radin [Dec 2010, Vol 8 (4]). So rather than being contradictory, the existing ganzfeld database is actually completely consistent. In addition, skeptics have successfully repeated the ganzfeld experiment (it's not obvious from that article's abstract, but it is described in the paper itself).

What is the lesson from this history? It is that one should give pause when presented with new claims of impressive evidence for psi. Early excitement is often misleading, and as Ray Hyman has pointed out, it often takes up to 10 years before the shortcomings of a new approach in parapsychological research become evident.

In other words, Alcock suggests that we don't need to pay attention to new experimental data because it will probably, eventually, be shown to be flawed in some way. If promissory dismissals were regularly applied to any other area of science, everything would come to a grinding halt. No new findings would ever appear in any domain, because research methods are evolving and today's data and analyses are never going to be as good as tomorrow's.

One must also keep in mind that even the best statistical evidence cannot speak to the causes of observed statistical departures. Statistical deviations do not favour arbitrary pet hypotheses, and statistical evidence cited in support of psi could as easily support other hypotheses as well. For example, if one conducted a parapsychological experiment while praying for above-chance scoring, statistically significant outcomes could be taken as evidence for the power of prayer just as readily as for the existence of psi.

Yes, there can be many interpretations of experimental results. It is the investigators' job to devise methods that as clearly as possible distinguish between possible explanations. In any case, it is not necessary to have an explanation for observed results. If it were necessary then science would never had advanced. The moment theory is allowed to trump observations, science will collapse into a dogmatic religion.

Another key consideration: Parapsychology’s putative phenomena are all negatively defined – to claim that psi has been detected, all possible normal influences must be ruled out. However, one can never be certain that all normal influences have been eliminated; the reader of a research report has only the experimenter’s word for it.

This brings us to a related concern. Research reports involve an implicit social contract between experimenter and audience. The reader can only evaluate what has been put into print, and must presume that the researcher has followed the best practices of good research. We assume that the participants did actually participate and that they were not allowed to use their cellular telephones during the experiment, or to chat with other participants. We assume that they were effectively shielded from cues that might have inappropriately influenced their responses. We assume that the data were as reported, that none were thrown out because they did not suit the experimenter, and that they were analyzed appropriately and in the manner indicated. We assume that equipment functioned as described, and that precautions reported in the experimental procedure were carefully followed. We take for granted that the researcher set out to test particular hypotheses, and did not choose the hypotheses after looking at the data. We must take all this on faith, for otherwise, any research publication might simply be approached as a blend of fact, fantasy, skill, and error, possibly reflecting little other than the predilections of the researcher. Obvious methodological or analytical sloppiness indicates that the implicit social contract has been violated and that we can no longer have confidence that the researcher followed best practices and minimized personal bias. As Gardner (1977) wrote, when one finds that the chemist began with dirty test tubes, one can have no confidence in the chemist's findings, and one must wonder about other, as yet undetected, contamination. So, when considering the present research, we need not only to look at the data, but, following the metaphor, we need to assess whether Bem used clean test tubes.

This implies that it is possible to devise a perfect experiment. The clean test tube metaphor is a nice ideal, but in the real world there is no such thing. Nevertheless the ideal always gives the critic a convenient reason to dismiss any result that he or she prefers to disbelieve. If no actual flaw can be found, then one can always raise suspicions, or propose implausible scenarios that never actually occurred, or if all that fails, then they can just default to the catch-all criticism, "let's wait 10 years before we accept this result because by then someone will surely find something amiss."

In sum, Alcock has been an enthusiastic and effective defender of the skeptical faith for decades. His critiques are predictable and at first blush they may even seem reasonable, especially to those who aren't familiar with the research literature in question. But when you do know that literature, the arguments fall apart.

For a recent book that goes into the proponent/skeptic debate in some detail, including more of Alcock's critiques and my responses, I recommend this book: