This is a valid critique. Blind judging is preferred to avoid the possibility of such bias, and readers of the journal article would not have any way of judging whether the proposed biases actually occurred.
Bem provided this response to Wiseman (slightly edited by me for clarity):
We see here one of the strengths of science in action. Sometimes a potential flaw turns out to be a show-stopper. Sometimes it doesn't. In this case, it doesn't.
This is a response to Richard (Wiseman’s) concern about the ability of the experimenter to correct misspelled words while being able to observe which corrections will help the psi hypothesis (because the misspelled word is a practice word) or work against the psi hypothesis. This is a legitimate concern and I will modify the database so that that the category information is not available to the experimenter when he or she makes spelling corrections.
The program that runs the experiment automatically calculates the results of the session, ignoring all words it doesn’t recognize as literal copies of the test words. This analysis is also transferred to the database, which is set up so that the experimenter cannot change it or any of the original words as typed by the participant. Any changes made by the experimenter in the database are explicitly shown as changes, and a security check flags records in which the experimenter has corrected any of the original words. In other words, there is a complete record of the original data that cannot be altered. As an additional check, the critical data appear in the output file in both unencrypted and encrypted form, and only I know the encryption formula. If anything is changed in the output, the security flag in the database will read “False.”
Any experimenter who wishes can simply ignore the option to correct misspellings. It will make little difference to the results, as the following shows.
My two experiments included 150 participants, who recalled a total of 2,920 words, of which 45 (1.5%) were misspelled. 23 of those were practice words and 22 of those words were non-practice control words, for a net “gain” of one word for the psi hypothesis. Here are the results reported in my article (in which I corrected misspelled words) compared with the original program-calculated results (which ignores all unrecognized words). The score is a Differential Recall% score, which can range from -100% to +100%, with scores > 0 being in the “psi-predicted” direction.
Corrected DR% score = 2.27%, t(99) = 1.91, p = .029, d = .19
Uncorrected DR% score = 2.29%, t(99) = 1.95, p = .027, d = .20
Stimulus Seekers: Corrected DR% = 6.46%, t(42) = 3.76, p = .0003, d = .57
Uncorrected DR% = 6.50%, t(42) = 3.91, p = .0002, d = .60
Corrected DR% = 4.21%, t(49) = 2.96, p = .002, d = .42
Uncorrected DR% = 4.05%, t(49) = 2.86, p = .003, d = .40
As can be seen, Experiment 8 is trivially hurt by the corrections; Experiment 9 is trivially helped.
Additional observations: Half of the words used in this experiment are common words, as determined by “Frequency Analysis of English Usage” by Francis and Kucera (e.g., apple, doctor) and half are uncommon (e.g., gorilla, rabbi) Although Richard uses "CTT" and "CAT" as examples to illustrate the ambiguity of correcting misspellings, in fact only a few different words were misspelled by anyone, and they are among the uncommon words or commonly misspelled words in the list (e.g., potatoe for potato). So, Richard’s hypothetical example, notwithstanding, in practice the correction of misspelled words is actually very straightforward and unambiguous. “Intrusions,” i.e. words that aren’t in the original list, are also very easy to spot. (I can furnish the list to whoever wants to try a blind correction exercise, but I don’t want to publish it here lest it ruin future participants.)