This material has been published in the Journal of Memory and Language, 47, 145-171, the only definitive repository of the content that has been certified and accepted after peer review. Copyright and all rights therein are retained by Academic Press. This material may not be copied or reposted without explicit permission.
Brett Kessler and Rebecca Treiman
Wayne State University
John Mullennix
University of Pittsburgh at Johnstown
Voice response time (RT) measurements from 4 large-scale studies of oral reading of English monosyllables were analysed for evidence that voice key measurements are biased by the leading phonemes of the response. Words with different initial phonemes did have significantly different RTs. This effect persisted after contributions of 9 covariables, such as frequency, length, and spelling consistency, were factored out, as well as when variance associated with error rate was factored out. A breakdown by phoneme showed that voiceless, posterior, and obstruent consonants were detected later than others. The second phonemes of the words also had an effect on RT: Words with high or front vowels were detected later. Phoneme-based biases due to voice keys were large (range about 100 ms) and pervasive enough to cause concern in interpreting voice RT measurements. Techniques are discussed for minimizing the impact of these biases.
Much research in psychology uses measures of vocal response latencies to make inferences about underlying processes. In some experiments, the time that it takes to initiate pronunciation of a printed word is used to study the processes involved in reading and word recognition (e.g., Forster & Chambers, 1973). Other studies use the time to start repeating a spoken word to shed light on auditory word recognition (e.g., Connine, Mullennix, Shernoff, & Yelen, 1990). In still other experiments, spoken responses are used to examine higher levels of language comprehension (e.g., Stanovich & West, 1983). By having participants name pictures instead of printed words, researchers can address issues about how speakers retrieve and produce words (e.g., Griffin & Bock, 1998). Vocal response times are used in studies of memory as well (e.g., Scarborough, Cortese & Scarborough, 1977). The naturalness of spoken responses makes such tasks well suited to a variety of populations and a variety of issues.
By far the predominant technology used for determining voice response times, and the one used in the studies cited above, is the voice key. A voice key is a device that determines voice onset in real time. The voice key is connected to a microphone, which converts sound pressure (the physical correlate of the amplitude of the sound) into voltage. When a stimulus has been presented, a computer arms the voice key, which begins monitoring the microphone. When the sound pressure reaches a predefined target, the voice key is triggered. It notifies the computer, which stores the number of milliseconds that elapsed between arming and triggering the voice key. The typical voice key is triggered as soon as the sound pressure reaches a certain level, which the experimenter can predefine (e.g., Cedrus, 2000; Psychology Software Tools, 2000).
Traditional voice keys are not the only ways of measuring voice response times. More sophisticated voice keys can be configured to be triggered by a lower level of sound pressure, provided it is maintained for a certain length of time (e.g., Hutzler, 1999; Rastle & Davis, 2000). Digital signal processing techniques can compute voice response time algorithmically, or produce waveforms for visual inspection (e.g., Bachoud-Lévi, Dupoux, Cohen, & Mehler, 1998; Fushimi, Ijuin, Patterson, & Tatsumi, 1999; Kawamoto, Kello, Jones, & Bame, 1998; Morrison & Ellis, 1995). Although some of our discussion will be relevant to these more sophisticated methods, this paper concentrates on voice keys, which are the source of timing data for the large majority of published studies that use measurements of vocal response times.
The general goal of this paper is to increase our understanding of the potential inaccuracy of voice keys. In particular, we focus on phonetic bias. Psychologists often wish to compare reaction times across stimuli that provoke different word or word-like responses. They are rarely directly interested in response time differences that are due entirely to phonetic differences between responses. Unfortunately, there are strong a priori reasons for believing that systematic phonetic biases exist. A response beginning with /s/, for example, might systematically be detected later, or sooner, than a word beginning with /m/, yielding different measured response times for no other reason than the phonetics of the words involved.
One obvious reason that phonemes might be detected at different times is articulatory: Some sounds take more time for the human vocal apparatus to initiate. Sakuma, Fushimi, and Tatsumi (1997) reported, for example, that /s/ can be initiated especially quickly. The second major factor is acoustic. Even after phonemes are initiated, it may take the voice key different amounts of time to detect them. The culprit in this case is the sound pressure cutoff that voice keys use. Setting the voice key too low (making it too sensitive) results in an unacceptable number of suspiciously fast response times as the voice key is triggered by nonspeech noises such as the participant's breathing. But any setting higher than zero means that part of the beginning of the vocal response will be missed, because all utterances have a nonzero rise time to target amplitude. This acoustic factor can result in phonetic bias, because different phonemes have different rise times. For example, fricatives such as /s/ have less sound pressure than vowels (Fry, 1979; Sacia & Beck, 1926). Because voice keys are triggered only after a certain sound pressure level is detected, we might expect that under some circumstances, at least, voice keys may take longer to register a fricative than to register a vowel, or, indeed, may not register the fricative at all (Sakuma et al., 1997; Rastle & Davis, 2000).
Although we focus here on phonetic biases, it is useful to note that this acoustic factor can result in other types of problems with voice keys. If for any reason one utterance is spoken at a faster tempo than another, then the faster utterance will more quickly reach the point in the speech stream where the target amplitude occurs. For example, if initial /s/ is routinely missed, then the faster utterance will more quickly get to the following phoneme, and therefore have the quicker measured response time, even if the veridical times to initiate the responses are identical. That could introduce much variation into the data. It could even result in experimental bias if factors that influence response time also influence tempo, as we have good reason to expect (Kawamoto et al., 1998). Such effects would, of course, interact with phonetic biases. If a voice key has special trouble detecting initial /s/, then artifactual effects of slow speech will affect /s/-initial words disproportionately. In addition, differences in overall amplitude of the utterance can have effects similar to those caused by differences in tempo. In a word like grow, the /ɹ/ is louder than the /g/, and the /o/ is louder still. If the utterance as a whole is very loud, a voice key might trigger at the /g/. If the utterance is less loud, so that all of the phonemes are proportionately softer, perhaps only the /ɹ/ or even the /o/ will have enough acoustic energy to trigger the voice key. And of course the tempo of the utterance will largely determine how long it takes to get to those later phonemes. Thus voice key measures can conflate veridical response time, tempo, and amplitude in a complex way that interacts with the differential acoustic natures of the phonemes in the response.
Our specific objectives here are threefold. First, we wish to determine what the magnitude of phonetic bias actually is in voice key studies. Is it a purely theoretical consideration of trifling import, perhaps amounting to errors of a few milliseconds, or is it a much larger problem that could seriously impact typical studies? Second, we seek to discover how pervasive the problem is. Could it be solved, for example, by treating /s/-initial words separately from other words, or does it pervade the entire phonemic inventory? Is the problem limited to the first phoneme, or might it extend more deeply into the word? Does phonetic bias affect different experiments equally strongly? Third, we try to separate true phonetic bias from associations that might be mediated by psycholinguistic factors. For example, if words beginning with /z/ consistently had slower measured response times than other words, we would want to rule out the possibility that that is due, perhaps, to the fact that /z/-initial words tend to be of lower frequency and familiarity. One potential objective we do not adopt is that of separating out the articulatory from the acoustic source of phonetic bias. That would be of little practical benefit to those who wish to use voice keys or evaluate earlier studies that used them. For those people, the two sources of phonetic bias form a bundled entity whose magnitude and variability needs to be dealt with or evaluated as a unit.
Awareness about possible bias in voice keys varies greatly. To gather information about researchers' concerns and beliefs, we conducted an informal Internet poll during the summer of 2000, to which 60 people responded. Of them, 55 indicated that they understood that different initial phonemes could differentially affect the triggering of voice keys. That understanding was unanimous among the 39 who had published research that used voice keys. In an open-ended question about what factors may be involved in such bias, several respondents mentioned that the intensity or, more precisely, the intensity rise time of the initial phoneme may affect how quickly the phoneme is detected, if at all. Many respondents stated some observations in articulatory terms. Few individual respondents offered more than a small part of the story, however. The most widely cited belief, that voiced phonemes are detected faster than voiceless ones, was mentioned by only about one third of the respondents.
While 92% of the respondents believed the first phoneme has an effect, only 45% believed that subsequent phonemes have an effect; that proportion crept up to 53% for researchers who had published voice key experiments. About one third of the respondents (19) suggested that the second phoneme may be important if the voice key is not triggered by the first phoneme at all. Almost as many (16) opined that the second phoneme could affect voice key responses by modifying the pronunciation of the first phoneme. However, few specifics were offered, and many volunteered that they considered this a theoretical possibility that they had never seen documented.
What about researchers' actual practices? To gather information about how experiments are conducted, we surveyed the 220 articles published from 1997 through 2000 in the Journal of Memory and Language. (All of these articles were accepted for publication before the second author of the present article, who edited many of the surveyed articles, was aware of most of the issues discussed here.) Our survey focused on experiments that compared vocal response time across different sets of stimuli. Some of the experiments used the same stimuli; we counted these groups of experiments only once in our summary statistics, giving 48 cases in total. The authors of a number of the articles were aware of some potential problems with voice key measurement and tried to deal with these problems in some way. These solutions will be discussed more fully later in this paper, but we introduce them here briefly. In 5 cases (all in one paper), the acoustic biases of voice keys were avoided by using a digital signal detection algorithm to find the onset of the response. In 12 cases, a delayed naming task or similar control task was used in addition to a standard naming task. Delayed naming is an attempt to isolate acoustic influences on voice key measurements from psycholinguistic factors. Another tack, used in many of the 36 cases that did not use delayed naming, was to balance the stimuli by phonemic criteria. For example, if experimenters believed that the voice key is biased by the first phoneme of the word, they might ensure that each group of stimuli had an equal number of words beginning with each phoneme. We found a clear difference in how experimenters treated initial phonemes and following phonemes. Experimenters attempted to balance the sets of stimuli for the initial phoneme in 20 of the 36 cases. As a reason for this practice, the authors sometimes stated that different word-initial phonemes trigger the voice key at different times. Those authors who attempted to balance the stimuli for initial phonemes sometimes indicated that they were not able to do so in all cases. In such cases, they often attempted to match a phoneme by one that was similar, but they usually gave no details about what standard of similarity was used. In another 15 of the 36 cases, the authors did not report any attempt to equate the stimuli for initial phonemes. One additional case revealed an intermediate course of action: The authors equated the stimuli for the manner class of the initial phoneme. Two other authors also mentioned manner class as being important in determining when the voice key triggers, one of the two offering the specific hypothesis that plosives trigger the voice key more quickly than fricatives.
The situation was different for the second phoneme. There was an attempt to match the stimuli for the second phoneme in 8 of the 36 stimulus sets for which delayed naming was not used. In the remaining 28 cases, no such attempt was made.
Few authors reported the specific type of voice key used or the conditions of its use. The Journal's guidelines stated that details of equipment need not be reported unless essential for replication, and most authors seemed to feel that the make of the voice key had no important effects.
Combining the results of our survey of experimenters' beliefs and our survey of publications, it appears that experimenters who publish work using voice keys are aware that different initial phonemes can differentially affect the triggering of voice keys. In practice, however, experimenters do not always equate the stimuli that they are comparing across conditions for initial phoneme. The gap between belief and practice may reflect the fact that such matching is often difficult to achieve given the other constraints on stimulus selection. In addition, some researchers appear to believe that the effects are relatively small and will wash out if there are a reasonable number of stimuli. Far fewer published studies equate stimuli for their second phonemes than for their initial phonemes. In addition, fewer researchers endorse the idea that the second phoneme is important. Researchers also seem to expect that different voice key setups will yield similar results.
Although none of the authors in the survey of articles from the Journal of Memory and Language cited published research to support their views about voice keys, there is some helpful research that contrasts the timings obtained by voice keys with more accurate timings inferred by visual inspection of the voice waveform. The impact of this research has been limited by the fact that several studies were written in languages other than English and investigated languages other than English. The first such study we know of was Pechmann, Reetz and Zerbst (1989). They asked their participants to read the same word five times and measured the voice key error by comparing the response time as measured by the voice key to much more accurate measures determined by analyzing waveforms. They reported not only high errors, but also large differences between trials. For example, the word Freude was, on average, detected by the voice key 104 ms after it was visible in the waveform, and the average range of voice key errors for a given participant was 98 ms. Most words had smaller errors, but the wide variation between words was perhaps more alarming than high but uniform error rates would have been. While a reasonable interpretation is that phonetic bias underlay the between-word variation, Pechmann et al. did not measure more than one word with the same initial phoneme, so we do not know conclusively to what extent the initial phoneme itself was the cause of the errors. Sakuma et al. (1997) ran experiments contrasting most of the possible word-initial phonemes in Japanese. They found that the voice key was always slower than the waveform measurement, and that the difference was biased by manner of articulation: Vowels and nasals had a small difference from the waveform value, the liquid had more, plosives more yet, and voiceless fricatives had a very great difference. The range of the difference between the phoneme groups was about 95 ms.
Possible contributions of the second phoneme of the word to phonetic bias have been studied less. Rastle and Davis (2000) were the first to directly investigate the possible effects of voice key artifacts caused by the second phoneme of the word. Their experiments with speakers of British English contrasted words with simple (/s/ plus vowel) and complex (/s/ plus /t/ or /p/) onsets. When the data were studied by waveform, words with complex onsets were named faster than those with simple onsets. But when a voice key of typical design was used, the reverse pattern was obtained. These results document how voice key artifacts may lead researchers to draw false conclusions when investigating questions such as the effect of onset complexity on the reading of printed words.
Voice waveforms are much more cumbersome to use than voice key timings, so previous researchers who directly compared the two measurement types have understandably restricted their experiments to a few stimulus types. Consequently many questions remain. For example, does the second phoneme affect the measured naming latencies in words that do not begin with the /s/ plus plosive clusters studied by Rastle and Davis (2000), or are such (relatively infrequent) word-initial sequences of two voiceless phonemes an isolated special case? We also noted that languages differ in the way nominally equivalent phonemes are pronounced, and asked whether the results obtained for German, Japanese, and British English hold for North American English. Because of these outstanding issues, we perceived the need for a larger-scale study, using North American English. Because we investigated phonetic bias as a whole and did not seek to factor the articulatory from the acoustic components, we did not need to use voice waveforms. This freedom allowed us to economically run thousands of trials, and also to compare our results with those of previous studies.
In the experiment reported here, we presented a large number of words—virtually all the familiar one-syllable monomorphemic words of English—to 20 native speakers in standard orthography. We recorded their response times using a typical voice key. Known or suspected covariables such as word frequency were accounted for statistically. Some (but not all) of the lexical covariables could have been reduced by asking participants to repeat nonsense syllables instead of reading words. However, we placed priority on replicating the conditions of speeded naming tasks, which are a more typical application for voice keys; we did not want to accidentally pass over any stage of the task which might introduce phonetic bias, whether it be articulatory or acoustic in origin. The task also allowed us to investigate variability between experiments by affording comparison with three other naming megastudies. These are the studies of Seidenberg and Waters (1989; henceforth SW) at McGill University; Treiman, Mullennix, Bijeljac-Babic, and Richmond-Welty (1995; henceforth TMBR), which studied consonant--vowel--consonant words at Wayne State University; and Spieler and Balota (1997; henceforth SB) at Washington University. We will refer to our own study as KTM. See Table 1 for the number of participants in each study, and the number of words that were tested. The KTM and TMBR studies used the voice key included in the response box supplied with the MEL software package (Schneider, 1988). In both of these studies, the voice key was left at the same fixed setting for all trials. SB used a Gerbrands model G1341T voice-operated relay, and SW used a custom-made voice key.
| Measure | KTM | TMBR | SB | SW |
|---|---|---|---|---|
| Participants | 20 | 27 | 31 | 30 |
| Words, Tested | 3,690 | 1,327 | 2,870 | 2,897 |
| Words, Analyzed | 2,982 | 1,234 | 2,525 | 2,536 |
| RT (ms) | 629 | 611 | 467 | 568 |
| Error Ratea | 4.6 | 1.3 | — | 6.0 |
| Bigram Frequencyb | 1,493 | 1,816 | 1,529 | 1,531 |
| Consistency of Onsetc | .976 | .966 | .975 | .976 |
| Consistency of Rimec | .913 | .906 | .908 | .907 |
| Familiarityd | 6.6 | 6.6 | 6.7 | 6.7 |
| Frequency of Spellinge | 3,172 | 2,526 | 2,559 | 2,584 |
| Homophonesf | 1.1 | 1.2 | 1.1 | 1.1 |
| Length of Pronunciationg | 3.5 | 3.0 | 3.5 | 3.5 |
| Length of Spellingh | 4.4 | 4.1 | 4.4 | 4.4 |
| Neighborhood Sizei | 6.8 | 9.4 | 7.1 | 7.1 |
Note. Averages across words. For RT and Error Rate, the per-word measures are themselves averages across trials.
aPercentage of mispronounced responses, excluding those with outlying response times or failures to respond. Data unavailable for SB.
bAverage text frequency of the two-letter sequences in the spelling (Solso & Juel, 1980).
cProportion (by type counts) of words in the list that have the same spelling that also share the pronunciation (Treiman et al., 1995).
d Nusbaum, Pisoni, and Davis (1984).
eCorpus frequency in Zeno, Ivenz, Millard, and Duvvuri (1995).
fNumber of words in list with this pronunciation (minimum = 1).
gNumber of phonemes.
hNumber of letters.
iNumber of words that differ by substituting one letter (Coltheart, Davelaar, Jonasson, & Besner, 1977).
The participants were 20 undergraduate students from Wayne State University. Three participants dropped out from an original group of 23, leaving 6 men and 14 women. The students received course credit in exchange for participation. All were native speakers of English. None had a history of speech or hearing disorder, and none had uncorrected visual problems.
A set of 3,690 words was taken from a computerized version of the Merriam-Webster Pocket Dictionary and Webster's Seventh Collegiate Dictionary that was developed for such studies as Nusbaum, Pisoni and Davis (1984). Homographic heterophones (e.g., wind) and many unfamiliar words were excluded, but the list included many words that differ only in inflectional ending (e.g., both blow and blown were included) as well as many words unknown to the great majority of college undergraduates (e.g., gneiss). In order to avoid the statistical noise that would be introduced by redundancy and by the inclusion of unknown words, we performed our analyses only on the 2,982 words that also occur in the word list used for Kessler and Treiman (2001). The latter list was carefully controlled to have only morphologically nonredundant words that are generally familiar to college undergraduates. The selection of this subset was made before we looked at any of the experimental data.
We also analysed, separately, the data from the three previous megastudies. Table 1 shows the number of words remaining in all four studies after intersecting their stimulus list with Kessler and Treiman's (2001).
Participants were instructed to read aloud words that appeared on a computer screen, as quickly and accurately as possible. On each trial, the prompt GET READY appeared for 2 s in the center of the screen. The screen then went blank for 1 s. The stimulus word was then presented in upper case letters. It remained on the screen until the voice key picked up the spoken response. Then the screen went blank for 1 s until the next warning message. After every 100 trials, the participants received a one-minute break.
All of the 3,690 monosyllabic English words were presented to each subject, in eight sessions that each tested from 461 to 463 words. Words were presented in a different, randomly selected, order for each subject. Participants were advised not to make any extraneous sounds that could trigger the voice key. They were asked to keep their lips about four inches from the microphone throughout the experiment. An Electro-voice RE16 cardioid microphone was used together with the aforementioned voice key (Schneider, 1988). Trials with response times quicker than 100 ms or slower than 2,000 ms were rejected from the analysis. The remaining trials were hand-coded to indicate whether the pronunciation was in error. A subset of the responses (one randomly selected 461-word list from each of five subjects) was checked by two different judges to determine accuracy of this hand-coding; 96.7% of the judgments agreed on the correctness or incorrectness of a response. For the purpose of error analyses, mispronunciations were counted as errors even when the participants subsequently corrected themselves.
We first present analyses dealing with voice key effects caused by the word-initial phoneme. We then discuss effects of the second phoneme.
First we grouped words by their initial phoneme and asked whether those groups differed significantly in their average response times. We chose a Monte Carlo test of significance (Good, 1995), as illustrated with a small subset of the data in Table 2. We directly determined the likelihood that our F-ratio could be due to chance by randomly rearranging the data between phoneme groups 10,000 times, subject to the constraint that each group contains a constant number of words across rearrangements, and seeing what proportion of those rearrangements had an F-ratio greater than or equal to ours. This method makes fewer assumptions than are required for using the standard F distribution, and also provides a straightforward way of factoring out the contribution of the second phoneme: Each time we randomly rearranged words across phoneme groups, we only swapped pairs of words that have the same second phoneme (i.e., words were not swapped across the horizontal lines in the table). In effect, the data were cross-classified in blocks based on the second phoneme, and in each rearrangement the number of words in each of those blocks did not change. This technique of blocking on variables to factor out the significance of their effect was used throughout our analyses. We also blocked simultaneously on the participant in each trial. Blocking allowed us to isolate the different sources of variance, yet still run all the data in one large test of significance.
| Blocking Variables | Original | One Rearrangement | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Vowel | Speaker | /b/ | /d/ | /b/ | /d/ | ||||
| /æ/ | 1 | bad | 660 | dab | 748 | dab | 748 | bad | 660 |
| badge | 806 | dad | 738 | badge | 806 | dad | 738 | ||
| dam | 797 | dam | 797 | ||||||
| /æ/ | 2 | bad | 840 | dab | 550 | dam | 490 | bad | 840 |
| badge | 629 | dad | 521 | dab | 550 | badge | 629 | ||
| dam | 490 | dad | 521 | ||||||
| /ɪ/ | 1 | bib | 815 | did | 826 | bib | 815 | bid | 857 |
| bid | 857 | dig | 518 | did | 826 | big | 612 | ||
| big | 612 | dill | 473 | dill | 473 | dig | 518 | ||
| /ɪ/ | 2 | bib | 508 | did | 565 | bid | 522 | bib | 508 |
| bid | 522 | dig | 569 | big | 522 | dig | 569 | ||
| big | 522 | dill | 737 | did | 565 | dill | 737 | ||
| Sum | 6,771 | 7,532 | 6,317 | 7,986 | |||||
| Metric1 | 9,312,229 | 9,305,132 | |||||||
Note. Each rearrangement randomly shifts words and their response times across columns, subject to: (1) Words are not shifted across blocks (demarcated by horizontal lines); (2) Each column of each block keeps the same number of words.
1Sum of the mean squares of the column sums; this is the portion of the F-ratio that varies between rearrangements. The fraction of rearrangements for which the metric is greater than or equal to that of the original is p.
The first row of Table 3, “Raw RT”, shows the results of this analysis. Words with different phonemes have different measured response times, at a significance of .000, i.e., p < .001. We ran the same analysis for the three other megastudies and obtained the same result. The only difference in those tests is that trial-level data were not available, and so we used as our basic level of analysis the average response time for each word. These results suggest that there is a voice key bias: Words with different initial phonemes have significantly different response time measures.
| Response | KTM | TMBR | SB | SW |
|---|---|---|---|---|
| Raw RT | .000 | .000 | .000 | .000 |
| Bigram Frequency | .000b | .000a | .000a | .000a |
| Consistency of Onset | .000a | .000a | .000a | .000a |
| Consistency of Rime | .000a | .007a | .001a | .000a |
| Familiarity | .090a | .043a | .017a | .020a |
| Frequency of Spelling | .000a | .000a | .000a | .000a |
| Homophones | .000a | .000a | .000a | .000a |
| Length of Pronunciation | .000a | 1.000a | .000a | .000a |
| Length of Spelling | .000a | .000a | .000a | .000a |
| Neighborhood Size | .023a | .000a | .000a | .000a |
| Error Rate | .000a | .012a | — | .426a |
| Residual RT | .000 | .000 | .000 | .000 |
aRaw RT remains significant (p≤ .05) when blocked by this covariable.
bRaw RT not significant when blocked by this covariable.
One might suspect, however, that the relationship between initial phoneme and response time is not direct, but is mediated through some third covariable such as spelling length. We therefore tagged each of our words to identify the levels of potential covariables. The mean values for these covariables are listed in Table 1. Table 3 shows what happened when we tested whether words with different initial phonemes differ significantly for the various covariables. The statistical techniques used for these tests were exactly like those used for response time, except that these tests were conducted for each word, instead of for each trial. In almost all cases, words with different initial phonemes have significantly different levels of each of the covariables. These tests leave open the possibility that there are in fact no phonetic voice key biases: The different phonemes could have different response time measurements because words that have different phonemes also happen to vary with respect to the true causal factors.
However, an additional round of statistical blocking can factor out the effect of individual covariables. For each covariable, we tested whether there was still a significant (p < .05) difference between the response times for words with different initial phonemes after we blocked on the different levels of the covariable. That is, the setup was similar to that illustrated in Table 2, except that the second blocking variable was the covariable instead of the speaker, and each word was measured only once. In almost all cases, the association remained significant. For example, the footnotes (a) in Table 3 show that when we block on that covariable, words with different phonemes still have significantly different response times. (It should be noted that in each cell of the table, the number and the footnote are referring to two different tests. An entry like “Familiarity .090a” means that when words are grouped by initial phoneme, the groups do not differ significantly among themselves in familiarity, at p=.090; and they do differ significantly among themselves in raw response time, when blocked by familiarity.) These analyses with blocking show that the association between phoneme and response time is not ultimately due to spelling consistency, familiarity, frequency of spelling, number of homophones, word length as determined by pronunciation or spelling, or neighborhood size. Tests of bigram frequency were less readily interpretable because virtually every word has a unique bigram frequency, resulting in block sizes that were too small to allow significance to be determined in any event.
Although those blocking tests show that individual covariables are not responsible for the association between initial phoneme and measured response time, one might suspect that the association is mediated by a combination of covariables. To test that possibility, our final statistical test on initial phoneme groups was a regression analysis, using response time as the response variable and all the covariables, as well as error rate, as predictor variables. For each covariable, we used the transformation that accounted for the most variance in the response time as averaged across all subjects, as computed by a single-factor linear regression. The predictor variables in the multiple regression accounted for 35% of the variance in the response times in the KTM study, 27% in TMBR, 28% in SB, and 23% in SW. For each word, we determined the average residual response time left after subtracting the predicted values accounted for by the linear regression. Then we tested whether there were significant differences in residual response time for words with different initial phonemes.
The results of this analysis are presented in the last row of Table 3. For all four studies, words with different initial phonemes differed significantly (p < .001) in residual response time. That is, even after factoring out the effect of all of the psycholinguistic covariables by linear regression, there was still a significant difference in response times between words with different initial phonemes. This difference must be due to factors other than the psycholinguistic factors, namely, phonetic voice key measurement biases.
The converging evidence over four different studies leaves little doubt that the initial phoneme itself has a direct influence on the response time. This corroborates previous suggestions from analyses of the SB and TMBR data that were reported in Spieler and Balota (1997) and Treiman et al. (1995). However, the current analysis is more direct and convincing. Treiman et al., for example, included the phonetic features of the initial phoneme in a large regression test to predict the average response time of words from many predictors, and found that some of those phonetic features were significant. But there is no guarantee that a multiple regression will assign variation to the right predictor variable: The variance assigned to the phonetic features could actually belong to psycholinguistic covariables that are highly correlated with those features. Our current regression analysis is more conclusive because we use it to factor out all effects that can be attributed to nonphonetic factors, then do association tests only on the residuals. Also, we factor out any contributions of the second phoneme, and we reinforce the results by blocking tests. Although the latter are limited to single variables, they reliably remove 100% of the effect of the covariable in consideration without making the statistical assumptions required for linear regression.
Even after being convinced that there is some voice key bias by initial phoneme, one could still imagine that the effect is quite limited and therefore, perhaps, easily controlled. One possibility would be that only particular phonemes are of concern. For example, Bates, Devescovi, Pizzamiglio, D'Amico, and Hernandez (1995), responding to reports that fricatives and affricates cause problems in voice key studies, added a flag in their regression analyses to indicate whether the word in question began with a fricative or affricate. To examine whether there is any possibility of addressing the voice key question in such a straightforward way, we ran separate tests contrasting each pair of initial phonemes. This may at first alarm readers who are alert to the fallacy of attempting to prove that variables are associated by running separate tests on each of their levels. However, we have already demonstrated that the initial phoneme is associated with the response time. Our purpose now is to explore what factors may account for the overall association.
The procedure for testing particular pairs was essentially the same as for the residual response time analysis across all phonemes, except that the presence of exactly two groups at a time allowed for a simpler metric abstracted from the standard t-test. Tables 4 through 7 present the results. These tables are arranged by descending order of differential residual response times for the phoneme. These differentials show how much more slowly words with the indicated phoneme (in the first column) were named than words with different first phonemes but the same second phonemes. That is, for each phoneme, we computed first the average residual response time for words with that phoneme in second position (R2). Similarly, for each two-phoneme sequence, we computed the average residual response time for words that begin with that sequence (R1,2). Then, to obtain the differential residual response times for a phoneme, we averaged the difference R1,2 − R2 for all the two-phoneme sequences that begin with that phoneme. This computation factors out any effect of the second phoneme. In each row in Table 4, we present first the initial phoneme in consideration, its residual response time offset, and the number of words that begin with that phoneme. Then we list the phonemes that have significantly faster response times. For example, the 512 words beginning with /s/ are measured as 38.5 ms slower than can be accounted for by the levels of the covariables in those words or by the effect of the following phoneme. Furthermore, /k/, /t/, /n/, and so forth have significantly smaller (faster) residual response time offsets. The word counts are important, because lack of significance between two phonemes can be due to the fact that there are only a few words that begin with those phonemes, diminishing the power of statistical tests.
| Phone | RT1 | N | |||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| z | 40.6 | 9 | f | p | dʒ | d | h | j | b | m | ɹ | w | l | θ | |||||||||||||||||||
| i | 39.9 | 6 | b | ||||||||||||||||||||||||||||||
| ʃ | 38.6 | 76 | k | t | n | ð | f | g | p | dʒ | d | aɪ | h | j | v | b | m | ɹ | w | l | ɔ | θ | ɑ | ||||||||||
| s | 38.5 | 512 | k | t | n | ð | f | g | p | dʒ | d | ɚ | h | ɪ | j | v | b | ɛ | m | ɹ | aʊ | w | l | ɔ | e | ʌ | θ | o | ɑ | ɔɪ | |||
| tʃ | 37.1 | 55 | t | n | ð | f | g | p | dʒ | d | h | j | v | b | m | ɹ | w | l | θ | ||||||||||||||
| k | 21.8 | 239 | t | n | ð | f | g | p | dʒ | d | h | j | v | b | m | ɹ | w | l | ɔ | θ | ɑ | ɔɪ | |||||||||||
| t | 12.3 | 163 | f | g | p | dʒ | d | aɪ | h | j | v | b | m | ɹ | w | l | θ | ɑ | |||||||||||||||
| n | 9.8 | 78 | h | j | v | b | m | ɹ | w | l | θ | ||||||||||||||||||||||
| ð | 7.8 | 13 | w | l | θ | ||||||||||||||||||||||||||||
| f | 4.6 | 189 | d | h | ɪ | j | b | m | ɹ | w | l | θ | ɑ | ||||||||||||||||||||
| g | 3.9 | 150 | d | h | b | m | ɹ | w | l | ɑ | |||||||||||||||||||||||
| p | 2.9 | 201 | d | h | ɪ | v | b | m | ɹ | w | l | θ | ɑ | ||||||||||||||||||||
| æ | 0.2 | 16 | ɪ | aʊ | o | ||||||||||||||||||||||||||||
| dʒ | -3.2 | 47 | h | j | b | m | ɹ | w | l | θ | |||||||||||||||||||||||
| d | -6.0 | 138 | h | ɪ | j | b | m | ɹ | w | l | |||||||||||||||||||||||
| ɚ | -6.3 | 7 | b | ||||||||||||||||||||||||||||||
| aɪ | -7.1 | 6 | b | ||||||||||||||||||||||||||||||
| h | -9.1 | 134 | w | l | θ | ||||||||||||||||||||||||||||
| ɪ | -9.9 | 11 | b | ɔ | o | ||||||||||||||||||||||||||||
| j | -13.2 | 27 | w | l | θ | ||||||||||||||||||||||||||||
| v | -13.3 | 36 | w | l | |||||||||||||||||||||||||||||
| b | -14.5 | 247 | ɛ | aʊ | w | l | |||||||||||||||||||||||||||
| ɛ | -15.6 | 11 | |||||||||||||||||||||||||||||||
| m | -17.4 | 124 | w | l | |||||||||||||||||||||||||||||
| ɹ | -17.6 | 131 | w | l | |||||||||||||||||||||||||||||
| aʊ | -21.7 | 6 | |||||||||||||||||||||||||||||||
| w | -22.6 | 127 | |||||||||||||||||||||||||||||||
| l | -26.7 | 136 | |||||||||||||||||||||||||||||||
| ɔ | -27.8 | 11 | |||||||||||||||||||||||||||||||
| e | -28.0 | 12 | |||||||||||||||||||||||||||||||
| ʌ | -29.6 | 3 | |||||||||||||||||||||||||||||||
| θ | -33.6 | 40 | ɑ | ||||||||||||||||||||||||||||||
| o | -35.1 | 8 | |||||||||||||||||||||||||||||||
| ɑ | -37.9 | 11 | |||||||||||||||||||||||||||||||
| ɔɪ | -50.2 | 1 | |||||||||||||||||||||||||||||||
| u | -79.0 | 1 |
Note. Entries tell which phonemes have voice key response times significantly faster than the phoneme in the first column, p ≤ .05. Effect of second phoneme is blocked. See International Phonetic Association (1996, 1999) for explanation of phoneme symbols.
1Response time residuals (milliseconds) after effect of covariables is removed by regression.
| Phone | RT1 | N | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tʃ | 29.1 | 37 | t | f | p | g | dʒ | n | h | ɹ | b | v | m | d | l | w | θ | j | ð | |||||
| ʃ | 27.7 | 46 | g | dʒ | n | h | ɹ | b | v | m | d | l | w | θ | j | ð | ||||||||
| s | 27.4 | 83 | t | k | f | p | g | dʒ | n | h | ɹ | b | v | m | d | l | w | θ | j | ð | ||||
| t | 18.1 | 68 | dʒ | ɹ | b | m | d | l | w | θ | j | ð | ||||||||||||
| k | 17.1 | 75 | g | dʒ | n | ɹ | b | v | d | l | w | θ | j | ð | ||||||||||
| f | 15.2 | 62 | dʒ | ɹ | b | d | l | w | θ | j | ð | |||||||||||||
| z | 14.1 | 6 | j | |||||||||||||||||||||
| p | 9.4 | 87 | b | d | l | w | θ | j | ð | |||||||||||||||
| g | 8.2 | 51 | d | l | w | θ | j | ð | ||||||||||||||||
| dʒ | 1.0 | 30 | w | θ | ð | |||||||||||||||||||
| ɚ | 0.0 | 1 | ||||||||||||||||||||||
| n | -1.7 | 59 | d | l | w | θ | j | ð | ||||||||||||||||
| h | -3.2 | 80 | d | l | w | θ | j | ð | ||||||||||||||||
| ɹ | -3.7 | 97 | d | w | θ | j | ð | |||||||||||||||||
| b | -9.8 | 93 | w | θ | j | ð | ||||||||||||||||||
| v | -13.8 | 23 | ð | |||||||||||||||||||||
| m | -14.8 | 79 | j | ð | ||||||||||||||||||||
| d | -20.2 | 71 | θ | j | ð | |||||||||||||||||||
| l | -25.2 | 88 | w | j | ð | |||||||||||||||||||
| w | -25.9 | 59 | j | ð | ||||||||||||||||||||
| θ | -48.7 | 13 | ||||||||||||||||||||||
| j | -56.7 | 17 | ð | |||||||||||||||||||||
| ð | -56.7 | 9 |
Note. See notes for Table 4.
| Phone | RT1 | N | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| z | 33.5 | 8 | ð | s | g | k | θ | p | j | b | d | h | t | dʒ | l | n | ɹ | w | m | tʃ | ʃ | |||||||||||||||
| f | 18.1 | 161 | s | g | k | v | θ | p | j | b | d | h | t | dʒ | l | n | ɹ | w | e | m | tʃ | ʃ | ||||||||||||||
| ð | 12.3 | 13 | θ | p | j | b | d | h | t | dʒ | l | n | ɹ | w | m | tʃ | ʃ | |||||||||||||||||||
| s | 10.9 | 415 | g | p | j | b | d | h | t | dʒ | ɛ | l | n | ɹ | w | e | ɔ | m | æ | tʃ | ʃ | |||||||||||||||
| g | 10.5 | 136 | k | j | b | d | h | t | dʒ | l | n | ɹ | w | e | m | tʃ | ʃ | |||||||||||||||||||
| k | 10.4 | 203 | p | ɑ | j | b | d | h | t | dʒ | ɛ | l | n | ɹ | w | e | ɔ | m | tʃ | ʃ | ||||||||||||||||
| v | 10.3 | 30 | j | b | d | h | t | dʒ | l | n | ɹ | w | m | tʃ | ʃ | |||||||||||||||||||||
| θ | 6.9 | 36 | j | b | d | h | t | l | n | ɹ | w | m | tʃ | ʃ | ||||||||||||||||||||||
| p | 6.9 | 176 | i | j | b | d | h | t | dʒ | l | n | ɹ | w | m | tʃ | ʃ | ||||||||||||||||||||
| ɑ | 4.5 | 6 | ||||||||||||||||||||||||||||||||||
| i | 4.3 | 6 | ||||||||||||||||||||||||||||||||||
| aʊ | 3.0 | 6 | ||||||||||||||||||||||||||||||||||
| ɪ | 2.2 | 6 | b | |||||||||||||||||||||||||||||||||
| u | 0.8 | 1 | ||||||||||||||||||||||||||||||||||
| aɪ | 0.4 | 4 | b | |||||||||||||||||||||||||||||||||
| j | 0.2 | 24 | l | n | ɹ | w | m | tʃ | ʃ | |||||||||||||||||||||||||||
| b | -1.4 | 219 | h | ɛ | l | n | ɹ | w | m | æ | tʃ | ʃ | ||||||||||||||||||||||||
| d | -2.3 | 112 | ɛ | l | ɹ | w | m | tʃ | ʃ | |||||||||||||||||||||||||||
| h | -2.4 | 115 | t | ɹ | w | m | tʃ | ʃ | ||||||||||||||||||||||||||||
| t | -2.5 | 131 | ɛ | l | n | ɹ | w | m | tʃ | ʃ | ||||||||||||||||||||||||||
| ɚ | -3.0 | 6 | ||||||||||||||||||||||||||||||||||
| dʒ | -3.0 | 39 | l | ɹ | w | m | tʃ | ʃ | ||||||||||||||||||||||||||||
| ɛ | -3.2 | 8 | ||||||||||||||||||||||||||||||||||
| l | -5.7 | 119 | w | m | tʃ | ʃ | ||||||||||||||||||||||||||||||
| n | -5.9 | 68 | m | tʃ | ʃ | |||||||||||||||||||||||||||||||
| ɹ | -6.0 | 125 | w | m | tʃ | ʃ | ||||||||||||||||||||||||||||||
| w | -7.8 | 112 | m | tʃ | ʃ | |||||||||||||||||||||||||||||||
| o | -8.2 | 4 | ||||||||||||||||||||||||||||||||||
| e | -10.3 | 10 | ||||||||||||||||||||||||||||||||||
| ɔ | -10.3 | 6 | ||||||||||||||||||||||||||||||||||
| m | -11.6 | 101 | ʃ | |||||||||||||||||||||||||||||||||
| æ | -12.3 | 9 | ||||||||||||||||||||||||||||||||||
| tʃ | -13.5 | 45 | ʃ | |||||||||||||||||||||||||||||||||
| ɔɪ | -14.2 | 1 | ||||||||||||||||||||||||||||||||||
| ʃ | -22.8 | 64 |
Note. See notes for Table 4.
| Phone | RT1 | N | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| z | 55.6 | 8 | tʃ | dʒ | k | f | t | j | d | b | h | ð | θ | n | p | l | w | m | ɹ | |||||||||||||||||
| aɪ | 38.2 | 4 | g | k | f | d | b | p | ||||||||||||||||||||||||||||
| s | 37.9 | 418 | tʃ | g | dʒ | k | f | v | t | ɪ | j | ɚ | d | ɛ | aʊ | b | h | e | ɔɪ | ð | θ | n | p | l | w | ɔ | m | o | ɹ | æ | ||||||
| ʃ | 34.7 | 64 | g | dʒ | k | f | v | t | j | ɑ | d | b | h | ð | θ | n | p | l | w | ɔ | m | ɹ | ||||||||||||||
| tʃ | 25.0 | 46 | f | v | t | j | d | b | h | ð | θ | n | p | l | w | m | ɹ | |||||||||||||||||||
| g | 24.1 | 137 | v | i | j | d | b | h | ð | θ | n | p | l | w | m | ɹ | ||||||||||||||||||||
| dʒ | 18.4 | 39 | j | d | b | h | ð | θ | n | p | l | w | m | ɹ | ||||||||||||||||||||||
| k | 17.2 | 205 | i | j | d | b | h | ð | θ | n | p | l | w | m | ɹ | |||||||||||||||||||||
| u | 15.8 | 1 | ||||||||||||||||||||||||||||||||||
| f | 14.3 | 163 | i | d | b | h | ð | θ | n | p | l | w | m | ɹ | ||||||||||||||||||||||
| v | 11.8 | 30 | ð | θ | n | p | l | w | m | ɹ | ||||||||||||||||||||||||||
| i | 10.3 | 6 | b | p | ||||||||||||||||||||||||||||||||
| t | 8.6 | 133 | d | b | h | ð | θ | n | p | l | w | ɔ | m | ɹ | ||||||||||||||||||||||
| ɪ | 4.4 | 6 | b | p | ||||||||||||||||||||||||||||||||
| j | 3.9 | 24 | n | p | l | w | m | ɹ | ||||||||||||||||||||||||||||
| ɚ | 0.6 | 6 | p | |||||||||||||||||||||||||||||||||
| ɑ | -0.9 | 6 | b | |||||||||||||||||||||||||||||||||
| d | -1.9 | 111 | ɛ | aʊ | b | h | ð | n | p | l | w | m | ɹ | |||||||||||||||||||||||
| ɛ | -4.7 | 8 | b | p | ||||||||||||||||||||||||||||||||
| aʊ | -4.7 | 6 | b | θ | p | |||||||||||||||||||||||||||||||
| b | -7.2 | 219 | n | p | l | w | m | ɹ | ||||||||||||||||||||||||||||
| h | -7.6 | 115 | n | p | l | w | m | ɹ | ||||||||||||||||||||||||||||
| e | -8.3 | 11 | p | æ | ||||||||||||||||||||||||||||||||
| ɔɪ | -12.4 | 1 | ||||||||||||||||||||||||||||||||||
| ð | -14.0 | 13 | ||||||||||||||||||||||||||||||||||
| θ | -17.0 | 36 | m | |||||||||||||||||||||||||||||||||
| n | -17.7 | 68 | ||||||||||||||||||||||||||||||||||
| p | -20.5 | 175 | ɹ | |||||||||||||||||||||||||||||||||
| l | -20.9 | 119 | ||||||||||||||||||||||||||||||||||
| w | -20.9 | 112 | ||||||||||||||||||||||||||||||||||
| ɔ | -21.0 | 6 | ||||||||||||||||||||||||||||||||||
| m | -24.0 | 101 | ||||||||||||||||||||||||||||||||||
| o | -25.4 | 4 | ||||||||||||||||||||||||||||||||||
| ɹ | -28.8 | 125 | ||||||||||||||||||||||||||||||||||
| æ | -40.0 | 10 |
Note. See notes for Table 4.
The tables leave no doubt that significant differences in response time between words with different initial phonemes are pervasive. At our significance level of .05, we would expect each row to have only one or two significant cells, but almost all rows have many more. The few exceptions are vowels, about which firm conclusions cannot be drawn because very few monosyllabic words in English begin with vowels. They are included in the tables primarily in order to show their differential residual response times.
Tables 4 through 7 contain a wealth of information about differences between individual pairs of phonemes. Our next step is to ask whether any generalizations emerge from the data. We took several passes over the data, breaking them down by each of the three main classes of consonant articulatory features: voicing, place of articulation, and manner of articulation. Because significance testing has already been done, we apply only informal summary statistics in this phase of the analysis.
The patterning most often mentioned by respondents to our Internet poll (35%) was that voicing would affect the measured response time. All of those who mentioned the direction of the effect said that voiced phonemes are detected faster than voiceless phonemes. A quick glance at Table 4 shows that this cannot be an absolute rule: There are many voiced phonemes that are detected more slowly than voiceless ones, and the slowest of all phonemes are the voiced /z/ and /i/. But if we look at tendencies rather than absolutes, the voiceless phonemes do seem to cluster in the top (slower) half of the table. One way to quantify that impression is to compare the number of voiced phonemes that are faster than voiceless ones, and vice versa. We counted 97 times where voiceless phonemes are significantly slower than voiced phonemes, but only 12 times where voiced phonemes are significantly slower than voiceless ones; the voiceless have the advantage in 89% of all the untied pairs. Applying the same counts to TMBR (Table 5), we get a ratio of 75:6 pairs where voiceless phonemes are slower than voiced phonemes (93%). Thus, the rule for voiced phonemes having faster response times holds for both studies carried out using the same equipment in the same lab at Wayne State University. The picture is somewhat different for the other two megastudies, however. In SB (Table 6) the ratio is only 63:40 (61%); in SW (Table 7) the ratio is 80:29 (73%). But it could be the case that voicing is only correlated with some other feature that is the true cause of the response time difference. To get around such problems, we next compared phonemes that differ only by the voice feature, but are otherwise identical. Comparing response times of such pairs in KTM, we found four pairs such as /tʃ/ and /dʒ/, where the voiceless phoneme is significantly slower than its voiced counterpart, but only one, /θ/ and /ð/, where the voiced phoneme is slower (4:1). These ratios are 4:0 for TMBR, 2:4 for SB, and 1:1 for SW.
What can account for these patterns? The general tendency to detect voiced phonemes more quickly is doubtless due to the fact that voiced phonemes tend to be louder than voiceless ones (Fry, 1979). If a voice key is set at a fairly high threshold, then it may skip over the softer, voiceless segments to a greater degree than it skips over voiced ones, resulting in shorter measured response times for the latter. The differences between studies could be due to differences in thresholds. The more sensitive the apparatus (e.g., the lower the voice key threshold is set), the less there should be a bias due to the difference in acoustic intensity between voiced and voiceless sounds. A less obvious source of bias could arise if the voice key trigger is so high that it tends to miss even voiced obstruents, which are less loud than vowels. In such an event, the voice key will trip earlier for words beginning with voiced phonemes, not because it detects those consonants, but because voiced phonemes are about 40 ms shorter than voiceless phonemes (Klatt, 1976).
In our poll, 5 people (8%) mentioned place of articulation as a dimension along which initial phonemes could differentially affect voice key response time. Of those five, only two respondents offered a specific hypothesis. One stated that labials have shorter response times than other consonants. The other stated more generally that anterior consonants are detected faster than posterior ones. The data conform to a slight variation of the anteriority hypothesis: velar > coronal (alveolar, postalveolar, or palatal) > labiodental > dental > labial (bilabial or labiodental), where “>” means that the symbol to the left has the greater measured response time. (A strict interpretation of the anteriority hypothesis would place labiodental sounds after dental sounds, because the former involve the lips.) In the KTM study, if we look at consonant pairs that differ only by place of articulation, 10 pairs like /k/ and /t/ agree with the hypothesis; none contradict it (10:0). The ratios for the other studies are TMBR 5:0, SB 7:2, and SW 12:0. Thus there are only two contradictions to the hypothesis in all four megastudies. This consistency suggests a real and pervasive bias that is not very sensitive to differences in experimental setups. For the plosives, this patterning correlates with the length of burst and aspiration noise: Velar plosives have the greatest burst and aspiration components, labial plosives have the least (Klatt, 1975). If voice keys tend to skip either or both of these components, that could explain why they tend to register velars most slowly and labials most quickly. For the fricatives, the relative ordering corresponds to the average duration of the phoneme; Kent and Read (1992) suggested that /ʃ/ and /s/ are longer than /f/, which is longer than /θ/. One may hypothesize that some voice keys have a threshold setting so high that they often skip over initial fricatives entirely. This is corroborated by the findings of Sakuma et al. (1997) and Rastle and Davis (2000), who found that typical voice keys, on average, were tripped more than 100 ms after the true beginning of an /s/-initial word; that length of time corresponds to the entire duration of the phoneme. A prominent exception in our megastudies is in SB, where /ʃ/ is detected faster than all other phonemes. That is explainable by the fact that /ʃ/ is the loudest of all obstruents, and that fricatives, as mentioned before, take the shortest time for speakers to initiate (Sakuma et al., 1997). The differences in the treatment of /ʃ/ between the studies could be due to the voice key being set to trip a little lower (or the microphone being a little more sensitive, or the participants sitting a little closer to the microphone or talking a little louder) in the SB study than in the other three megastudies.
Manner of articulation was suggested as a source of voice key bias by about half of the respondents in our poll, and it was also considered by some researchers in our analysis of published experiments. Twelve of the poll respondents (20%) suggested that plosives are detected faster and more reliably than other sounds. However, 4 people offered the opposite idea. A second hypothesis, offered by 16 respondents (27%), was that fricatives are detected later, and less reliably, than other sounds. The strictest way to investigate whether manner is a causal factor is to contrast pairs of phonemes that differ only in manner of articulation. In all four megastudies, /s/ is detected more slowly than /t/, and in three studies (KTM, SB, SW) we see a significant effect for /z/ being slower than /d/. That is, fricatives are detected more slowly than plosives, at least for the coronal place of articulation. These results are fairly meager because there are relatively few minimally contrasting pairs of phoneme one can check in English. A broader pattern can be discerned if we relax our criterion a bit and simply ask whether phonemes with different manners of articulation, as groups, differ in response times. For example, for affricates, we counted all pairs where an affricate phoneme was significantly slower than a nonaffricate phoneme, and divided that by the number of all pairs where an affricate is significantly slower or faster than a nonaffricate phoneme. Table 8 summarizes that computation across seven manners of articulation, and for all four studies. A fairly strong pattern here is that obstruents (plosives, affricates, and fricatives) are detected more slowly than sonorants (nasals, approximants, and vowels). The only exception is that in SB, affricates are detected quickly, like sonorants. These ratios corroborate the hypothesis that fricatives are detected very slowly, but contradict the idea that plosives are detected especially quickly.
| Manner | KTM | TMBR | SB | SW |
|---|---|---|---|---|
| Affricate | .82 | .79 | .19 | .84 |
| Fricative | .73 | .56 | .82 | .78 |
| Plosive | .65 | .64 | .67 | .56 |
| Nasal | .40 | .50 | .09 | .00 |
| Vowel | .13 | — | .12 | .32 |
| Glide | .07 | .03 | .26 | .20 |
| Liquid | .03 | .33 | .22 | .00 |
Note. Number of phoneme pairs where the phoneme with the indicated feature is significantly slower than the phoneme lacking the feature, divided by the number of phoneme pairs where exactly one of the phonemes has the indicated feature and there is a significant difference between their response times.
Physical causes for these patterns have, to a large extent, already been mentioned. Some types of sounds begin with a period of silence corresponding to complete air blockage in the vocal tract (plosives, affricates, and often vowels), which naturally slows their detection relative to other sounds; at the other extreme, fricatives can be produced exceptionally quickly. Another consideration is that sound types differ in their loudness. Sonorants (especially vowels), are the loudest of all sounds; obstruents are less loud and so may be skipped over entirely.
The inconsistent status of affricates can be attributed to the intermediate status of postalveolar consonants in general (/ʃ/ as well as /tʃ/ and /dʒ/), which tend to be the loudest of the obstruents (Fry, 1979). It would appear that in SB alone the voice key was sensitive enough to detect them readily, and in the case of /ʃ/ this combined with the articulatory advantage to make it the most quickly detected of all phonemes. In the other studies, the apparatus was apparently not sensitive enough to detect even postalveolar obstruents, and so words beginning with them were detected slowly.
The results clearly show a response time measurement bias. The statistical tests showing that response time varies across phonemes even when covariables are factored out (Table 3) are buttressed by finer-grained evidence at the level of phoneme pairs (Tables 4–7). The differences between phonemes align along the well-known dimensions of voicing, place, and manner of articulation; it is not the case that every phoneme behaves in an idiosyncratic fashion. The patterns are explicable in terms of our prior knowledge of the articulation and acoustics of the phonemes. However, the details can vary widely across different phonemes and different studies. Thus, while the fricatives /s/ and /ʃ/ are slow to detect in three of the megastudies, /ʃ/ but not /s/ is extremely quickly detected in one of the four studies (SB). It also appears that there may be substantial variation beyond that found in our four megastudies, in that we found no evidence of a response time advantage for plosives, yet that was mentioned by several poll respondents and reported by Connine et al. (1990). There is also the issue that several different factors influence response time measurements simultaneously. One cannot assume that by treating, say, fricatives, differently, one can readily correct the bias problem. Rather, all three dimensions of articulatory features have an effect, so that, in this example, fricatives will differ among themselves depending on their voicing and place of articulation.
For the most part, it has been assumed that effects of phonemes on voice key response times are limited to the initial phoneme of the word. As we have mentioned, less than a quarter of the experiments we surveyed included any effort to control for the second phoneme. However, some recent research shows that second-position phonemes may need to be taken into consideration. Rastle and Davis (2000) reported that typical voice keys trigger about 10 ms later on words that begin with /s/ plus obstruent than on words that begin with /s/ plus vowel. The analysis presented here asks whether this is an isolated problem or whether it is the tip of an iceberg.
We performed the same analyses as with initial phonemes, but this time used the second phoneme of the word as the grouping variable and blocked by the first phoneme. That is, we asked whether words with different second phonemes had significantly different response time or covariables, after factoring out the effect of the first phoneme.
The results in Table 9 are similar to those for initial phonemes (Table 3), though a few details vary. A test of whether words with different word-second phonemes have different raw (unadjusted) response times is significant in all four studies at p < .001. The covariables, by and large, are less plausible as indirect causes than they were for initial variables. As the superscripts indicate, even those covariables that varied significantly with the second phoneme cannot individually be responsible for the differences in response time. When we conducted a multiple regression to factor out the contribution of all these covariables from the response time, we found that words with different second phonemes differed in their residual response times. This difference just reaches significance in SW, but is highly significant (p < .001) in the other three studies. Overall, we have strong evidence of a voice key bias caused by the second phoneme of the word.
| Response | KTM | TMBR | SB | SW |
|---|---|---|---|---|
| Raw RT | .000 | .000 | .000 | .000 |
| Bigram Frequency | .029b | .001b | .115a | .105b |
| Consistency of Onset | .000a | .800a | .000a | .000a |
| Consistency of Rime | .000a | .000a | .000a | .000a |
| Familiarity | .413a | .714a | .652a | .551a |
| Frequency of Spelling | .236b | .184b | .709a | .701a |
| Homophones | .000a | .000a | .000a | .000a |
| Length of Pronunciation | .000a | 1.000a | .000a | .000a |
| Length of Spelling | .000a | .000a | .000a | .000a |
| Neighborhood Size | .000a | .000a | .000a | .000a |
| Error Rate | .000a | .003a | — | .193a |
| Residual RT | .000 | .000 | .000 | .050 |
aRaw RT remains significant (p≤ .05) when blocked by this covariable.
bRaw RT not significant when blocked by this covariable.
Tables 10 through 13 show figures for the individual phonemes and pairs of phonemes. Recall that the response time measurements are the average amount by which the residual response time exceeds the average residual response time of other second-position phonemes when they follow the same initial consonant. We ran these tests only for words that begin with a consonant. We omitted vowel-initial words from these analyses because relatively few monosyllabic words begin with vowels, leaving very little data with which to conduct significance tests. On phonetic principles, we would expect voice keys to be triggered near the beginning of vowel-initial words in any case, and therefore to be little affected by following phonemes.
| Phone | RT1 | N | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| j | 41.4 | 21 | i | u | ɪ | aʊ | aɪ | e | o | ɑ | ɔɪ | æ | ɔ | ɛ | ɚ | ʌ | ɹ | ʊ | l | |||||||
| i | 16.9 | 142 | ɪ | aʊ | aɪ | e | o | ɑ | ɔɪ | æ | ɔ | ɛ | ɚ | ʌ | ɹ | ʊ | l | |||||||||
| u | 16.9 | 94 | aɪ | e | o | ɑ | æ | ɔ | ɛ | ɚ | ʌ | ɹ | l | |||||||||||||
| p | 14.1 | 70 | aʊ | aɪ | o | æ | ɔ | ɛ | t | l | m | |||||||||||||||
| f | 12.7 | 2 | ||||||||||||||||||||||||
| ɪ | 10.8 | 225 | o | ɑ | æ | ɔ | ɛ | ɚ | ʌ | ɹ | ʊ | l | ||||||||||||||
| aʊ | 7.3 | 42 | k | n | ɔ | ʌ | ɹ | |||||||||||||||||||
| k | 6.4 | 74 | aɪ | o | ɔ | ɛ | t | l | m | |||||||||||||||||
| aɪ | 4.0 | 127 | n | ɑ | ɔ | t | ʌ | ɹ | ʊ | l | ||||||||||||||||
| e | 3.6 | 160 | ɑ | ʌ | ɹ | l | ||||||||||||||||||||
| n | 1.0 | 25 | ||||||||||||||||||||||||
| o | 0.1 | 132 | ɹ | l | ||||||||||||||||||||||
| ɑ | 0.1 | 140 | æ | ɹ | ||||||||||||||||||||||
| ɔɪ | -0.6 | 21 | l | |||||||||||||||||||||||
| æ | -2.2 | 186 | ʌ | l | ||||||||||||||||||||||
| ɔ | -2.4 | 149 | w | |||||||||||||||||||||||
| ɛ | -3.2 | 182 | ɹ | |||||||||||||||||||||||
| t | -3.9 | 107 | ||||||||||||||||||||||||
| ɚ | -4.5 | 77 | ||||||||||||||||||||||||
| ʌ | -5.3 | 180 | ɹ | |||||||||||||||||||||||
| ɹ | -6.8 | 336 | ||||||||||||||||||||||||
| ʊ | -7.2 | 32 | ||||||||||||||||||||||||
| l | -12.1 | 253 | ||||||||||||||||||||||||
| m | -14.8 | 18 | ||||||||||||||||||||||||
| w | -23.0 | 77 |
Note. Entries tell which phonemes have voice key response times significantly faster than the phoneme in the second column, p ≤ .05. Effect of first phoneme is blocked.
1Response time residuals (milliseconds) after effect of covariables is removed by regression.
| Phone | RT1 | N | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| æ | 33.0 | 128 | i | aɪ | ɛ | ɑ | e | ɪ | u | ʌ | ɚ | o | ɔ | ɔɪ | ʊ | ||
| aʊ | 30.6 | 23 | ɑ | e | ɪ | u | ʌ | ɚ | o | ɔ | ɔɪ | ʊ | |||||
| i | 20.0 | 110 | ɪ | u | ʌ | ɚ | o | ɔ | ɔɪ | ʊ | |||||||
| aɪ | 10.7 | 91 | ʌ | o | ɔ | ɔɪ | |||||||||||
| ɛ | 5.3 | 95 | ɪ | ʌ | ɚ | o | ɔ | ɔɪ | ʊ | ||||||||
| ɑ | 4.5 | 76 | ɚ | o | ɔ | ɔɪ | ʊ | ||||||||||
| e | 1.1 | 123 | ɪ | ʌ | ɚ | o | ɔ | ɔɪ | ʊ | ||||||||
| b | 0.0 | 1 | |||||||||||||||
| ɪ | -6.5 | 140 | ɔɪ | ʊ | |||||||||||||
| u | -7.5 | 66 | ʊ | ||||||||||||||
| ʌ | -13.0 | 117 | |||||||||||||||
| ɚ | -15.7 | 61 | |||||||||||||||
| o | -19.0 | 88 | |||||||||||||||
| ɔ | -19.6 | 75 | |||||||||||||||
| ɔɪ | -31.9 | 11 | |||||||||||||||
| ʊ | -34.4 | 29 |
Note. See notes for Table 10.
| Phone | RT1 | N | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| j | 17.6 | 18 | i | ʊ | u | aɪ | ɪ | ɔ | e | l | ɹ | aʊ | ɛ | ɑ | o | æ | ɚ | ʌ | ||||||||
| f | 12.2 | 2 | o | |||||||||||||||||||||||
| p | 8.3 | 53 | w | k | ɪ | ɔ | l | m | n | ɛ | t | o | ɚ | ʌ | ||||||||||||
| i | 7.5 | 123 | u | ɪ | ɔ | e | ɹ | aʊ | ɛ | ɑ | o | æ | ɚ | ʌ | ||||||||||||
| w | 6.9 | 58 | k | ɔ | l | ɹ | o | ʌ | ||||||||||||||||||
| ʊ | 6.0 | 29 | ɔ | e | ɹ | aʊ | ɛ | ɑ | o | æ | ɚ | ʌ | ||||||||||||||
| u | 4.2 | 76 | ɛ | ɑ | o | æ | ɚ | ʌ | ||||||||||||||||||
| aɪ | 4.0 | 114 | ɛ | ɑ | o | æ | ɚ | ʌ | ||||||||||||||||||
| k | 2.5 | 56 | ɔ | t | o | |||||||||||||||||||||
| ɪ | 2.3 | 193 | aʊ | ɛ | ɑ | o | æ | ɚ | ʌ | |||||||||||||||||
| ɔ | 1.2 | 130 | ɑ | æ | ʌ | |||||||||||||||||||||
| e | 1.2 | 137 | ɛ | ɑ | æ | ʌ | ||||||||||||||||||||
| l | 1.1 | 199 | ɹ | aʊ | ɑ | o | æ | ʌ | ||||||||||||||||||
| ɹ | 0.9 | 285 | ɑ | æ | ʌ | |||||||||||||||||||||
| m | 0.8 | 13 | o | |||||||||||||||||||||||
| n | -0.3 | 16 | o | |||||||||||||||||||||||
| aʊ | -0.3 | 39 | ||||||||||||||||||||||||
| ɔɪ | -0.8 | 19 | ||||||||||||||||||||||||
| ɛ | -2.1 | 164 | ||||||||||||||||||||||||
| ɑ | -2.9 | 117 | ||||||||||||||||||||||||
| t | -2.9 | 96 | ||||||||||||||||||||||||
| o | -3.3 | 118 | ʌ | |||||||||||||||||||||||
| æ | -3.7 | 167 | ||||||||||||||||||||||||
| ɚ | -3.8 | 68 | ||||||||||||||||||||||||
| ʌ | -4.4 | 162 |
Note. See notes for Table 10.
| Phone | RT1 | N | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| j | 27.5 | 17 | i | ɪ | aʊ | e | ɑ | u | o | ɛ | æ | ʌ | ɔ | l | ɚ | ɔɪ | ɹ | w | ||||||||
| f | 24.8 | 2 | w | |||||||||||||||||||||||
| i | 14.8 | 123 | e | ɑ | u | o | ɛ | æ | ʌ | ɔ | l | ɚ | ɔɪ | ɹ | w | |||||||||||
| k | 14.2 | 56 | aɪ | e | ɑ | n | ɛ | æ | ɔ | l | p | w | m | |||||||||||||
| aɪ | 9.6 | 114 | t | ɑ | u | n | o | ɛ | æ | ʌ | ɔ | l | ɚ | ɹ | p | |||||||||||
| ɪ | 9.5 | 194 | o | ɛ | æ | ʌ | ɔ | ɚ | ɔɪ | ɹ | ||||||||||||||||
| ʊ | 8.8 | 29 | ʌ | ɔ | l | ɹ | w | |||||||||||||||||||
| t | 7.3 | 96 | ɑ | ɛ | æ | ɔ | p | w | m | |||||||||||||||||
| aʊ | 5.8 | 39 | ɔ | ɹ | w | |||||||||||||||||||||
| e | 5.4 | 138 | ɛ | ʌ | ɔ | ɔɪ | ɹ | |||||||||||||||||||
| ɑ | 4.6 | 116 | ɹ | |||||||||||||||||||||||
| u | 2.0 | 76 | ɹ | |||||||||||||||||||||||
| n | -0.4 | 16 | w | |||||||||||||||||||||||
| o | -0.5 | 121 | ɹ | |||||||||||||||||||||||
| ɛ | -0.6 | 163 | ɹ | |||||||||||||||||||||||
| æ | -2.7 | 168 | ɔ | ɹ | ||||||||||||||||||||||
| ʌ | -3.8 | 162 | ɹ | |||||||||||||||||||||||
| ɔ | -3.9 | 131 | ||||||||||||||||||||||||
| l | -4.3 | 199 | ɹ | |||||||||||||||||||||||
| ɚ | -4.8 | 69 | ||||||||||||||||||||||||
| ɔɪ | -6.5 | 19 |