1. Bilingual Speech Perception
The development of speech production and perception occurs within the constraints of the phonetic inventory and vernacular of the language being learned (
Anthony et al. 2009;
Birdsong 2018;
Fennell et al. 2016;
Ferjan Ramírez et al. 2017;
Flege and Eefting 1986;
Gonzales et al. 2019;
Grosjean 1996;
Luk and Bialystok 2013;
Smith et al. 2021;
Soto et al. 2019). Speech intelligibility occurs within the context of spoken language. Learning a second language (L2) occurs within the context of different learning processes than for monolinguals. Aspects of language that do not immediately or easily transfer from the vernacular first language (L1) to the second language (L2) are referred to as language interference (
C.A. Brown 1998;
Cummins 2009;
Markov et al. 2022). Knowledge of common word sequences (lexical and collocational knowledge) for L2 likely develops across the lifespan; while, knowledge of grammar structures (morphosyntax) and individual speech sounds (phonetics) seem to have sensitive periods between six years and mid-adolescence, respectively (
Granena and Long 2012). These developmental learning differences (i.e., in lexical word knowledge, morphosyntax, and phonetics) between L1 and L2 can result in perception and production differences in L2 (
Anthony et al. 2009;
Birdsong 2018;
Fennell et al. 2016;
Ferjan Ramírez et al. 2017;
Flege and Eefting 1986;
Granena and Long 2012;
Grosjean and Miller 1994;
Li 1996;
Li and Yip 1998;
Luk and Bialystok 2013;
Smith et al. 2021). Native listeners perceive L2 production differences as a foreign accent. For example, the Spanish tap /r/ is not present in English and is often difficult for English speakers to perceive as a distinct sound and to acquire. Likewise, English “th” sounds, /ð/ and /θ/, do not occur in Western hemisphere Spanish and are difficult for Spanish speakers to perceive as distinct sounds and also to acquire (
Yavaş and Goldstein 2018). In such cases,
Goldstein and Brunta (
2011) explain that a speaker will likely substitute these L2 sounds with the closest available approximation from L1 (e.g., for a Spanish speaker, /ðɪs/->/dɪs/). Both positive and negative cross-linguistic influences may be observed in both simultaneous and sequential bilinguals (
Bialystok et al. 2009;
Fricke et al. 2019;
Navarro-Torres et al. 2021).
Birdsong (
2018) stated, “It is important to emphasize that, despite bilingualism effects, there are late L2 learners who resemble native monolinguals with respect to targeted aspects of the L2 (as opposed to bilinguals being indistinguishable from monolinguals in every measurable respect)” (p. 6). Bilingual speakers, fluent in their native language, can achieve native-like abilities in speech voice onset times (VOT), global pronunciation, morphology, and syntax in the second language (
Birdsong 2018).
McCarthy et al. (
2014) found that while bilingual children showed less consistent use of English words than their monolingual counterparts between 46 and 57 months of age; by the time the children began primary school, this difference disappeared and both groups had comparable vocabulary sizes. Consequently, research has shown that sequential bilingual learners are similar to monolingual learners in terms of overall language skill acquisition (
Castilla et al. 2009) and that neither simultaneous nor sequential bilingualism causes speech-language disorders (
Korkman et al. 2012;
Marinis and Chondrogianni 2011).
Other aspects of bilingualism include when L2 was learned (i.e., early vs. late learners) and how L2 was learned (i.e., simultaneous and sequential bilingual speakers). Research indicates that all early/late and simultaneous/sequential bilinguals are capable of native-like proficiency, receptively and expressively (
Bak et al. 2014;
Birdsong 2018;
MacLeod and Stoel-Gammon 2005;
McCarthy et al. 2014). Thus, indicating that any second language attained, generally speaking, is not always inferior to first language abilities. However, bilinguals and monolinguals are not identical in how they process and produce language.
Kupisch et al. (
2014) also assert that native-like proficiency is most common in bilinguals who use L2 on a daily basis over a sustained period of time.
Herschensohn (
2022) stated that both simultaneous and sequential learners are primarily influenced by internal and external factors affecting L2 acquisition.
2. Second Language Proficiency
Two estimates of length of exposure to L2 are age of acquisition (AoA) and length of residence (LoR). Age of acquisition (AoA) refers to the age at which L2 is first encountered. An individual may arrive in the United States at three years of age, yet not be fully exposed to English until five years of age when they enter school. Hence, the age of arrival would not serve as an adequate indicator in this situation. Therefore, the age of acquisition would be a more accurate estimate of the length of exposure. LoR refers to the amount of time spent in contact with L2. Age of acquisition and LoR are used to make distinctions regarding the amount of L2 language exposure (i.e., early, middle, and late bilingualism). While estimates vary somewhat within the literature,
Brice et al. (
2013) defined early bilingualism as L2 onset between birth and 8 years of age, middle bilingualism between 9 and 15 years of age, and late bilingualism after 16 years of age. The study of speech perception and production through the lens of AoA and LoR can offer observable data on how individuals from the three bilingual age groups compare to monolinguals.
Granena and Long (
2012) offer a detailed picture of how both AoA and LoR affect different aspects of language learning. For example, a battery of language mastery tests completed by those who acquired L2 during the upper limit of the sensitive period for phonetic skills (12 years of age) revealed a pattern of much greater proficiency with morphosyntax than with producing and identifying individual speech sounds (
Granena and Long 2012). Thus, length of time speaking L2 (LoR) may have more of an impact on the ease with which an individual can converse in L2; while early exposure (AoA) may have more of an impact on how individual speech sounds are processed (
Granena and Long 2012).
The findings of
De Carli et al. (
2015), corroborate those of
Granena and Long (
2012), who found that consistent and sustained practice of L2 was more important than AoA for achievement of high proficiency in conversational L2. Together, these findings indicate a possibility that, although bilinguals can become highly proficient in the recognition and use of L2 word sequences and grammar structures, there may be fundamental differences from monolingual/native speakers in terms of how phonotactic elements of speech are processed during speech perception. For example,
Sebastián-Gallés et al. (
2005) employed a gating task to examine how well Spanish/Catalan bilinguals recognized non-word phonemic contrasts. All stimuli consisted of disyllabic non-words with phonemic contrasts that were common to Catalan but nonexistent in Spanish. They found that participants with Spanish as their L1 required larger portions of the stimuli to make accurate identifications when compared to participants with Catalan as their L1.
Sebastián-Gallés et al. (
2005) also found that the Spanish L1 speakers failed to differentiate /e/-/ɛ/ vowel contrasts.
These findings suggest that L1 impacts the perception of non-native phonemic contrasts, even when exposure to the second language is early and extensive. For example, a monolingual English speaker may not be able to perceive a Spanish tap or flap /r/ and perceive it as a /d/ sound instead (
Brice et al. 2009). In a gating study with 30 monolingual and 30 Spanish-English speakers (early bilinguals, 0–8 years of age; middle bilinguals, 9–15 years of age, and late bilinguals, 16 plus years of age),
Brice and Brice (
2008) found that all the adult bilingual speakers identified words faster with initial voiced consonants than initial voiceless consonants. In addition, CV words that contained tense vowels were identified faster than words with lax vowels in English. Spanish does not have lax vowels; therefore, the perception of English-lax vowels will be affected (
Kondaurova and Francis 2010;
Smith and Hayes-Harb 2016). Consequently, sounds that are more common in L1 (e.g., tense vowels) can have lasting effects on L2 perception (e.g., lax vowels and slower speech perception) even into adulthood.
3. Code-Switching and Code-Mixing
It is common for bilingual speakers to use more than one language in their daily interactions. This use of both languages may manifest as code-switching or code-mixing. Other terms are used to describe this bilingual phenomenon (e.g., language alternations, use of translanguages, inter-sentential alterations, and intra-sentential alterations); however, we will use the terms code-switching and code-mixing. Code-switching refers to the use of both languages within a single discourse (
Li 1996;
Li and Yip 1998), usually across sentence boundaries. For example, a German-English bilingual speaker might say, “Hello there!
Wie geht’s? (How are you?)”. Code mixing is the use of two languages within a single sentence (
Martin et al. 2003). For example, “I like that
kleid (dress)”, or
“Andale pues (okay, swell), and do come again, mm?” (
Gumperz 1982).
Gumperz (
1982) offers another example in Spanish, taken from conversation: “She doesn’t speak English, so,
dice que la regañan: ‘Si se les va olvidar el idioma a las criaturas’ (she says that they would scold her: ‘the children are surely going to forget their language’)” (p. 76). While code-mixing is more syntactically complex than code-switching, both are common behaviors among bilingual speakers (
Brice and Brice 2008;
Brice et al. 2021;
Grosjean 1996,
2001;
Grosjean and Miller 1994;
Heredia and Altarriba 2001;
Li 1996;
Li and Yip 1998). Most bilinguals voluntarily engage in purposeful code-mixing in natural conversation (
Gollan and Ferreira 2009), and researchers have found that bilinguals readily use the speech sound inventories of both L1 and L2 when code-mixing (
Genesee 2015;
Grosjean 1996,
2001;
Grosjean and Miller 1994). Code-mixing is not observed to be more taxing than choosing one language over the other and may in fact be easier than restricting speech to a single language (
Gollan and Ferreira 2009), even though bilinguals may utilize more cognitive resources to choose which language to use when code-mixing as opposed to when speaking in a single language (
Flege 1995;
Li 1996). Most research examining code-mixing has focused on the ability of bilingual individuals to produce speech sounds in L1 and L2 (
Genesee 2015;
Gollan and Ferreira 2009;
Grosjean 1996,
2001;
Grosjean and Miller 1994;
Heredia and Altarriba 2001;
Piccinini and Arvaniti 2015;
Thornburgh and Ryalls 1998). The limited literature on bilinguals has typically addressed speech production and focused significantly less on speech perception (
López 2012;
Sebastián-Gallés et al. 2005;
Edwards and Zampini 2008).
García-Sierra et al. (
2012) stated, “However, little is known about the way speech perception in bilinguals is affected by the language they are using at the moment” (p. 194).
The phonetic frequency of such sounds may have influenced the results of
Brice et al. (
2013). To give an example, high-frequency sounds are those that occur often in a language and are thus more easily recognizable (
Metsala 1997). Thus, it is possible that the voiceless consonants had a high frequency of occurrence and were more easily recognized by participants (
Brice et al. 2013;
Geiss et al. 2022;
Keating 1980). Additionally, the Spanish speakers in the
Brice et al. (
2013) study were all early bilinguals and may have had more exposure to English than to Spanish; whereas, the participants in the
Brice and Brice (
2008) study consisted of early, middle, and late bilinguals. Hence, bilinguals may change their perception of speech sounds as a result of their first English exposure and length of English exposure. Consequently, it is important to delineate AoA carefully.
In contrast, monolinguals need only deal with the functional load (i.e., the importance of certain features that assist in making language distinctions), phonetic frequencies, and word frequencies in one language. Functional load is not to be confused with cognitive load. Whereas cognitive load refers to mental effort (
Sweller 1988), functional load refers to the speech sound’s distinctiveness in the utterance context (
A. Brown 1988). A phonetic feature with a high functional load is one that contributes greatly to making a word understandable. One method of investigating the way in which high-frequency sounds, functional loads, and cognitive loads contribute to an individual’s perception of speech is through the use of gating.
4. Gating
Developed by
Grosjean (
1988), gating is when an auditory stimulus is presented to a listener in equal and increasing segments of time. The sound segments that a listener hears are referred to as “gates.” The stimuli may range from single words to longer phrases and/or sentences. This study is concerned with gating single words such as in previous gating studies (
Brice et al. 2021;
Li 1996;
Sebastián-Gallés et al. 2005). Gating is useful because it quantifies the extent of phonetic information that a listener needs in order to recognize and identify words (
Li 1996).
Gating involves parsing an auditory stimulus presented to a listener into segments. Gates are typically presented in segments of 50–70 ms (
Boudelaa 2018;
Brice et al. 2013;
Grosjean 1996;
Li 1996;
Mainela-Arnold et al. 2008). In this scenario, the listener first hears 70 ms of a word, then 140 ms of the word, and so on until either the listener makes a correct identification of the word or the end of the trial is reached. The outcome measures of interest are labeled the “isolation point” and the “recognition point” (
Grosjean 1996). The isolation point is defined as the portion of the stimulus (e.g., 4 of 9 gates) needed for participants to make a correct identification of the target word. The recognition point in this study also refers to the portion of the stimulus needed for participants to recognize the word; however, the participants must give two consecutive correct identifications with 100% certainty. This determines the accuracy of their identification.
5. Neuroimaging Investigations of Bilingual Language Perception and Production
Monolingual and bilingual speech and language processing seems to require both comparable and dissimilar brain areas.
Abutalebi et al. (
2001) presented evidence that, in some instances, second language (L2) processing occurs in the same dedicated language areas as first language (L1). Questions have also been raised regarding possible bilingual-monolingual differences in the recruitment of other brain areas, such as the prefrontal cortex, for language processing, particularly when bilingual individuals switch between multiple languages as with code-mixing or code-switching (
Abutalebi et al. 2001). In a later review of fMRI and PET studies,
Abutalebi (
2007) explored evidence for such differences, finding that many studies supported the notion of increased brain activity in areas also associated with L1 language processing when bilinguals engaged in both code-switching and code-mixing. There is also a wealth of evidence for the recruitment of additional areas by bilingual language users when engaging in code-mixing and code-switching; specifically, the left prefrontal cortex (Brodmann areas 8, 9, 10, 11, 12, 13, 44, 45, 46, and 47), anterior cingulate cortex (ACC; Brodmann areas 24, 32, 33), and basal ganglia (
Abutalebi 2007;
Hernandez et al. 2000;
Jasinska and Petitto 2013;
Kovelman et al. 2008). Some researchers have begun investigating bilingualism and brain localization of function utilizing imaging technology, particularly fNIRS technology (
Jasinska and Petitto 2013;
Kovelman et al. 2008,
2014;
Zinszer et al. 2015). Many researchers now do not believe Broca’s area to be restricted in function to only speech production, nor Wernicke’s area only to speech comprehension; it is more likely that all of the aforementioned language centers of the brain are involved both in comprehension and production of speech (
Fadiga et al. 2009;
Hagoort 2014).
Hagoort (
2014) provides an overview of an emerging, dynamic view of speech perception and production, in which speech production and comprehension act as shared networks among frontal, temporal, and parietal regions. It is also likely that more areas than just these dedicated speech centers are used for language. For example,
Hagoort (
2014) states that memory processes have been implicated in language processing, as individuals access a mental lexicon of speech sounds and whole words both when listening and when speaking.
Exploring differences not only between monolinguals and bilinguals but among bilinguals themselves,
Hernandez (
2009) conducted an investigation into differences in levels of neural activation between low-proficiency and high-proficiency bilinguals. When given a picture-naming task, bilinguals showed increased dorso-lateral prefrontal cortex (DLPFC; Brodmann areas 9 and 46) activation when switching between languages as opposed to naming pictures in a single language, regardless of the person’s language proficiency level (i.e., low or high). Activation was also noted in brain areas devoted to the hippocampus (i.e., memory) and amygdala (i.e., somatosensory processing) in code-mixed language conditions and single-language conditions (
Hernandez 2009). Hernandez speculates that somatosensory processing is involved due to emotions and cognition salient to the words in each language for the participants. The superior parietal lobe is also involved in somatosensory processing, attention, and visual-spatial perception (
Wilkinson 1992). It was unclear, however, if the somatosensory processing activation is unique to bilinguals or if it is shared with monolinguals.
Although
Hernandez’s (
2009) work addressed only early and late bilinguals, evidence from
Archila-Suerte et al. (
2015) supports the idea that neural processing both overlaps and differs between bilinguals and monolinguals and between early and late bilinguals. In an fMRI study of brain activity during both speech production and speech perception tasks, it was found that monolinguals, early bilinguals (those with an AoA of less than 9 years old), and middle bilinguals (those with an AoA of more than 10 years old) performed similarly in terms of results for speech production and perception in L2 (L1 for monolinguals).
It should be noted that the neural processing of speech sounds differed. Early bilinguals showed greater engagement of prefrontal regions involved in working memory compared to monolinguals, while middle bilinguals showed greater activation in the inferior parietal lobule compared to both early bilinguals and monolinguals (
Archila-Suerte et al. 2015).
Similarly,
Perani et al. (
1998) found evidence for differences in activation in both left and right temporal and hippocampal regions between high and low proficiency groups in a PET investigation of performance on a task involving comprehension of an entire story. Low-proficiency bilinguals showed lower activation than high-proficiency bilinguals, though both bilingual groups displayed greater activity (more blood flow) located in areas also associated with L1 (
Perani et al. 1998). Together, this evidence suggests that although general localization of neural speech processing may be common between monolingual and bilingual groups; levels of activation and/or diverse neural regions may also differ. This supports the notion that perception and underlying language comprehension can be comparable yet also vary between bilingual and monolingual individuals.
6. Functional near Infrared Spectroscopy (fNIRS)
Functional near-infrared spectroscopy (fNIRS), a technique that uses infrared light to examine hemodynamic response in a shallow brain depth (approximately 3 cm), is an emerging method of investigating levels of neural activation. The regions of the brain that can be examined via fNIRS include the lateral prefrontal cortex (LPFC) and the medial prefrontal cortex (MPFC). Since a delay occurs between response to a stimulus and peak oxygenation readout on an fNIRS device (
Tak and Ye 2014), most fNIRS researchers (
Kovelman et al. 2008;
Minagawa-Kawai et al. 2007;
Zinszer et al. 2015) have employed block designs to capture data. These designs measure peak oxygenation levels during blocks of time starting approximately 5 s after the initial presentation of target stimuli and are measured against a baseline of oxygenation data.
This entry is adapted from the peer-reviewed paper 10.3390/languages8030216