Monogenic causes of language disorders remain comparatively rare, and do not fully account for the developmental language disorders (DLD) prevalence rate of >7%. It is widely accepted that common risk variants confer a genetic susceptibility for DLDs. Termed ‘complex genetic model’, each variant contributes incrementally to an overall level of risk of developing a language disorder. Studies to identify these risk variants within a complex genetic model fall into two main approaches: linkage studies and genome-wide association studies (GWASs).
1. Linkage Analyses
Using genetic linkage studies to identify regions of the genome shared between affected individuals were the linchpin of neurodevelopmental genetics in the 2000s. As the principal method to detect common variants, they were used to generate many important findings to elucidate the genetics of complex disease. Linkage studies were particularly well suited to detect common variants with a moderate or large effect size and present in more than 10% of population. The Specific Language Impairment Consortium (SLIC) found regions strongly associated with SLI on 16q24 (SLI1) and 19q13 (SLI2) 
). Fine-mapping of these regions implicated two specific genes; C-mad inducing protein (CMIP
) and calcium-transporting ATPase type 2C member 2 (ATP2C2
. Both CMIP
were found to contain common risk variants with a moderate effect size and have additional evidence through the identification of monogenic cases of language disorder (see Table 1
). Newbury et al. (2009) 
discovered that CMIP
was associated with language, reading and spelling in both the SLI cohort and in the general population. This may suggest a contribution to phonological language skills in language ability more generally. In contrast, ATP2C2
was associated with phonological memory in the SLI cohort, but only showed association within the language-impaired group in the general population, suggesting a possible role specifically in individuals with language disorders. Recently, Martinelli et al. (2021) 
characterised the functional effects of a rare variant in ATP2C2
and its role in language disorders.
Table 1. Summary of findings from genome-wide association studies of language disorders.
|Luciano et al. (2013) 
|Eicher et al. (2013) 
||Selected reading and language impaired
||3, 4, 13
|Nudel et al. (2014) 
||Selected (parent of origin)
|St Pourcain et al. (2014) 
|Gialluisi et al. (2014) 
||Selected reading and language impaired
|Harlaar et al. (2014) 
|Kornilov et al. (2016) 
|Eising et al. (2021) 
||Selected and population
Meta-analysis using 19 cohorts
|Doust et al. (2021) 
||51,800 dyslexia cases, 1,087,070 controls
||Selected and population
Meta-analysis using binary case/control self-reported measure of dyslexia
|1, 2, 3, 6, 7, 11, 17, X
Bartlett et al. (2002) 
showed a strong association to chromosomal region 13q21 (SLI3) on reading specific measures, and more modestly to regions 2p22 and 17q23 using more general measures of delayed language. In this case, they utilised a family-based linkage approach using five large Canadian families where several family members had a diagnosis of SLI. Evans et al. (2015) 
identified two associated regions using 147 pairs of siblings, where at least one had an SLI diagnosis. They reported two regions associated with phonological memory, 10q23.33 and 13q33.3.
In many of these examples, the exact genes and contributory variants remain unelucidated. This is a particularly difficult issue with linkage studies, which tend to identify very large regions containing many hundreds of genes, making fine mapping difficult.
As cohort sizes have increased, and genomic data at a population level becomes more available, the linkage study has been replaced by the genome-wide association study except in large families or highly related populations. GWASs provide a much higher resolution of variants and allow for more efficient finer mapping of contributory variants. For example, Andres et al. (2019) 
used linkage to identify a region on chromosome 2q associated with SLI in fourteen consanguineous Pakistani families, totalling 156 individuals.
Linkage studies are difficult to replicate in other populations, and there is rarely overlap between studies 
2. Genome-Wide Association Studies
The current methodology for detecting genomic variants associated with a complex condition is to perform a genome-wide association study (GWAS). GWAS uses advances in genetic marker technology to simultaneously assess more than 4 million sites of known common variation across the entire genome, providing higher resolution. In addition to more variants, the number of individuals has increased into the tens or hundreds of thousands. A pivotal study in the broader field of psychiatric genetics identified more than 100 regions associated with schizophrenia using 37,000 cases and 113,000 unaffected controls 
. Studies focused on DLD and language-related phenotypes are only very recently beginning to achieve sample size on this scale (Table 1
). A number of GWASs have been performed on SLI/DLD and related phenotypes, and identified genetic regions are summarised in Table 1
. As with the linkage studies described in the previous section, there is little consistency between the genomic regions found to be associated between studies. This can be partially explained by differences in phenotyping used between studies, exacerbated by the lack of robust consensus criteria for diagnosing DLDs. Secondly, the genetic aetiology of DLDs are such that it likely involves many contributing variants across many different genes (and environmental factors) each of a small effect size.
Very recently, two research studies have presented meta-analyses in which multiple different GWAS cohorts are combined into one large study 
. Meta-GWAS is a method whereby the GWAS summary statistics from more modest cohort sizes can be pooled together to increase power and is a cost-efficient means of gene identification. Eising et al. (2021) 
utilised 22 different cohorts and five measures: word reading, non-word reading, spelling, phoneme awareness and non-word spelling. They found that word reading associated with a variant (rs11208009) using a subset of 19 cohorts and 33,959 individuals. The variant lies outside of a genic region, but is located near to (and in linkage disequilibrium with) three potential candidate genes: DOCK7
, and USP1
. They went on to show that both reading and language traits have a genetic basis that is largely separate to that of Performance IQ.
Even larger still, Doust et al. (2021) 
utilised population and genetic data from 23 and Me, totalling 51,800 adults who self-reported that they had dyslexia and over a million controls without dyslexia. The authors identified 42 individual genomic regions that associated with diagnosis of dyslexia. Of these 42, 17 had been previously reported as associated with either education attainment or cognitive ability, and 25 were novel 
. A total of 12 of the 25 novel regions went on to be independently replicated in separate cohorts. It is important to note the trade off in these studies between the sample size and phenotyping information, which was a yes/no question and self-reported.
Both Eising et al. (2021) 
and Doust et al. (2021) 
represent substantial leaps forward in understanding the genetic contribution to DLDs and sample size is now large enough to detect some of the missing heritability of language disorders. As the list of candidate genes grows, so does our knowledge of the biology of DLD risk. Polygenic profiles, in which an overall risk score is generated for each risk allele associated with a phenotype, can be generated which capture the genetic differences and similarities between neurodevelopmental disorders. Early studies suggest that this is a promising area of research. Shared genetic effects have been shown to exist between cognitive ability, educational attainment, language development and psychosocial outcomes; however, this pilot study was based on very small sample sizes 
. As polygenic profiles are updated using summary statistics from increasingly large GWASs, they become more sensitive and specific, allowing for improving inference accuracy. For example, the first profiling of educational attainment explained 2% of variance 
while more recent scores explain 13% 
. Polygenic methods are being developed in language disorders 
and dyslexia 
. Polygenic risk scores of clinical conditions indicate that polygenic profiles can be informative for the extremes of the population (who carry a high burden of risk or protective variants). So even if they are not useful for capturing individual variation in the middle of the distribution at the extremes, they can be clinically meaningful.