The MADS-box gene family encodes a number of transcription factors that play key roles in various plant growth and development processes from response to environmental cues to cell differentiation and organ identity, especially the floral organogenesis, as in the prominent ABCDE model of flower development. Recently, the genome of American beautyberry (Callicarpa americana) has been sequenced. It is a shrub native to the southern region of United States with edible purple-colored berries; it is a member of the Lamiaceae family, a family of medical and agricultural importance. Seventy-eight MADS-box genes were identified from 17 chromosomes of the C. americana assembled genome. Peptide sequences blast and analysis of phylogenetic relationships with MADS-box genes of Sesame indicum, Solanum lycopersicum, Arabidopsis thaliana, and Amborella trichopoda were performed. Genes were separated into 32 type I and 46 type II MADS-box genes. C. americana MADS-box genes were clustered into four groups: MIKCC, MIKC*, Mα-type, and Mγ-type, while the Mβ-type group was absent. Analysis of the gene structure revealed that from 1 to 15 exons exist in C. americana MADS-box genes. The number of exons in type II MADS-box genes (5–15) greatly exceeded the number in type I genes (1–9). The motif distribution analysis of the two types of MADS-box genes showed that type II MADS-box genes contained more motifs than type I genes. These results suggested that C. americana MADS-box genes type II had more complex structures and might have more diverse functions. The role of MIKC-type MADS-box genes in flower and fruit development was highlighted when the expression profile was analyzed in different organs transcriptomes. This study is the first genome-wide analysis of the C. americana MADS-box gene family, and the results will further support any functional and evolutionary studies of C. americana MADS-box genes and serve as a reference for related studies of other plants in the medically important Lamiaceae family.
1. Introduction
Mints (Lamiaceae) are the sixth largest family of flowering plants and include many ornamental, medical, and edible species, such as basil, rosemary, thyme, peppermint, and spearmint. Full genome and transcriptome sequencing data that are available at the Mints Genome Project database (
http://mints.plantbiology.msu.edu/index.html; accessed on 1 July 2021) and separate other projects are enhancing our understanding of this important medical plant family. American beautyberry (
Callicarpa americana) is known for its prominent purple fruit, and it has been reported that native Americans have used it as an insect repellent and medicinal plant
[1]. Studies have revealed a number of terpenoids, such as spathulenol, intermedeol, and callicarpenal that have been isolated from the plant, and proved to be effective as a mosquito repellent in laboratory experiments
[2][3][2,3].
Callicarpa is a representative from the early-diverging mint lineage, and thus, it has an important phylogenetic position to study the evolution of key gene families, such as the MADS-box genes. Recently, the full genome sequence of
C. americana has been published
[4], providing the opportunity to conduct a comprehensive analysis of the
C. americana MADS-box gene family. However, the identity and function of MADS-box genes in this species have not been reported in detail.
The MADS-box transcription factor family is of key importance; it can be found in almost all eukaryotes, from protists to animals, but in plants, it is most important for its major role in organ identity and cell differentiation from roots to flower development and fruit ripening and, thus, has been extensively studied
[5][6][7][5,6,7]. Understanding the genes that regulate flower, root, and fruit development is of key importance, on a scientific fundamental level, as well as on an economic level. The MADS-box gene family also has a role in plants’ developmental plasticity and responses to abiotic stress such as drought, salinity, extreme temperatures, and nutrient deficiency
[8][9][8,9]. The acronym MADS represents the first letters of its founding members: mini chromosome maintenance 1 (MCM1) of yeast (
Saccharomyces cerevisiae), agamous (AG) of
Arabidopsis thaliana, deficiens (DEF) of snapdragon (
Antirrhinum majus L.), and serum response factor (SRF) of humans
[10]. All MADS-box proteins are characterized by the presence of about 60 amino acids long, DNA-binding domain, known as the MADS-box domain (M-domain), located at the N-terminal region of the proteins. The development of the floral organ is controlled by major groups of MADS-box genes, through the ABCDE model of flower development. In this model, tetramers from different subgroups determine the organ identity; sepal development is directed by the A subfamily genes, petal development requires A and B genes, and carpel development is determined by C genes, whereas stamen development is determined by B and C genes. While the D-functional genes are needed in ovule development
[11][12][13][14][11,12,13,14], and the E-functional genes—acting as the glue that binds different members in the tetramer quartet—are required for the development of all floral organs
[15][16][15,16].
According to majority of studies, M-type (type I) and MIKC-type (type II) are the two evolutionary lineages of MADS-box genes
[17][18][17,18]. Both types contain the DNA-binding M-domain. The MIKC-type contains several other conserved domains in addition to the M-domain: an intervening (I) domain, a keratin-like (K) domain, and a C-terminal (C) domain
[19][20][19,20]. Each of these domains has a role in protein–protein interaction with other MADS-box protein forming dimers and tetramers and with non-MADS proteins
[21]; in addition, the C-domain is the most variable, and usually, it contains a transcriptional activation domain
[13].
The MIKC type II genes can be further classified as MIKC
C (C for “classic”) and MIKC*. The MIKC
C type is more diverse, containing thirteen subgroups based on structural differences: SQUAMOSA [SQUA (A)], DEFICIENS/GLOBOSA [DEF/GLO (B)], AGAMOUS [AG (C/D)], SEPALLATA [SEP (E)], AGAMOUS-like; AGL6, AGL12, AGL15, AGL17 (ANR1), B sister (Bsis), SUPPRESSOR OF OVEREXPRESSION OF CO 1 [TM3/SOC1], STMADS11 (SVP), FLOWERING LOCUS C [FLC], and TOMATO MADS 8 [TM8]. While the MIKC* is less diverse and has only two subgroups MIKC*-S and MIKC*-P. Studies showed that the MIKC* type has more conserved functions compared to the M-type and MIKC-type through plants evolution
[15][18][22][15,18,22]. MIKC*-type genes play an essential role in the development of the male gametophyte in
A. thaliana, and they have high degree of functional redundancy. The M-type group usually does not contain the K-domain and overall lacks the domains complexity found in MIKC-type proteins. The M-type (type I) genes are divided into three subgroups: Mα, Mβ, and Mγ subgroups in most plants
[23].
In this study, the MADS-box gene family for
C. americana (American beautyberry) has been systematically analyzed. A total of 78 MADS-box genes were identified in 17 chromosomes. These genes were renamed
CamMADS1 to
CamMADS78 based on their locations on the chromosomes, and a phylogenetic tree of all
CamMADS genes have been constructed. In addition to
C. americana, the type I and type II MADS-box genes of
Arabidopsis thaliana, Sesamum indicum, Solanum lycopersicum, and
Amborella trichopoda were analyzed and utilized to construct two phylogenetic trees, one for type I and one for type II of these genes. The gene structure and conservative domain in these genes were identified, then the expression patterns of
C. americana MADS-box genes in various tissues were analyzed. In addition,
cis-regulatory elements were analyzed and identified in the 2 kb upstream promoter regions. Results indicated their broad range of functions in several
C. americana tissues, with major roles in flower and fruit development and abiotic stress response. This study will help in improving our understanding of the evolution and function of this essential transcription factor family, in the medically important
Lamiaceae family
[24].
2. Development and Findings
MADS-box genes have been identified in several species, both the numbers and the types of MADS-box genes differed greatly among these species. Some species had very few type I (M-type) genes or lacked them totally, as in:
Saccharum officinarum (grass),
Marchantia polymorpha (Marchantiophyta),
Klebsormidium flaccidum, Dunaliella salina, and
Chlorella variabilis (Algaea). While, the Angiosperms species
Amaranthus hypochondriacus and
Jatropha curcas have ten genes. Several algae species had very few or lacked the type II (MIKC) genes, as in:
Bathycoccus prasinos, Chlamydomonas reinhardtii, and
Volvox carteri. Marchantia polymorpha (Marchantiophyta) has two type II (MIKC) genes, and
Picea abies (Pinophyta) has three. While, the Angiosperms specie
Daucus carota has five genes. Angiosperms also have the largest number of type I genes (
Camelina sativa: 271 genes) and the largest number of type II genes (
Glycine max, Soybean: 209 genes)
[25][26][29,30].
The number of type I MADS-box genes in
C. americana (32) was similar to
S. indicum (31), but lower than
Ocimum tenuiflorum (42), all members of Lamiaceae family. While, the number of type II genes in
C. americana (46) was higher than that in
O. tenuiflorum (43) but lower than
S. indicum (62). The genome size of
C. americana was 506.1 Mb
[4], compared to the genome size of
S. indicum 337 Mb
[27][34] and 612 Mb estimated genome size for
O. tenuiflorum [28][37]. When compared to the large soybean genome (1115 M)
[27][29][34,38], which also has 269 MADS box genes, and
Camelina sativa estimated the genome size of 785 Mb
[30][39], which has 384 MADS box genes. The reduced number of genes in some Lamiaceae members might be justified by the smaller genome size and/or more active genome size reduction after duplication events, since the whole genome duplication event is a main contributor for the genes’ number increment and diversification of species
[31][32][33][34][40,41,42,43]. The clustering of genes is observed in other transcription factor families, such as
Hox genes
[35][44]. This clusters might have risen through tandem gene duplication events
[18][36][18,45]. The high exon number in type II (MIKC) genes (5–15) compared to type I (1–9) is consistent with studies in other species, such as sesame,
Arabidopsis, rice, and soybeans
[37][27][29][32,34,38]. This also matches the more complex and versatile functions found in type II (MIKC) compared to type I (M-type)
[7][12][18][23][7,12,18,23].
The Mβ-type of type I MADS-box genes was absent in
C. americana; also, it was absent in
S. indicum and
U. gibba [27][34]. The absence of Mβ-type genes in these species, which are all members of the Lamiales order, is an indication of a close relationship between the Lamiacaea family (
C. americana and
S. indicum) and Lentibulariaceae family (
U. gibba) within the Lamiales order. The function of most Mβ-type genes in Arabidopsis is not fully understood, but some play important roles in the differentiation of female gametophyte
[37][38][32,46]. Either there is a different mechanism in
C. americana due to the lack of Mβ-type genes, or there was a redundancy in their function and other CamMADS protein can still fill their role in the protein network. Mβ genes were reported to be absent in rice and other monocots as well
[37][32], and the subgroup might have evolved as a lineage-specific clade.
CamMADS75 is an ortholog of
TM8 gene present in
S. lycopersicum,
S. indicum, and
A. trichopoda, but absent in
A. thaliana.
TM8-like genes were identified in gymnosperms and angiosperms. The pattern of genes expression in several different tissues and the lack of a clear associated phenotype related to TM8 deletion or overexpression render it difficult to pinpoint an exact function, and it could indicate that TM8-like genes are a clade of fast evolving genes
[39][40][31,47]. Its promoter region has elements involved in stress and drought response, jasmonate and gibberellin response elements, and the GCN4_motif, which is involved in endosperm expression. Further molecular and systematic analysis of
C. americana CamMADS75 TM8 ortholog could provide useful information on the function of this elusive gene.
In general, in each studied tissue, there was at least one
CamMADS active gene being expressed. This hints to the importance and diversity in functions of this gene family in the
C. americana plant. Type II
CamMADS genes have an overall higher expressivity across all tissues compared to type I CamMADS. This is expected and can be justified, as the MIKC type genes are more complex and diverse than the M-type genes
[7][12][18][23][7,12,18,23].
CamMADS51, an ortholog of
Arabidopsis PISTILLATA (
PI) gene, has the highest expression level in closed flower sample, along with
CamMADS4, an ortholog of the
Arabidopsis APETALA3 (
AP3) gene. This is reasonable for the key roles that
PI and
AP3 plays during the florogenesis
[19].
CamMADS64 an ortholog of
Arabidopsis AG gene was highly expressed in whole fruit sample
[41][42][43][48,49,50].
CamMADS68, an ortholog of
Arabidopsis FLC, was suppressed during flower development, since it is a suppressor of flowering, implying that it has a conserved function in
C. americana [14][21][23][14,21,23].
CamMADS47 and
CamMADS60, members of the MIKC*-S subgroup, were expressed in flower tissues, hinting to a possible conserved function during male gametophyte development
[15][18][22][15,18,22].
Some of the MIKC group genes were expressed in root, stem, and leaves tissues in addition to their key role in florogenesis. This is consistent with the patterns of MADS-box gene expression in
A. thaliana where several genes are involved in biological processes other than florogenesis.
A. thaliana FLM and
FLC are involved in vernalization.
FLC, SVP, and
SOC1 are involved in drought response; the presence of
cis-acting regulatory elements in the promoter regions involved in drought response in ortholog
CamMADS implies a possible conservation of functions.
ANR1 and
AGL21 are involved in lateral root formation; both respective ortholog
CamMADS38 and
CamMADS44 are expressed in the
C. americana root.
SOC1, AGL21, and
FLC are involved in abscisic acid (ABA) and gibberellin (GA) metabolism
[8]; their orthologs in
C. americana have the
cis-acting regulatory elements involved in ABA and gibberellin GA metabolism. These functions might be conserved in
C. americana as well, for the orthologs expression profile can justify the presence of these subgroups’ members in the plants’ respective tissues.
All promoters had at least one of the GAGA (C-box) elements, which is required for the normal expression of a wide range of different genes; it can facilitate activation by a remote enhancer. Cytokinin response elements was shown to interact with the C-box in
A. thaliana [44][45][51,52]; a similar mechanism could be at play here in
C. americana.
In addition to the upstream promoter region, the first intron of each
CamMADS genes—when available—was scanned for
cis-regulatory elements, all introns contained TATA-box and/or CAAT-box elements, in addition to few other elements found in the upstream promoter region. This might point to a possible role of the intronic region in gene regulation in
CamMADS genes
[46][53].
In
A. thaliana, most type I MADS-box genes are expressed weakly, and their function is not as clear as type II MADS-box genes. The expression of
CamMADS17 and
CamMADS20 genes in the flower bud tissues suggested that they might have a role in flower development. This is in line with what some studies suggest that type I genes are involved in
A. thaliana reproduction and development
[37][38][32,46]. It is worth noting that some genes appear to have no expression in any
C. americana tissue. This might be due to the fact that some of the MADS-box genes are activated in response to certain environmental cues and abiotic stress responses, such as: temperature, salinity, drought, and wound response
[8][9][8,9]. Another possibility is that these gens might be pseudogenes being transcribed to RNA at a very low level, with no function, or might be redundant genes going through neofunctionalization process. The presence of two or more orthologs of
A. thaliana MADS-box genes either reflect a functional redundancy, or some of these genes might have acquired new functions, or they might differ in response to different environmental cues to fine tune gene expression level in
C. americana. The
C. americana genome analyses have revealed three putative whole-genome duplication events
[2]. Gene duplication events were also recently reported in mints
[34][43]. Whole genome duplication events might have contributed to MADS-box gene family expansion.
3. Conclusions
Based on the latest C. americana genome sequence and RNA-Seq data, 78 CamMADS genes were identified using bioinformatics tools and were classified as M-type (Mα and Mγ) and MIKC-type (MIKC* and MIKCC) according to their evolutionary relationships and protein structure characteristics. The Mβ-type of type I MADS-box genes was absent in C. americana, as it was absent in S. indicum and U. gibba. The absence of Mβ-type genes in these species, which are all members of the Lamiales order, might hint to a close relationship between Lamiacaea family and Lentibulariaceae family within the Lamiales order. Gene structure analysis revealed that type II genes contained a greater number of exons than did type I genes. The expression pattern of CamMADS genes in eight tissues, and the cis-regulatory element analysis of their promoter regions suggest an overall conservation of some of the abiotic stress responses and the ABCDE model of flower development functions to some extent in C. americana. The absence of certain elements and the change in expression patterns could point to some MADS-box genes being diversified in functions, or simply to a redundancy in function. This study will help guide future molecular protein–protein interaction analysis studies to confirm the interactions and functions of each of the CamMADS genes presented.