2. Omics Studies for nsLTPs
2.1. Understanding nsLTPs from Previous Studies
Previous studies (
Table 1) involving nsLTP gene search and plant transcriptional expression analyses were scrutinized. Based on this search, different views on nsLTP abundance, types (
Table 1), and functions were found. The seminal work by Edstam et al.
[1] proposes a greater nsLTP abundance in terrestrial plants and their absence in green algae (chlorophytes and charophytes) (
Table 1), suggesting that nsLTP genes evolved soon after the terrestrial environment conquest. In favor of the mentioned proposition, a limited number of representatives and types of nsLTPs are observed when comparing lower plants (as bryophytes and lichens) to spermatophytes, which indicates the emergence of new types of nsLTPs in higher plants
[1][19][20].
Table 1. Some previous studies involving nsLTP mining in plant omics data, including data sources, identification strategy, amount per species, and classes retrieved.
In the barley (
Hordeum vulgare) genome, 70 HvnsLTPs (
Hordeum vulgare nsLTPs) were identified (
Table 1), which were classified into five groups (‘1’, ‘2’, ‘C’, ‘D’, and ‘G’)
[27]. Each of these genes shared common structures. Considering their expansion mechanisms, the 70 HvnsLTP genes presented 15 tandem duplication repeats (encompassing 36 genes). The HvnsLTPs’ baseline expression profiles in different tissues across developmental stages indicated that this group of genes might perform a variety of functions
[27]. In addition, the differential expression profile indicated that HvnsLTP genes might have diverged in terms of the cis-regulatory elements of their promoters
[27].
For the Solanaceae family, there are data on nsLTPs for potato (StnsLTPs,
Solanum tuberosum nsLTPs). Li et al.
[29] found 83 StnsLTP genes in potato genomes, categorized into eight types, namely, ‘1’, ‘2’, ‘4’, ‘5’, ‘7’, ‘8’, ‘12*’, and ‘13*’ (
Table 1). Chromosome distribution and collinearity analyses suggested that the expansion of the StnsLTP gene family was enhanced by tandem duplications. In turn, Ka/Ks analysis showed that 47 pairs of duplicated genes have gone through purifying selection during evolution. StnsLTP genes were expressed mainly in younger tissues. Furthermore, StnsLTPs contained a large number of stress-responsive,
cis-acting elements in their promoter regions. These results indicated that StnsLTPs might play significant and functionally varied roles in potato plants.
In
Arachis duranensis, Song et al.
[31] discovered 64 AdnsLTPs (
Arachis duranensis nsLTPs) genes, which were divided into six groups (‘1’, ‘2’, ‘C’, ‘D’, ‘E’, and ‘G’;
Table 1), anchored over nine chromosomes. Considering the AdnsLTPs’ expansion mechanisms, the study revealed some gene clustering by tandem duplication, while other family members showed segmental duplication in several chromosomes. Following treatments with high salt (NaCl, 250 mM), PEG, low temperature (4 °C), and abscisic acid, the AdnsLTPs’ expression levels were altered. Three AdnsLTPs were linked to nematode infection resistance. The DOF and WRI1 transcription factors were suggested as potential controllers of the AdnsLTP response to nematode infection.
Fang et al.
[30] found 330 TansLTPs (
Triticum aestivum nsLTPs) genes in wheat (
T. aestivum) (
Table 1). Such a quantitative result can be considered an update of the 461 nsLTP loci found by Kouidri et al.
[26] for the same species. To date,
T. aestivum is the plant with the highest number of nsLTPs. The TansLTPs clustered into five groups (‘1’, ‘2’, ‘C’, ‘D’, and ‘G’) by phenetic analysis (
Table 1). Gene structure and MEME pattern analyses showed that different groups of nsLTPs had similar structural compositions. Chromosome anchoring revealed that all five groups were distributed on 21 chromosomes. Furthermore, 31 gene clusters were identified as tandem duplications, and 208 gene pairs were identified as segmental duplications. Data mining of RNA-seq libraries, covering multiple stress conditions, showed that the transcript levels of some of the nsLTP genes could be strongly up-regulated by drought and high salt (NaCl, 250 mM) stresses.
In another context, Liang et al.
[35] scrutinized the
Brassica napus pangenome for BnnsLTPs (
B. napus nsLTPs). These authors identified 246 BnnsLTP genes, divided into five groups (‘1’, ‘2’, ‘C’, ‘D’, and ‘G’;
Table 1). Different BnnLTP genes were identified among the eight studied
B. napus varieties (ZS11, Gangan, Zheyou7, Shengli, Tapidor, Quinta, Westar, and No2127). BnnsLTPs showed different duplication patterns in different varieties. Cis-regulatory elements that respond to biotic and abiotic stresses were anchored at all BnnsLTP genes. Finally, RNA-Seq analysis showed that the BnnsLTP genes were involved in responses to the fungus
Sclerotinia sclerotiorum infection.
Vangelisti et al.
[36], studying sunflower (
Helianthus annuus) HansLTPs (
Helianthus annuus nsLTPs), observed the existence of four (‘1’, ‘2’, ‘3’, and ‘4’) groups (
Table 1). The authors did not explicitly classify the observed groups according to the available classification systems. The HansLTPs (101 in total) were further examined by looking into potential gene duplication sources, which revealed a high prevalence of tandem- in addition to whole-genome duplication (WGD) events. This finding is consistent with polyploidization events that occurred during the evolution of the sunflower genome. Three (‘1’, ‘3’, and ‘4’) of the four HansLTP groups responded uniquely to environmental cues, including auxin, abscisic acid, and the saline environment. Interestingly, sunflower seeds were the only source of expression for HansLTP group ‘2’ genes.
In line with the reports mentioned above and other works in Table 1, it is observed that the nsLTP genes act multifunctionally and show genetic variability even within accessions of the same species. nsLTPs are present in a wide range of plants, showing gene expression in different tissues, developmental stages, and stressful conditions.
2.2. Filling the Gap: Discovering and Classifying nsLTPs in New Plant Genomes
To provide genomic information for nsLTPs in plants not yet studied in the previous topic, and to update nsLTP data for some species with improved genome versions that have been made available, the following plant genomes were scrutinized (Table 2): (1) Marchantia polymorpha; (2) Ceratopteris richardii; (3) Selaginella moellendorffii; (4) Thuja plicata; (5) Gossypium hirsutum; (6) Lactuca sativa; (7) Manihot esculenta; (8) Mimulus guttatus; (9) Populus trichocarpa; (10) Sinapis alba; (11) Solanum tuberosum; and (12) Spinacea oleracea. The mentioned species were chosen to diversify the number of analyzed clades.
Table 2. Studied species and number of recovered nsLTPs from the three applied mining approaches.
Three distinct and complementary strategies were used to retrieve nsLTPs in the genomes selected. The nsLTP exhaustive mining applied to 12 evaluated genomes (Table 2) returned 258 candidate sequences by BLASTp search, 344 by the cysteine pattern-based strategy (RegEx mining), and 1191 by the machine learning approach (HMMER tool).
The machine learning approach (HMMER tool) recovered a more comprehensive number of sequences considering nsLTP domains. Strategies based on machine learning are emerging as the future of DNA/RNA/protein sequence identification and bioinformatics, in general. The search with the RegEx approach was more restrictive, on the other hand, and not as accurate as the local alignment (BLASTp) method when observing the presence of the conserved nsLTP domain. However, it is worth mentioning that the BLASTp strategy did not recover sequences from the hypothetical proteomes of C. richardii, S. moellendorffii, and M. polymorpha (Table 2). When working with sequences as diverse as nsLTPs, it is advisable to combine several mining methods to increase the chances of finding the maximum number of sequences combined with subsequent data curation. Herein, however, almost all of the sequences retrieved by the BLASTp and RegEx approaches were also retrieved by the machine-learning-based strategy .
From the analyzed species pool (
Table 2),
S. moellendorffii (21) and
G. hirsutum (218) presented, respectively, the lowest and highest number of nsLTPs. The search confirmed the tendency of these peptides to be encoded by large gene families: nine (75%) of the 12 analyzed species had more than 50 nsLTP loci in their respective genomes (
Table 2). There was no correlation (r = 0.10) between genome size and the number of nsLTPs in the analyzed species pool (
Table 2). However, angiosperms and the analyzed gymnosperm have a higher amount of nsLTPs than pteridophytes and the analyzed bryophyte (
Table 2). This fact may be associated with the evolution of these basal groups, as shown by Edstam et al.
[1]. Pteridophytes and bryophytes are phylogenetically closer to green algae than the other clades analyzed. This scenario is possibly responsible for the nsLTPs’ reduced number in the mentioned clades since no nsLTPs were identified in green algae
[1].
The work also updated the data for the nsLTP content in cotton (
G. hirsutum), potato (
S. tuberosum), common liverwort (
M. polymorpha), and spikemoss (
S moellendorffii) from the respective updated genome versions (Ghirsutum_527_v2.1, Stuberosum_686_v6.1, Mpolymorpha_320_v3.1, and Smoellendorffii_91_v1.0 | Phytozome database). While Li et al.
[6] and Li et al.
[29] identified, respectively, 91 and 83 nsLTP loci for the
G. hirsutum and
S. tuberosum (
Table 1), 218 and 105 nsLTP loci were identified (
Table 2). For
M. polymorpha and
S. moellendorffii, 21 and 36 nsLTPs were found, respectively (
Table 2), in the present study (compared to 13 and 23 nsLTPs reported by Fonseca-García et al.
[34]; (
Table 1)). Genome assemblies are never perfect since they are models for the actual genome. It is hard to completely rule out all potential technological or algorithmic flaws, and no single assembly can accurately capture all the variety within populations of a species. Thus, published genomes that have an active research community are continuously improved. Such modified and updated versions are potential sources for changing minor paradigms.
Considering the 1191 nsLTPs’ categorization, the researchers observed nine large groups, of which five could be identified (‘nsLTL1’, ‘nsLTL2’, ‘nsLTLG’, ‘nsLTLD’, and ‘nsLTLC’; Figure 1), in addition to four distinct groups denominated ‘Unknown 1-4’ (Table 2).
Figure 1. Neighbor-joining tree (based on 8CM nsLTP domains) of 1191 non-specific lipid transfer proteins (nsLTPs) predicted in Marchantia polymorpha, Ceratopteris richardii, Selaginella moellendorffii, Thuja plicata, Glycine max, Gossypium hirsutum, Lactuca sativa, Manihot esculenta, Mimulus guttatus, Populus trichocarpa, Sinapis alba, Solanum tuberosum, and Spinacea oleracea genomes. All amino acid sequences were aligned using ClustalX2. The obtained result was visualized with the iTOL program. Legend: the numbers (1–4) inside the circles on the edge of the tree indicate different nsLTP subgroups of a given nsLTP group; at the rectangles, the nsLTP group classification is available.
Group separation, obtained from the implemented neighbor-joining (NJ) approach, reprised the nine groups in the Edstam et al.
[1] classification. The strategy of performing NJ analysis from 8CM domain sequences (as performed by Edstam et al.
[1] and Xue et al.
[38]) promoted better group discrimination. The NJ tree derived from the complete nsLTP sequences (8CM + upstream and downstream regions) did not present such a level of resolution (only six groups were formed.
As will be seen in the ‘How structural nsLTP proteomics correlates with current nsLTP classification systems?’ section, nsLTP sequences show high variability in the amino acid sequence. Outside the 8CM region, the mentioned variability is accentuated, which causes greater noise in the distance analysis, resulting in an efficiency reduction in the formation of ‘true’ groups. Although reduced, compared to other nsLTP regions, the variability of the 8CM region was also a significant factor in the analysis using the NJ method. This is evidenced by the low bootstrap values of the first branches formed and in nsLTP NJ analysis for a manifold of species (see works by Fang et al.
[30] and Vangelisti et al.
[36], among others, in
Table 1). Bootstrap values reflect the proportion of trees/replicates in which a recovered grouping is presented (in other words, a measure of support for that group). Despite reduced bootstrap values, the obtained tree topology was in accordance with the composition of the characterized seed sequences used to perform the nsLTP classification.
Regarding the nsLTPs’ composition in the scrutinized species,
L. sativa, despite not having the highest amount of nsLTPs in its genome (
Table 2 and
Table 3), was the species which presented the greatest variety of these peptides (
Table 3), with at least one member of each of the nine groups found.
M. polymorpha, in turn, had the lowest variety of nsLTPs (presenting members only for the ‘LTP1’, ‘LTP2’, ‘LTPD’, ‘LTPG’, ‘Unknown 1’, and ‘Unknown 4’ groups;
Table 3), a fact that is possibly associated with its nsLTPs’ small genomic amount. Results for this species, however, indicated that it presented nsLTP groups (e.g., ‘LTP1’, ‘LTP2’, ‘Unknown 1’, and ‘Unknown 4’) not yet identified in previous studies (i.e., in Edstam et al.
[1] and Fonseca-García et al.
[34]).
Table 3. nsLTP quantification by categories in the 12 analyzed plant genomes.