The growth of complexity in evolution is a most intriguing phenomenon. Using gene phylostratigraphy, we showed this growth (as reflected in regulatory mechanisms) in the human genome, tracing the path from prokaryotes to hominids. Generally, the different regulatory gene families expanded at different times, yet only up to the Euteleostomi (bony vertebrates). The only exception was the expansion of transcription factors (TF) in placentals; however, we argue that this was not related to increase in general complexity. Surprisingly, although TF originated in the Prokaryota while chromatin appeared only in the Eukaryota, the expansion of epigenetic factors predated the expansion of TF. Signaling receptors, tumor suppressors, oncogenes, and aging- and disease-associated genes (indicating vulnerabilities in terms of complex organization and strongly enrichment in regulatory genes) also expanded only up to the Euteleostomi. The complexity-related gene properties (protein size, number of alternative splicing mRNA, length of untranslated mRNA, number of biological processes per gene, number of disordered regions in a protein, and density of TF–TF interactions) rose in multicellular organisms and declined after the Euteleostomi, and possibly earlier. At the same time, the speed of protein sequence evolution sharply increased in the genes that originated after the Euteleostomi. Thus, several lines of evidence indicate that molecular mechanisms of complexity growth were changing with time, and in the phyletic lineage leading to humans, the most salient shift occurred after the basic vertebrate body plan was fixed with bony skeleton. The obtained results can be useful for evolutionary medicine.
Summary of the Main Points
(1) Methodologically, we introduced the distinction between the ‘deep’ and ‘shallow’ phylostratigraphy, compared these approaches, and discussed them in the context of the classic ‘lumping vs. splitting’ problem (thereby extending it to the molecular field). This helps explain the controversial gene datings that were published previously.
(2) Surprisingly, although transcription factors (TF) originated in the Prokaryota while chromatin appeared only in the Eukaryota, the expansion of epigenetic factors predated the expansion of TF. Protein modifiers and epigenetic factors expanded at the unicellular evolutionary stage, whereas TF and signaling receptors expanded in the multicellular organisms. Because cancer is an atavistic shift to unicellularity, these observations suggest that protein modifiers and epigenetic factors can be (at least) as important for oncogenesis as TF and signaling receptors.
(3) The expansion of nervous system-related genes could create a misleading notion that the growth of complexity in the latest phylostrata was owing to these genes. However, this later expansion was due to olfactory receptors, which reflected only the elaboration of olfaction. Without olfactory receptors, the nervous system genes expanded only up to the Euteleostomi (bony vertebrates).
(4) Several lines of evidence suggest a salient shift in the evolution of complexity after the Euteleostomi. The expansions of regulatory gene families as well as the disease genes (indicating vulnerabilities in a complex organization and strongly enriched in regulatory genes) sharply declined after the Euteleostomi, with one paradoxical exception (explained below). The complexity-related gene properties (protein size, number of alternative splicing mRNA, length of untranslated mRNA, number of biological processes per gene, number of disordered regions in a protein, and density of TF–TF interactions) rose in multicellular organisms and declined after the Euteleostomi or earlier. At the same time, the speed of protein sequence evolution sharply increased.
(5) The above-mentioned exception is a unique expansion of TF families of deeply unicellular origin in the placentals, whereas the earlier expansion of TF was of purely multicellular origin. This inversion of TF origins and expansions creates a paradox, which can be explained as follows. It is likely that expansions of deeply unicellular TF occurred in the past, beginning from early times, but now we can see only their traces. They were situational expansions that occurred for the purpose of coping with invading transposons (‘nuclear immunity’) and deteriorated after the diminishing of the activity of cognate transposons. Only a minor part of these TF were adopted for other organism’s needs and retained (we showed the remnants of earlier expansions). Therefore, the recent expansion of deeply unicellular TF was related to clades rather than to grades of complexity. Thus, the expansions of olfactory and ‘nuclear immunity’ genes seem to be two possible pitfalls in the analysis of the impact of the gene repertoire on the growth of complexity.
We concluded that the molecular mechanisms of complexity growth were changing with time, and after the basic vertebrate body plan was fixed with bony skeleton (in the Euteleostomi), there was a salient shift in these mechanisms within the phyletic lineage leading to humans. The first growth wave involved the expansion of protein modifiers (including epigenetic factors) that represented regulation at the level of gene products. They showed a plateau at the Eukaryota-Metazoa (unicellular sponges) and a depression after the Chordata. The second wave involved signaling receptors (representing intercellular communication), closely followed by a wave of transcription factors (probably representing diversification of cell types). These genes showed significant enrichment at the Bilateria and decline after the Euteleostomi. The wave of the nervous system genes generally coincided with the growth of other regulatory genes. This was probably because the nervous system should integrate all the other complexity waves at the organismal level. After the Euteleostomi, we revealed no significant expansions in gene groups that were related to general complexity (albeit there was an elaboration of olfaction and ‘nuclear immunity’). The overall picture of the main changes in the regulatory gene repertoire is shown in Figure 3.
The post-Euteleostomi complexity growth probably proceeded mostly via changes in protein sequences (the speed of which sharply increased) and in gene expression, with small-scale changes in the gene repertoire. The changes in gene expression could have been affected by a multitude of means, including changes in cis-regulatory elements, non-coding RNA, the structure of chromatin, and complex interactions of TF between themselves and with TF cofactors and epigenetic factors. The changes in TF complexes could have affected both transcription initiation and promoter-proximal pausing [34]. The invasion and propagation of transposable elements could also have changed the gene expression patterns [35][36]. Furthermore, there could have arisen intricate combinations of variation in terms of non-coding DNA, chromatin structure, TF, and epigenetic factors (with a growing space of splicing variants), followed by elaborate (post-) translational regulation.
Beginning from a certain threshold number of the basic genetic elements (protein-coding genes), the combinatorial space of protein products of these elements may have become large enough such that the further growth of organismal complexity could proceed without significant expansion in the genetic elements’ base. Natural selection explores this combinatorial space via changes in gene expression and protein interactions. Finally, the expansion of regulation to the higher organization level (from regulatory genes to regulatory cells) seemed to be involved, which was manifested in the increase in the number of neurons (approximated by encephalization quotient).
All these possibilities cannot so far be traced, in the evolutionary path leading to humans, in a consistent and quantitative way that is similar to that of protein-coding sequences. Therefore, we should include a caveat that our approach embraces only a part of the possible mechanisms of complexity growth. In the future, other aspects can be added to this picture.