The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated cas) systems constitute the adaptive immune system in prokaryotes, which provides resistance against bacteriophages and invasive genetic elements. The landscape of applications in bacteria and eukaryotes relies on a few Cas effector proteins that have been characterized in detail. However, there is a lack of comprehensive studies on naturally occurring CRISPR-Cas systems in beneficial bacteria, such as human gut commensal Bifidobacterium species. In this study, we mined 954 publicly available Bifidobacterium genomes and identified CRISPR-Cas systems in 57% of these strains. A total of five CRISPR-Cas subtypes were identified as follows: Type I-E, I-C, I-G, II-A, and II-C. Among the subtypes, Type I-C was the most abundant (23%). We further characterized the CRISPR RNA (crRNA), tracrRNA, and PAM sequences to provide a molecular basis for the development of new genome editing tools for a variety of applications. Moreover, we investigated the evolutionary history of certain Bifidobacterium strains through visualization of acquired spacer sequences and demonstrated how these hypervariable CRISPR regions can be used as genotyping markers. This extensive characterization will enable the repurposing of endogenous CRISPR-Cas systems in Bifidobacteria for genome engineering, transcriptional regulation, genotyping, and screening of rare variants.
Clustered regularly interspaced short palindromic repeats (CRISPR) and accompanying CRISPR-associated (cas) genes constitute the adaptive immune system in bacteria, which provides resistance against bacteriophage predation [1]. This immunity is orchestrated in three stages. During the first stage, adaptation, snippets of foreign DNA are copied and incorporated into bacterial genomic CRISPR arrays. Next, during the expression stage, the CRISPR array is transcribed and processed to generate mature CRISPR RNA (crRNA) [2,3][2][3]. During the last stage, interference, the crRNA guides Cas nuclease(s) for selective target recognition of complementary invasive nucleic acids and subsequent cleavage [4]. Due to the rapid increase in sequencing data and subsequent rise in CRISPR-Cas diversity, the classification of CRISPR-Cas systems is constantly evolving [5]. To date, two classes, six types, and 33 subtypes of CRISPR-Cas systems have been reported. With thousands of CRISPR-Cas systems occurring in nature across genera and species, only a handful have been characterized in detail and repurposed for various applications, notably genetic engineering and transcriptional regulation, among others. Compared to the exponential expansion of CRISPR-Cas applications in eukaryotes, the tremendous application potential in prokaryotes has yet to be fully exploited, particularly in key species related to human health and in food microorganisms. Noteworthy, many human commensal bacteria, probiotic strains, and other industrial workhorses harbor CRISPR-Cas systems in their genomes, allowing the repurposing of these systems for diverse applications without the need of heterologous expression [6]. However, the lack of a fundamental understanding by the scientific community of CRISPR-Cas biology in general, along with the repurposing of endogenous systems in particular, has represented a bottleneck which limits broad implementation.
Bifidobacteria are among the most abundant natural inhabitants of the human gastrointestinal tract, particularly in the infant gut [7,8][7][8]. The compositions of infant gut microbiomes differ significantly depending on the delivery and feeding methods, consisting of Enterobacteriaceae (around 30%), Bifidobacterium (around 10%), some Lactobacillus (around 3%), and other diverse bacteria [9]. Their presence is strongly associated with multiple health-promoting effects, although the exact modes of action are yet to be fully revealed. It has been demonstrated that bifidobacteria can modulate the host immune response [10[10][11],11], reduce ulcerative colitis and irritable bowel syndrome [12], and ferment non-digestible complex carbohydrates to produce beneficial short-chain fatty acids such as butyrate [13]. Due to the potential health benefits, some strains of selected Bifidobacterium species have been commercialized as probiotic products [12] [12] which are defined as “live microorganisms that, when administered in adequate amounts, confer health benefits on the host” [14]. Extensive research efforts are underway to study the genomics of bifidobacteria, aiming to discover the underlying mechanisms of their potential health benefits, as well as the genetic relatedness among strains isolated from different hosts and environments [15]. Recent advances in high-throughput sequencing technologies have greatly expanded the availability of bifidobacterial genomes, along with other functional omics data such as transcriptomes and proteomes. These studies have provided insights into the abundance of carbohydrate metabolism systems, adaptations to the glycan-rich gut environment [16], and the diversity of restriction/modification systems [17]. The increase of metagenomic data, together with a new generation of bioinformatic tools to identify and characterize CRISPR-Cas systems [18], has recently allowed for a better understanding of these systems and a wider range of identification across datasets.
CRISPR-Cas based technologies have been gradually implemented for genome engineering in Gram-positive bacteria that are recalcitrant to traditional genetic modification, including Clostridium species [19,20][19][20], Lactococcus lactis [21], and several species of Lactobacillus [6,22,23][6][22][23]. Despite the abundance of CRISPR in bifidobacteria, there is a paucity of reports investigating and developing CRISPR applications in bifidobacteria [24,25] [24] and currently no reports on CRISPR-Cas based genome engineering in bifidobacteria.
In this study, we presented a comprehensive screening of CRISPR-Cas systems in all publicly available Bifidobacterium genomes in the NCBI RefSeq database. We observed diverse CRISPR-Cas systems spanning five different subtypes, with large and distinct CRISPR loci containing a myriad of spacers that provided insights into bifidobacteria strain evolution and predator-prey dynamics. We further characterized the essential elements such as crRNA, tracrRNA, and PAM sequences for all five CRISPR subtypes in different species. This work lays the foundation for repurposing CRISPR-Cas systems in bifidobacteria for a variety of applications ranging from genome editing and transcriptional control, to rare variant screening and genotyping. Altogether, we envision the wide utilization of CRISPR-Cas systems to expedite the development and formulation of next-generation Bifidobacterium probiotics.