Prior studies examining Western cohorts have identified five bacterial community state types (CSTs), of which type I–III and V are
Lactobacillus-dominant while CST IV is comprised of polymicrobial communities
[8]. However, studies examining the FRT bacteriome in African cohorts have revealed a different pattern in hierarchical clustering analysis, with more community groupings of high diversity bacteriomes, consistent with the increased prevalence of high diversity FRT bacteriomes in this population
[4][8][18]. To further assess the bacterial communities within the FRT bacteriome seen in African women, 253 vaginal swabs were processed and underwent 16S rRNA gene amplicon sequencing of the V3–V4 region
[19]. A total of 11 samples did not amplify, and 4 failed to achieve sufficient reads for downstream analysis, leaving 238 samples for bacteriome analysis and sequence identification using QIIME2 (
Figure 1A).
Figure 1. Bacteriome profiling by community group of self-collected vaginal swabs from South African women. (A) Relative abundance of the 16 most frequently identified bacterial taxa (y-axis) by sample (x-axis), grouped by community group (CG), bacterial vaginosis (BV) status, visit number, highly active antiretroviral therapy (HAART) status, and human immunodeficiency virus (HIV) status (color key shown). Percent abundance is indicated by gradient key. Using Ward’s linkage hierarchical clustering, samples clustered into five distinct bacterial community profiles termed CG. (B) Bacterial composition (color key shown) for each of the five CGs (x-axis) expressed as relative abundance (y-axis). (C) Bar plot showing the relative abundance of 16S rRNA copies per 10 ng total DNA (y-axis) of L. iners (blue), L. crispatus (purple), L. gasseri (pink), and L. jensenii (green) bacterial species as determined by qPCR of FRT samples (x-axis) that clustered into CG1.
The 16S samples were analyzed using VALENCIA
[20], a program developed to assign 16S vaginal samples to the commonly used CST communities
[8] and employs similarity scores ranging from 0 (no shared taxa) to 1 (all taxa shared and at the same relative abundance) to assess assignment confidence. VALENCIA successfully assigned
L. iners-dominant samples to CST III (
L. iners-dominant CST) with high confidence (similarity score 91%). However, the remaining CST assignments were low confidence with an overall similarity score average of 29.7%. This likely reflects the bias of VALENCIA toward
Lactobacillus-dominant samples. Because samples within this cohort were predominantly high diversity, Ward’s linkage hierarchal clustering was used to classify samples into distinct bacterial communities by composition and relative abundance.
Hierarchical clustering analysis of all visits by community abundance and composition identified five unique bacterial community clusters named herein bacterial community groups (CGs), which were distinguished by
Lactobacillus-dominance (CG1 and 2) or higher diversity bacteriomes (CG3–5;
Figure 1A). Similar to other African cohorts
[4][8][18], the majority (
n = 173, 72.7%) of subjects had high diversity FRT bacteriomes with a low prevalence of
Lactobacillus-dominant bacteriomes. Samples that were dominated by a single species, defined as >50% community composition, made up 42.0% of all samples and were mostly clustered with CG1 and 2, while the remaining samples showed no individual dominant species and mainly clustered in CG3–5. CG1 (
n = 15, 6.3%;
Figure 1B), a low diversity FRT bacteriome, was comprised almost exclusively of
Lactobacillus species (78%) that were unable to be further delineated by 16S rRNA gene amplicon sequencing. To further define the predominant
Lactobacillus constituents of CG1, qPCR of 16S rRNA gene sequences from the key FRT
Lactobacillus species
L. iners, L. crispatus, L. gasseri, and
L. jensenii [9][18] was performed, revealing that approximately 50% of samples in CG1 were
L. crispatus-dominant, similar to what has been previously described as CST I
[8] (
Figure 1C). One vaginal swab in CG1 contained sufficient volume to only perform qPCR for
L. iners and
L. crispatus, and
L. crispatus was most abundant (not shown).
L. jensenii tended to predominate when present (
Figure 1C). CG2 (
n = 54, 22.7%) was
L. iners-dominant, with a few samples showing notable amounts of
Gardnerella and
Prevotella as well (
Figure 1B). The identified high diversity CGs 3–5 were compositionally different from the conventional CST4 subtypes
[8]. The second to largest and most diverse group, CG3 (
n = 64, 26.9%), consisted mainly of
Gardnerella, Prevotella, and
L. iners (
Figure 1B). The smallest high diversity CG was CG4 (
n = 30, 12.6%), in which
Shuttleworthia and
Gardnerella were predominant. CG5 contained the largest number of samples (
n = 75, 31.5%) and was dominated by
Sneathia and
Prevotella (
Figure 1B). CG3, CG4 and CG5 exhibited significantly higher alpha diversity than CG1 and 2 (
Figure 2A;
p = 0.0001, 0.0065, and <0.0001, respectively). Beta diversity significantly differed between these five bacterial CGs (
Figure 2B;
p = 0.0001).