Forensic microbiomics is a promising tool for crime investigation. Geolocation connects an individual to a certain place or location by microbiota.
The microbiome is not a novel concept, given that the term was developed during the late 1980s by Whipps et al. 
to refer to a group of microorganisms living in a defined area. The common factor uniting the various fungi and bacteria in a particular location is the location itself. Today, the term has evolved into two different concepts: ‘microbiota’ is a term used for a group of microorganisms or viruses that are centred and interact in a certain area, and ‘microbiome’ is a term for the genomic study of a community of microorganisms 
. However, the microbiome has been defined as a certain microbial community that lives in a defined area with certain physical and chemical properties (it includes the microorganisms and their environment), and the microbiota includes an assembly of microorganisms that belong to different kingdoms, including their microbial structures, metabolic reagents or products and mobile or relic DNA/RNA elements. Thus, the original definition as stated by Whipps appears to be the most accurate 
Various methodologies and strategies have been developed to describe and classify microorganisms. Prior to 1960, the methodology was mostly based on morphology, metabolic requirements or pathogenicity. In 1960, a numerical taxonomy was introduced into bacterial systematics with the mol% guanine–cytosine content of DNA as a quantitative measurement. Therefore, no more than 2–3% of the variation in guanine–cytosine content was expected in the same species of microorganisms. Chemotaxonomy, the description of new species based on the study of the composition of cell walls or bacterial cytochromes, became common from 1960 to 1980; however, it was supplanted by the arrival of 16S ribosome DNA or rDNA (see Figure 1
) gene sequencing during the mid-1990s. This approach implied that strains with less than 98.7% sequence similarity were a new species. Given that 16S rDNA is easily isolated, ubiquitous and constrained (constraints are mechanisms that limit or restrict adaptative evolution), it is commonly studied; that is the reason why it is the most common approach in literature 
. Most recently, the introduction of high-throughput technologies, commonly known as next generation sequencing (NGS), allowed whole genome sequencing, in which new species are defined by the comparison between two chromosomes 
Figure 1. Bacterial ribosome and 16S rRNA.
Currently, there is no official or recognised system for the classification of bacteria; however, the most commonly used system is the polyphasic approach, which includes phenotypic, chemotaxonomic, genotypic and phylogenetic data 
. Microbiologists use Linnaeus’s binomial naming system to designate microorganisms, with Proteobacteria
divided into seven orders: Chromatiales
. Each order includes several genera, and each genus a variety of species; for example, the family Enterobacteriaceae
(from the order Enterobacteriales
) includes the genera Enterobacter
and Yersinia 
Given the importance of the microbiota, the Human Genome Project led to the Human Microbiome Project, whose main objectives are creating a draft database of the human-associated microbiome by 16S rRNA sequencing, studying individuals who represent specific clusters and analysing global human microbiome diversity 
. As an example of the large variety of microbiota that live in the human body, the most common genera in stomach, small and large intestine, oral cavity, male and female urogenital system and skin are shown in Figure 2
Most common bacteria genera in the different parts of the human body 
Forensic microbiology is a fairly new field in the forensic sciences, and it has been developing since the terrorist attacks in the United States in 2001 due to the fear of a possible biological attack. Forensic microbiologists were concerned with developing tools to identify bioweapons and those who use them 
. Since then, and thanks to major developments in sequencing technologies, its applications are growing rapidly 
. There are currently three main areas of interest in forensic science 
Identification. The microbiome has the potential to identify an individual in the population based on their characteristic microbials. It appears to be possible to identify the items a person has touched, and therefore to define biogeographical patterns in the items.
Post-Mortem. Interval Estimation. Research shows that there are distinctive microorganisms that can be sequenced at various time points and body locations during decomposition.
Geolocation. Microbiota differ in composition across geographical locations due to climate, rainfall, altitude, soil and energy sources in the environment; thus, the knowledge of specific bacteria composing a certain area would could link a person or item to a certain place.
2. Forensic Microbiome as a Tool for Geolocation
“Every contact leaves a trace” is probably the most important axiom in forensic science, given that it was first established by Locard during the early 20th century. This statement has been applied by forensic scientists since then in all forensic fields, and it can also be applied to microbiome studies 
. If a certain place contains a characteristic microbiota that is different from other locations, we can analyse a person’s microbiome and possibly establish where they have been, which is precisely the main principle of microbiome geolocation.
Several studies have been performed to characterise the urban and transit microbiome, demonstrating that certain areas of a city contain unique microbiome profiles 
. Along these lines, the Earth Microbiome Project, EMP (http://www.earthmicrobiome.org
) must be mentioned. It was created in 2010 to sample the whole planet’s microbial communities with the aim of understanding the biogeographic variations and principles that govern microbial communities by using standardised protocols and environmental descriptors in an open science model 
. The various samples and their connection by similarity (containing similar types of microbial communities) are shown in Figure 3
Soil microbiome samples collected by the Earth Microbiome Project by similarity 
2.1. Soil and Surface Microbiome
The literature has demonstrated that a whole city’s microbiome can be analysed by swab sampling of subway stations, public parks and waterways. Certain species have been found to be linked to certain areas of the city, with a degree of fluctuation observed in some genera during the day. However, an important issue was also discovered: many samples did not match any known organism 
, which calls attention to the importance of projects such as the Earth Microbiome Project. A combined effort to study the urban metagenome can be found in the Metagenomics and Metadesign of the Subways and Urban Biomes, (MetaSUB) International Consortium, which was created with the aim of helping with city planning, public health and architectural design matters 
. Moreover, in 60 cities across a three-year longitudinal study, it was established that there is geographic variation among microbial communities in type and density 
; thus, it is possible to create a map of the various microbiota that can be found in specific cities. Interestingly, it has been observed that a relationship can be established between a geographic metagenome and organisms’ diversity, acting as a type of ‘molecular echo’ 
. This molecular echo could be useful information for future correlations between the microbiome and forensic entomology. Recent advances in city microbiome studies suggest that certain species are especially useful for geolocation, given that some of them are invariably present in every studied city, thus, some genera was particular to each location 
2.2. In Vivo Microbiome
There are genera of microorganisms that allow researchers to assess a person’s geographical origin. For example, Helicobacter pylori extracted from gastric mucosa has been used to determine the geographical origin of unidentified Asian cadavers, resulting in three different clusters: East Asian, Western and Southeast Asian . Furthermore, studies focusing on the relationship between microbiota and diseases such as obesity have found differences between Colombians, Americans, Europeans, Japanese and South Koreans and their relative disposition to increased body mass index . These differences have also been found in studies conducted to evaluate the relationship between the microbiome and infectious diseases such as Plasmodium falciparum infection, finding again geographical differences among people in their stool microbiota . Other studies performed with human hair microbiota have found differences between samples from California and Maryland, and interestingly, scalp hair resulted in better prediction of geolocation than pubic hair .
Firmicutes and Bacteroidetes appear to have a certain pattern depending on the latitude. In a study conducted with healthy individuals’ gut microbiota, it was found that the Firmicutes and Bacteroidetes proportion differs with latitude: the proportion of Firmicutes is much higher in the Northern Hemisphere than in the Southern Hemisphere . The explanation of the differences in microbiota remains unclear, although there are three proposed models: host genes, the environment itself or host plasticity.
2.3. Machine Learning and Geolocation
Machine learning automates computers to make predictions based on data. Machine learning has been used in biomedical research, cancer diagnosis and with the human microbiome to predict categorical or numerical values by classification and regression, respectively 
. The program itself learns from each classification it makes, so the next classification contemplates the previous ones. There are numerous machine learning techniques available for the classification of the human microbiota 
, and random forest is one of them. It is the most commonly used technique in microbiome forensics 
. A random forest algorithm is a combination of tree predictors (a tree is a type of flux diagram in which every internal node is an attribute, the branch is a decision rule and every leaf a result). Each tree has the same distribution, and its values depend on a random vector sampled independently 
. Roughly, a random forest works as follows (see Figure 4
): a data set is introduced into the algorithm, which generates the statistically best decision trees for the given variables, and the algorithm is trained so it can learn from its successes and mistakes (as any other machine learning based algorithm). Then, a problem sample is given so the algorithm makes decisions with the various trees generated, ultimately giving a category result (for example, a country) based on a majority vote of the tree results.
Figure 4. Random forest prediction in microbiome Forensics.
Although random forest is an accurate and unbiased predictor that needs no rigid statistical assumptions of the target variable, it has some disadvantages: greater computational intensity with the increase in calibration data, high sensitivity of predictions to the quality of the input data and variations in obtained model interpretation 
. Several algorithms have been developed to make machine learning more accessible to forensic scientists. An example applied to microbiome geolocation is DeepSpace, which is based on deep neural network classifiers (algorithms of machine learning that assimilate data representation when they recognise, for example, a human face in a pixel image), which could correctly classify dust from different countries with a 90% accuracy just by using fungi data 
Several protocols for sampling, DNA extraction and amplification are available; however, given that swabs are a reliable technique and the DNA extraction methodology is crucial, a reduction in host DNA is recommended 
The Earth Microbiome Project has designed a protocol for collaborators who want to contribute samples. The protocol depends on the specific sample type, and is summarised in Table 1.
Sampling protocol from Earth Microbiome Project 
|Samples Should Be Collected Fresh and Then Frozen without Using Any Buffer or Solution.
|Split fresh sample into 2 mL tubes (10) with, at least, 200 mg biomass and store at −80 or −20 °C.
|Take 10 replicate swabs with no buffers or solutions and store in −80 or −20 °C
|Samples should be shipped with dry ice in an extruded polystyrene foam container or similar.
2.4.2. DNA Extraction
For good-quality environmental samples there are several commercial platforms for microbiome studies, all of which are magnetic beads based: KingFisher Flex Purification System (ThermoFisher Scientific, Waltham, MA, USA); epMotion 5075 TMX (Eppendorf, Hamburg, Germany); and Tecan Freedom EVO Nucleic Acid Purification (Tecan, Morrisville, NC, USA) 
. The platforms have been tested with a variety of samples, including faeces, oral, skin, soil and water. The various commercial DNA extraction kits available are shown in Table 2
. A special strategy has been developed for low-template microbiome samples, for as few as 50-500 cells, called KatharoSeq. It is based on Mo Bio PowerSoil and the QIAGEN Ultra Clean kit 
. Other kits not designed for the microbiome have been validated for forensic microbiome workflows 
; however, they present the challenge of not eliminating the non-bacterial DNA present in samples.
Table 2. Commercial kits for microbiome DNA extraction.
|MagMAX Microbiome Ultra Nucleic Acid Isolation Kit (ThermoFisher Scientific) 
|KingFisher™ Duo Prime, Flex and Presto
|Invitrogen PureLink Microbiome DNA Purification Kit (ThermoFisher Scientific) 
|QIAamp DNA Microbiome Kit (QIAGEN) 
|MO BIO’s PowerMag® Soil DNA Isolation Kit (QIAGEN) 
|4 × 96 or 32 × 12
The 16S amplification protocol was designed for prokaryotes, bacteria and archaea, given that it is an excellent phylogenetic marker, and it provides insight into both communities and individual microbial taxa. The protocol’s ability to relate trends of species to hosts or environments has been proven. The polymerase chain reaction primers were developed for the V4 region of 16S rRNA 
3. Challenges and Limitations
Microbiome forensics appears to be a highly promising field in forensic science, but there are still some hurdles to be overcome to be accepted as evidence in court, which is a primary goal of forensic scientists. They seek to prove that a person is or is not involved in a criminal event. Daubert v. Merrell Dow Pharmaceuticals (1993) subsequently laid these foundations in North American law and international laws regarding how science should be presented in court. More precisely, there are criteria for any science to be presented in court as evidence, including whether the technique has been tested in field conditions, whether it has been subjected to peer review, whether the rate of error is known, standardisation and whether it has been generally accepted in the scientific community 
. In addition, the calculation of the likelihood ratio, recommended as a best practice by the European Network of Forensic Science Institutes 
, is not currently available for microbiome forensics.