Each component of a GSL is involved in mediating various biological processes and physiological functions. While the ceramide functions as an anchor of the glycan headgroup and modulates its antigenic and assembling properties [
17,
18], the carbohydrate moiety acts as a receptor (for bacteria and viruses), an antigen (in several autoimmune diseases), is involved in protein interactions, receptor regulation, cell recognition, differentiation and adhesion, cell signaling, apoptosis, and formation of the myelin sheath [
19,
20,
21,
22]. Hence, GSLs participate in embryogenesis [
19,
21], brain development, synaptogenesis [
23,
24,
25,
26,
27,
28,
29], antigenicity and immune response [
30,
31,
32,
33,
34], kidney function [
25,
29], and hemostasis/thrombosis [
29].
Over time, the GSLs physiological roles have been studied using various biophysical, genetic, cell biology, and biochemical methods. However, the GSLs involvement in the etiopathogenesis of these diseases is not extensively known and, therefore, this represents a field with much promise for the future.
2. Structural Characteristics, Classification, and Nomenclature of GSLs
Because of the existing variations in structure in both the ceramide and carbohydrate parts, GSLs are characterized by great complexity and diversity. The GSL glycan core expresses its specific species, while GSLs with different ceramide structures but the same glycan are considered different
lipoforms (lipid forms) of the same species of GSL [
69,
70]. A single lipoform, homogeneous regarding the fatty acid, sphingoid base, and glycan, is considered equivalent to what is sometimes stated as a “molecular species” in GSL literature.
Every species of GSL can present a multitude of lipoforms, varied in the ceramide structure, such as the fatty acid chain length (C14 to C30 or greater, although the fatty acids most common in mammalian GSL ceramides are C18:0, C16:0, and C20:0), unsaturation degree, branching pattern, and hydroxylation possibility for both the fatty acid and the sphingoid base.
Removal from the ceramide moiety of the fatty acid residue, most commonly under physiological conditions using acid ceramidases, results in the formation of lyso-GSLs, which can be connected to various human diseases [
71].
The basic chemical structure for the sphingoid base is represented by sphinganine or dihydrosphingosine, denoted as d18:0 (d stands for ‘di’ (two) hydroxyl groups at positions 1 and 3, 18 represents the number of carbon atoms, and 0 the C-C double bond number). In mammals, the most commonly encountered sphingoid base is, however, sphingosine (d18:1), possessing a double bond between C-4 and C-5, in addition to the structure of sphinganine. Phytosphingosine (t18:0) possesses an additional hydroxyl on C-4 while lacking the double bond.
In animals, ceramides which contain sphinganine and phytosphingosine are less abundant, while those which present phytosphingosine are found very widely in the GSLs of plants and fungi [
1,
72,
73,
74,
75,
76,
77].
Other sphingoid bases containing a different number of C atoms, d20:0, d20:1, d16:0, d16:1, are also present in eukaryotes [
65,
78,
79] and additional double bonds of carbon occur regularly at different positions in the hydrocarbon chain, generating a variety of sphingetrienes and sphingedienes. Several modifications of sphingosine, such as creating hydroxyl or oxo groups at different positions in the carbon backbone by the oxidation of carbons, or the addition of methylene or methyl groups to form rings or branches, are usually found within the tree of life’s low species [
70].
Even though ceramide variations result in a substantial variety of GSL structures, most of the important structural and even functional classifications are owing to the structure of the carbohydrate core. The carbohydrate part can contain different types and numbers of monosaccharide residues, different linkages connecting them, or can be modified with various functional groups.
In vertebrates, the first monosaccharide attached to ceramide can be glucose (Glc-Cer) or galactose (Gal-Cer), but relatively reduced numbers of Gal-Cer-derived GSLs are found because, usually, extending the Gal-Cer glycan is constrained. The result is that the majority of mammalian GSLs are generated from Glc-Cer, while in invertebrates Gal-Cer derivatives are found.
GSLs have been divided into two classes on the basis of their glycans’ physicochemical properties: acidic GSLs and neutral (nonionic) GSLs. Acidic GSLs are mainly made up of two groups: the sialosyl-GSLs (which contain one or more than one sialic acid residue) or gangliosides and the sulfo-GSLs (which contain sulfate monoesters) or sulfatides [
7]. In humans, only two forms of sialic acid usually exist, namely, N-glycolylneuraminic (Neu5Gc) and N-acetylneuraminic (Neu5Ac), the former only being present in trace amounts, originating either from diet [
80,
81], or produced by some malignant cells [
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92,
93,
94,
95,
96].
An unusual and rare form of sialic acid, deaminoneuraminic (KDN), and its glycoconjugates are only abundant in pathogenic bacteria and lower vertebrates; nevertheless, KDN was identified in some human tumors and in different animal organs, its presence being enhanced in hypoxic conditions [
97,
98,
99].
GSLs are further classified according to the number, sequence, configuration, and linkages between the constituent monosaccharides as
ganglio-,
isoganglio-,
globo-,
isoglobo-,
lacto-,
neolacto-,
lactoganglio-,
muco-,
gala-,
neogala-,
mollu-,
arthro-,
schisto-, and
spirometo-series [
100]. GSLs are divided into three key groups in vertebrates, containing the
lacto-/
neolacto-series, the
globo-/
isoglobo-series, and the
ganglio-/
isoganglio-series, and are expressed in tissue-specific patterns (
Table 1). This diversity probably reveals important differences in the functions of GSLs. Conventionally, all sialylated GSLs are called “gangliosides” if they are derived not just from the
ganglio-series, but also from the
lacto- and
globo-series of neutral GSLs [
101,
102]. In invertebrates, the GSLs found are in the
mollu- and
arthro-series (
Table 1).
Table 1. GSL classification according to the core carbohydrate structure.
In addition to the high structural variety of GSLs as a result of the existent variations in their glycan and lipid portions, chemical modifications such as
O-acetylation, fucosylation, lactonization [
64,
103,
104], and the uncommon
O-ketalation [
105] may also occur in the structure of glycans.
In 1997, the IUPAC-IUB Joint Commission on Biochemical Nomenclature proposed a nomenclature for GSL based on a set of core structures which are pre-defined (characterizing the linkages between the monosaccharide components and the composition of the glycan) for serving as a base name [
106] and includes all modifications within the glycan headgroup (
Table 2).
Table 2. The Svennerholm system of GSL nomenclature and the IUPAC-IUB Joint Commission nomenclature.
An example is the name III2-α-Fuc-Gb3Cer (d18:1/18:0) for the GSL having the composition Fucα1-2Galα1-4Galβ1-4Glcβ1–1′Cer. In this example, the carbohydrate core structure Galα1-4Galβ1-4Glcβ1–1′ is denoted Gb3 (the base name). Since the GSL contains a monosaccharide extending beyond this core structure, it is used as prefix to this name, which contains (1) a Roman numeral indicating the monosaccharide which is modified, counting the monosaccharide which is closest to the ceramide moiety as “I” (in this case Fuc is attached to Gal at position III); (2) a superscript on the Roman numeral indicating which hydroxyl on that monosaccharide is modified (2 in the above example); (3) the configuration of the linkage (α); and (4) the abbreviated name of the attached monosaccharide unit (Fuc). While this terminology resolves confusion within the chemical literature, particularly when addressing numerous GSL isomers, it proves overly intricate for everyday application and lacks a framework for naming the lipid component.
The established approach for GSL naming, initially introduced by Svennerholm and still prevalent today [
107,
108], relies on their series designations along with the type, number, bonding arrangement, and position of sugar units within the glycan structure (
Table 2). As an illustration, gangliosides are commonly abbreviated using a combination of two letters and a numeral, for instance, GM1, GD1, GT1, GQ. Here, G signifies the
ganglio-series; the second letter (M, D, T, Q, P) indicates the count of sialic acid units (single, double, triple, quadruple, pentuple) found in the glycans; and the numeral (1, 2, 3, 4) corresponds to their elution sequence in thin-layer chromatography (TLC) (Rf values: GM4 > GM3 > GM2 > GM1), relative to the starting point. The latter property depends on the composition and length of the glycan, and the neutral core oligosaccharides were shown to move farther in TLC as the number increased. The indexes such as a, b, c show the molecular synthesis pathway, with the sialic acids binding position being different for all three pathways: e.g., in GM1b, the residue of sialic acid is linked to a Gal residue on the oligosaccharide chain end, while in the case of GM1a, the inner Gal residue is bound to the sialic acid.
Due to the varying degrees of molecular structural insight provided by diverse MS techniques, the Lipid MAPS consortium proposed an extensive classification system in 2005. They also created a comprehensive structural database encompassing biologically significant lipids, GSLs included. This database features entries for each GSL species, presenting both the systematic nomenclature and the commonly used name. It also incorporates information about the identifiable sphingoid base and N-linked fatty acids for each GSL species. A shorthand notation based on Liebisch et al. [
109] is used for annotating MS-data-derived lipid structures at three levels: species level, molecular species level, and full structure level. When considering the ganglioside GM3 (d18:1/24:0), it is named according to this shorthand notation:
(1) NeuAcHex2Cer, 42:1;O2, at the species level (when the number of hydroxyl groups within the sphingoid base is uncertain, the total count of N-linked fatty acyl and sphingoid bases should be identified as the total carbon number; double bonds total; ceramide moiety oxygen atoms number; unidentified monosaccharide); (2) NeuAcHex2Cer, 18:1;O2/24:0, at the molecular species level (if the fatty acid and long-chain base structure are both known, with the exception of the double bond or stereochemistry and position); (3) NeuAc-Gal-Glc-Cer, 18:1(4E);3OH/24:0 (GM3), at the full structure level (if both stereochemistry or double bond and position and are identified). This widely acknowledged representation applies exclusively to the oligosaccharide headgroups comprising a maximum of two monosaccharide units; it does not encompass the more intricate glycosphingolipids (GSLs).