Glycans—a broad term describing carbohydrates, including oligosaccharides and polysaccharides—are the third class of important biological macromolecules following nucleic acids and proteins. Glycans are found in all domains of life and in viruses. They can exist as free sugars, but are more commonly found as glycoconjugates, including proteoglycans, glycoproteins, and glycolipids. Glycans are involved in a wide variety of physiological functions and have implications in numerous infectious and non-infectious diseases, making them diagnostic and therapeutic targets. Additionally, glycans are targeted in various biotechnological and industrial applications. The broad applications of glycans have spurred interest in the development of glycan binding proteins (GBPs).
GBPs include lectins, antibodies, pseudoenzymes, and carbohydrate-binding modules (CBMs). Lectins are non-immunoglobulin proteins containing at least one non-catalytic domain that exhibits reversible carbohydrate binding. CBMs are similar to lectins, but are small binding domains typically found in lectins or carbohydrate-active enzymes (CAZymes). CAZymes can be further classified into glycoside hydrolases, glycosyltransferases, polysaccharide lyases, and carbohydrate esterases—detailed information on these enzymes is available through the Carbohydrate Active Enzymes (CAZy) database.
Figure 1). The definition of lectins has changed over the years[1][2], but can generally be defined as proteins that bind carbohydrates. Hence, most GBPs can be categorized as a lectin; however, for the purpose of this review, we have categorized certain GBPs separately from lectins due to their distinct characteristic folds and properties. A summary of the scaffolds discussed in this section, along with example scaffolds that have structural and binding data available, is available in
Figure 1.
Lectins are carbohydrate binding proteins that are placed into sub-categories based on their folds and function: P-type, I-type, L-type, R-type, C-type, and galectins. Lectins display a wide variety of physiological functions and have biotechnological and biomedical applications—lectins have already been used in the detection and targeted treatments of human diseases such as cancer[3][4]. Here we provide a brief overview of lectins and some examples in GBP engineering. An excellent resource for detailed information on the various sub-categories of lectins can be found in the comprehensive text,
Essentials of Glycobiology (specifically chapters 28 to 38)[5].
Generally, lectins have relatively low affinities for their glycan targets, with dissociation constants in the micromolar range[6][7]. This may be explained by the shallow binding interface that is observed in most lectins, causing more competitive solvent interactions. The shallow binding interface may also explain the promiscuous binding observed in lectins—glycans with similar structures often bind similar lectins. In nature, the low affinity problem is overcome by oligomerization and multivalency; in biological settings lectins tend to assemble into oligomeric structures containing multiple binding sites, allowing for higher affinities to be reached. The relatively low affinities and promiscuity of lectins in the monomeric state must be considered when selecting scaffolds for GBP engineering; however, lectins with improved binding specificity and affinity have been developed[8]. One advantage of using lectins over other protein scaffolds is that databases like UniLectin3D are available that can search for lectin scaffolds based on the glycan target[9].
Carbohydrate binding modules (CBMs)[10], also known as carbohydrate binding domains (CBDs), are non-catalytic protein domains generally found on carbohydrate-active enzymes (CAZymes). There is low sequence identity between CBMs[10], but there are conserved tertiary folds that are categorized based on their binding site topology as types A, B, or C [11]. The topologies of CBMs are characterized in type A by a planar hydrophobic surface, in type B by an extended binding cavity, and in type C by a short binding pocket—for more information on the structures of CBM types please see the extensive review by Armenta et al. [10]. For the purposes of GBP engineering, type A CBMs are suitable for binding insoluble, crystalline carbohydrates, due to the exposed planar binding interface [12]. In contrast, type B CBMs bind oligosaccharides[13], and type C CBMs bind mono and di-saccharides[14]. One attractive aspect of CBMs as GBP scaffolds is their modularity; due to their small size CBMs can be designed in tandem to increase specificity or allow for multiple binding targets. Additionally, there is a variety of well characterized CBMs that can be used as scaffolds—not surprisingly, CBMs have been used to engineer a variety of GBPs with altered binding characteristics[13][15][16][17].
In nature, a number of GBPs have evolved from enzymes through the loss of catalytic activity while retaining binding function. These can be defined as pseudoenzymes, which are catalytically inactive proteins related to ancestral enzymes[18]. Pseudoglycosidases are a type of pseudoenzyme that evolved from glycosidases (glycoside hydrolases). These proteins, which bind glycans but cannot hydrolyze glycosidic linkages, can also be characterized as lectins since glycan-binding is their primary function. A few notable examples of pseudoglycosidases that act as GBPs have been observed in nature. In animals, chitinase-like proteins such as the human YKL-39 are pseudoglycosidases (GH18 homologues) with enigmatic biological functions that have been shown to bind to chitooligosaccharides as part of their apparent role in modulating the innate immune response[19][20]. Another example in animals is found in α- and β-klotho proteins, which each make up part of a receptor complex responsive to fibroblast growth factors (FGFs), wherein catalytically inactive GH1-like tandem repeats of the klotho proteins bind to “sugar-mimicking motifs” of FGF19 and FGF21[21]. In protozoans, the CyRPA protein of
Plasmodium falciparum—part of the invasion complex that allows the malaria-causing parasite to bind and enter red blood cells—appears to be a catalytically inactive pseudoglycosidase related to GH33 sialidases[22][23][24]. Pseudoenzymes evolved from other types of enzymes can also bind to glycans. For example, PgaB in
E. coli
N
N-acetylglucosamine [25].
Although pseudoenzymes can, in theory, be used as glycan binding scaffolds, there are no published works on engineering pseudoenzyme scaffolds into novel GBPs as of 2020. This may be due to a lack of known pseudoenzymes scaffolds but may also be due to the prevalence of mutagenesis techniques that allow for inactivation of enzymes. The use of enzymes as GBP scaffolds is discussed in greater detail in the following section.
E. coli
KD) of 191 nM [26][27]. This engineered GBP has been applied as a very sensitive tool for detecting polysialic acid[28][29]. In another example, mutation of a CE2 carbohydrate esterase from
Clostridium thermocellum has also been shown to produce a catalytically inactive GBP with micromolar affinity[30]. A single amino acid replacement of the
Ct
KD
®” (lectins engineered from enzymes) [31]. The company has produced several Lectenz
® through site-directed mutagenesis and computationally guided directed evolution. One advantage of using CAZyme scaffolds is that carbohydrate-processing enzymes tend to be more specific for their ligands than lectins, although this will vary between proteins.
Antibody-based scaffolds consist of immunoglobulin or immunoglobulin-like protein folds. A variety of antibody-based scaffolds are found in animals, but the most commonly used for developing antigen binding proteins are immunoglobulin G (IgG), and more recently, camelid antibodies[32]. The production of naturally occurring antibodies is time consuming and costly as it requires the immunization of an animal; however, antibody-based scaffolds have been engineered that circumvent the use of animals. These include, but are not limited to, antigen binding fragments (F
ab)[33], single chain variable fragments (ScFvs)[34], diabodies[35], monobodies, and nanobodies [36]. There has been a concerted effort to produce antibodies against tumor-associated carbohydrate antigens (TACAs)—in total, antibodies have been designed for about 250 distinct glycan targets [37]. Antibody scaffolds offer certain advantages over lectins, including a larger binding interface for longer glycan epitopes, and generally more selective binding due to the complementary determining regions. However, glycans are poorly immunogenic and producing an anti-glycan antibody can be costly, labour intensive, and time consuming. Additionally, anti-glycan antibodies generally have lower affinities (
KD
KD in the nanomolar range). Within the last decade, phage display has provided methods for overcoming some of these limitations, resulting in antibodies with higher affinity for their glycan targets[38]. However, this approach still requires an initial scaffold obtained from immunization to be used as the base scaffold for improving affinity and selectivity.
Table 1 is by no means a complete scaffold list; it serves as a list of example scaffolds that are available and characteristics that need to be taken into consideration. Finding scaffolds ideal for a glycan of choice can be challenging and we recommend using UniLectin3D or equivalent GBP databases as starting point for finding potential scaffolds [9].
Scaffold Category |
Scaffold Sub-Category |
Description | Origin | Example Protein (PE) | PE Length | PE Ligand | PE Oligomeric State | PE Multivalency |
---|---|---|---|---|---|---|---|---|
Lectins |
P-type |
Lectin that binds to mannose 6-phosphate |
Animal |
Bovine CD-MPR binding domain [56] |
154 aa |
Mannose 6-Phosphate |
Dimer |
Monovalent |
I-type |
Protein that is homologous to the immunoglobulin superfamily (IgSF) |
Vertebrata |
hCD22 domains 1-3 [57] |
324 aa |
Sialoglycans |
Monomer |
Monovalent |
|
L-type |
Proteins that are structurally similar to lectins found in the seeds of leguminous plants |
All domains of life and viruses |
237 aa |
Trimannoside containing-oligosaccharides [59] |
Oligomer |
Divalent |
||
R-type |
Proteins that are structurally similar to the carbohydrate recognition domain (CRD) in ricin |
All domains of life and viruses |
Ricin [60] |
267 aa |
β1,4 galactose, N-acetylgalactosamine |
Dimer |
Divalent |
|
C-type |
Ca2+ dependant proteins that share a primary and secondary homology in their CRDs |
Animal |
C-type domain of murine DCIR2 [61] |
129 aa |
N-glycans |
Monomer |
Monovalent |
|
Galectin |
Globular proteins that share primary structural homology in their CRDs |
Animal |
hGalectin-3 [62] |
146 aa |
N-acetyllactosamine |
Monomer |
Monovalent |
|
Carbohydrate Binding Modules (CBMs) |
Type A |
Protein domain that binds to crystalline surfaces of cellulose and chitin |
All domains of life and viruses |
CBM from Cel7A [63] |
36 aa |
Cellulose |
Monomer |
Monovalent |
Type B |
Protein domain that binds endo-glycan chains |
All domains of life and viruses |
CBM4-2 from xylanase [33] |
150 aa |
Xylans, β-glucans |
Monomer |
Monovalent |
|
Type C |
Protein domain that binds exo-type glycan chains |
All domains of life and viruses |
Cp-CBM 32 of hexosaminidase [64] |
150 aa |
N-acetyllactosamine |
Monomer |
Monovalent |
|
Pseudoenzymes |
Pseudoglycosidase |
Carbohydrate binding proteins that evolved from glycosidases but are no longer catalytically active |
Possibly all domains of life* |
hYKL-39 [36] |
365 aa |
Chitooligo-saccharides |
Monomer |
Monovalent |
Pseudoesterase |
Carbohydrate binding proteins that evolved from carbohydrate esterases but are no longer catalytically active |
Possibly all domains of life * |
C-terminal domain of PgaB [42] |
367 aa |
Poly-1,6-N-acetylgluco-samine |
Monomer |
Monovalent |
|
Carbohydrate- Active Enzymes (CAZymes) |
Glycoside hydrolase |
Enzymes that cleave glycosidic linkages |
All domains of life and viruses |
Endo-NF (GH58) [44] |
811 aa |
Polysialic acid |
Trimer |
Multivalent |
Carbohydrate esterase |
Enzymes that hydrolyze ester linkages of acyl groups attached to carbohydrates |
All domains of life and viruses |
CtCE2 [47] |
333 aa |
Cellooligo-saccharides |
Monomer |
Monovalent |
|
Other CAZymes (glycosyltransferase, polysaccharide lyase, auxiliary activities) |
Enzymes involved in the assembly, break-down, and modification of carbohydrates |
All domains of life and viruses |
− |
− |
− |
− |
− |
|
Antibodies |
N/A |
Naturally or synthetically produced proteins with an immunoglobulin, or derived from an immunoglobulin-like structure |
Vertebrata |
hu3S193 [65] |
LC: 219 aa HC: 222 aa |
LewisY |
Dimeric |
Divalent |