4. Gene Cloning and Design
Gene cloning typically involves selecting a purification method, such as affinity chromatography, which utilizes the inherent characteristics of the protein. This can be achieved through immobilized ligand or substrate mimic chromatography, using compounds like Cibacron Blue F3GA
[28] or cyclic peptide-based ligands
[29]. Alternatively, a purification tag, such as a maltose-binding protein (MBP)-tag, glutathione-S-transferase (GST)-tag, or commonly a hexahistidine tag (his-tag), can be added to facilitate purification. Immobilized metal affinity chromatography (IMAC)
[30] is commonly employed. If a protein with similar characteristics is present, its attributes can be utilized to assess the feasibility of adding a tag to the N- and C-terminus. Alternatively, one might utilize structure prediction software such as Phyre 2
[31]. Although N-terminal histidine tags are highly valuable and extensively employed, they can introduce heterogeneity in the final product due to varying (phospho)gluconylation occurring at the N-terminus
[32].
After the design of the protein construct, gene design starts to yield maximum expression that depends much on cellular homeostasis, or keeping a delicate balance within the cell. When a high-copy number plasmid is employed with a robust promoter, it consistently leads to a reduced protein yield
[33]. This is attributed to the excessive allocation of cellular resources towards synthesizing plasmid DNA and mRNA. Consequently, the abundance of mRNA exceeds the capacity of the translation machinery, resulting in suboptimal protein production. Toxic effects of overexpressed recombinant proteins on
E. coli cells can be anticipated to avoid these processes
[34].
Transcriptome analysis can identify and remove the genes in charge of the cellular stress response. The number of growth-essential genes’ down-regulated expression is reduced when cell surface receptor (CSR) is blocked
[35].
An increasingly popular method to avoid losing plasmids during long fermentation processes is to add genes to the bacterial chromosome. However, despite their drawbacks, plasmids can be employed in their original form because they are more expeditious and cost-effective. The selection of plasmids for protein production depends on their copy quantity, which depends on the plasmid's origin of replication, promoter, and selection marker. To get the most out of the cell resources used for protein production, you need to find the right balance between the number of copies of the plasmid and the strength of the promoter, taking into account the conditions of the media.
The field of synthetic biology has witnessed notable progress in developing growth-decoupled recombinant protein production. This has been achieved using the co-expression of Gp2, a peptide generated from a bacteriophage that acts as an inhibitor of RNA polymerase in Escherichia coli. This methodology facilitated the regulation of metabolic resources, ensuring their exclusive allocation towards synthesizing the intended protein.
In addition to the plasmid, the origin of the gene is a crucial factor. In the past, the gene was taken directly from the living thing itself, usually by using a cDNA library made from messenger RNA (mRNA) through reverse transcription polymerase chain reaction (RT-PCR) to avoid including introns. Although the process can exhibit rapidity, cost-effectiveness, and efficiency, it can also lead to challenges associated with disparities in translation initiation and codon utilization between prokaryotic and eukaryotic organisms.
Due to a significant decrease in pricing, the cost of synthesizing a gene artificially has become lower than the combined expenses of labor and materials involved in cloning a gene from a complementary DNA (cDNA) library. Synthetic genes can also alleviate the potentially harmful consequences of another dissimilarity in protein translation rates between eukaryotes and prokaryotes
[36]. In prokaryotic organisms like
Escherichia coli (
E. coli), a coupling exists between the transcription and translation rates
[37]. Specifically, transcription occurs at a rate of 50 nucleotides, whereas translation occurs at 16 amino acids.
4.1. Ribosomes
In 1987
[38], a modified ribosome system was developed to facilitate the production of the proteins in
E. coli through modifications made to the Shine–Dalgarno (SD) sequence of the mRNA and the corresponding anti-SD sequence of the 16S ribosomal RNA (rRNA). Other alternative ribosome systems can be utilized, including the orthogonal riboswitch system
[39], the RiboTite system, and the Ribo-T system
[40][41]. The riboswitch system facilitates the adjustable co-expression of several genes in a dose-dependent manner in response to tiny synthetic chemicals. On the other hand, the RiboTite system, an extension of the riboswitch technology, has demonstrated the ability to synchronize protein translation rates with protein release. The Ribo-T system utilizes a modified hybrid rRNA that combines small and large subunit rRNA sequences. This modified rRNA is connected into a single translating unit using short RNA linkers that form covalent bonds between the subunits. The functionality of the orthogonal ribosome-mRNA system has been demonstrated to sustain bacterial growth in the absence of wild-type ribosomes. Furthermore, a recent study has documented the development of an enhanced tethered version of this system
[42].
-
The characteristics and location of the ribosome binding site (RBS) and the disparities in translation rates observed in prokaryotic and eukaryotic organisms
[43]. The ribosome binding site (RBS) plays a crucial role in translation initiation. The sequence and position of a gene relative to the initiation codon can influence translation efficiency. Customizing the RBS to the host organism might enhance the efficiency of translating the desired protein
[44];
-
Correct use of the strain and media to optimize production, though with many limitations
[45]. The optimization of production in
E. coli strains through proper selection of the strain and media is a common strategy in biotechnology but comes with certain limitations.
-
Optimization in E. coli can vary widely depending on the protein or other manufactured product. Selecting the right strain of E. coli, determining the optimal temperature, and choosing the appropriate culture media are crucial considerations for recombinant protein expression.
The presence of secondary structural components in mRNA might obstruct ribosome binding, resulting in hindered translation and various limits in the translational process
[46]. Eukaryotic ribosomes exhibit a binding affinity towards the cap located at the 5′ terminus of the mRNA molecule. Subsequently, they traverse along the mRNA until they commence translation at the initial AUG codon, preceded by a Kozak sequence. In contrast, prokaryotic ribosomes engage with a specific region on the mRNA called the Shine–Dalgarno sequence or ribosome binding site. The ribosome binding sites (RBS) typically consist of 5–13 base pairs
[47] upstream of the beginning AUG codon, with an ideal spacing of 5–6 base pairs
[48]. These RBS sequences complement the 3′ end of the 16S ribosomal RNA. The nucleotide sequence AGGAGGU
[49] is seen in
Escherichia coli. When eukaryotic proteins are made in Escherichia coli (E. coli), having a separate ribosome binding site (RBS) causes two different things to happen. Before beginning the AUG codon, a ribosome binding site (RBS) must be present. This phenomenon may be observed within the plasmid region external to the multi-cloning site. However, it is imperative to exercise caution to ensure that the distance is appropriate and that the translation process does not inadvertently introduce more AUG trinucleotides.
Furthermore, it is essential that this specific nucleotide sequence does not occur inside the gene of interest. At an internal ribosome binding site (RBS), two things can happen: if there is an AUG codon close enough to it, it can either cause translation to stop because a ribosome binds to it and stops translation; or it can cause the production of a second protein. Therefore, special consideration is given to the choice of codons for Gly-Gly pairs (excluding GGA-GGU), Arg-Arg pairs (excluding AGG-AGG), and sequences surrounding Glu (GAG), including Glu-Glu pairs (GAG-GAG). Escherichia coli (E. coli) exhibits infrequent utilization of AGG and GGA codons. Because of this, it is very important to be careful when optimizing codons to avoid internal ribosome binding sites (RBS) that are linked to sequences around glutamic acid (Q/K/E-E or E-V).
4.2. Promoter
Some important functional parts close to PT7 are the −35/−10 region, the translation initiation region (TIR), the operator sequence, and the TpET plasmid's replicon. There are many functional areas close to PT7, which is the core region of the pET plasmid, that control the level of expression before induction and the right transcription rate after induction.
By maximizing transcription or translation levels, the T7 RNAP objective is attained. The lacUV5 promoter (PlacUV5), a strongly inducible promoter that is activated by the amino acid isopropyl-beta-d-thiogalactopyranoside (IPTG), controls this process
[50], and the P
lacUV5 is independent of recombinant product, which makes it leakier than P
lac [51]. Three inducible promoters—ParaBAD
[52], PrhaBAD, and Ptet—are appropriate for toxin–protein fermentation that lasts a long time. PrhaBAD and Ptet, however, more strictly control T7 RNAP transcription, giving additional expression possibilities for various recombinant products—especially dangerous proteins
[53]. When the lac repressor gene (lacI) is altered, leaky expression is decreased by improving the ability to inhibit proteins
[54].
-
To create the promoter variation lac1G, the promoter lacUV5 and lac were joined again. (G was substituted for A at position +1)
[55];
-
By having a mutant form of the Lac repressor protein (LacI), specifically the V192F variant, the expression of T7 RNA polymerase (RNAP) is effectively controlled to stop leakage. This mutant variant cannot bind to isopropyl β-D-1-thiogalactopyranoside (IPTG), hence preventing its activation. Consequently, the mutant LacI dynamically governs the levels of transcripts produced by T7 RNAP
[56];
-
Building a T7 RNAP RBS library quickly involves using the base editor and CRISPR/Cas9 to screen potential expression hosts
[57];
-
Because of a specific amino acid substitution (A102D), T7 RNA polymerase was less able to bind to the PT7 promoter. This changed the rate at which RNA was made. The T7 RNA polymerase (T7 RNAP) was fragmented into two segments and co-expressed with a light-responsive dimerization domain, exhibiting functional behavior upon exposure to blue light
[58].
4.3. Codons
The expression level of the ColE1 plasmid replication-associated gene can be regulated by utilizing CRISPRi and the inducible promoter Ptet
[59].
The distribution of codon usage is not uniform throughout the available codons, and there is significant variance in the degree of codon usage bias observed among different organisms. Using codons exhibits substantial variation across other microorganisms and is associated with corresponding transfer RNA (tRNA) quantities
[60].
mRNA, which contains multiple rare codons, can exhibit translation stalling and degradation
[61]. Bioinformatic approaches can examine codon usage issues, e.g., Graphical Codon Usage Analyzer
[62]. One method to prevent this problem is to overexpress the rare tRNAs
[63], such as from pLysSRARE
[64][65]. The usual approach is using synthetic genes that can be codon optimized for the expression host while avoiding internal RBS, internal restriction sites, and factors that influence mRNA structure and stability
[66][67].
4.4. Protein Folding
Translation rates in eukaryotes are comparatively slower, typically occurring at approximately three amino acids per second. The process of protein folding has co-evolved with translation rates, resulting in a situation where the translation rate
[68] of a eukaryotic protein expressed in
E. coli may exceed the folding rate. This poses a challenge, particularly for multi-domain proteins. However, this challenge can be addressed through various strategies, such as adjusting the translation rate, harmonizing codon usage
[69], or intentionally inducing ribosome stalling by incorporating rarer codons at domain boundaries.
When the host cell cannot handle the rate or volume of recombinant products being expressed, many proteins will misfold and cluster, eventually creating IBs and obstructing the expression. The primary reasons for the synthesis of IBs are limited post-translational modifications (PTMs) capacity and folding efficiency, which are of the utmost importance for increasing the functional activity of recombinant products
[70].
To make sure that antibodies with disulfide linkages fold and work correctly, the individual antibody chains need to be exposed to the oxidizing conditions in the periplasm of bacteria. In addition, it should be noted that the periplasmic space serves as a habitat for specific proteins known as chaperonins and disulfide isomerases, which play a crucial role in correctly folding newly synthesized proteins
[71]. A leader sequence (PelB, OmpA, PhoA) drives the antibody to the oxidizing periplasm for periplasmic expression
[72]. After expression, osmotic shock extracts the antibody from the periplasmic region. Yields obtained from shaking flask cultures have been documented to range from 0.1 mg/L to 100 mg/L, while using fermenters has demonstrated the potential to achieve yields as high as 2 g/L
[73]. Utilizing specific
E. coli strains that offer an oxidizing environment in the cytoplasm is an additional choice; typically, it comprises mutations of the enzymes, glutathione oxidoreductases, and thioredoxin reductases
[74].
Choosing the right molecular chaperones, such as GroES/GroEL, DnaK-DnaJ-GrpE, and co-expression, for overexpression to increase folding efficiency
[75].