Ribulose1,5-bisphosphate carboxylase/oxygenase (Rubisco) is an enzyme employed by plants, algae, cyanobacteria and other autotrophic organisms to incorporate CO2
into organic compounds, thus it is one of the key photosynthetic enzymes. Rubisco catalyses carboxylation reaction, during which it assimilates CO2
and an oxygenation reaction, in which it oxidizes the substrate. In both reactions, the substrate is ribulose-1,5-bisphosphate (RuBP). Due to the fact that Rubisco’s effectiveness of carboxylation is low, and that it also catalyses the unfavourable reaction of photorespiration, it is considered to be a limiting factor of photosynthesis. Consequently, Rubisco is the obvious target for the increase in agricultural production efficiency, and thus it is one of the best studied enzymes for this application 
. Rubisco consists of at least two catalytic, large subunits (RbcL), and in some cases, of additional regulatory small subunits. To reach catalytic competence, lysine in the active site of Rubisco must first be carboxylated by a non-substrate CO2
molecule, followed by the binding of a Mg2+
ion. This process is called carbamylation and serves to position the substrate RuBP for an efficient electrophilic attack by the second CO2
molecule that will be fixed in the Calvin-Benson cycle (CBB) cycle upon RuBP binding. The active site is closed via two conformational changes in RbcL: loop 6 in the C-terminal domain of RbcL extends over the bound RuBP trapping it underneath; the C-terminal tail of RbcL then stretches across the subunit and pins down loop 6, closing the active site, which results in a closed conformation of Rubisco. Besides RuBP, Rubisco can also bind other molecules like carboxyarabinitol-1,5-bisphosphate (CABP), which is a tight-binding inhibitor of this enzyme, making the active site of carbamylated or decarbamylated Rubisco adopt a closed conformation, and downregulating Rubisco’s activity 
The variation of Rubisco is great due to the huge diversity of organisms that it was found in. Additionally, the different quaternary structures allow distinguishing four different Rubisco forms. Of four known forms, the dinoflagellates form II is the least studied one. Most papers about this form come from the period of 1972–2003 
. Until today all other Rubiscos have been very well studied, while many questions pertaining to the dinoflagellate enzyme are left unanswered. Rubisco from these organisms shows a set of surprising features. Little is known about its catalytic properties, besides the fact that it is highly unstable, however, possesses a much greater specificity factor (SF, defined as the ratio between CO2
activity), than other form II Rubiscos 
. It is very important to understand the origin of such high SF, as it may help in improving catalytic properties of other Rubiscos. The dinoflagellate Rubisco has been shown to be a form II type enzyme, a homodimer of RbcL (L2), most likely similar to the one from Rhodospirillum rubrum,
and is encoded by nuclear-localized genes unlike other known eukaryotic large Rubisco subunits, which are encoded by the plastidic genome. What is more, it is encoded as a triple polyprotein by a diverse gene family that contains introns 
sp. Rubisco expression is photoperiod regulated, but also dependent on its anemone host 
. Another outstanding fact is that dinoflagellates, although being aerobic photoautotrophs, have a form II Rubisco. This form of Rubisco originates from anaerobic proteobacteria and has a high affinity for O2
, and this should lead, under normal circumstances for an aerobic organism, to inefficient CO2
assimilation. Since this is not true, we may suppose that dinoflagellates cells pose a mechanism to cope with the O2
dilemma, e.g., a local CO2
concentrating mechanism (CCM) 
This unusual set of features of dinoflagellate Rubisco suggest also unusual evolutionary origin, corresponding to the mysterious evolution of dinoflagellate, with multiple events of endosymbiosis 
. To further understand it, more data is needed about the enzyme itself.
The main obstacle in obtaining sufficient data is that the dinoflagellate Rubisco is highly unstable. It has been shown that Rubisco from Symbiodinium
sp. and A. cartere
lost its activity within 30 min following the cell lysis 
, while higher plant or R. rubrum
Rubisco is stable for several hours and may be easily isolated 
. The reason for this venture is not fully understood. It was shown that loss of Symbiodinium
sp. Rubisco activity was not due to proteolysis or precipitation. The explanation may be the instability of the L2 dimer or of the higher quaternary structure complex 
. There might be some specific chaperone proteins involved in stabilising the final oligomer, what is suggested by Rubisco assembly scenario present in other organisms 
. The existence of chaperones might be deduced from an organism’s genome homology study. However, such is impossible for the dinoflagellate genomes, since they are enormously large (from 1 to 270 Gb, a size that is one-third to 90-fold the size of the human genome), and they have not been fully sequenced so far. Although surely not depicting the whole picture, some chaperones were indeed identified in the Symbiodinium
sp. transcriptome 
An enzyme’s crystal structure would be helpful in understanding the dinoflagellate Rubisco. No successful effort to solve it was yet carried out, mainly because it is impossible to purify its native form due to the aforementioned. However, tools are available to search for the answers not only in vivo, but also in silico. Such an attempt was successfully used for several proteins, which demonstrated as hard to crystallize 
. The present paper is an attempt to create a model of a structure of the dinoflagellate Rubisco from Symbiodinium
sp. by homology modelling. We utilize known solved structures of form II Rubisco as templates. Then, we show similarities and differences, which we use to build an explanation for the unusual features of dinoflagellate Rubisco. In a basic experiment, we also show that one of the identified elements (an insert forming loop, exclusive for dinoflagellates) may influence Rubisco solubility.
2. Homologues of Form II Rubisco from Rhodospirillum Rubrum among Dinoflagellates
To find the best sequence for further modelling, we used the blastP tool to find homologues of the template R. rubrum Rubisco among dinoflagellates. As mentioned already, this protein is broadly accepted as a model form II Rubisco. The highest scoring entries are listed in Table 1.
Table 1. Highest scoring homologues of Rhodospirillum rubrum Rubisco among dinoflagellates.
||Query Cover [%]
||Percent Identity [%]
Homologues were searched using the blastP tool with the organism parameter defined to: Dino-flagellates taxid: 2864. Due to the high similarity of sequences between dinoflagellates, only the top 4 are listed in the table. Symbiodinium microadriaticum is listed here, as it is the name of an entry; however, in the hereby text we are using simply Symbiodinium sp., as it is a convention accepted in most of papers pertaining to dinoflagellates.
showed the highest similarity of amino acid sequence to the R. rubrum
sequence, as described by Query cover (97%, a number that describes how much of the query sequence is covered by the target sequence), E value (0.0, expected value, a number that describes how many times a match by chance in a database of that size is expected; the lower the E value is, the more significant the match) and percent identity (67.67%, a percent of identical amino acids in the same position of the sequence) 
. The best studied Rubisco from dinoflagellates is the one from Symbiodinium
sp., being the second with the highest score. It differs from the first hit by less than 2 in percent identity. Thus, we decided to choose Symbiodinium
sp. as a case for further investigations in this paper.
3. Analysis of the Amino Acid Sequence of Dinoflagellate Rubiscos
To compare the primary structure of dinoflagellate Rubisco, we aligned sequences of Rubiscos listed in Table 1
on the R. rubrum
template using Clustal OMEGA 
. This comparison showed differences that might be crucial for further investigation of the eukaryotic form II Rubisco (Figure 1
Figure 1. Protein sequence alignment in Clustal OMEGA (A) and a phylogenetic tree of form II Rubiscos from Dinoflagellates constructed based on this alignment (B). Red frames indicate the position of two unique inserts. “*” indicate identical amino acids in all sequences, “:” indicate amino acids which are not identical but have similar properties.
First of all, in our alignment dinoflagellate Rubiscos do not start with a methionine residue (like in R. rubrum
), but with a leucine. The lack of an initial codon suggests that there might be a transit peptide encoded at the beginning of the rbcA locus, which encodes rbcL. Rubiscos from dinoflagellates are encoded in the nucleus, and therefore need to be transported into the chloroplasts. It was previously shown that there is an upstream sequence in the rbcA mRNA, with a pattern of conserved residues analogous to Euglena’s Rubisco’s small subunit precursor polyprotein 
. Aranda and co-workers sequenced and analysed parts of the dinoflagellate genomes and transcriptomes, and identified this upstream sequence of the rbcA locus 
. The second reason for the lack of methionine is the protein’s encoding as a precursor polyprotein. This means that first result of translation is a longer peptide, bearing a transit peptide, and two or more proteins, which are separated with spacers. This pre-polyprotein trend occurs also in Euglena’s proteome, where, for example, light-harvesting complexes consist as such, and are separated with a deca-peptide spacer 
As mentioned previously, more than 67% of the amino acid sequence is identical in aligned proteins. Most of the differences are equally distributed along the compared sequences. The charge distribution is similar; an isoelectric point of Symbiodinium
Rubisco is slightly higher than that of R. rubrum
enzyme (5.72 vs. 5.60). This is a result of a plus one negative and a minus one positive amino acid in the Symbiodinium
sp. sequence. More notable might be the higher amount of cysteine residues in the dinoflagellate Rubisco. In the Symbiodinium
sp. sequence, there are 9 such residues, which is almost twice their number (5) in R. rubrum
. Notably, only two cysteine residues are conserved between R. rubrum
and dinoflagellate Rubiscos (Cys59 and Cys180). Cysteine residues, although not involved directly in Rubisco activity, are known to be responsible for its redox regulation and conformational changes 
. The importance of cysteine residues was also proven for Arabidopsis thaliana Rubisco; after oxidative inactivation, the enzyme was reactivated by redox treatment 
. On this basis, we may hypothesise, that the higher content of Cys residues is responsible for possible oxygen-dependent inactivation of Symbiodinium
sp. Rubisco upon isolation.
The most significant differences between dinoflagellate and R. Rubrum Rubiscos are the two insertions present in the dinoflagellate RbcL amino acid sequence (Figure 1A, red rectangles). The first insertion contains three negatively charged amino acids in position 413, and the second insertion is made up of eight amino acids in position 425. Both inserts may be treated as one longer, dinoflagellate-specific motif. The possible role of those inserts will be further discussed on the base of constructed models.
To conclude, we built a structural model of dinoflagellate Rubisco based on known form II homologs of this enzyme (Fig.2). Dinoflagellates, as mentioned, belong to the Eucaryota, but their Rubisco, classified as type II, is nuclearly encoded in three repeats, differently to other known eukaryotic Rubiscos of type I. This feature may reflect the evolutional history of the Rubisco enzyme, as dinoflagellate Rubisco shows characteristics of both eukaryotic and prokaryotic organisms. It should be kept in mind that this is an in silico study without crystallographic confirmation; however, it comes out with several indications, which may help in further studies. First, we confirmed that the catalytical site of the enzyme is conserved, and therefore is not an explanation for differences noted between dinoflagellate Rubiscos and its homologs from other organisms. Therefore, the experimentally observed loss of activity of isolated dinoflagellate enzyme must be linked to other structural features of the protein.
Figure 2 Large subunit monomers from R. palustris (A, green ribbon structure), modelled Symbiodinium sp. Rubisco structure (B, violet ribbon structure), and a superimposition of both structures (C). Red colour indicates a novel loop (insert 425) in the Symbiodinium sp Rubisco structure.
We found, that Rubisco from Symbiodinium sp. has twice as many cysteine residues as the Rubisco from R. rubrum. We postulate that the higher amount of cysteines, which are known to be responsible for redox regulation, might be the cause for high instability of dinoflagellate Rubisco. This observation suggests that the isolation of an active enzyme from a natural source may need additional optimization of redox conditions; the active enzyme expression in a heterological system may also require overcoming of the folding limitations.
Our analysis showed that the dinoflagellatae Rubisco is a hexamer (a trimer of dimers) rather than, as previously suggested, a L2 type enzyme. The indicated hexamer has a more complex structure than a simple dimer. This knowledge might help to obtain a stable purified enzyme, mostly by including chaperone proteins in the process, aiding in formation of a higher oligomer. We may hypothesize that these might be, at least in part, the chaperones alike to those of higher plants; however, it needs further experimental confirmation.
We also show that dinoflagellate Rubiscos contain a novel motif, consisting of a helix extension and a loop. Location of this motif excludes its direct involvement in a catalytical reaction, suggesting rather a role in interaction with an unknown protein partner of possible regulatory function. As a proof of concept, we expressed the Symbiodinium sp. RbcL without the loop, finding the protein solubility to be on a significantly lower level. This loop; therefore, maybe important for the interactions with other proteins, such as a possible unknown regulatory protein as well as chaperones. Again, this makes the dinoflagellate enzyme more similar to the eukaryotic Rubisco due to the similar need for a series of chaperone proteins in order to assemble into an active enzyme. All these findings bring us closer to explaining dinoflagellate Rubisco’s surprising features. Full understanding of Rubisco characteristics will make possible reengineering it to gain a higher yield of CO2 assimilation, what may benefit in higher crop yield and an overall improvement in biosphere CO2 level.