Type I collagen, the predominant protein of vertebrates, assembles into fibrils that orchestrate the form and function of bone, tendon, skin, and other tissues. Collagen plays roles in hemostasis, wound healing, angiogenesis, and biomineralization, and its dysfunction contributes to fibrosis, atherosclerosis, cancer metastasis, and brittle bone disease.
Collagens are among the most ubiquitous and complex of the vertebrate extracellular matrix (ECM) macromolecules . About thirty genetically distinct collagens are expressed in human connective tissues. For most, the majority of their sequences exist as triple helices, which makes them unique among proteins. Triple helices are rigid, rope-like protein conformations which, depending on the collagen type, may be interspersed between small flexible non-triple helical regions, or larger, globular non-collagenous regions. The triple helical regions are composed of contiguous Gly-X-Y tripeptide repeats, with Gly residues being supported at this position because they are small enough to fit the confines of the three peptide chains that form the triple helix. The extent of the triple helical region, along with the presence of non-triple helical regions, determines the type of aggregate collagen molecules make, and how they contribute to the intricate ECM scaffold that makes up the internal architecture of the vertebrate body.
Type I collagen is the prototypical collagen that aggregates into fibrils. It is the most abundant protein in the human body, comprising about 7 kg of the dry weight of the human adult. There are approximately 1 × 1023 collagen molecules in the human body . Remarkably, if the collagen molecules from an adult human were laid end to end, the resulting rope would be long enough to lasso the Moon from the Earth many times over, or easily span the distance between the Earth and the Sun . That so many collagen molecules are packed tightly within one organism is a testament to the exquisite and efficient way vertebrate cells assemble and twist collagen molecules into rope-like fibrils—the most abundant molecular aggregate formed by collagens in vivo.
Type I collagen comprises much of the substance of connective tissues including tendon, ligaments and skin, and most of the organic phase of bone, and supports and provides form to many other tissues of the vertebrate body via the connective tissue proper . In bone, the type I collagen fibril also serves as the site for mineralization either directly, or by its association with mineralization nucleation proteins . It is therefore no surprise that type I collagen plays crucial roles in vital physiologic processes, including hemostasis, angiogenesis, and biomineralization, and in human pathologies including cancer, fibrosis, and atherosclerosis . Type I collagen from animal sources is also the most widely used biomaterial for fabrication of bone regeneration scaffolds, hemostats, bandages, and tendon repair patches . Therefore, from both the basic and applied standpoints, it is of paramount interest to understand collagen biology and define the collagen structure-function relationship.
Type I collagen is synthesized by cells as pro-α1 and pro-α2 procollagen chains, encoded by separate genes and comprising about 1000 amino acid residues each . The C-terminal propeptides promote the polymerization of two pro-α1 and one pro-α2 chains into the triple helical procollagen molecule. Extracellularly, the globular termini of procollagen are removed by proteolysis, yielding trimeric collagen monomers of a little over 300-nm long. Five monomers assemble in a quarter-staggered fashion to form the microfibril, the basic subunit of the collagen fibril (Figure 1). Specifically, along the fibril’s long axis exists a repeating D-period pattern of molecular organization. Within the 67-nm D-period, groups of five neighboring collagen molecules wind around each other into microfibrils that interdigitate with neighboring microfibrils to form practically inseparable connections within the fibril. The amino acid sequences of the single collagen molecules are found within each five-molecule segment that defines each D-period. Because ~300 nm (collagen triple helix length)/67 nm (D-period length) produces a non-integer number, the D-period contains a region of incomplete overlap, called the gap zone. This space within the microfibril plays a role in the biomechanical properties of the fibril. The gap region also allows biomineralization of bone by accommodating hydroxyapatite crystal formation and growth . Because of the gap zone, the remaining four full-length segments arrange around each other forming a twist that somewhat mirrors the superhelix of the collagen molecule, albeit on a larger scale . Each microfibril (Figure 1B,C), and its neighbors may be connected by N- and C-terminal intermolecular cross-links, yet, collagen fibrils exhibit varying degrees of crosslinking depending on tissue location, age, and other circumstances . Other collagen types, proteoglycans (PGs), and matrix macromolecules typically assemble onto the fibril to impart tissue-specific properties to the polymer .
Figure 1. Type I collagen assembly and structure. (A). Segment of Type I collagen fibril visualized by transmission electron microscopy. One molecular repeat or D-period (D) is indicated. The positive stain microscopy bands, a–e, are as indicated below the image; arrow indicates left border of overlap zone. This fibril preparation was used to localize heparin-binding sites; thus, heparin-gold particles appear as dark circles bound to the fibril. Originally published in San Antonio et al., 1994, J. Cell Biol., 125, 1179–1188. (B). Fibril schematic depicted as negatively-stained TEM preparation where gap regions are dark and overlap regions light. Microfibril schematic shows the Hodge-Petruska scheme  of packing where collagen molecules (numbered 1–5 for molecular (M) segments as in M1, M2, etc.) are staggered so that every five M segment does not traverse the entire D-period. Select collagen functional domains (right) are marked along the length of the collagen molecule. (C). A single D period of a single microfibril is shown beneath the microfibril schematic. The C-terminal telopeptide (marked in green on the top of the microfibril) and the rest of monomer 5 is orientated towards the outside of the fibril. The side view is from an observers’ perspective from a neighboring microfibril. Note the molecular segments are relatively straight in the overlap zone but re-organize towards the end of the gap zone especially in the region of the supertwist in the vicinity of the gap zone’s discoidin domain receptor 2 (DDR2) binding site. Figure segments reprinted with permission from .