The establishment of a new cell line is a very complex process that is still not well understood. The success rate for the establishment is low and unpredictable for any specimen of origin 
. This statement could seem paradoxical when considering that the stabilization of a cell line starts with a sample of tumors able to grow vigorously in vivo, escaping all cellular mechanisms that are involved in the control of the cell cycle and cell death by apoptosis 
. However, many causes of this difficulty and serendipity for the establishment of a new cell line can be understood by taking into consideration the extreme differences (such as growth factor dependence, the percentage of oxygen, interaction with the stroma and immune cells, etc.) that exist between the in vivo and in vitro microenvironments 
. This issue is witnessed by the impossibility of establishing, for example, a continuous cell line from chronic myeloid leukemia in the chronic phase. This hematological disorder is characterized by a very high rate of proliferation of leukemic cells in vivo, but the same leukemic cells die after a few weeks in vitro 
. Furthermore, regarding the success of continuous growth in vitro, the procedure for the establishment of a new cell line is, in any case, difficult and time consuming, requiring even more than one or two years 
. Nevertheless, since each cell line is derived from the disease from which the patient is suffering from, it offers the opportunity for disclosing pathological features that were otherwise unidentified by conventional clinical diagnostic settings 
and to perform experiments that are not possible to be performed in vivo. The processes of stabilizing and characterizing a new cell line should be performed in agreement with published guidelines. In particular, in 1999, Drexler and Matsuo published the “Guidelines for the characterization and publication of human malignant hematopoietic cell lines” and stressed the importance of confirming the immortality, authenticity, and tissue or cell type of origin for each newly established cell line 
. These guidelines are still valid and they are included in the updated United Kingdom Coordinating Committee on Cancer Research (UKCCCR) guidelines for the use of cancer cell lines in biomedical research published by Geraghty et al. in 2014 
. Indeed, a detailed characterization, the immortality of the culture, proof of neoplasticity, authentication of the true origin of the cells, scientific significance, and availability of the cell line for other investigators are of paramount importance when publishing a new cell line. In this way, under the right conditions and with appropriate controls, properly authenticated cancer cell lines retain most of the properties of the cancer of origin 
and they become helpful model systems for the progress of medical research 
2. Cell Lines in Modern Cancer Research: Toward the “Encyclopedia” of Cell Lines
Human cancer cell lines continue to play a critical role in modern cancer research. Indeed, they are widely used as preclinical model systems for gaining mechanistic and therapeutic insight. Notably, with the advent of -omics technologies 
, recent studies have provided comprehensive databases dedicated to the characterization of most existing cell lines 
. Furthermore, the online availability of the information that was derived from these studies created an important resource for the study of cancer cell lines and facilitated researchers in selecting the most appropriate in vitro model system for their research projects. In this context, it is important to consider a series of significant papers that have been published in less than 10 years.
In 2012 (and for the first time), two independent research groups that were led by Barretina et al. 
and Garnett et al. 
were successful in providing a large-scale genetic and pharmacological characterization of human cancer cell lines. Both of the research groups were able to perform a comprehensive characterization of several hundred cell lines using different high-throughput platforms and analytical methods. Their complimentary results confirmed that many human cell lines capture the genomic diversity of their respective cancers and, consequently, can be used as in vitro model systems of the diseases from which they were derived. In particular, in the case of Barretina et al., a large-scale genomic dataset of 947 human cancer cell lines, together with the pharmacological profiling of 24 compounds across 500 of these cell lines, was established. The resulting collection, which encompassed 36 tumor types, was termed the Cancer Cell Line Encyclopedia (CCLE) and it was made public at the website http://www.broadinstitute.org/ccle
. Following this comprehensive approach, an important preliminary result that was obtained by Barretina et al. revealed the possible association between Schlafen family member 11 (SLFN11) gene expression and sensitivity to topoisomerase inhibitors. In the paper by Garnett et al., by performing a similar integration between genomic and pharmacological data, it was possible to disclose the association between the EWS-FLI1 gene translocation, which is frequently found in Ewing’s sarcoma, and sensitivity to poly (ADP-ribose) polymerase (PARP) inhibitors, which are a class of drugs currently used in clinical trials for other cancer types. Both resources that were provided by Barretina et al. 
and Garnett et al. 
are extremely useful when a novel defect at the DNA level or a difference in gene or protein expression is detected in a specific cancer type. Indeed, by exploring these resources, it is possible to determine whether any of the listed cell lines can be used as preclinical models to gain mechanistic and therapeutic insight, otherwise they carry no practicality in humans. The molecular profiles presented by Barretina et al. and Garnett et al. paved the way for the generation of additional resources dedicated to testing experimental hypotheses for the preclinical setup of personalized cancer medicine protocols 
. In the following years, Iorio et al. reported how cancer-driven alterations (including somatic mutations, copy number alterations, DNA methylation, and gene expression) identified in 11,289 tumors from 29 tissues can be effectively mapped to 1001 molecularly annotated human cancer cell lines 
. The same authors disclosed that most of the oncogenic alterations that were identified in tumor tissues are present in cancer cell lines, which confirms that they can be considered to be effective model systems for studying drug sensitivity/resistance. The genetic map defined by Iorio et al. is available as an online database through the website http://www.cancerRxgene.org
Despite the enthusiasm generated by the aforementioned important works, it is important to consider that cell lines have important limitations, especially due to the differences in terms of gene expression, as compared to in vivo tumor tissues. Specifically, cell lines, when cultured in vitro, do not have interactions with other cell types; additionally, their growth is not under the influence of cytokines and other cell signaling molecules, and the native tissue architecture is lost. Moreover, the effects of in vivo drug distribution and metabolism are not easily matched in vitro 
. All of these considerations indicate that sensitivity and resistance in culture might not reflect the factors that influence a drug’s action in vivo. In this context, it is also important to consider the findings of Sandberg and Ernberg 
regarding the comparison of the NCI60 cell lines with their corresponding tumors and normal tissues. In their study, the authors demonstrated that only 34 of 60 cell lines maintained the tissue-specific upregulation of genes 
. The authors explained their findings while considering that cell lines could be derived from a subtype of the tumor not represented in the tumor biopsy; otherwise, these cell lines have lost the differentiated phenotype of their tumor of origin or that the tumor, from which the cell line was derived, arose from a progenitor cell that lacked the gene expression that is associated with differentiated cells from that tissue. Furthermore, it cannot be excluded that the original classification might not be correct due to metastasis or cultivation problems. More recently, in 2017, Jin et al. 
applied RNAseq technology and compared the matched tumor and cell line pairs that were derived from synovial sarcoma (SS). In their paper, the authors compared three tumor/cell line pairs from a genetically engineered mouse model of SS as well as 2 pairs from human SS tumors. The results of this comparison highlighted the considerable variation in gene expression profiles and the enrichment of microenvironment modification-related genes among those differentially expressed across all examined tumor to cell line comparisons. The findings of Sandberg and Ernberg and Jin et al. 
highlight the difficulties in defining what constitutes the most appropriate preclinical model system for cancer study and drug discovery.
Klijn and colleagues 
improved our knowledge of gene expression in cancer cell lines by performing a comprehensive transcriptional portrait while using the RNAseq approach. The authors cataloged coding and noncoding RNA expression, mutations, the expression of viral sequences, and DNA copy number changes in 675 cell lines. Notably, while using this approach, the authors determined that 1435 of 2200 fusion genes were detected for the first time and it could be further investigated while using already available cell lines. In addition, by combining gene copy number data, expression data, mutation status, and gene fusion information, the authors predicted the response to clinical compounds including MAPK/ERK kinase (MEK), phosphoinositide-3-kinase C (PI3K), and fibroblast growth factor receptor (FGFR) inhibitors in many cell lines. In this way, the authors confirmed that the data that were derived from the study of human cell lines by the application of genomic and transcriptomic technologies are critical for expediting the development of effective personalized medicine protocols 
In parallel to advancements in the knowledge of genetics and transcriptomics of cancer cell lines, The Cancer Genome Atlas (TCGA) database 
revealed the great molecular diversity among tumors across and within cancer types. Therefore, understanding the functional consequence of this diversity on the treatment response has become a central task for a number of research laboratories worldwide. Consequently, it is essential to characterize the comprehensive molecular profiles of a large number of human cancer cell lines to capture the diversity that was observed in patient tumors and to elucidate the complex relationships between molecular aberrations, cancer phenotypes, and the therapeutic response 
. In this context, using the same reverse-phase protein array (RPPA) platform that was employed for the TCGA, Jun Li et al. 
generated a comprehensive cell line protein expression dataset for 651 independent cell lines. This study added information on protein expression, including total and post translationally modified proteins, which are arguably the most crucial molecules in the cell and, importantly, the targets of most drugs. Together with the aforementioned works that systematically characterized cancer cell lines at the DNA and RNA levels, as well as drug responses, the study by Jun Li et al. provided an additional rich resource (https://tcpaportal.org/mclp/#/
) for the research community to investigate tumor behaviors in a quantitative and efficient way and to compare the differences in protein expression across cancer cell lines and to in vivo tumor tissues.
The generation of these extensive datasets highlighted the need for functional assays to identify novel targetable genes. In this sense, it is important to consider the work by Tsherniak A et al. 
that is dedicated to the publication of results of genome-scale RNA interference (RNAi)-based loss-of-function screens (Project Achilles, https://depmap.org/portal/achilles/
) to identify critical gene functions in 501 cancer cell lines. The authors identified genes whose expression is required for the proliferation or survival of subsets of these cell lines and developed an approach to identify the features that predict these gene dependencies. This cancer dependency map provides an innovative approach for defining and predicting genes that are essential for cell viability, which thereby facilitates the identification of cancer targets.
More recently, in 2019, data from two research papers added value to knowledge regarding the biology of the cancer cell lines that were included in the CCLE. The first study was published by Li et al. and focused on the metabolic diversity of 928 cancer cell lines derived from 20 cancer types 
. The authors profiled 225 metabolites by means of liquid chromatography-mass spectrometry (LC-MS). The authors generated a resource (available at the CCLE portal, https://portals.broadinstitute.org/ccle
), where unbiased association analysis can be performed by linking the cancer metabolome to genetic alterations, epigenetic features, and gene dependencies 
. Overall, the authors proved that distinct metabolic phenotypes exist in cancer cell lines and that such phenotypes have direct implications for therapeutics targeting metabolism. The second study by Ghandi et al. expanded the characterization of the cell lines that were encompassed in the CCLE by including data on gene mutations, RNA splicing, DNA methylation, histone H3 modification, microRNA expression, and the RPPA 
. Moreover, these data have been integrated with functional characterizations, such as drug sensitivity, short hairpin RNA knockdown, and clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 knockout data. This comprehensive approach will be extremely useful in revealing potential targets for cancer drugs and associated biomarkers.
Finally, wet laboratory researchers should have a friendly interface to explore the data on each cell line; (e.g., the Cell Model Passports (https://cellmodelpassports.sanger.ac.uk/
) interface developed at the Sanger Institute (UK)) due to the complexity of the newly derived datasets 
. This resource is a valuable tool that enables access to genomic and phenotypic datasets that were derived from cancer cell models, empowering diverse research applications. Table 1
displays an updated list of some of the existing online resources, where it is possible to have a comprehensive genetic, transcriptomic, and proteomic map for exploring most of the cell lines currently available. These resources will help researchers to determine cancer-sustaining molecular mechanisms with unprecedented depth, rigor, and speed 
. In this way, cancer cell lines will continue to be essential for current research strategies; however, their proper use following published guidelines is mandatory.
Table 1. List of online resources with comprehensive genomic, transcriptomic, and proteomic datasets derived from cancer cell lines.
|Cancer Cell Line Encyclopedia
||The Cancer Cell Line Encyclopedia (CCLE) database was conceived to conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models (approximately 110 models). Gene expression, mutation, methylation, RNAseq and metabolomics data are downloadable.
|Genomics of Drug Sensitivity in Cancer
||This project aims at screening >1000 genetically characterized human cancer cell lines with a wide range of anticancer therapeutics. The sensitivity patterns of the cell lines are correlated with extensive genomic data to identify genetic features that are predictive of sensitivity.
|MD Anderson Cell Lines Project
||The MD Anderson Cell Lines Project depicts the expression levels of approximately 230 key cancer-related proteins in 650 independent cell lines. This bioinformatic resource is a comprehensive resource for accessing, visualizing, and analyzing functional proteomics of cancer cell lines.
||Project Achilles systematically identifies and catalogs gene essentiality across hundreds of genomically characterized cancer cell lines. For each cell line, a list of genes able to alter cell survival is reported as a result of RNAi and/or CRISPR-Cas9 genetic silencing or knockout of the individual gene. Additionally, these results are linked to the genetic or molecular features of the tumors to provide a “cancer dependency map”.
|Cell Model Passports
||This resource provides large-scale genomic datasets for approximately 1200 cancer cell line and organoid models cataloged. For each model system, it is possible to display associated somatic nucleotide variants, gene expression, copy number variations or methylation data. Its accessibility format is also useful for noncomputational, wet laboratory scientists.