CTC genetic heterogeneity, a window into the metastatic process

x-posted to Erica Pratt’s Blog


BCa: Breast Cancer
CTC: Circulating tumor cell. Read about what they are and why they’re important here.
Epithelial cell surface marker: A protein or receptor sticking out of the cell membrane of epithelial cells.
EMT: Epithelial-to-mesenchymal transition, where cells lose biomarkers associated with their organ of origin and become more stem cell-like.
SNR: Signal-to-noise ratio.

Tumor genetic heterogeneity has emerged as an effective biomarker of malignant processes1-4. However, limited access to tissue in solid tumors makes repeated sampling and tracking of tumor mutations infeasible. CTCs can serve as a “liquid biopsy”, allowing researchers to study genetic progression in real time. The paper I’m reviewing today, “Single Cell Profiling of Circulating Tumor Cells: Transcriptional Heterogeneity and Diversity from Breast Cancer Cell Lines” by Powell et al., demonstrates the utility of single-CTC genetic profiling5. The article is Open Access and available on PLoS ONE.

Immunocapture-Based CTC isolation

The technology used in this study is called the MagSweeper, developed by the Jeffrey Lab at Stanford. Magnetic beads were coated with an antibody targeting epithelial cell surface markers. These antibody-coated (i.e. immunomagnetic) beads were mixed into blood samples, resulting in cancer cells covered in beads, as shown in figure D. The blood samples were diluted with saline solution, and cancer cells were extracted using a magnetic source. Captured cells were washed while attached to the magnet, and then released when the applied field was removed, as shown in figure B. Cell gene expression and viability were shown to be unaffected by this capture process.


Cancer cell genome analysis

Once isolated, cancer cells were lysed, and their RNA extracted, to generate cDNA through a process called reverse transcription. The cDNA was then used in microarrays to search for a panel of cancer-associated genes. Microarray analysis requires many more copies of DNA than are generated through reverse transcription; therefore, researchers used a cyclical process called polymerase chain reaction, or PCR, to exponentially increase (aka amplify) the number of DNA copies. PCR was used again to look for specific genes within the DNA strand, increasing the SNR, and enabling accurate detection.

This process is monitored using a fluorescent reporter, which increases proportionally with number of DNA copies made. An example of reporter accumulation (not from this paper) is shown below: on the y-axis is the log of reporter concentration (Rn) and on the x-axis is the number of PCR cycles that have taken place. Once the threshold number of copies has been made, the process can be stopped at what is called the threshold cycle, or CT. Threshold cycle is inversely proportional to SNR, i.e. at high CT, the target gene expression level is indistinguishable from background. Powell et al. selected CT ≥ 35 cycles as their cutoff for a gene being defined as “expressed”.


Single cell assay benchmarking

The single cell profiling assay was tested, to ensure its robustness, by taking genetically-distinct cell lines derived from BCa, mixing them up, and attempting to re-group them based on similarities between single-cell genomes. 7 different breast cancer cell lines, whose genomes were well characterized, were selected; four were derived from BCa metastases, and three from BCa primary tumors. Seven cells were randomly chosen from each cell line (for a total of 49 cells), and were assayed for 87 cancer-associated and reference genes. The selected reference genes were known to be pervasively expressed in cell types of interest. If they could not be measured (i.e. CT was too high), the RNA sample was presumed to be degraded, and was not analyzed. The following reference genes were used in this study:

Reference gene(s) for all cells

UBB encodes regulatory protein Ubiquitin B. Ubiquitin, as the name implies, is ubiquitously expressed in almost all eukaryotic tissues. Ubiquitin binds to proteins, flagging them for eventual degradation within the cell.

ACTB encodes actin, a filament element that is used to maintain cell structural integrity as well as cell motility.

GADPH is considered a “housekeeping gene” involved in key cellular processes, for example DNA repair and apoptosis.

CTCs-specific reference gene(s)

krt encodes keratin, and is an epithelial cell biomarker.

Leukocyte-specific reference gene(s)

CD45 is a leukocyte cell surface marker. Cells positive for CD45 were excluded from the sample analysis.

Individual cells were then grouped using a method called hierarchical clustering. Its rubric is explained nicely by Stephen Borgatti’s “How to Explain Hierarchical Clustering”:

Given a set of N items to be clustered, and an NxN distance (or similarity) matrix, the basic process of Johnson’s (1967) hierarchical clustering is this:

1. Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.

2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.

3. Compute distances (similarities) between the new cluster and each of the old clusters.

4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

The strategy used by Powell et al. is technically divisive clustering, which, in simple terms, means the above rubric is run in reverse. The data were then median-centered and displayed as a dendrogram, a tree diagram commonly used for hierarchical clustering data.

The dendrogram from the cell line analysis is shown below, the colored bars at the top (red, plum, pink, gold, yellow, dark green, bright green) are the 7 clusters that cells were assigned to, and high-low expression is defined as +/- 3 SDs from the median (bell curve in the upper left). 48 out of the total 49 cells were clustered appropriately by cell type, indicating that single cell analyses have sufficient resolution to accurately categorize cell genomes.


Interestingly, cell line clusters were not grouped as metastatic versus non-metastatic, but by the presence or absence of estrogen receptor (ER), an important breast cancer classification tool.

Breast Cancer Patient results

65 blood samples from 50 BCa patients were captured and analyzed using the same procedure I described above. In this case, no more than five randomly chosen CTCs were analyzed from each patient. After eliminating samples with evidence of RNA degradation, 40 samples from 35 patients (14 primary tumor, 21 metastatic) underwent hierarchical clustering.

Instead of seven clusters, there were only two, named Cluster I (21 cells, 13 patients) and Cluster II (84 cells, 30 patients). Interestingly, CTCs from the same patient were not clustered together, demonstrating the extreme heterogeneity both across and within CTC patient populations. Reference gene expression was similar for Clusters I and II; however most cells in Cluster I expressed genes related to metastasis, stem cell phenotypes, EMT, and cell proliferation. All of these are either known, or hypothesized, to contribute to cancer cell invasiveness.


Cancer cell line data and CTC data were then pooled and analyzed using unsupervised hierarchical clustering. Strikingly, only 1 CTC out of 105 was clustered with a BCa cell line. All cancer cells were correctly clustered by cell line, and the remaining 104 CTCs were correctly re-classified into Clusters I or II. This corroborates growing evidence that while cancer cell lines have gross similarities with CTCs, they are morphologically and genetically distinct.

Powell et al. also observed a population of cells whose genomes were positive for both cancer- and leukocyte-specific genes. These unclassified cell populations have been observed by others via immunofluorescent staining and other methods, and almost nothing is known about them. However, increasing evidence indicates they are not an artifact of CTC capture and identification, and merit further study.

Implications for cancer research

Robust single-cell genetic profiling opens the door to a wide range of studies that were previously infeasible. Bottlenecks in increasing CTC isolation device purity (definition) are circumvented, as each cell’s genome can be checked for indicators of cancer origin. Drug therapies could be designed based on an individual patient’s CTC profile. Single cell profiling could even add missing pieces to the phylogenetic tree of cancer evolution. All of these are critical steps in the creation of personalized patient therapies.


1. Tomlins et al. Urine TMPRSS2:ERG fusion transcript stratifies prostate cancer risk in men with elevated serum PSA. Science Translational Medicine 2011; 3(94):94ra72

2. Russnes et al. Genomic architecture characterizes tumor progression paths and fate in breast cancer patients. Science Translational Medicine 2010; 2(38):38ra47

3. Russness et al. Insight into the heterogeneity of breast cancer through next-generation sequencing. Journal of Clinical Investigation 2011; 121(10):3810-3818

4. Navin et al. Tumour evolution inferred by single-cell sequencing. Nature 2011; 472(7341):90-94

5. Powell A.A., Talasaz A.H., Zhang H., Coram M.A., Reddy A., Deng G., Telli M.L., Advani R.H., Carlson R.W. & Mollick J.A. & (2012). Single cell profiling of circulating tumor cells: transcriptional heterogeneity and diversity from breast cancer cell lines., PloS one, PMID: