Covering Disruptive Technology Powering Business in The Digital Age

Home > DTA news > News > Researcher’s Code Could Be Key To Crunching Cancer’s Big Data
Researcher’s Code Could Be Key To Crunching Cancer’s Big Data
August 24, 2016 News

At the most basic level cancer can be defined as the DNA of a normal cell going haywire.

One way that DNA can go haywire is through something called aneuploidy, where one or more of a cell’s 23 chromosomes gets duplicated. It can also commonly happen through structural variations in the sequences within the DNA — think of the double helix model from middle school science then move around the pieces.

Those changes can occur in any of the 3.2 billion base pairs of DNA found in each person. That means the data sets holding genome sequences for cancer cells can be huge.

Usually those changes are looked at separately, but Carnegie Mellon University Associate Professor of Computational Biology Jian Ma is using servers at CMU to crunch data to see how aneuploidy mutations and structural variations relate to each other.

“The advantage, of course, is if you look at them together, you can identify these event more precisely in an unbiased way,” Ma said. “You will be able to tell the timing of these event so you will be able to know if this structure rearrangement happened before chromosome duplication or after chromosome duplication.”

Ma is doing this by looking at breast and cervical cancer genomes that have already been sequenced as part of the Cancer Genome Atlas. The Atlas is a federally-funded project to catalog the genetic mutations responsible for cancer.

“As a computational biologist, our interest is to develop new algorithms, new analytic tools, that can look at these data from different perspective and then you can potentially identify new things that are relevant to the biology of cancer,” said Ma, who is pushing that huge data set through code that he calls “Weaver.”

Adrian Lee, a professor of pharmacology and chemical biology at the University of Pittsburgh, said medicine keeps producing bigger and bigger data sets.

“And so now we are virtually dependent on computational biology to try to decipher what these changes mean,” Lee said. “Say you find a million changes, but only 20 are important. How do you find those 20? It’s kind of a needle in a haystack.”

Cancer evolves over time. Two biopsies taken of the same cancer, from the same patient, a year apart would look different on the DNA level, Lee said.

“We can measure the DNA and then try to build models to predict how it changed over time and then what it would have done in the future,” Lee said. “In that way we can predict how it’s going to change and then what we should target.”

Ma’s code was published in the journal Cell Systems and he said hopes scientists will use it on other data sets. Weaver has already found some duplicated chromosomal regions that are caused by specific structural variations.

Ma said he hopes to improve his algorithms to better understand those evolutions and then apply it to more samples from the Cancer Genome Atlas project. From there it could move into the clinical setting.

“Weaver at the moment is just an algorithm, there is no direct clinical impact yet,” Ma said. “But I think I will be interested in exploring potential opportunities like how to apply Weaver in the broader context.”

Researchers said someday this type of big data crunching could help doctors tailor a specific treatment to a patient’s specific cancer mutation.

This article was originally published on can be viewed in full