Bridging experiments and data to decode the genome
This article is from the Bioengineering: Unlocking mysteries, enabling impact issue of EQuad News magazine.
By John Sullivan
What differences cause a cell to function as a tough muscle strand in the heart as opposed to a gossamer neuron in the brain or an immune cell protective of infections and cancers?
For Yuri Pritykin *14, the answer lies somewhere in the genome, an enormous, winding code whose combination of simple instructions spells out the infinite complexity of life.
Pritykin, an assistant professor of computer science and genomics, is working to better understand the code and to examine ways that alterations in its expression lead to starkly different results. Ultimately, this knowledge could unlock the key to improving health and treating disease.
“The cells in complex organisms are very different,” Pritykin said. “The same genes are encoded in each cell, but the functions are different based on which genes are expressed.”
Currently, his research group’s focus is on immunology, particularly a class of immune cells called T cells that play a critical role in cancer and autoimmune diseases.
“But most of the computational approaches that we apply can be used at least in principle to analyze data in other types of cells as well,” he said.
Pritykin’s team has developed a number of tools widely used by researchers. One, called GuideScan, helps researchers precisely aim the CRISPR enzyme that has revolutionized genetic engineering. The CRISPR enzyme neatly snips DNA strands, allowing researchers to analyze and combine elements of the genome. Although simple in principle, aiming CRISPR is challenging because the genome is 3.2 billion nucleotides long.
Scientists use another molecule called a guide RNA to direct the CRISPR to the target section of the DNA. The scientists aim the guide by adjusting a 20-nucleotide sequence in the guide RNA. But this task is complicated by the tendency of guide RNA to hit close matches along the genome as well as the desired target.
“The design of the guide RNA is an interesting computational problem,” Pritykin said. “We want to hit what we target and not to hit anywhere else.”
By carefully structuring data for fast analysis, GuideScan transformed a laborious, time-consuming process into a simple operation. This is particularly important for experiments involving hundreds or thousands of CRISPR edits.
As a computational biologist, Pritykin’s research straddles the divide between the terminals of data scientists and the labs of traditional biology. His team designs experiments to study gene expression, and also creates algorithms and methods to interpret the massive data sets generated by the experimental work.
“The distinction between computation and wet lab is becoming not as sharp as it once was,” Pritykin said. “Knowing the biology is essential, but knowing the math and computer science is also essential.”