A Map of World Sequencing Data [Photo by Serratus Project via UBC]

The Serratus Project team, a collaboration of multi-disciplinary scientists based at the University of British Columbia, recently discovered nine never-before-seen coronaviruses and more than 130,000 RNA viruses. 

The team used ground-breaking computer technology to make these open-source findings, in collaboration with Amazon.

An RNA virus is the classification of virus that SARS-CoV-2 belongs to, which is the novel coronavirus of COVID-19. Every virus in the coronavirus family falls under the RNA virus umbrella. 

Artem Babaian, who led the Serratus Project, received his PhD in medical genetics at UBC. He said the most fascinating aspect of this research is that it is not exclusive to coronaviruses. The Serratus Project tracked RNA viruses as a whole. 

“This isn’t just coronaviruses, this was all RNA viruses, so that includes things [such as] measles, influenza and hepatitis viruses,” Babaian said.

The project findings were recently published in the scientific journal, Nature. At UBC, researchers in the department of biology and engineering worked with scientists beyond UBC and Canada to identify these new RNA viruses from Amazon’s central processing units (CPUs). A CPU is a part of a computer that gives and receives instructions for a program to perform. 

Former University of British Columbia (UBC) post-doctoral research fellow Dr. Artem Babaian led an international research team in re-analyzing all public RNA sequencing data to uncover almost ten times more RNA viruses than were previously known, including several new species of coronaviruses in some unexpected places. [Photo provided by UBC]
The viruses that were identified are not limited to humans, they also affect animals and livestock. The researchers were interested in viruses found in animals because they are able to evolve to affect humans, a process similar to how COVID-19 variants mutate. 

“If we understand what will affect the animals around us, then that provides better protection for humans as well,” Babaian said. “SARS-CoV-2 is the best example of this in the recent past as it jumped to humans somehow that started in a bat.”

These discoveries were made possible using an Amazon supercomputer that operates on a planetary scale. Planetary-scale data collection means the samples that were studied were not from one static geographical location. The RNA sequencing data came from all around the world, Babaian said. RNA sequencing is the process scientists use to identify which genes are turned on and off in a sample, providing data that is unique to that specific virus or cell. 

The UBC press release states the research team was able to uncover ten times the amount of RNA viruses than previously known. 

To put into perspective how advanced this technology is, according to UBC’s press release, scientists can sequence blood samples of patients with symptoms of unknown origins. 

Then, scientists can compare the sequence of an unknown virus to the large database of newly discovered RNA viruses. 

If a patient, for example, presents with a viral infection of unknown origin in St. Louis, you can now search through the database in about two minutes and connect that virus to, say, a camel in sub-Saharan Africa sampled in 2012,” according to a quote from Babaian in the press release. 

Babaian said the goal for the research project was to collect all the sequencing data on RNA viruses from around the world. According to the project’s database description, the Serratus Project supercomputer re-analyzed all the publicly available information about the gene sequencing of RNA viruses. This allows the research to be open access, meaning anyone can view the study and scientists can continue adding to it based on their discoveries. 

One of Babaian’s colleagues in this project is Jeff Taylor, a cloud architect who designed the cloud infrastructure. In this context, cloud infrastructure is the hardware and software components that support virtual databases. 

“We were able to take advantage of a lot of the work that Amazon has put into this—given they were able to create a system that is capable of storing enormous amounts of data and allowing people to access it very quickly,” Taylor said.  

According to Taylor, a supercomputer of this magnitude is “nothing but a drop in the bucket” for Amazon. The company has millions of CPUs and runs a huge portion of the internet, he said. 

According to Jeffery Joy, a scientist who researches the evolutionary biology of diverse viruses at UBC and was not involved in the Serratus Project, this project is unique. 

“There are all these large-scale databases of genetic sequencing that are publicly available in the world, but the Serratus Project is unlike the others,” Joy said. “Serratus has every bit of information [on the RNA viruses].”


Featured image provided by Serratus Project via the University of British Columbia.