AWS India, introduces Centre for Cellular and Molecular Biology focusing to foster Genomics Research in India
Amazon Web Services (AWS) India Private Limited announced today that the Centre for Cellular and Molecular Biology (CCMB), a premier research organisation focusing on modern molecular biology and population-scale genomics, has selected AWS as its preferred cloud provider to accelerate genomics research projects. One of CCMB’s focus areas, under the guidance of the Council of Scientific and Industrial Research (CSIR), is the study of genetic material, how it differs throughout populations, and how this variation leads to disparities in human health and disease.
Large volumes of data created by next-generation high-throughput sequencers must be accessed, stored, and analysed by life sciences and genomics research companies. Previously, many businesses relied on on-premises servers to meet their storage and computation requirements. Because genomics research is data-intensive, CCMB needed to acquire extra on-premises storage on a regular basis to manage petabyte scale datasets and store raw data as well as the subsequent output files generated from secondary and tertiary analysis.
In addition, CCMB was depending on on-premises high-performance computing (HPC) clusters to do this analysis, which was prone to outages, affecting research timelines and output. Because on-premises servers posed scalability and performance difficulties, CCMB resorted to cloud computing to effortlessly scale up its data storage and analysis needs.
“At a time when genetics research is becoming critical for life sciences advancement, disease diagnosis, and drug development, we must innovate using technologies like cloud computing to achieve outcomes faster and better,” said Dr. Divya Tej Sowpati, genomics scientist at the CSIR CCMB. “Leveraging AWS, we have been able to speed up sample analysis and achieve more consistent results on genomics research. We are also able to tap the high GPU instances on-demand on AWS to analyse large-scale data sets now, widening our scope of investigation, enhancing our ability to collaborate, and enabling us to focus on the hard research problems at hand such as studying genetic variations and their impact on diseases” he added.
CCMB used AWS Snowball to transfer 83 terabytes of genomics data from on-premises servers to AWS. AWS Snowball is an offline data transport service that uses secure devices to transmit huge volumes of data into and out of the AWS Cloud without accessing the internet. The company subsequently moved its genomic analysis toolkit and bioinformatics data pipelines for secondary analysis to Amazon Genomics CLI, an open source solution that allows genomics companies to analyse raw genomics and biological data. CCMB also successfully accessed various genomics databases via the Registry of Open Data on AWS (RODA) without having to download them locally for processing, saving months of data download time and gaining access to established sources of truth.
CCMB ran on AWS and genotyped 3,200 samples from the 1000 Genomes Project, an international research endeavour to create a detailed database of human genetic variation. CCMB was able to minimise the time required for research analysis by up to 98% by using services such as Amazon Aurora, Amazon Elastic Compute Cloud (Amazon EC2), EC2 Auto Scaling, Amazon Simple Storage Service (Amazon S3), and AWS Batch.
CCMB has also begun examining breast cancer samples in order to find molecular fingerprints of triple negative breast cancers in the Indian population. CCMB reduced the time required for analysis each sample by 50 to 70% by using CPU and GPU-accelerated computing on AWS Cloud.
CCMB also used AWS GPU instances to train and test machine learning (ML) neural network models on long-read data sequenced with Oxford Nanopore sequencers to detect DNA modifications associated with diseases such as cancer, neurodegenerative disorders, and cardiovascular disease. It achieved greater than 91% accuracy and decreased the time required to train these models from several days on their on-premise systems to three to four hours per dataset on AWS.
“Understanding the genomic variation in India’s population is a government priority towards developing precision healthcare and diagnostics, and delivering them at affordable costs. However, genomics research is data intensive, and the increasing volume and velocity of genomics data is a challenge for research institutions in managing both infrastructure and costs,” said Pankaj Gupta, Leader – Public Sector (Government, Education, Healthcare), AWS India Private Limited. “Finding greater computing efficiency at scale is well addressed by cloud computing, but more crucially, it can accelerate genomics research, enabling researchers to translate insights faster to enable drug development and drive better health care treatments. AWS is excited to work with CCMB to accelerate the translation of raw sequencing data into actionable insights through our scalable, powerful, and secure services,”he added.