Understanding the genetic make-up of cells, through a process known as RNA-sequencing, is an incredibly important process that allows researchers to better understand how cells work, grow and react with other cells. When analysing cancer cells, this research can help to identify where cancer starts, understand which treatments may be most effective for particular cells, or determining potential for cancer reoccurrence.

Currently, a process known as ‘Read Trimming’ is undertaken when analysing and mapping gene data through RNA-sequencing. In effect, this trimming removes adapter sequences and low-sequencing-quality bases (or in layman’s terms, some of the outliers of a data set) so that the analysis is focused on a more concentrated collection of gene data. While effective, this trimming does add significant data analysis time to each study and it is also unknown if the removal of the trimmed data impacts on the overall analysis or subsequent results of the remaining data set.

A study led by Prof Wei Shi, Head of our Bioinformatics and Cancer Genomics Laboratory, and Bioinformatician Dr Yang Liao, recently published in NAR Genomics and Bioinformatics Oxford Academic, has found that the read trimming process is not required for effective genome analysis. In fact, Wei and Yang found that by not undertaking read trimming (or only undertaking a ‘soft-clipping’), researchers can effectively analyse complete data sets faster, perform detailed analysis using less computational technology, while importantly providing equivalent or better data accuracy.

Wei said, “We found that adapter sequences can be effectively removed by a read aligner we developed via ’soft-clipping’ and that many low-sequencing-quality bases, which would be removed by read trimming tools, were rescued by the aligner”.

“Being able to conduct this analysis faster means that we also have access to results faster which is incredibly important when looking at scenarios for personalised medicine. Because doctors and their patients need to be able to access results as quickly as possible”, said Wei.

So, what does this mean for the future of RNA-sequencing?

“By sharing these findings, we hope that this will generate significant change to the way RNA-sequencing is performed by us and other researchers”, said Wei.

“We also hope that this will improve our ability to understand the properties and behaviours of different cells which can lead to more in-depth research and analysis of many cells, including cancer cells”.



This study has been possible thanks to the support of: Australian National Health and Medical Research Council, Walter and Eliza Hall Institute Centenary Fellowship sponsored by CSL, Victorian State Government Operational Infrastructure Support.

Wei and Yang also sincerely thank Prof Gordon K Smyth for suggesting this study.


Publication details

NAR Genomics and Bioinformatics – Oxford Academic: https://doi.org/10.1093/nargab/lqaa068


Image credit: Flavio Takemoto from FreeImages