INFLUENCE OF THE DEGREE OF SEQUENCING DATA FILTERING ON THE QUALITY AND COMPLETENESS OF THE DE NOVO TRANSCRIPTOME ASSEMBLY
Abstract and keywords
Abstract (English):
There are many assemblers that have different algorithms to assemble a de novo transcriptome. At the same time, the filtering stage, being one of the key stages, also has several approaches and algorithms. However, to date, there is very little work on the influence of filtration degree on the de novo transcriptome Assembly. In this paper, we analyzed transcripts obtained using two of the most common programs (rnaSPADES and Trinity), and applied various approaches to the stage of filtering readings. Key differences were shown for the two assemblies and parameters were identified that were sensitive to the degree of filtering and the length of input reads. We also proposed an effective filtering algorithm that is two-stage and allows you to save the maximum amount of input data with the necessary quality of all readings after filtering and cropping.

Keywords:
RNA-seq, rnaSPADES, Trinity, de novo transcriptome assembly, read filtering
Text
Publication text (PDF): Read Download
References

1. Marinov G.K. On the design and prospects of direct RNA sequencing. Briefings in functional genomics, 2017, vol. 16, pp. 326-335.

2. Liu L., Song B., Ma J., Song Y., Zhang S.Y., Tang Y., Wu X., Wei Z., Chen K., Su J., Rong R., Lu Z., de Magalhães J.P., Rigden D.J., Zhang L., Zhang S.W., Huang Y., Lei X., Liu H., Meng J. Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics.Computational and structural biotechnology journal, 2020, vol. 18, pp. 1587-1604.

3. Fu M., Su H., Su Z., Yin Z., Jin J., Wang L., Zhang Q., Xu X. Transcriptome analysis of Corynebacterium pseudotuberculosis-infected spleen of dairy goats. Microbial pathogenesis, 2020, vol. 34, pp. 104-120.

4. Seweryn M.T., Pietrzak M., Ma Q. Application of information theoretical approaches to assess diversity and similarity in single-cell transcriptomics.Computational and structural biotechnology journal, 2020, vol. 18, pp. 1830-1837.

5. Tamames J., Cobo-Simón M., Puente-Sánchez F. Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes. BMC genomics, 2019, vol. 20, pp. 960.

6. Hölzer M., Manja M. De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers. GigaScience, 2019, vol. 8, pp. 247-260.

7. Longone P. Percolation of aligned rigid rods on two-dimensional triangular lattices. Physical review. E, 2019, vol. 100, pp. 52-64.

8. Andrews S. FastQC: A Quality Control Tool for High Throughput Sequence Data [Online], 2010. URL: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

9. Chen S. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 2018, vol. 34, pp. 884-890.

10. Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., Chen Z., Mauceli E., Hacohen N., Gnirke A., Rhind N., di Palma F., Birren B.W., Nusbaum C., Lindblad-Toh K., Friedman N., Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol, 2011, vol. 29, pp. 644-702.

11. Bushmanova E., Antipov D., Lapidus A., Prjibelski A.D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. GigaScience, 2019, vol. 8, pp.103-147.

12. Gurevich A., Saveliev V., Vyahhi N., Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 2013, vol. 29(8), pp. 1072-1075.

13. Langmead B., Wilks C., Antonescu V., Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics, 2019, vol. 35, pp. 421-432.

14. Seppey M., Manni M., Zdobnov E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods in Molecular Biology, 2019, vol. 6, pp.19-62.

15. Edgar R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 2010, vol. 26, pp. 2460-2461.


Login or Create
* Forgot password?