Reset filters

Search publications


By keyword
By department

No publications found.

 

TOUCAN: a framework for fungal biosynthetic gene cluster discovery.

Authors: Almeida HPalys STsang ADiallo AB


Affiliations

1 Departement d'Informatique, UQAM, Montréal, QC, H2X 3Y7, Canada.
2 Centre for Structural and Functional Genomics, Concordia University, Montréal, QC, H4B 1R6, Canada.

Description

TOUCAN: a framework for fungal biosynthetic gene cluster discovery.

NAR Genom Bioinform. 2020 Dec; 2(4):lqaa098

Authors: Almeida H, Palys S, Tsang A, Diallo AB

Abstract

Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars: rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features.

PMID: 33575642 [PubMed]


Links

PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33575642

DOI: 10.1093/nargab/lqaa098