Reset filters

Search publications


By keyword
By department

No publications found.

 

Towards a better understanding of deep convolutional neural network processes for recognizing organic chemicals of environmental concern

Authors: Sun XZhang XWang LLi YMuir DCGZeng EY


Affiliations

1 Guangdong Key Laboratory of Environmental Pollution and Health, School of Environment, Jinan University, Guangzhou 511443, China.
2 Department of Chemistry and Biochemistry, Concordia University, Montreal, Quebec H4B 1R6, Canada.
3 Guangdong Key Laboratory of Environmental Pollution and Health, School of Environment, Jinan University, Guangzhou 511443, China; Environment and Climate Change Canada, Aquatic Contaminants Research Division, 867 Lakeshore Road, Burlington, Ontario L7S 1A1, Canada.
4 Guangdong Key Laboratory of Environmental Pollution and Health, School of Environment, Jinan University, Guangzhou 511443, China. Electronic address: eddyzeng@jnu.edu.cn.

Description

Deep convolutional neural network (DCNN) has proved to be a promising tool for identifying organic chemicals of environmental concern. However, the uncertainty associated with DCNN predictions remains to be quantified. The training process contains many random configurations, including dataset segmentation, input sequences, and initial weight, etc. Moreover, the DCNN working mechanism is non-linear and opaque. To increase confidence to use this novel approach, persistent, bioaccumulative, and toxic substances (PBTs) were utilized as representative chemicals of environmental concern to estimate the prediction uncertainty under five distinguished datasets and ten different molecular descriptor (MD) arrangements with 111,852 chemicals and 2424 available MDs. An internal correlation coefficient test indicated that the prediction confidence reached 0.98 when a mean of 50 DCNNs' predictions was used instead of a sing DCNN prediction. A threshold for PBT categorization was determined by considering costs between false-negative and false-positive predictions. As revealed by the guided backpropagation-class activation mapping (GBP-CAM) saliency images, only 12% of all selected MDs were activated by DCNN and influenced decision-making process. However, the activated MDs not only varied among chemical classes but also shifted with different DCNNs. Principal component analysis indicated that 2424 MDs could transform into 370 orthogonal variables. Both results suggest that redundancy exists among selected MDs. Yet, DCNN was found to adapt to redundant data by focusing on the most important information for better prediction performance.


Keywords: Gradient-weighted class activation mappingGuided backpropagationOrganic contaminantsPrediction uncertaintyRedundancy


Links

PubMed: https://pubmed.ncbi.nlm.nih.gov/34388923/

DOI: 10.1016/j.jhazmat.2021.126746