Jakub Bartoszewicz [Picture: Kay Herschelmann]

DeePaC-Live: Predicting pathogenic potentials of short DNA reads with reverse-complement deep neural networks.
22. March 2021 | 04–05 pm CET
Jakub Bartoszewicz, Hasso-Plattner-Institut, Germany

Viruses evolve quickly and may emerge rapidly. Next-generation sequencing is the state-of-the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. We show that deep neural architectures can accurately predict if raw, unassembled sequencing reads originate from novel, human-infecting agents, cutting the error rates in half compared to alternative approaches and generalizing to taxonomic units distant from those presented during training. To gain insight in the inner workings of the trained models, we visualize the learned features and the contributions of individual nucleotides to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect virulence-related genes in novel agents. As analyzing the samples as the sequencer is running can greatly reduce the turnaround time, we extend the approach to classify incomplete Illumina and Nanopore reads in real-time. The resulting models show strongly improved performance compared to existing real-time mapping approaches for both sequencing technologies.