Wednesday, 23rd March: Satellite Meeting on SARS-CoV-2
Morning Session
Here, we present covSonar, an efficient, database-driven system for automatic acquisition of genomic profiles, making them easily accessible, searchable and linked to selected metadata. As a proof-of-concept, we show that the system is capable of rapidly tracking any genomic mutation pattern of SARS-CoV-2 and especially those that provide protection against neutralizing antibodies based on in vitro data (Greaney et al., PMID: 33259788). We integrate several publicly available data resources that are essential for effective genomic surveillance, such as the German electronic sequence data hub (DESH), which includes both random and targeted sequences, and deep mutational scanning data. Integrating this information, we show a positive trend of genomic sequences from randomly drawn samples from Germany mainly carrying Spike protein mutations, which confer protection against class 2 monoclonal antibodies in vitro. Contrary, we did not detect alarming trends for mutations facilitating escape from polyclonal antibodies obtained from patient sera after Moderna vaccination. However, as the human population becomes more and more immunized, the selection pressure on the virus also changes, which could favor more advantageous mutations. Thus, an efficient and sensitive surveillance system for critical mutations of SARS-CoV-2, such as covSonar, is crucial to enable targeted risk assessment.
A striking aspect of VOCs is that many of them involve an unusually large number of defining mutations. Current phylogenetic estimates of the substitution rate of SARS-CoV-2 suggest that its genome accrues around 2 mutations per month. However, VOCs can have 15 or more defining mutations and it is hypothesised that they emerged over the course of a few months, implying that they must have evolved faster for a period of time.
In this talk I will present detailed molecular clock analyses of genome sequence data from the GISAID database to assess whether the emergence of VOCs can be attributed to changes in the substitution rate of the virus.
Our results indicate that the emergence of VOCs is driven by an episodic increase in the substitution rate of around 4-fold the background phylogenetic rate estimate that may have lasted several weeks or months. This outcome stands in contrast with the notion that the virus has overall increased its mutation rate. In sum, this study underscores the importance of monitoring the molecular evolution of the virus as a means of understanding the circumstances under which VOCs may emerge.
Based on the extent of convergent evolution observed in SARS-CoV-2, one can devise a system that could generalize from previous examples to rank and identify potential concerning samples based on their amino acid (AA) profile. VOCAL, the Variant Of Concern ALert system, starts from full genome sequences and categorizes the AA changes appearing in the spike protein depending on the type of mutation present and their overlap with known antibody binding sites, epitope regions and sites under positive selection. VOCAL then detects the potentially concerning samples and ranks them according to three tiers of alert level.
We evaluated VOCAL retrospectively by considering all German sequences during two scenarios of emerging VOCs in 2021: the Delta variant (April) and the recent Omicron (December). All of the VOC samples were correctly detected as high concern (Delta: 30/30 (100%), Omicron: 3372/3446 (97%)). For Delta, an additional set of 21 samples was detected, mainly assigned to lineages B.1.617.1 and B.1.617.3 (which have also been reported as concerning). In summary, VOCAL is a specialized tool for the early detection of potentially concerning variants from large collections of SARS-CoV-2 genomes
Afternoon Session
Therefore, we present our recently designed impuSARS tool, which allows the imputation of SARS-CoV-2 genomes by taking advantage of a reference panel with an enormous number of SARS-CoV-2 sequences (over 230k sequences). impuSARS have been developed to be freely distributed, being encapsulated with Docker or available under a conda environment in Python. Consequently, the application can be integrated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing. Also, this tool can be adapted to impute any other viral genome by customizing its own reference panel.
ImpuSARS has been validated under several simulated conditions of missing regions (continuous fragments, whole amplicons or sparse single positions) as well as real sequencing samples with low covered genomes. Results showed high accuracy when predicting the original sequences, being able to recover lineages with a 100% precision for almost all the lineage, even with poorly covered genomes (less than 20%). Hence, impuSARS has been proved to accurately recover many incomplete or low-quality sequences that would be otherwise discarded.
Here, we present the results of a national-scale SARS-CoV-2 WBE program in Austria, which aggregated 2,093 samples from December 2020 to September 2021 collected at 95 sewage plants covering over 57% of the total population. Using our Variant Quantification in Sewage pipeline designed for Robustness (vaquero), we reproducible deduced SARS-CoV-2 variant abundance from complex wastewater samples. This was validated by the epidemiological integration of over 130.000 individual variant-genotyped cases of the respective catchment areas. Compared to conventional epidemiological surveillance data, our WBE analyses accurately recapitulated the emergence of the dominant Alpha and Delta variants across the country and delineated large regional clusters of other variants of concern. Finally, we provide a framework to infer variant-specific reproduction numbers from wastewater and predict emerging variants de novo.
Our study demonstrates the power of national-scale WBE. Such non-invasive surveillance programs are likely to play increasingly important roles for tracking the dynamics of new SARS-CoV-2 variants and may proof particularly useful for pandemic management in countries without dense individual surveillance programs.
To quickly identify putative outbreaks and transmission clusters, we developed BREAKFAST, a tool for rapid sequence clustering in the specific context of SARS-CoV-2, and applied it to German and international sequences. Our approach, which derives transmission clusters from SNP occurrences, is motivated by the low mutation rate of SARS-CoV-2. Here, the pairwise genetic distance between multiple sequences is computed via a constructed sparse matrix of alignment-based genomic profiles.
Using pre-computed mutation profiles, we clustered 114,042 sequences in 1.5 minutes using 80 cores and a peak of 1.45 GB of RAM. Its efficiency and intuitive parameters make BREAKFAST suitable for monitoring fast-growing clusters and analyzing potential outbreaks on a daily basis. Computationally intensive phylogenetic tools can be applied to a smaller set of sequences of interest based on the clustering results. To verify the performance of our method, we compared BREAKFAST with recently developed tools for clustering DNA sequences on different viral datasets.
These results demonstrate that targeted methods, which leverage a pathogen’s specific properties, can be used in conjunction with large datasets to provide key insights into the ongoing COVID-19 pandemic. Our approach was applied to add individuals to already known outbreaks, and trigger follow-up epidemiological investigations of transmission clusters.
BREAKFAST is freely available via github.
References
1 Campbell et al., 29029156 (PMID)
2 Andre et al., 17018825
3 Harper et al., 33626040
Thursday, 24th March
Session 1: Viral emergence and surveillance
As the pandemic progresses, heterogeneity in immune history, through infections, vaccinations, and boosters, also means increasing heterogeneity in how ‘concerning’ a VoC may be: the impact of Omicron varied widely across countries. In turn, future variants on the ‘road to endemicity’ may pose different risks to different populations.
Though it’s impossible to predict what future variants may mean for how much SARS-CoV-2 continues to impact society, the return of pathogens that were suppressed during the restrictions of 2020 and early 2021 are a reminder of the common disparity in data and understanding between SARS-CoV-2 and the world of viruses we live in. How do we pivot our real-time test of the role that sequencing, modelling, and immunity panels can play in public health to a sustainable real-life integration of research and healthcare for a better understanding of human viruses overall?
Session 2: Virus-host interactions
A key virulence factor of phleboviruses is the non-structural protein NSs, an inhibitor of the antiviral type I interferon (IFN) system. Our group has identified the mechanisms by which the NSs proteins of both RVFV and SFSV (i) inhibit the transactivation of the IFN genes and (ii) abrogate the antiviral protein kinase R (PKR) pathway. For RVFV, the NSs was found to recruit several E3 ubiquitin ligases of the F-Box type in order to destroy the general host cell transcription factor TF-IIH as well as PKR, an antiviral mRNA translation inhibitor. For SFSV, by contrast, the NSs is occluding the DNA-binding domain of the IFN transcription factor IRF-3 to inhibit IFN induction, and NSs also binds and reprograms the translation initiation factor eIF2B to immunize the ribosomal machinery against PKR signaling.
Thus, our investigations have shown two surprisingly different IFN escape strategies by these related phleboviruses. While the highly virulent RVFV destroys key host factors of innate immunity, the more benign SFSV only sequesters them.
Session 3: Viral Sequence analysis
Following an introduction to the origins of the theoretical and experimental quasispecies concepts, the presentation will describe results with hepatitis C virus (HCV) on how viral fitness can influence resistance to antiviral agents, and on new procedures to visualize HCV diversification based on deep sequencing data. Also, current views on differences between mutant spectra of SARS-CoV-2 and those of other RNA viruses will be presented. Finally, prospects for future developments in the quasispecies field will be outlined.
NCBI and many other general databases do not reliably check whether all uploaded data are correct. Most new entries in these databases are compared by sequence similarity to existing ones, and the mistakes in the databases can cascade. Large-scale, downstream, and evolutionary analysis are hardly possible. Even with much effort and time, filtering true from false entries is not always possible. Good scientific research using these public virus genome databases is further complicated when the metadata or sequences are only partially correct. Especially if one extrapolates the growth of viral data [Paez-Espino et al., 2016; Roux et al., 2018].
To prevent the problem of false-positive sequences in the databases, we propose a guideline for uploading sequences. Here, we present the most common mistakes made in NCBI and other databases and present some main steps that should be followed during uploading virus sequences. We further provide examples of how database entries not following these steps can lead to a false conclusion and even jeopardize complete studies.
We propose the usage of alignments and quality checks (ideally done by the database) to predict whether the entire sequence is correct. Such alignments should be built with other known viruses of the same taxon. Additionally, we tackle the problem of legal issues related to virus databases. We envision a future database containing an easy-to-use interface, quality check, a private workspace, and tools for assembly, alignment, and phylogeny analysis with SOPs in the field.
Here we propose a fast and scaleable method to cluster all sequences of IAV with high dimensional k-mer vectors and a novel clustering approach combining hierarchical and density-based methods. Overall, we clustered over 400,000 different IAV sequences without computationally expensive all-versus-all pairwise comparisons. Our genotype-based classification highly agrees with the serological classification for segment 4 and 6, established by the community with even higher resolution. We further extend our classification to all other segments of IAV.
For each IAV strain in the data set, we connect the clusters into which its respective segments have been grouped into. Highly-connected clusters, therefore, describe sequence combinations that are often observed in viable viruses and potential reassortment candidates. By only considering our reassortment candidates, we decrease the complexity of possible segment to segment interactions by a considerable amount. Extensive analysis of the resulting interaction-patterns might then lead to novel insights into the reassortment process.
Friday, 25th March
Session 4: Virus identification and annotation
Here, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score over 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2’s modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity.
With multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in various ecosystems. Source code of VirSorter2 is freely available, and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse. To best serve the research community, we maintain a “live protocol” (dx.doi.org/10.17504/protocols.io.bwm5pc86) for using VirSorter2 for virus sequence identification, including curating less well-studied viruses and mobile genetic elements, and establishing bona fide virus-encoded auxiliary metabolic genes.
Session 5: Phages
Here, I will discuss the recent advances in taxonomy and their implications for microbiome studies and show that ultimately, taxonomy is the language that allows us to understand each other’s research.