This is a collection of useful tools in Virus Bioinformatics. Please note, that EVBC is not maintaining these tools.

Tools by EVBC members are marked ★.

Don’t hesitate to contact us if you want a tool to be added. We are also happy, to receive feedback on the tools!

★ eggNOG 5.0
eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations, including 2502 viral proteomes. eggNOG offers online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets.

EpiFlu™ is the world’s most complete collection of genetic sequence data of influenza viruses and related clinical and epidemiological data. EpiFlu™ is tailored to the needs of influenza researchers from both the human and the veterinary fields. The data is publicly accessible but not Public Domain (GISAID does not remove nor waive any preexisting rights).

Hepatitis C Virus (HCV) Database Project
The HCV database group strives to present HCV-associated, hand-annotated genetic data in a userfriendly way, by providing access to the central database via web-accessible search interfaces and supplying a number of analysis tools.

ICTV Taxonomy
The International Committee on Taxonomy of Viruses (ICTV) authorises and organises the taxonomic classification of and the nomenclatures for viruses. The ICTV has developed an universal taxonomic-scheme for viruses.
★ ViPR | Virus Pathogen Resource
The ViPR database integrates various types of data for multiple virus families. You can search the comprehensive database for sequences & strains, immune epitopes, 3D protein structures, host factor data, antiviral drugs, plasmid data. Further you can analyze the data online using sequence alignment, phylogenetic tree reconstruction, sequence variation (SNP), metadata-driven comparative analysis and BLAST.

★ ViPR Hepatitis C Virus Database
The hepatitis C virus (HCV) portal of ViPR facilitates basic research and development of diagnostics and therapeutics for HCV, by providing a comprehensive collection of HCV-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, and personal Workbench spaces for data storage and sharing.

★ ViralZone
ViralZone is a web-resource from the Swiss Institute of Bioinformatics for all viral genus and families, providing general molecular and epidemiological information, along with virion and genome figures. Each virus or family page gives an easy access to UniProtKB/Swiss-Prot viral protein entries. ViralZone project is handled by the virus program of SwissProt group.

★ Viruses.STRING | Virus-Host Protein-Protein Interaction Database
Viruses.STRING is a protein–protein interaction database specifically catering to virus–virus and virus–host interactions. This database combines evidence from experimental and text-mining channels to provide combined probabilities for interactions between viral and host proteins. The database contains 177,425 interactions between 239 viruses and 319 hosts. The database is publicly available and the interaction data can also be accessed through the latest version of the Cytoscape STRING app.

Virus Variation Resource (NCBI)
Virus Variation Resource (VVR) is a web retrieval interfaces, analysis and visualization tools for virus sequence datasets.
De novo assembly
AV454 | Assemble Viral 454
AV454 is a de novo consensus assembler designed for small and non-repetitive genomes sequenced at high depth. It was specifically designed to assemble read data generated from a mixed population of viral genomes. Reads need not be paired, and it is assumed that no sequence repeat in the genome would be large enough to fully contain an average read.

★ Genome Detective
Genome Detective is an easy to use web-based software application that assembles the genomes of viruses quickly and accurately. The application uses a novel alignment method that constructs genomes by reference-based linking of de novo contigs by combining amino-acids and nucleotide scores.

SPAdes is a tool for assembling genomes and mini-metagenomes from highly chimeric reads.

V-FAT | Viral Finishing and Annotation Toolkit
V-FAT is a tool to perform automated computational finishing and annotation of de novo viral assemblies. V-FAT uses reference and read data to order and merge contigs, correct frameshifts, and produce NCBI-ready annotation files. It also performs a set of quality assurance measurements including coverage computation by gene or amplicon and identification of potential consensus errors.
VICUNA is a de novo assembly tool targeting highly diverse viral populations. It creates a single linear representation of the mixed population on which intra-host variants can be mapped. After initial assembly, it can also use existing genomes to perform guided merging of contigs. VICUNA efficiently handles ultra-deep sequence data with tens of thousands fold coverage.

★ VrAP | Viral Assembly Pipeline
VrAP is based on the genome assembler SPAdes combined with an additional read correction and several filter steps. The pipeline classifies the contigs to distinguish host from viral sequences. VrAp can identify viruses without any sequence homology to known references.
Sequencing and annotation
★ AUGUSTUS | Multi-Genome Annotation

The comparative gene prediction algorithm of AUGUSTUS performs a multi-genome annotation to increase the accuracy and consistency of the predicted exon-intron structures of the protein-coding genes by simultaneously predicting the genes in all input genomes.

★ Base-By-Base | Comparative Tools for Large Virus Genomes

Base-By-Base is a comprehensive tool for the creation and editing of multiple sequence alignments. It can be used with gene and protein sequences as well as with large viral genomes, which themselves can contain gene annotations. New features: (1) “consensus-degenerate hybrid oligonucleotide primers” (CODEHOP), a popular tool for the design of degenerate primers from a multiple sequence alignment of proteins; and (2) the ability to perform fuzzy searches within the columns of sequence data in multiple sequence alignments to determine the distribution of sequence variants among the sequences.

★ DIGS | Database-Integrated Genome Screening
Exploring genomes heuristically using sequence similarity search tools and a relational database.

Systematically screening of genomic ‘dark matter’ to recover useful biological information using sequence similarity search tools and a relational database. DIGS can be used to systematically search for sequences of interest, and to support investigations of their distribution, diversity and evolution. One example is the screening for endogenous viral elements (EVEs) in mammalian genomes.

GLUE is a data-centric bioinformatics environment for virus sequence data, with a focus on variation, evolution and sequence interpretation. It is a protocol for working with mutiple sequence alignments (MSAs) and for generating standardized reports using MSAs and data. GLUE also provides tools for managing MSAs and data, and can be used in combination with the MySQL relational database management system (RDBMS) to create boutique sequence databases.

★ PoSeiDon | Positive Selection Detection and recombination analysis of protein-coding genes
Attention: Currently the website hosting PoSeiDon is under maintenance. The web service will be back soon with full functionality.
PoSeiDon is a pipeline to detect significant positively selected sites and possible recombination events in an alignment of multiple coding sequences. Sites that undergo positive selection can give you insights in the evolutionary history of your sequences, for example showing you important mutation hot spots, accumulated as results of virus-host arms races during evolution.
PriSM is a set of algorithms designed specifically to create degenerate primers for the amplification and sequencing of short viral genomes while maintaining sample population diversity. PriSM allows for rapid in silico optimization of primers for downstream applications such as sequencing.

Tanoti is a BLAST guided reference based short read aligner. It is developed for maximising alignment in highly variable next generation sequence data sets (Illumina).
★ VIpower | Estimating power of Viral Integration
VIpower is a simulation-based tool for estimating power of viral integration detection via high-throughput sequencing. It was designed to model the process of detecting viral integrations from the human genome.

ViraMiner | Identifying viral genomes in human samples
ViraMiner is a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. ViraMiner is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples.

★ VIRULIGN | Fast codon-correct alignment and annotation of viral genomes
VIRULIGN is built for fast codon-correct alignments of large datasets, with standardized and formalized genome annotation and various alignment export formats.

Secondary structure prediction
★ LRIscan | Long Range Interaction scan
LRIscan is a tool for the prediction of long-range interactions in full viral genomes based on a multiple genome alignment. LRIscan is able to find interactions spanning thousands of nucleotides.

★ RNAalifold
RNAalifold is a tool for calculating secondary structures for a set of aligned RNAs. It is part of the Vienna RNA Package.

★ SilentMutations (SIM) | Analyzing long-range RNA–RNA interactions in viral genomes
SilentMutations (SIM) is an easy-to-use tool to analyze the effect of multiple point mutations on the secondary structures of two interacting viral RNAs. The tool can simulate disruptive and compensatory mutants of two interacting single-stranded RNAs. This allows a fast and accurate assessment of key regions potentially involved in functional long-range RNA–RNA interactions and will eventually help virologists and RNA-experts to design appropriate experiments.

Virus genotyping and diagnosis

Find a list of virus genotyping tools here.

★ ArboTyping | Identification of Dengue, Zika and Chikungunya virus species and genotypes
ArboTyping is an easy-to-use software to classify virus sequences with respect to their species and sub-species (i.e. serotype and/or genotype). The method was validated on a large dataset assessing the classification performance with respect to whole-genome sequences and partial-genome sequences. It allows the high-throughput classification of these virus species and genotypes in seconds.

ATHLATES | Accurate Typing of Human Leukocyte Antigen Through Exome Sequencing
ATHLATES is a software for determining human leukocyte antigen genotypes for individuals from Illumina exome sequencing data.

★ DisCVR | Rapid viral diagnosis from HTS data
DisCVR is a Diagnostic tool for detecting known human viruses in clinical samples from High-Throughput Sequencing (HTS) data and validate the results interactively on computers with limited resources.

★ ERVcaller | Identifying polymorphic endogenous retrovirus and other transposable element insertions
ERVcaller is a tool to detect and genotype transposable element insertions, including ERVs, in the human genome.

★ geno2pheno | Genotypic interpretation system for identifying viral drug resistance using NGS data

geno2pheno[ngs-freq] is a web service for rapidly identifying drug resistance in HIV-1 and HCV samples by relying on frequency files that provide the read counts of nucleotides or codons along a viral genome. geno2pheno[ngs-freq] can assist clinical decision making by enabling users to explore resistance in viral populations with different abundances.

★ Purple | Computational Workflow for Strategic Selection of Peptides for Viral Diagnostics Using MS-Based Targeted Proteomics

To detect viral pathogens in time-critical scenarios, accurate and fast diagnostic assays are required. Such assays can now be established using mass spectrometry-based targeted proteomics, by which viral proteins can be rapidly detected from complex samples down to the strain-level with high sensitivity and reproducibility. Purple is a software tool for selecting target-specific peptide candidates directly from given proteome sequence data. It comes with an intuitive graphical user interface, various parameter options and a threshold-based filtering strategy for homologous sequences. Purple enables peptide candidate selection across various taxonomic levels and filtering against backgrounds of varying complexity.

★ RotaC2.0
RotaC2.0 is a web-based tool that can be used for fast rotavirus genotype differentiation of all 11 group A rotavirus gene segments according to the new guidelines proposed by the Rotavirus Classification Working Group (RCWG).

★ ViPR | Rotavirus A Genotype Determination
Rotavirus A Genotype Determination from ViPR is an annotation pipeline for genotyping Rotavirus A viruses, that is an optimized reimplementation of RotaC2.0.
★ V-Pipe | Mining viral genomes and improve clinical diagnostics
V-Pipe is an end-to-end pipeline tool to mine viral genomes and improve clinical diagnostics. V-Pipe integrates various computational tools for the analysis of viral high-throughput sequencing data. It supports the reproducible analysis of genomic diversity in intra-host virus populations, which is often involved in viral pathogenesis and virulence. V-pipe takes as input read data obtained from a viral sequencing experiment and produces, in a single execution of the pipeline, various output files covering quality control, read alignment, and inference of viral genomic diversity on the level of both single-nucleotide variants and viral haplotypes.

Phylogenetic and phylodynamic inference
★ AdaPatch
AdaPatch searches for dense and spatially distinct clusters of sites under positive selection on the surface of proteins and shall be applied on protein structures of viruses of yet unknown adaptive behavior. This could identify further candidate regions that are important for host-virus interaction. The tool is based on a graph-cut algorithm and combines sites with large dN/dS values into patches.

★ AntiPatch
AntiPatch is a software for inference of antigenicity-altering patches of sites on a protein structure.

★ Antigenic Tree Inference
The AntigenicTree program infers a phylogenetic tree based on virus sequences and assigns antigenic distances to reconstructed amino acid changes on internal branches. For sufficiently resolved branches, this allows to quantify the antigenic impact of single amino acid changes. However, the software is also applicable to problems where a phylogenetic tree and pair-wise phenotypic distances are available.

★ BEAST 1.10 | Bayesian phylogenetic and phylodynamic data integration

Bayesian Evolutionary Analysis by Sampling Trees (BEAST) is a primary tool for Bayesian phylogenetic and phylodynamic inference from genetic sequence data. BEAST unifies molecular phylogenetic reconstruction with complex discrete and continuous trait evolution, divergence-time dating, and coalescent demographic models in an efficient statistical inference engine using Markov chain Monte Carlo integration. BEAST 1.10 focusses on delivering accurate and informative insights for infectious disease research through the integration of diverse data sources, including phenotypic and epidemiological information, with molecular evolutionary models.

Some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. BEAST 2 is a computational software platform, that allows robust development of compatible (sub-)models which can be composed into a full model hierarchy.

★ EPA-ng | Evolutionary Placement Algorithm for Next Generation Sequencing
Phylogenetic placement method to determine how sequences obtained from diverse microbial environments fit into an evolutionary context.

★ Fréchet tree distance measure

A method for comparing phylogeographies across different trees inferred from the same taxa. reconstruct the origin and spread of taxa by inferring locations for internal nodes of the phylogenetic tree from sampling locations of genetic sequences. This is commonly applied to study pathogen outbreaks and spread.

★ SANTA-SIM | Simulating viral sequence evolution dynamics under selection and recombination.

SANTA-SIM is a software package to simulate the evolution of a population of gene sequences forwards through time. It models the underlying biological processes as discrete components: replication, recombination, point mutations, insertion-deletions, and selection under various fitness models and population size dynamics. The software is designed to be intuitive to work with for a wide range of users and executable in a cross-platform manner.

★ Sweep Dynamics (SD) plots | Computational identification of selective sweeps

Sweep Dynamics (SD) plots is a computational method combining phylogenetic algorithms with statistical techniques to characterize the molecular adaptation of rapidly evolving viruses from longitudinal sequence data. SD plots facilitate the identification of selective sweeps, the time periods in which these occurred and associated changes providing a selective advantage to the virus.

VICTOR | Virus Classification and Tree Building Online Resource
The VICTOR is a web service compares bacterial and archaeal viruses using their genome or proteome sequences. The outputs include phylogenomic trees inferred using the Genome-BLAST Distance Phylogeny method (GBDP), with branch support, as well as suggestions for the classification at the species, genus, subfamily and family level.

★ ViCTree
ViCTree is a bioinformatic framework that automatically selects new candidate virus sequences from GenBank, generates multiple sequence alignments, calculates a maximum likelihood phylogeny, integrates the sequences into the existing phylogenetic trees and is capable of automatically building new phylogenies when new data is available on GenBank.

★ CAMISIM | Simulating metagenomes and microbial communities
CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo.

All data sets and the software are freely available.

★ EPA-ng | Evolutionary Placement Algorithm for Next Generation Sequencing
Phylogenetic placement method to determine how sequences obtained from diverse microbial environments fit into an evolutionary context.

★ LiveKraken | real-time metagenomic classification of illumina data
LiveKraken is a real-time read classification tool based on the core algorithm of Kraken. Kraken is one of the most widely used tools in metagenomics due to its robustness and speed. LiveKraken uses streams of raw data from Illumina sequencers to classify reads taxonomically, producing results identical to those of Kraken the moment the sequencer finishes. LiveKraken also provides comparable results in early stages of a sequencing run, allowing saving up to a week of sequencing time on an Illumina HiSeq in High Throughput Mode.

★ RIEMS | Reliable Information Extraction from Metagenomic Sequence datasets
RIEMS is a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically.

★ vConTACT v.2.0 | Taxonomic assignment of uncultivated prokaryotic virus genomes

vConTACT is a network-based application utilizing whole genome gene-sharing profiles for virus taxonomy that integrates distance-based hierarchical clustering and confidence scores for all taxonomic predictions.

VirSorter | Mining viral signal from microbial genomic data

VirSorter is a web-based tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. VirSorter outperforms existing tools in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages) and for fragmented genomic and metagenomic datasets. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination.

Please also have a look at the following publication for assessing metagenomic assemblers: