This is a collection of useful tools in Virus Bioinformatics. Please note, that EVBC is not maintaining these tools.

Tools by EVBC members are marked ★.

Don’t hesitate to contact us if you want a tool to be added. We are also happy, to receive feedback on the tools!

Databases
★ eggNOG 5.0
eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations, including 2502 viral proteomes. eggNOG offers online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets.

  • [DOI] J. Huerta-Cepas, D. Szklarczyk, D. Heller, A. Hernández-Plaza, S. K. Forslund, H. Cook, D. R. Mende, I. Letunic, T. Rattei, L. J. Jensen, C. von~Mering, and P. Bork, “eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses,” Nucleic Acids Res, vol. 47, iss. D1, p. D309–D314, 2018.
    [Bibtex]
    @Article{Huerta-Cepas:18,
    author = {Jaime Huerta-Cepas and Damian Szklarczyk and Davide Heller and Ana Hern{\'{a}}ndez-Plaza and Sofia K Forslund and Helen Cook and Daniel R Mende and Ivica Letunic and Thomas Rattei and Lars Juhl Jensen and Christian von~Mering and Peer Bork},
    title = {{eggNOG} 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses},
    journal = {{Nucleic Acids Res}},
    year = {2018},
    volume = {47},
    number = {D1},
    pages = {D309--D314},
    doi = {10.1093/nar/gky1085},
    publisher = {Oxford University Press ({OUP})},
    }
GISAID EpiFlu™
EpiFlu™ is the world’s most complete collection of genetic sequence data of influenza viruses and related clinical and epidemiological data. EpiFlu™ is tailored to the needs of influenza researchers from both the human and the veterinary fields. The data is publicly accessible but not Public Domain (GISAID does not remove nor waive any preexisting rights).

  • [DOI] Y. Shu and J. McCauley, “GISAID: global initiative on sharing all influenza data – from vision to reality,” Eurosurveillance, vol. 22, iss. 13, 2017.
    [Bibtex]
    @Article{Shu:17,
    author = {Yuelong Shu and John McCauley},
    title = {{GISAID}: Global initiative on sharing all influenza data {\textendash} from vision to reality},
    journal = {{Eurosurveillance}},
    year = {2017},
    volume = {22},
    number = {13},
    doi = {10.2807/1560-7917.es.2017.22.13.30494},
    publisher = {European Centre for Disease Control and Prevention ({ECDC})},
    }
Hepatitis C Virus (HCV) Database Project
The HCV database group strives to present HCV-associated, hand-annotated genetic data in a userfriendly way, by providing access to the central database via web-accessible search interfaces and supplying a number of analysis tools.

  • [DOI] C. Kuiken, K. Yusim, L. Boykin, and R. Richardson, “The Los Alamos hepatitis C sequence database.,” Bioinformatics, vol. 21, p. 379–384, 2005.
    [Bibtex]
    @Article{Kuiken:05,
    author = {Kuiken, Carla and Yusim, Karina and Boykin, Laura and Richardson, Russell},
    title = {The {L}os {A}lamos hepatitis {C} sequence database.},
    journal = {{Bioinformatics}},
    year = {2005},
    volume = {21},
    pages = {379--384},
    abstract = {The hepatitis C virus (HCV) is a significant threat to public health worldwide. The virus is highly variable and evolves rapidly, making it an elusive target for the immune system and for vaccine and drug design. At present, some 30 000 HCV sequences have been published. A central website that provides annotated sequences and analysis tools will be helpful to HCV scientists worldwide. The HCV sequence database collects and annotates sequence data and provides them to the public via a website that contains a user-friendly search interface and a large number of sequence analysis tools, based on the model of the highly regarded Los Alamos HIV database. The HCV sequence database was officially launched in September 2003. Since then, its usage has steadily increased and is now at an average of approximately 280 visits per day from distinct IP addresses. The HCV website can be accessed via http://hcv.lanl.gov and http://hcv-db.org.},
    doi = {10.1093/bioinformatics/bth485},
    issue = {3},
    keywords = {Amino Acid Sequence; Base Sequence; DNA, Viral, chemistry, genetics; Database Management Systems; Databases, Genetic; Genome, Viral; Hepacivirus, genetics, metabolism; Information Storage and Retrieval, methods; Molecular Sequence Data; New Mexico; Sequence Alignment, methods; Sequence Analysis, methods; User-Computer Interface; Viral Proteins, chemistry, genetics, metabolism},
    pmid = {15377502},
    }
ICTV Taxonomy
The International Committee on Taxonomy of Viruses (ICTV) authorises and organises the taxonomic classification of and the nomenclatures for viruses. The ICTV has developed an universal taxonomic-scheme for viruses.
★ ViPR | Virus Pathogen Resource
The ViPR database integrates various types of data for multiple virus families. You can search the comprehensive database for sequences & strains, immune epitopes, 3D protein structures, host factor data, antiviral drugs, plasmid data. Further you can analyze the data online using sequence alignment, phylogenetic tree reconstruction, sequence variation (SNP), metadata-driven comparative analysis and BLAST.

  • [DOI] B. E. Pickett, E. L. Sadat, Y. Zhang, J. M. Noronha, B. R. Squires, V. Hunt, M. Liu, S. Kumar, S. Zaremba, Z. Gu, L. Zhou, C. N. Larson, J. Dietrich, E. B. Klem, and R. H. Scheuermann, “ViPR: an open bioinformatics database and analysis resource for virology research,” Nucleic Acids Res, vol. 40, iss. D1, p. D593–D598, 2011.
    [Bibtex]
    @Article{Pickett:11,
    author = {Brett E. Pickett and Eva L. Sadat and Yun Zhang and Jyothi M. Noronha and R. Burke Squires and Victoria Hunt and Mengya Liu and Sanjeev Kumar and Sam Zaremba and Zhiping Gu and Liwei Zhou and Christopher N. Larson and Jonathan Dietrich and Edward B. Klem and Richard H. Scheuermann},
    title = {{ViPR}: an open bioinformatics database and analysis resource for virology research},
    journal = {{Nucleic Acids Res}},
    year = {2011},
    volume = {40},
    number = {D1},
    pages = {D593--D598},
    doi = {10.1093/nar/gkr859},
    publisher = {Oxford University Press ({OUP})},
    }
★ ViPR Hepatitis C Virus Database
The hepatitis C virus (HCV) portal of ViPR facilitates basic research and development of diagnostics and therapeutics for HCV, by providing a comprehensive collection of HCV-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, and personal Workbench spaces for data storage and sharing.

  • [DOI] Y. Zhang, C. Zmasek, G. Sun, C. N. Larsen, and R. H. Scheuermann, “Hepatitis C virus database and bioinformatics analysis tools in the virus pathogen resource (ViPR),” in Methods Mol Biol, Springer New York, 2018, p. 47–69.
    [Bibtex]
    @InCollection{Zhang:18,
    author = {Yun Zhang and Christian Zmasek and Guangyu Sun and Christopher N. Larsen and Richard H. Scheuermann},
    title = {Hepatitis {C} Virus Database and Bioinformatics Analysis Tools in the Virus Pathogen Resource ({ViPR})},
    booktitle = {{Methods Mol Biol}},
    publisher = {Springer New York},
    year = {2018},
    pages = {47--69},
    doi = {10.1007/978-1-4939-8976-8_3},
    }
★ ViralZone
ViralZone is a web-resource from the Swiss Institute of Bioinformatics for all viral genus and families, providing general molecular and epidemiological information, along with virion and genome figures. Each virus or family page gives an easy access to UniProtKB/Swiss-Prot viral protein entries. ViralZone project is handled by the virus program of SwissProt group.

  • [DOI] C. Hulo, E. de Castro, P. Masson, L. Bougueleret, A. Bairoch, I. Xenarios, and P. Le Mercier, “ViralZone: a knowledge resource to understand virus diversity.,” Nucleic Acids Res, vol. 39, p. D576–D582, 2011.
    [Bibtex]
    @Article{Hulo:11,
    author = {Hulo, Chantal and de Castro, Edouard and Masson, Patrick and Bougueleret, Lydie and Bairoch, Amos and Xenarios, Ioannis and Le Mercier, Philippe},
    title = {{ViralZone}: a knowledge resource to understand virus diversity.},
    journal = {{Nucleic Acids Res}},
    year = {2011},
    volume = {39},
    pages = {D576--D582},
    abstract = {The molecular diversity of viruses complicates the interpretation of viral genomic and proteomic data. To make sense of viral gene functions, investigators must be familiar with the virus host range, replication cycle and virion structure. Our aim is to provide a comprehensive resource bridging together textbook knowledge with genomic and proteomic sequences. ViralZone web resource (www.expasy.org/viralzone/) provides fact sheets on all known virus families/genera with easy access to sequence data. A selection of reference strains (RefStrain) provides annotated standards to circumvent the exponential increase of virus sequences. Moreover ViralZone offers a complete set of detailed and accurate virion pictures.},
    doi = {10.1093/nar/gkq901},
    issue = {Database issue},
    keywords = {Databases, Genetic; Genome, Viral; Genomics; Proteomics; Viral Proteins, genetics; Virion, chemistry; Virus Physiological Phenomena; Virus Replication; Viruses, classification, genetics},
    pmid = {20947564},
    }
★ Viruses.STRING | Virus-Host Protein-Protein Interaction Database
Viruses.STRING is a protein–protein interaction database specifically catering to virus–virus and virus–host interactions. This database combines evidence from experimental and text-mining channels to provide combined probabilities for interactions between viral and host proteins. The database contains 177,425 interactions between 239 viruses and 319 hosts. The database is publicly available and the interaction data can also be accessed through the latest version of the Cytoscape STRING app.

  • [DOI] H. Cook, N. T. Doncheva, D. Szklarczyk, C. von Mering, and L. J. Jensen, “Viruses.STRING: a virus-host protein-protein interaction database,” Viruses, vol. 10, iss. 10, p. 519, 2018.
    [Bibtex]
    @Article{Cook:18,
    author = {Helen Cook and Nadezhda Tsankova Doncheva and Damian Szklarczyk and Christian {von Mering} and Lars Juhl Jensen},
    title = {Viruses.{STRING}: A Virus-Host Protein-Protein Interaction Database},
    journal = {Viruses},
    year = {2018},
    volume = {10},
    number = {10},
    pages = {519},
    doi = {10.3390/v10100519},
    publisher = {{MDPI} {AG}},
    }
Virus Variation Resource (NCBI)
Virus Variation Resource (VVR) is a web retrieval interfaces, analysis and visualization tools for virus sequence datasets.
De novo assembly
AV454 | Assemble Viral 454
AV454 is a de novo consensus assembler designed for small and non-repetitive genomes sequenced at high depth. It was specifically designed to assemble read data generated from a mixed population of viral genomes. Reads need not be paired, and it is assumed that no sequence repeat in the genome would be large enough to fully contain an average read.

  • [DOI] M. R. Henn, C. L. Boutwell, P. Charlebois, N. J. Lennon, K. A. Power, A. R. Macalalad, A. M. Berlin, C. M. Malboeuf, E. M. Ryan, S. Gnerre, M. C. Zody, R. L. Erlich, L. M. Green, A. Berical, Y. Wang, M. Casali, H. Streeck, A. K. Bloom, T. Dudek, D. Tully, R. Newman, K. L. Axten, A. D. Gladden, L. Battis, M. Kemper, Q. Zeng, T. P. Shea, S. Gujja, C. Zedlack, O. Gasser, C. Brander, C. Hess, H. F. Günthard, Z. L. Brumme, C. J. Brumme, S. Bazner, J. Rychert, J. P. Tinsley, K. H. Mayer, E. Rosenberg, F. Pereyra, J. Z. Levin, S. K. Young, H. Jessen, M. Altfeld, B. W. Birren, B. D. Walker, and T. M. Allen, “Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection.,” PLoS Pathog, vol. 8, p. e1002529, 2012.
    [Bibtex]
    @Article{Henn:12,
    author = {Henn, Matthew R and Boutwell, Christian L and Charlebois, Patrick and Lennon, Niall J and Power, Karen A and Macalalad, Alexander R and Berlin, Aaron M and Malboeuf, Christine M and Ryan, Elizabeth M and Gnerre, Sante and Zody, Michael C and Erlich, Rachel L and Green, Lisa M and Berical, Andrew and Wang, Yaoyu and Casali, Monica and Streeck, Hendrik and Bloom, Allyson K and Dudek, Tim and Tully, Damien and Newman, Ruchi and Axten, Karen L and Gladden, Adrianne D and Battis, Laura and Kemper, Michael and Zeng, Qiandong and Shea, Terrance P and Gujja, Sharvari and Zedlack, Carmen and Gasser, Olivier and Brander, Christian and Hess, Christoph and Günthard, Huldrych F and Brumme, Zabrina L and Brumme, Chanson J and Bazner, Suzane and Rychert, Jenna and Tinsley, Jake P and Mayer, Ken H and Rosenberg, Eric and Pereyra, Florencia and Levin, Joshua Z and Young, Sarah K and Jessen, Heiko and Altfeld, Marcus and Birren, Bruce W and Walker, Bruce D and Allen, Todd M},
    title = {Whole genome deep sequencing of {HIV-1} reveals the impact of early minor variants upon immune recognition during acute infection.},
    journal = {{PLoS Pathog}},
    year = {2012},
    volume = {8},
    pages = {e1002529},
    abstract = {Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia.},
    doi = {10.1371/journal.ppat.1002529},
    issue = {3},
    keywords = {CD8-Positive T-Lymphocytes, immunology; Genetic Variation; Genome, Viral, genetics; Genome-Wide Association Study; Genomic Structural Variation; HIV Infections, immunology, prevention & control, virology; HIV-1, genetics, immunology, pathogenicity; Humans; Immune Evasion, genetics, immunology; Oligonucleotide Array Sequence Analysis; RNA, Viral, analysis; Sequence Analysis, RNA; Viral Vaccines, immunology},
    pmid = {22412369},
    }
★ Genome Detective
Genome Detective is an easy to use web-based software application that assembles the genomes of viruses quickly and accurately. The application uses a novel alignment method that constructs genomes by reference-based linking of de novo contigs by combining amino-acids and nucleotide scores.

  • [DOI] M. Vilsker, Y. Moosa, S. Nooij, V. Fonseca, Y. Ghysens, K. Dumon, R. Pauwels, L. C. Alcantara, E. V. Eynden, A. Vandamme, K. Deforche, and T. de Oliveira, “Genome Detective: an automated system for virus identification from high-throughput sequencing data,” Bioinformatics, vol. 35, iss. 5, p. 871–873, 2018.
    [Bibtex]
    @Article{Vilsker:18,
    author = {Michael Vilsker and Yumna Moosa and Sam Nooij and Vagner Fonseca and Yoika Ghysens and Korneel Dumon and Raf Pauwels and Luiz Carlos Alcantara and Ewout Vanden Eynden and Anne-Mieke Vandamme and Koen Deforche and Tulio de Oliveira},
    title = {{Genome Detective}: an automated system for virus identification from high-throughput sequencing data},
    journal = {Bioinformatics},
    year = {2018},
    volume = {35},
    number = {5},
    pages = {871--873},
    doi = {10.1093/bioinformatics/bty695},
    editor = {Inanc Birol},
    publisher = {Oxford University Press ({OUP})},
    }
SPAdes
SPAdes is a tool for assembling genomes and mini-metagenomes from highly chimeric reads.

  • [DOI] A. Bankevich, S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M. Lesin, S. I. Nikolenko, S. Pham, A. D. Prjibelski, A. V. Pyshkin, A. V. Sirotkin, N. Vyahhi, G. Tesler, M. A. Alekseyev, and P. A. Pevzner, “SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.,” J Comput Biol, vol. 19, p. 455–477, 2012.
    [Bibtex]
    @Article{Bankevich:12,
    author = {Bankevich, Anton and Nurk, Sergey and Antipov, Dmitry and Gurevich, Alexey A and Dvorkin, Mikhail and Kulikov, Alexander S and Lesin, Valery M and Nikolenko, Sergey I and Pham, Son and Prjibelski, Andrey D and Pyshkin, Alexey V and Sirotkin, Alexander V and Vyahhi, Nikolay and Tesler, Glenn and Alekseyev, Max A and Pevzner, Pavel A},
    title = {{SPAdes}: a new genome assembly algorithm and its applications to single-cell sequencing.},
    journal = {{J Comput Biol}},
    year = {2012},
    volume = {19},
    pages = {455--477},
    abstract = {The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.},
    doi = {10.1089/cmb.2012.0021},
    issue = {5},
    keywords = {Algorithms; Bacteria, genetics; Genome, Bacterial; Metagenomics, methods; Sequence Analysis, DNA, methods; Single-Cell Analysis, methods},
    pmid = {22506599},
    }
V-FAT | Viral Finishing and Annotation Toolkit
V-FAT is a tool to perform automated computational finishing and annotation of de novo viral assemblies. V-FAT uses reference and read data to order and merge contigs, correct frameshifts, and produce NCBI-ready annotation files. It also performs a set of quality assurance measurements including coverage computation by gene or amplicon and identification of potential consensus errors.
VICUNA
VICUNA is a de novo assembly tool targeting highly diverse viral populations. It creates a single linear representation of the mixed population on which intra-host variants can be mapped. After initial assembly, it can also use existing genomes to perform guided merging of contigs. VICUNA efficiently handles ultra-deep sequence data with tens of thousands fold coverage.

  • [DOI] X. Yang, P. Charlebois, S. Gnerre, M. G. Coole, N. J. Lennon, J. Z. Levin, J. Qu, E. M. Ryan, M. C. Zody, and M. R. Henn, “De novo assembly of highly diverse viral populations.,” BMC Genomics, vol. 13, p. 475, 2012.
    [Bibtex]
    @Article{Yang:12,
    author = {Yang, Xiao and Charlebois, Patrick and Gnerre, Sante and Coole, Matthew G and Lennon, Niall J and Levin, Joshua Z and Qu, James and Ryan, Elizabeth M and Zody, Michael C and Henn, Matthew R},
    title = {{De novo} assembly of highly diverse viral populations.},
    journal = {{BMC Genomics}},
    year = {2012},
    volume = {13},
    pages = {475},
    abstract = {Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage. We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/ viral-genomics-analysis-software. We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.},
    doi = {10.1186/1471-2164-13-475},
    keywords = {Algorithms; Computational Biology; Genome, Viral, genetics; Software},
    pmid = {22974120},
    }
★ VrAP | Viral Assembly Pipeline
VrAP is based on the genome assembler SPAdes combined with an additional read correction and several filter steps. The pipeline classifies the contigs to distinguish host from viral sequences. VrAp can identify viruses without any sequence homology to known references.
Sequencing and annotation
★ AUGUSTUS | Multi-Genome Annotation

The comparative gene prediction algorithm of AUGUSTUS performs a multi-genome annotation to increase the accuracy and consistency of the predicted exon-intron structures of the protein-coding genes by simultaneously predicting the genes in all input genomes.

  • [DOI] S. Nachtweide and M. Stanke, “Multi-genome annotation with AUGUSTUS,” in Methods Mol Biol, Springer New York, 2019, p. 139–160.
    [Bibtex]
    @InCollection{Nachtweide:19,
    author = {Stefanie Nachtweide and Mario Stanke},
    title = {Multi-Genome Annotation with {AUGUSTUS}},
    booktitle = {{Methods Mol Biol}},
    publisher = {Springer New York},
    year = {2019},
    pages = {139--160},
    doi = {10.1007/978-1-4939-9173-0_8},
    }
★ Base-By-Base | Comparative Tools for Large Virus Genomes

Base-By-Base is a comprehensive tool for the creation and editing of multiple sequence alignments. It can be used with gene and protein sequences as well as with large viral genomes, which themselves can contain gene annotations. New features: (1) “consensus-degenerate hybrid oligonucleotide primers” (CODEHOP), a popular tool for the design of degenerate primers from a multiple sequence alignment of proteins; and (2) the ability to perform fuzzy searches within the columns of sequence data in multiple sequence alignments to determine the distribution of sequence variants among the sequences.

  • [DOI] S. Tu, J. Staheli, C. McClay, K. McLeod, T. Rose, and C. Upton, “Base-By-Base version 3: new comparative tools for large virus genomes,” Viruses, vol. 10, iss. 11, p. 637, 2018.
    [Bibtex]
    @Article{Tu:18,
    author = {Shin-Lin Tu and Jeannette Staheli and Colum McClay and Kathleen McLeod and Timothy Rose and Chris Upton},
    title = {{Base-By-Base} Version 3: New Comparative Tools for Large Virus Genomes},
    journal = {Viruses},
    year = {2018},
    volume = {10},
    number = {11},
    pages = {637},
    doi = {10.3390/v10110637},
    publisher = {{MDPI} {AG}},
    }
★ DIGS | Database-Integrated Genome Screening
Exploring genomes heuristically using sequence similarity search tools and a relational database.

Systematically screening of genomic ‘dark matter’ to recover useful biological information using sequence similarity search tools and a relational database. DIGS can be used to systematically search for sequences of interest, and to support investigations of their distribution, diversity and evolution. One example is the screening for endogenous viral elements (EVEs) in mammalian genomes.

  • [DOI] H. Zhu, T. Dennis, J. Hughes, and R. J. Gifford, “Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database.,” bioRxiv, 2018.
    [Bibtex]
    @Article{Zhu:18,
    author = {Henan Zhu and Tristan Dennis and Joseph Hughes and Robert J Gifford},
    title = {Database-integrated genome screening ({DIGS}): exploring genomes heuristically using sequence similarity search tools and a relational database.},
    journal = {{bioRxiv}},
    year = {2018},
    doi = {10.1101/246835},
    publisher = {Cold Spring Harbor Laboratory},
    }
★ GLUE
GLUE is a data-centric bioinformatics environment for virus sequence data, with a focus on variation, evolution and sequence interpretation. It is a protocol for working with mutiple sequence alignments (MSAs) and for generating standardized reports using MSAs and data. GLUE also provides tools for managing MSAs and data, and can be used in combination with the MySQL relational database management system (RDBMS) to create boutique sequence databases.

  • [DOI] J. B. Singer, E. C. Thomson, J. McLauchlan, J. Hughes, and R. J. Gifford, “GLUE: a flexible software system for virus sequence data,” BMC Bioinformatics, vol. 19, iss. 1, 2018.
    [Bibtex]
    @Article{Singer:18,
    author = {Joshua B. Singer and Emma C. Thomson and John McLauchlan and Joseph Hughes and Robert J. Gifford},
    title = {{GLUE}: a flexible software system for virus sequence data},
    journal = {{BMC Bioinformatics}},
    year = {2018},
    volume = {19},
    number = {1},
    doi = {10.1186/s12859-018-2459-9},
    publisher = {Springer Nature},
    }
★ PoSeiDon | Positive Selection Detection and recombination analysis of protein-coding genes
Attention: Currently the website hosting PoSeiDon is under maintenance. The web service will be back soon with full functionality.
PoSeiDon is a pipeline to detect significant positively selected sites and possible recombination events in an alignment of multiple coding sequences. Sites that undergo positive selection can give you insights in the evolutionary history of your sequences, for example showing you important mutation hot spots, accumulated as results of virus-host arms races during evolution.
PriSM
PriSM is a set of algorithms designed specifically to create degenerate primers for the amplification and sequencing of short viral genomes while maintaining sample population diversity. PriSM allows for rapid in silico optimization of primers for downstream applications such as sequencing.

  • [DOI] Q. Yu, E. M. Ryan, T. M. Allen, B. W. Birren, M. R. Henn, and N. J. Lennon, “PriSM: a primer selection and matching tool for amplification and sequencing of viral genomes.,” Bioinformatics, vol. 27, p. 266–267, 2011.
    [Bibtex]
    @Article{Yu:11,
    author = {Yu, Qing and Ryan, Elizabeth M and Allen, Todd M and Birren, Bruce W and Henn, Matthew R and Lennon, Niall J},
    title = {{PriSM}: a primer selection and matching tool for amplification and sequencing of viral genomes.},
    journal = {Bioinformatics},
    year = {2011},
    volume = {27},
    pages = {266--267},
    abstract = {PriSM is a set of algorithms designed to select and match degenerate primer pairs for the amplification of viral genomes. The design of panels of hundreds of primer pairs takes just hours using this program, compared with days using a manual approach. PriSM allows for rapid in silico optimization of primers for downstream applications such as sequencing. As a validation, PriSM was used to create an amplification primer panel for human immunodeficiency virus (HIV) Clade B. The program is freely available for use at: www.broadinstitute.org/perl/seq/specialprojects/primerDesign.cgi.},
    doi = {10.1093/bioinformatics/btq624},
    issue = {2},
    keywords = {Algorithms; DNA Primers, chemistry; Genome, Viral; HIV, genetics; Humans; Polymerase Chain Reaction; Sequence Alignment; Sequence Analysis, RNA; Software},
    pmid = {21068001},
    }
Tanoti
Tanoti is a BLAST guided reference based short read aligner. It is developed for maximising alignment in highly variable next generation sequence data sets (Illumina).
★ VIpower | Estimating power of Viral Integration
VIpower is a simulation-based tool for estimating power of viral integration detection via high-throughput sequencing. It was designed to model the process of detecting viral integrations from the human genome.

  • [DOI] A. Sulovari and D. Li, “VIpower: simulation-based tool for estimating power of viral integration detection via high-throughput sequencing,” Genomics, 2019.
    [Bibtex]
    @Article{Sulovari:19,
    author = {Arvis Sulovari and Dawei Li},
    title = {{VIpower}: Simulation-based tool for estimating power of viral integration detection via high-throughput sequencing},
    journal = {Genomics},
    year = {2019},
    doi = {10.1016/j.ygeno.2019.01.015},
    publisher = {Elsevier {BV}},
    }
ViraMiner | Identifying viral genomes in human samples
ViraMiner is a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. ViraMiner is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples.

  • [DOI] A. Tampuu, Z. Bzhalava, J. Dillner, and R. Vicente, “ViraMiner: deep learning on raw DNA sequences for identifying viral genomes in human samples,” bioRxiv, p. 602656, 2019.
    [Bibtex]
    @Article{Tampuu:19,
    author = {Ardi Tampuu and Zurab Bzhalava and Joakim Dillner and Raul Vicente},
    title = {{ViraMiner}: deep learning on raw {DNA} sequences for identifying viral genomes in human samples},
    journal = {{bioRxiv}},
    year = {2019},
    pages = {602656},
    doi = {10.1101/602656},
    publisher = {Cold Spring Harbor Laboratory},
    }
★ VIRULIGN | Fast codon-correct alignment and annotation of viral genomes
VIRULIGN is built for fast codon-correct alignments of large datasets, with standardized and formalized genome annotation and various alignment export formats.

  • [DOI] P. J. K. Libin, K. Deforche, A. B. Abecasis, and K. Theys, “VIRULIGN: fast codon-correct alignment and annotation of viral genomes,” Bioinformatics, 2018.
    [Bibtex]
    @Article{Libin:18,
    author = {Pieter J K Libin and Koen Deforche and Ana B Abecasis and Kristof Theys},
    title = {{VIRULIGN}: fast codon-correct alignment and annotation of viral genomes},
    journal = {Bioinformatics},
    year = {2018},
    doi = {10.1093/bioinformatics/bty851},
    editor = {John Hancock},
    publisher = {Oxford University Press ({OUP})},
    }
Secondary structure prediction
★ LRIscan | Long Range Interaction scan
LRIscan is a tool for the prediction of long-range interactions in full viral genomes based on a multiple genome alignment. LRIscan is able to find interactions spanning thousands of nucleotides.

  • [DOI] M. Fricke and M. Marz, “Prediction of conserved long-range RNA-RNA interactions in full viral genomes,” Bioinformatics, vol. 32, iss. 19, p. 2928–2935, 2016.
    [Bibtex]
    @Article{Fricke:16,
    author = {Markus Fricke and Manja Marz},
    title = {Prediction of conserved long-range {RNA}-{RNA} interactions in full viral genomes},
    journal = {Bioinformatics},
    year = {2016},
    volume = {32},
    number = {19},
    pages = {2928--2935},
    doi = {10.1093/bioinformatics/btw323},
    publisher = {Oxford University Press ({OUP})},
    }
★ RNAalifold
RNAalifold is a tool for calculating secondary structures for a set of aligned RNAs. It is part of the Vienna RNA Package.

  • [DOI] R. Lorenz, S. H. Bernhart, C. H. zu Siederdissen, H. Tafer, C. Flamm, P. F. Stadler, and I. L. Hofacker, “ViennaRNA package 2.0,” Algorithms Mol Biol, vol. 6, iss. 1, 2011.
    [Bibtex]
    @Article{Lorenz:11,
    author = {Ronny Lorenz and Stephan H Bernhart and Christian H\"{o}ner zu Siederdissen and Hakim Tafer and Christoph Flamm and Peter F Stadler and Ivo L Hofacker},
    title = {{ViennaRNA} Package 2.0},
    journal = {{Algorithms Mol Biol}},
    year = {2011},
    volume = {6},
    number = {1},
    doi = {10.1186/1748-7188-6-26},
    publisher = {Springer Nature},
    }
★ SilentMutations (SIM) | Analyzing long-range RNA–RNA interactions in viral genomes
SilentMutations (SIM) is an easy-to-use tool to analyze the effect of multiple point mutations on the secondary structures of two interacting viral RNAs. The tool can simulate disruptive and compensatory mutants of two interacting single-stranded RNAs. This allows a fast and accurate assessment of key regions potentially involved in functional long-range RNA–RNA interactions and will eventually help virologists and RNA-experts to design appropriate experiments.

  • D. Desirò, M. Hölzer, B. Ibrahim, and M. Marz, “SilentMutations (SIM): a tool for analyzing long-range RNA–RNA interactions in viral genomes and structured RNAs,” Virus Res, vol. 260, p. 135–141, 2019.
    [Bibtex]
    @Article{Desiro:19,
    author = {Desir{\`o}, Daniel and H{\"o}lzer, Martin and Ibrahim, Bashar and Marz, Manja},
    title = {{S}ilent{M}utations ({SIM}): A tool for analyzing long-range {RNA}--{RNA} interactions in viral genomes and structured {RNA}s},
    journal = {{Virus Res}},
    year = {2019},
    volume = {260},
    pages = {135--141},
    publisher = {Elsevier},
    }
Virus genotyping and diagnosis

Find a list of virus genotyping tools here.

★ ArboTyping | Identification of Dengue, Zika and Chikungunya virus species and genotypes
ArboTyping is an easy-to-use software to classify virus sequences with respect to their species and sub-species (i.e. serotype and/or genotype). The method was validated on a large dataset assessing the classification performance with respect to whole-genome sequences and partial-genome sequences. It allows the high-throughput classification of these virus species and genotypes in seconds.

  • [DOI] V. Fonseca, P. J. K. Libin, K. Theys, N. R. Faria, M. R. T. Nunes, M. I. Restovic, M. Freire, M. Giovanetti, L. Cuypers, A. Nowé, A. Abecasis, K. Deforche, G. A. Santiago, I. C. de Siqueira, E. J. San, K. C. B. Machado, V. Azevedo, A. M. B. Filippis, R. V. da Cunha, O. G. Pybus, A. Vandamme, L. C. J. Alcantara, and T. de Oliveira, “A computational method for the identification of Dengue, Zika and Chikungunya virus species and genotypes,” PLoS Negl Trop Dis, vol. 13, iss. 5, p. e0007231, 2019.
    [Bibtex]
    @Article{Fonseca:19,
    author = {Vagner Fonseca and Pieter J. K. Libin and Kristof Theys and Nuno R. Faria and Marcio R. T. Nunes and Maria I. Restovic and Murilo Freire and Marta Giovanetti and Lize Cuypers and Ann Now{\'{e}} and Ana Abecasis and Koen Deforche and Gilberto A. Santiago and Isadora C. de Siqueira and Emmanuel J. San and Kaliane C. B. Machado and Vasco Azevedo and Ana Maria Bispo-de Filippis and Rivaldo Ven{\^{a}}ncio da Cunha and Oliver G. Pybus and Anne-Mieke Vandamme and Luiz C. J. Alcantara and Tulio de Oliveira},
    title = {A computational method for the identification of {D}engue, {Z}ika and {C}hikungunya virus species and genotypes},
    journal = {{PLoS Negl Trop Dis}},
    year = {2019},
    volume = {13},
    number = {5},
    pages = {e0007231},
    doi = {10.1371/journal.pntd.0007231},
    editor = {Isabel Rodriguez-Barraquer},
    publisher = {Public Library of Science ({PLoS})},
    }
ATHLATES | Accurate Typing of Human Leukocyte Antigen Through Exome Sequencing
ATHLATES is a software for determining human leukocyte antigen genotypes for individuals from Illumina exome sequencing data.

  • [DOI] C. Liu, X. Yang, B. Duffy, T. Mohanakumar, R. D. Mitra, M. C. Zody, and J. D. Pfeifer, “ATHLATES: accurate typing of human leukocyte antigen through exome sequencing.,” Nucleic Acids Res, vol. 41, p. e142, 2013.
    [Bibtex]
    @Article{Liu:13,
    author = {Liu, Chang and Yang, Xiao and Duffy, Brian and Mohanakumar, Thalachallour and Mitra, Robi D and Zody, Michael C and Pfeifer, John D},
    title = {{ATHLATES}: accurate typing of human leukocyte antigen through exome sequencing.},
    journal = {{Nucleic Acids Res}},
    year = {2013},
    volume = {41},
    pages = {e142},
    abstract = {Human leukocyte antigen (HLA) typing at the allelic level can in theory be achieved using whole exome sequencing (exome-seq) data with no added cost but has been hindered by its computational challenge. We developed ATHLATES, a program that applies assembly, allele identification and allelic pair inference to short read sequences, and applied it to data from Illumina platforms. In 15 data sets with adequate coverage for HLA-A, -B, -C, -DRB1 and -DQB1 genes, ATHLATES correctly reported 74 out of 75 allelic pairs with an overall concordance rate of 99% compared with conventional typing. This novel approach should be broadly applicable to research and clinical laboratories. },
    doi = {10.1093/nar/gkt481},
    issue = {14},
    keywords = {Alleles; Exome; HLA Antigens, classification, genetics; Histocompatibility Testing, methods; Humans; Sequence Analysis, DNA, methods; Software},
    pmid = {23748956},
    }
★ DisCVR | Rapid viral diagnosis from HTS data
DisCVR is a Diagnostic tool for detecting known human viruses in clinical samples from High-Throughput Sequencing (HTS) data and validate the results interactively on computers with limited resources.

  • [DOI] M. Maabar, A. J. Davison, M. Vučak, F. Thorburn, P. R. Murcia, R. Gunson, M. Palmarini, and J. Hughes, “DisCVR: rapid viral diagnosis from high-throughput sequencing data.,” Virus Evol, vol. 5, p. vez033, 2019.
    [Bibtex]
    @Article{Maabar:19,
    author = {Maabar, Maha and Davison, Andrew J and Vučak, Matej and Thorburn, Fiona and Murcia, Pablo R and Gunson, Rory and Palmarini, Massimo and Hughes, Joseph},
    title = {{DisCVR}: Rapid viral diagnosis from high-throughput sequencing data.},
    journal = {{Virus Evol}},
    year = {2019},
    volume = {5},
    pages = {vez033},
    abstract = {High-throughput sequencing (HTS) enables most pathogens in a clinical sample to be detected from a single analysis, thereby providing novel opportunities for diagnosis, surveillance, and epidemiology. However, this powerful technology is difficult to apply in diagnostic laboratories because of its computational and bioinformatic demands. We have developed DisCVR, which detects known human viruses in clinical samples by matching sample -mers (twenty-two nucleotide sequences) to -mers from taxonomically labeled viral genomes. DisCVR was validated using published HTS data for eighty-nine clinical samples from adults with upper respiratory tract infections. These samples had been tested for viruses metagenomically and also by real-time polymerase chain reaction assay, which is the standard diagnostic method. DisCVR detected human viruses with high sensitivity (79%) and specificity (100%), and was able to detect mixed infections. Moreover, it produced results comparable to those in a published metagenomic analysis of 177 blood samples from patients in Nigeria. DisCVR has been designed as a user-friendly tool for detecting human viruses from HTS data using computers with limited RAM and processing power, and includes a graphical user interface to help users interpret and validate the output. It is written in Java and is publicly available from http://bioinformatics.cvr.ac.uk/discvr.php.},
    doi = {10.1093/ve/vez033},
    issue = {2},
    keywords = {diagnosis; high-throughput sequencing; k-mer; virus},
    pmid = {31528358},
    }
★ ERVcaller | Identifying polymorphic endogenous retrovirus and other transposable element insertions
ERVcaller is a tool to detect and genotype transposable element insertions, including ERVs, in the human genome.

  • [DOI] X. Chen and D. Li, “ERVcaller: identifying polymorphic endogenous retrovirus and other transposable element insertions using whole-genome sequencing data,” Bioinformatics, 2019.
    [Bibtex]
    @Article{Chen:19,
    author = {Xun Chen and Dawei Li},
    title = {{ERVcaller}: Identifying polymorphic endogenous retrovirus and other transposable element insertions using whole-genome sequencing data},
    journal = {Bioinformatics},
    year = {2019},
    doi = {10.1093/bioinformatics/btz205},
    editor = {Inanc Birol},
    publisher = {Oxford University Press ({OUP})},
    }
★ geno2pheno | Genotypic interpretation system for identifying viral drug resistance using NGS data

geno2pheno[ngs-freq] is a web service for rapidly identifying drug resistance in HIV-1 and HCV samples by relying on frequency files that provide the read counts of nucleotides or codons along a viral genome. geno2pheno[ngs-freq] can assist clinical decision making by enabling users to explore resistance in viral populations with different abundances.

  • [DOI] M. Döring, J. Büch, G. Friedrich, A. Pironti, P. Kalaghatgi, E. Knops, E. Heger, M. Obermeier, M. Däumer, A. Thielen, R. Kaiser, T. Lengauer, and N. Pfeifer, “Geno2pheno[ngs-freq]: a genotypic interpretation system for identifying viral drug resistance using next-generation sequencing data.,” Nucleic Acids Res, vol. 46, p. W271–W277, 2018.
    [Bibtex]
    @Article{Doering:18,
    author = {Döring, Matthias and Büch, Joachim and Friedrich, Georg and Pironti, Alejandro and Kalaghatgi, Prabhav and Knops, Elena and Heger, Eva and Obermeier, Martin and Däumer, Martin and Thielen, Alexander and Kaiser, Rolf and Lengauer, Thomas and Pfeifer, Nico},
    title = {geno2pheno[ngs-freq]: a genotypic interpretation system for identifying viral drug resistance using next-generation sequencing data.},
    journal = {{Nucleic Acids Res}},
    year = {2018},
    volume = {46},
    pages = {W271--W277},
    abstract = {Identifying resistance to antiretroviral drugs is crucial for ensuring the successful treatment of patients infected with viruses such as human immunodeficiency virus (HIV) or hepatitis C virus (HCV). In contrast to Sanger sequencing, next-generation sequencing (NGS) can detect resistance mutations in minority populations. Thus, genotypic resistance testing based on NGS data can offer novel, treatment-relevant insights. Since existing web services for analyzing resistance in NGS samples are subject to long processing times and follow strictly rules-based approaches, we developed geno2pheno[ngs-freq], a web service for rapidly identifying drug resistance in HIV-1 and HCV samples. By relying on frequency files that provide the read counts of nucleotides or codons along a viral genome, the time-intensive step of processing raw NGS data is eliminated. Once a frequency file has been uploaded, consensus sequences are generated for a set of user-defined prevalence cutoffs, such that the constructed sequences contain only those nucleotides whose codon prevalence exceeds a given cutoff. After locally aligning the sequences to a set of references, resistance is predicted using the well-established approaches of geno2pheno[resistance] and geno2pheno[hcv]. geno2pheno[ngs-freq] can assist clinical decision making by enabling users to explore resistance in viral populations with different abundances and is freely available at http://ngs.geno2pheno.org.},
    doi = {10.1093/nar/gky349},
    issue = {W1},
    pmid = {29718426},
    }
★ Purple | Computational Workflow for Strategic Selection of Peptides for Viral Diagnostics Using MS-Based Targeted Proteomics

To detect viral pathogens in time-critical scenarios, accurate and fast diagnostic assays are required. Such assays can now be established using mass spectrometry-based targeted proteomics, by which viral proteins can be rapidly detected from complex samples down to the strain-level with high sensitivity and reproducibility. Purple is a software tool for selecting target-specific peptide candidates directly from given proteome sequence data. It comes with an intuitive graphical user interface, various parameter options and a threshold-based filtering strategy for homologous sequences. Purple enables peptide candidate selection across various taxonomic levels and filtering against backgrounds of varying complexity.

  • [DOI] J. Lechner, F. Hartkopf, P. Hiort, A. Nitsche, M. Grossegesse, J. Doellinger, B. Y. Renard, and T. Muth, “Purple: a computational workflow for strategic selection of peptides for viral diagnostics using MS-based targeted proteomics.,” Viruses, vol. 11, 2019.
    [Bibtex]
    @Article{Lechner:19,
    author = {Lechner, Johanna and Hartkopf, Felix and Hiort, Pauline and Nitsche, Andreas and Grossegesse, Marica and Doellinger, Joerg and Renard, Bernhard Y and Muth, Thilo},
    title = {Purple: A Computational Workflow for Strategic Selection of Peptides for Viral Diagnostics Using {MS}-Based Targeted Proteomics.},
    journal = {Viruses},
    year = {2019},
    volume = {11},
    abstract = {Emerging virus diseases present a global threat to public health. To detect viral pathogens in time-critical scenarios, accurate and fast diagnostic assays are required. Such assays can now be established using mass spectrometry-based targeted proteomics, by which viral proteins can be rapidly detected from complex samples down to the strain-level with high sensitivity and reproducibility. Developing such targeted assays involves tedious steps of peptide candidate selection, peptide synthesis, and assay optimization. Peptide selection requires extensive preprocessing by comparing candidate peptides against a large search space of background proteins. Here we present Purple (Picking unique relevant peptides for viral experiments), a software tool for selecting target-specific peptide candidates directly from given proteome sequence data. It comes with an intuitive graphical user interface, various parameter options and a threshold-based filtering strategy for homologous sequences. Purple enables peptide candidate selection across various taxonomic levels and filtering against backgrounds of varying complexity. Its functionality is demonstrated using data from different virus species and strains. Our software enables to build taxon-specific targeted assays and paves the way to time-efficient and robust viral diagnostics using targeted proteomics.},
    doi = {10.3390/v11060536},
    issue = {6},
    keywords = {data analysis; mass spectrometry; parallel reaction monitoring; peptide selection; targeted proteomics; virus diagnostics; virus proteomics},
    pmid = {31181768},
    }
★ RotaC2.0
RotaC2.0 is a web-based tool that can be used for fast rotavirus genotype differentiation of all 11 group A rotavirus gene segments according to the new guidelines proposed by the Rotavirus Classification Working Group (RCWG).

  • [DOI] P. Maes, J. Matthijnssens, M. Rahman, and M. V. Ranst, “RotaC: a web-based tool for the complete genome classification of group A rotaviruses,” BMC Microbiology, vol. 9, iss. 1, p. 238, 2009.
    [Bibtex]
    @Article{Maes:09,
    author = {Piet Maes and Jelle Matthijnssens and Mustafizur Rahman and Marc Van Ranst},
    title = {{RotaC}: A web-based tool for the complete genome classification of group {A} rotaviruses},
    journal = {{BMC Microbiology}},
    year = {2009},
    volume = {9},
    number = {1},
    pages = {238},
    doi = {10.1186/1471-2180-9-238},
    publisher = {Springer Nature},
    }
★ TaxIt | Automated computational pipeline for untargeted strain-level identification using MS/MS spectra
TaxIt is an iterative workflow for untargeted accurate strain-level classification of a priori unidentified organisms using tandem mass spectrometry, that addresses the increasing search space required for MS/MS-based strain-level classification of samples with unknown taxonomic origin. TaxIt first applies reference sequence data for initial identification of species candidates, followed by automated acquisition of relevant strain sequences for low level classification. Furthermore, proteome similarities resulting in ambiguous taxonomic assignments are addressed with an abundance weighting strategy to improve candidate confidence. TaxIt makes extensive use of public, unrestricted and continuously growing sequence resources such as the NCBI databases.

  • [DOI] M. Kuhring, J. Doellinger, A. Nitsche, T. Muth, and B. Y. Renard, “An iterative and automated computational pipeline for untargeted strain-level identification using MS/MS spectra from pathogenic samples,” bioRxiv, 2019.
    [Bibtex]
    @Article{Kuhring:19,
    author = {Mathias Kuhring and Joerg Doellinger and Andreas Nitsche and Thilo Muth and Bernhard Y. Renard},
    title = {An iterative and automated computational pipeline for untargeted strain-level identification using {MS}/{MS} spectra from pathogenic samples},
    journal = {{bioRxiv}},
    year = {2019},
    doi = {10.1101/812313},
    publisher = {Cold Spring Harbor Laboratory},
    }
★ ViPR | Rotavirus A Genotype Determination
Rotavirus A Genotype Determination from ViPR is an annotation pipeline for genotyping Rotavirus A viruses, that is an optimized reimplementation of RotaC2.0.
★ V-Pipe | Mining viral genomes and improve clinical diagnostics
V-Pipe is an end-to-end pipeline tool to mine viral genomes and improve clinical diagnostics. V-Pipe integrates various computational tools for the analysis of viral high-throughput sequencing data. It supports the reproducible analysis of genomic diversity in intra-host virus populations, which is often involved in viral pathogenesis and virulence. V-pipe takes as input read data obtained from a viral sequencing experiment and produces, in a single execution of the pipeline, various output files covering quality control, read alignment, and inference of viral genomic diversity on the level of both single-nucleotide variants and viral haplotypes.

  • [DOI] L. A. Carlisle, T. Turk, K. Kusejko, K. J. Metzner, C. Leemann, C. Schenkel, N. Bachmann, S. Posada, N. Beerenwinkel, J. Böni, S. Yerly, T. Klimkait, M. Perreau, D. L. Braun, A. Rauch, A. Calmy, M. Cavassini, M. Battegay, P. Vernazza, E. Bernasconi, H. F. Günthard, R. D. Kouyos, and Swiss HIV Cohort Study, “Viral diversity from next-generation sequencing of HIV-1 samples provides precise estimates of infection recency and time since infection.,” J Infect Dis, 2019.
    [Bibtex]
    @Article{Carlisle:19,
    author = {Carlisle, Louisa A and Turk, Teja and Kusejko, Katharina and Metzner, Karin J and Leemann, Christine and Schenkel, Corinne and Bachmann, Nadine and Posada, Susana and Beerenwinkel, Niko and Böni, Jürg and Yerly, Sabine and Klimkait, Thomas and Perreau, Matthieu and Braun, Dominique L and Rauch, Andri and Calmy, Alexandra and Cavassini, Matthias and Battegay, Manuel and Vernazza, Pietro and Bernasconi, Enos and Günthard, Huldrych F and Kouyos, Roger D and {Swiss HIV Cohort Study}},
    title = {Viral diversity from next-generation sequencing of {HIV}-1 samples provides precise estimates of infection recency and time since infection.},
    journal = {{J Infect Dis}},
    year = {2019},
    abstract = {HIV-1 genetic diversity increases over the course of infection, and can be used to infer time since infection (TSI) and consequently also infection recency, crucial quantities for HIV-1 surveillance and the understanding of viral pathogenesis. We considered 313 HIV-infected individuals for whom reliable estimates of infection dates and next-generation sequencing (NGS)-derived nucleotide frequency data were available. Fraction of ambiguous nucleotides (FAN) obtained by population sequencing were available for 207 samples. We assessed whether average pairwise diversity (APD) calculated using NGS sequences provided a more exact prediction of TSI and classification of infection recency (<1 year post-infection) compared to FAN. NGS-derived APD classifies an infection as recent with a sensitivity of 88% and specificity of 85%. When considering only the 207 samples for which FAN were available, NGS-derived APD exhibited a higher sensitivity (90% vs 78%) and specificity (95% vs 67%) than FAN. Additionally, APD can estimate TSI with a mean absolute error of 0.84 years, compared to 1.03 years for FAN.},
    doi = {10.1093/infdis/jiz094},
    keywords = {HIV-1; diversity; infection recency; next-generation sequencing; time since infection},
    pmid = {30835266},
    }
Phylogenetic and phylodynamic inference
★ AdaPatch
AdaPatch searches for dense and spatially distinct clusters of sites under positive selection on the surface of proteins and shall be applied on protein structures of viruses of yet unknown adaptive behavior. This could identify further candidate regions that are important for host-virus interaction. The tool is based on a graph-cut algorithm and combines sites with large dN/dS values into patches.

  • [DOI] C. Tusche, L. Steinbrück, and A. C. McHardy, “Detecting patches of protein sites of influenza A viruses under positive selection.,” Mol Biol Evol, vol. 29, p. 2063–2071, 2012.
    [Bibtex]
    @Article{Tusche:12,
    author = {Tusche, Christina and Steinbrück, Lars and McHardy, Alice C},
    title = {Detecting patches of protein sites of influenza {A} viruses under positive selection.},
    journal = {{Mol Biol Evol}},
    year = {2012},
    volume = {29},
    pages = {2063--2071},
    abstract = {Influenza A viruses are single-stranded RNA viruses capable of evolving rapidly to adapt to environmental conditions. Examples include the establishment of a virus in a novel host or an adaptation to increasing immunity within the host population due to prior infection or vaccination against a circulating strain. Knowledge of the viral protein regions under positive selection is therefore crucial for surveillance. We have developed a method for detecting positively selected patches of sites on the surface of viral proteins, which we assume to be relevant for adaptive evolution. We measure positive selection based on dN/dS ratios of genetic changes inferred by considering the phylogenetic structure of the data and suggest a graph-cut algorithm to identify such regions. Our algorithm searches for dense and spatially distinct clusters of sites under positive selection on the protein surface. For the hemagglutinin protein of human influenza A viruses of the subtypes H3N2 and H1N1, our predicted sites significantly overlap with known antigenic and receptor-binding sites. From the structure and sequence data of the 2009 swine-origin influenza A/H1N1 hemagglutinin and PB2 protein, we identified regions that provide evidence of evolution under positive selection since introduction of the virus into the human population. The changes in PB2 overlap with sites reported to be associated with mammalian adaptation of the influenza A virus. Application of our technique to the protein structures of viruses of yet unknown adaptive behavior could identify further candidate regions that are important for host-virus interaction.},
    doi = {10.1093/molbev/mss095},
    issue = {8},
    keywords = {Animals; Antibody Affinity, immunology; Databases, Protein; Epitopes, immunology; Hemagglutinin Glycoproteins, Influenza Virus, genetics; Humans; Influenza A virus, genetics; Models, Molecular; Selection, Genetic; Swine; Templates, Genetic; Viral Proteins, chemistry},
    pmid = {22427709},
    }
★ AntiPatch
AntiPatch is a software for inference of antigenicity-altering patches of sites on a protein structure.

  • [DOI] C. Kratsch, T. R. Klingen, L. Mümken, L. Steinbrück, and A. C. McHardy, “Determination of antigenicity-altering patches on the major surface protein of human influenza A/H3N2 viruses,” Virus Evol, vol. 2, iss. 1, p. vev025, 2016.
    [Bibtex]
    @Article{Kratsch:16,
    author = {Christina Kratsch and Thorsten R. Klingen and Linda Mümken and Lars Steinbrück and Alice C. McHardy},
    title = {Determination of antigenicity-altering patches on the major surface protein of human influenza {A/H3N2} viruses},
    journal = {{Virus Evol}},
    year = {2016},
    volume = {2},
    number = {1},
    pages = {vev025},
    doi = {10.1093/ve/vev025},
    publisher = {Oxford University Press ({OUP})},
    }
★ Antigenic Tree Inference
The AntigenicTree program infers a phylogenetic tree based on virus sequences and assigns antigenic distances to reconstructed amino acid changes on internal branches. For sufficiently resolved branches, this allows to quantify the antigenic impact of single amino acid changes. However, the software is also applicable to problems where a phylogenetic tree and pair-wise phenotypic distances are available.

  • [DOI] L. Steinbrück and A. C. McHardy, “Inference of genotype–phenotype relationships in the antigenic evolution of human influenza A (H3N2) viruses,” PLoS Comput Biol, vol. 8, iss. 4, p. e1002492, 2012.
    [Bibtex]
    @Article{Steinbrück:12,
    author = {Lars Steinbrück and Alice Carolyn McHardy},
    title = {Inference of Genotype{\textendash}Phenotype Relationships in the Antigenic Evolution of Human Influenza {A} ({H3N2}) Viruses},
    journal = {{PLoS Comput Biol}},
    year = {2012},
    volume = {8},
    number = {4},
    pages = {e1002492},
    doi = {10.1371/journal.pcbi.1002492},
    editor = {Neil Ferguson},
    publisher = {Public Library of Science ({PLoS})},
    }
★ BEAST 1.10 | Bayesian phylogenetic and phylodynamic data integration

Bayesian Evolutionary Analysis by Sampling Trees (BEAST) is a primary tool for Bayesian phylogenetic and phylodynamic inference from genetic sequence data. BEAST unifies molecular phylogenetic reconstruction with complex discrete and continuous trait evolution, divergence-time dating, and coalescent demographic models in an efficient statistical inference engine using Markov chain Monte Carlo integration. BEAST 1.10 focusses on delivering accurate and informative insights for infectious disease research through the integration of diverse data sources, including phenotypic and epidemiological information, with molecular evolutionary models.

  • M. A. Suchard, P. Lemey, G. Baele, D. L. Ayres, A. J. Drummond, and A. Rambaut, “Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10,” Virus Evol, vol. 4, iss. 1, p. vey016, 2018.
    [Bibtex]
    @Article{Suchard:18,
    author = {Suchard, Marc A and Lemey, Philippe and Baele, Guy and Ayres, Daniel L and Drummond, Alexei J and Rambaut, Andrew},
    title = {Bayesian phylogenetic and phylodynamic data integration using {BEAST} 1.10},
    journal = {{Virus Evol}},
    year = {2018},
    volume = {4},
    number = {1},
    pages = {vey016},
    publisher = {Oxford University Press},
    }

Some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. BEAST 2 is a computational software platform, that allows robust development of compatible (sub-)models which can be composed into a full model hierarchy.

  • [DOI] R. Bouckaert, T. G. Vaughan, J. Barido-Sottani, S. Duchêne, M. Fourment, A. Gavryushkina, J. Heled, G. Jones, D. Kühnert, N. D. Maio, M. Matschiner, F. K. Mendes, N. F. Müller, H. A. Ogilvie, L. du Plessis, A. Popinga, A. Rambaut, D. Rasmussen, I. Siveroni, M. A. Suchard, C. Wu, D. Xie, C. Zhang, T. Stadler, and A. J. Drummond, “BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis,” PLOS Comput Biol, vol. 15, iss. 4, p. e1006650, 2019.
    [Bibtex]
    @Article{Bouckaert:19,
    author = {Remco Bouckaert and Timothy G. Vaughan and Joëlle Barido-Sottani and Sebasti{\'{a}}n Duch{\^{e}}ne and Mathieu Fourment and Alexandra Gavryushkina and Joseph Heled and Graham Jones and Denise Kühnert and Nicola De Maio and Michael Matschiner and F{\'{a}}bio K. Mendes and Nicola F. Müller and Huw A. Ogilvie and Louis du Plessis and Alex Popinga and Andrew Rambaut and David Rasmussen and Igor Siveroni and Marc A. Suchard and Chieh-Hsi Wu and Dong Xie and Chi Zhang and Tanja Stadler and Alexei J. Drummond},
    title = {{BEAST} 2.5: An advanced software platform for {B}ayesian evolutionary analysis},
    journal = {{PLOS Comput Biol}},
    year = {2019},
    volume = {15},
    number = {4},
    pages = {e1006650},
    doi = {10.1371/journal.pcbi.1006650},
    editor = {Mihaela Pertea},
    publisher = {Public Library of Science ({PLoS})},
    }
★ EPA-ng | Evolutionary Placement Algorithm for Next Generation Sequencing
Phylogenetic placement method to determine how sequences obtained from diverse microbial environments fit into an evolutionary context.

  • [DOI] P. Barbera, A. M. Kozlov, L. Czech, B. Morel, D. Darriba, T. Flouri, and A. Stamatakis, “EPA-ng: massively parallel evolutionary placement of genetic sequences,” Syst Biol, vol. 68, iss. 2, p. 365–369, 2018.
    [Bibtex]
    @Article{Barbera:18,
    author = {Pierre Barbera and Alexey M Kozlov and Lucas Czech and Benoit Morel and Diego Darriba and Tom{\'{a}}{\v{s}} Flouri and Alexandros Stamatakis},
    title = {{EPA}-ng: Massively Parallel Evolutionary Placement of Genetic Sequences},
    journal = {{Syst Biol}},
    year = {2018},
    volume = {68},
    number = {2},
    pages = {365--369},
    doi = {10.1093/sysbio/syy054},
    editor = {David Posada},
    publisher = {Oxford University Press ({OUP})},
    }
★ Fréchet tree distance measure

A method for comparing phylogeographies across different trees inferred from the same taxa. reconstruct the origin and spread of taxa by inferring locations for internal nodes of the phylogenetic tree from sampling locations of genetic sequences. This is commonly applied to study pathogen outbreaks and spread.

  • [DOI] S. Reimering, S. Muñoz, and A. C. McHardy, “A Fréchet tree distance measure to compare phylogeographic spread paths across trees,” Sci Rep, vol. 8, iss. 1, 2018.
    [Bibtex]
    @Article{Reimering:18,
    author = {Susanne Reimering and Sebastian Mu{\~{n}}oz and Alice C. McHardy},
    title = {A {F}r{\'{e}}chet tree distance measure to compare phylogeographic spread paths across trees},
    journal = {{Sci Rep}},
    year = {2018},
    volume = {8},
    number = {1},
    doi = {10.1038/s41598-018-35421-4},
    publisher = {Springer Nature},
    }
★ SANTA-SIM | Simulating viral sequence evolution dynamics under selection and recombination.

SANTA-SIM is a software package to simulate the evolution of a population of gene sequences forwards through time. It models the underlying biological processes as discrete components: replication, recombination, point mutations, insertion-deletions, and selection under various fitness models and population size dynamics. The software is designed to be intuitive to work with for a wide range of users and executable in a cross-platform manner.

  • [DOI] A. Jariani, C. Warth, K. Deforche, P. Libin, A. J. Drummond, A. Rambaut, F. A. Matsen IV, and K. Theys, “SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination,” Virus Evol, vol. 5, iss. 1, 2019.
    [Bibtex]
    @Article{Jariani:19,
    author = {Abbas Jariani and Christopher Warth and Koen Deforche and Pieter Libin and Alexei J Drummond and Andrew Rambaut and Frederick A {Matsen IV} and Kristof Theys},
    title = {{SANTA}-{SIM}: simulating viral sequence evolution dynamics under selection and recombination},
    journal = {{Virus Evol}},
    year = {2019},
    volume = {5},
    number = {1},
    doi = {10.1093/ve/vez003},
    publisher = {Oxford University Press ({OUP})},
    }
★ Sweep Dynamics (SD) plots | Computational identification of selective sweeps

Sweep Dynamics (SD) plots is a computational method combining phylogenetic algorithms with statistical techniques to characterize the molecular adaptation of rapidly evolving viruses from longitudinal sequence data. SD plots facilitate the identification of selective sweeps, the time periods in which these occurred and associated changes providing a selective advantage to the virus.

  • T. R. Klingen, S. Reimering, J. Loers, K. Mooren, F. Klawonn, T. Krey, G. Gabriel, and A. C. McHardy, “Sweep dynamics (SD) plots: computational identification of selective sweeps to monitor the adaptation of influenza A viruses,” Sci Rep, vol. 8, iss. 1, p. 373, 2018.
    [Bibtex]
    @Article{Klingen:18a,
    author = {Klingen, Thorsten R and Reimering, Susanne and Loers, Jens and Mooren, Kyra and Klawonn, Frank and Krey, Thomas and Gabriel, G{\"u}lsah and McHardy, Alice C},
    title = {Sweep Dynamics ({SD}) plots: Computational identification of selective sweeps to monitor the adaptation of influenza {A} viruses},
    journal = {{Sci Rep}},
    year = {2018},
    volume = {8},
    number = {1},
    pages = {373},
    publisher = {Nature Publishing Group},
    }
★ TaxIt | Automated computational pipeline for untargeted strain-level identification using MS/MS spectra
TaxIt is an iterative workflow for untargeted accurate strain-level classification of a priori unidentified organisms using tandem mass spectrometry, that addresses the increasing search space required for MS/MS-based strain-level classification of samples with unknown taxonomic origin. TaxIt first applies reference sequence data for initial identification of species candidates, followed by automated acquisition of relevant strain sequences for low level classification. Furthermore, proteome similarities resulting in ambiguous taxonomic assignments are addressed with an abundance weighting strategy to improve candidate confidence. TaxIt makes extensive use of public, unrestricted and continuously growing sequence resources such as the NCBI databases.

  • [DOI] M. Kuhring, J. Doellinger, A. Nitsche, T. Muth, and B. Y. Renard, “An iterative and automated computational pipeline for untargeted strain-level identification using MS/MS spectra from pathogenic samples,” bioRxiv, 2019.
    [Bibtex]
    @Article{Kuhring:19,
    author = {Mathias Kuhring and Joerg Doellinger and Andreas Nitsche and Thilo Muth and Bernhard Y. Renard},
    title = {An iterative and automated computational pipeline for untargeted strain-level identification using {MS}/{MS} spectra from pathogenic samples},
    journal = {{bioRxiv}},
    year = {2019},
    doi = {10.1101/812313},
    publisher = {Cold Spring Harbor Laboratory},
    }
VICTOR | Virus Classification and Tree Building Online Resource
The VICTOR is a web service compares bacterial and archaeal viruses using their genome or proteome sequences. The outputs include phylogenomic trees inferred using the Genome-BLAST Distance Phylogeny method (GBDP), with branch support, as well as suggestions for the classification at the species, genus, subfamily and family level.

  • [DOI] J. P. Meier-Kolthoff and M. Göker, “VICTOR: genome-based phylogeny and classification of prokaryotic viruses.,” Bioinformatics, vol. 33, p. 3396–3404, 2017.
    [Bibtex]
    @Article{Meier-Kolthoff:17,
    author = {Meier-Kolthoff, Jan P and G\"{o}ker, Markus},
    title = {{VICTOR}: genome-based phylogeny and classification of prokaryotic viruses.},
    journal = {{Bioinformatics}},
    year = {2017},
    volume = {33},
    pages = {3396--3404},
    abstract = {Bacterial and archaeal viruses are crucial for global biogeochemical cycles and might well be game-changing therapeutic agents in the fight against multi-resistant pathogens. Nevertheless, it is still unclear how to best use genome sequence data for a fast, universal and accurate taxonomic classification of such viruses. We here present a novel in silico framework for phylogeny and classification of prokaryotic viruses, in line with the principles of phylogenetic systematics, and using a large reference dataset of officially classified viruses. The resulting trees revealed a high agreement with the classification. Except for low resolution at the family level, the majority of taxa was well supported as monophyletic. Clusters obtained with distance thresholds chosen for maximizing taxonomic agreement appeared phylogenetically reasonable, too. Analysis of an expanded dataset, containing >4000 genomes from public databases, revealed a large number of novel species, genera, subfamilies and families. The selected methods are available as the easy-to-use web service 'VICTOR' at https://victor.dsmz.de. . Supplementary data are available at Bioinformatics online.},
    doi = {10.1093/bioinformatics/btx440},
    issue = {21},
    keywords = {Archaea, virology; Bacteria, virology; Computer Simulation; Genomics, methods; Phylogeny; Sequence Analysis, DNA; Software; Viruses, classification, genetics},
    pmid = {29036289},
    }
★ ViCTree
ViCTree is a bioinformatic framework that automatically selects new candidate virus sequences from GenBank, generates multiple sequence alignments, calculates a maximum likelihood phylogeny, integrates the sequences into the existing phylogenetic trees and is capable of automatically building new phylogenies when new data is available on GenBank.

  • [DOI] S. Modha, A. S. Thanki, S. F. Cotmore, A. J. Davison, and J. Hughes, “ViCTree: an automated framework for taxonomic classification from protein sequences,” Bioinformatics, vol. 34, iss. 13, p. 2195–2200, 2018.
    [Bibtex]
    @Article{Modha:18,
    author = {Sejal Modha and Anil S Thanki and Susan F Cotmore and Andrew J Davison and Joseph Hughes},
    title = {{ViCTree}: an automated framework for taxonomic classification from protein sequences},
    journal = {Bioinformatics},
    year = {2018},
    volume = {34},
    number = {13},
    pages = {2195--2200},
    doi = {10.1093/bioinformatics/bty099},
    editor = {Janet Kelso},
    publisher = {Oxford University Press ({OUP})},
    }
Metagenomics
★ CAMISIM | Simulating metagenomes and microbial communities
CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo.

All data sets and the software are freely available.

  • [DOI] A. Fritz, P. Hofmann, S. Majda, E. Dahms, J. Dröge, J. Fiedler, T. R. Lesker, P. Belmann, M. Z. DeMaere, A. E. Darling, A. Sczyrba, A. Bremges, and A. C. McHardy, “CAMISIM: simulating metagenomes and microbial communities,” Microbiome, vol. 7, iss. 1, 2019.
    [Bibtex]
    @Article{Fritz:19,
    author = {Adrian Fritz and Peter Hofmann and Stephan Majda and Eik Dahms and Johannes Dr\"{o}ge and Jessika Fiedler and Till R. Lesker and Peter Belmann and Matthew Z. DeMaere and Aaron E. Darling and Alexander Sczyrba and Andreas Bremges and Alice C. McHardy},
    title = {{CAMISIM}: simulating metagenomes and microbial communities},
    journal = {Microbiome},
    year = {2019},
    volume = {7},
    number = {1},
    doi = {10.1186/s40168-019-0633-6},
    publisher = {Springer Nature},
    }
★ EPA-ng | Evolutionary Placement Algorithm for Next Generation Sequencing
Phylogenetic placement method to determine how sequences obtained from diverse microbial environments fit into an evolutionary context.

  • [DOI] P. Barbera, A. M. Kozlov, L. Czech, B. Morel, D. Darriba, T. Flouri, and A. Stamatakis, “EPA-ng: massively parallel evolutionary placement of genetic sequences,” Syst Biol, vol. 68, iss. 2, p. 365–369, 2018.
    [Bibtex]
    @Article{Barbera:18,
    author = {Pierre Barbera and Alexey M Kozlov and Lucas Czech and Benoit Morel and Diego Darriba and Tom{\'{a}}{\v{s}} Flouri and Alexandros Stamatakis},
    title = {{EPA}-ng: Massively Parallel Evolutionary Placement of Genetic Sequences},
    journal = {{Syst Biol}},
    year = {2018},
    volume = {68},
    number = {2},
    pages = {365--369},
    doi = {10.1093/sysbio/syy054},
    editor = {David Posada},
    publisher = {Oxford University Press ({OUP})},
    }
★ LiveKraken | real-time metagenomic classification of illumina data
LiveKraken is a real-time read classification tool based on the core algorithm of Kraken. Kraken is one of the most widely used tools in metagenomics due to its robustness and speed. LiveKraken uses streams of raw data from Illumina sequencers to classify reads taxonomically, producing results identical to those of Kraken the moment the sequencer finishes. LiveKraken also provides comparable results in early stages of a sequencing run, allowing saving up to a week of sequencing time on an Illumina HiSeq in High Throughput Mode.

  • [DOI] S. H. Tausch, B. Strauch, A. Andrusch, T. P. Loka, M. S. Lindner, A. Nitsche, and B. Y. Renard, “LiveKraken––real-time metagenomic classification of illumina data,” Bioinformatics, vol. 34, iss. 21, p. 3750–3752, 2018.
    [Bibtex]
    @Article{Tausch:18,
    author = {Simon H Tausch and Benjamin Strauch and Andreas Andrusch and Tobias P Loka and Martin S Lindner and Andreas Nitsche and Bernhard Y Renard},
    title = {{LiveKraken}{\textendash}{\textendash}real-time metagenomic classification of illumina data},
    journal = {Bioinformatics},
    year = {2018},
    volume = {34},
    number = {21},
    pages = {3750--3752},
    doi = {10.1093/bioinformatics/bty433},
    editor = {Bonnie Berger},
    publisher = {Oxford University Press ({OUP})},
    }
★ RIEMS | Reliable Information Extraction from Metagenomic Sequence datasets
RIEMS is a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically.

  • [DOI] M. Scheuch, D. Höper, and M. Beer, “RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets,” BMC Bioinformatics, vol. 16, iss. 1, 2015.
    [Bibtex]
    @Article{Scheuch:15,
    author = {Matthias Scheuch and Dirk H\"{o}per and Martin Beer},
    title = {{RIEMS}: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets},
    journal = {{BMC Bioinformatics}},
    year = {2015},
    volume = {16},
    number = {1},
    doi = {10.1186/s12859-015-0503-6},
    publisher = {Springer Nature},
    }
★ vConTACT v.2.0 | Taxonomic assignment of uncultivated prokaryotic virus genomes

vConTACT is a network-based application utilizing whole genome gene-sharing profiles for virus taxonomy that integrates distance-based hierarchical clustering and confidence scores for all taxonomic predictions.

  • [DOI] H. B. Jang, B. Bolduc, O. Zablocki, J. H. Kuhn, S. Roux, E. M. Adriaenssens, R. J. Brister, A. M. Kropinski, M. Krupovic, R. Lavigne, D. Turner, and M. B. Sullivan, “Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks,” Nat Biotechnol, vol. 37, iss. 6, p. 632–639, 2019.
    [Bibtex]
    @Article{Jang:19,
    author = {Ho Bin Jang and Benjamin Bolduc and Olivier Zablocki and Jens H. Kuhn and Simon Roux and Evelien M. Adriaenssens and J. Rodney Brister and Andrew M Kropinski and Mart Krupovic and Rob Lavigne and Dann Turner and Matthew B. Sullivan},
    title = {Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks},
    journal = {{Nat Biotechnol}},
    year = {2019},
    volume = {37},
    number = {6},
    pages = {632--639},
    doi = {10.1038/s41587-019-0100-8},
    publisher = {Springer Science and Business Media {LLC}},
    }

VirSorter | Mining viral signal from microbial genomic data

VirSorter is a web-based tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. VirSorter outperforms existing tools in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages) and for fragmented genomic and metagenomic datasets. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination.

  • [DOI] S. Roux, F. Enault, B. L. Hurwitz, and M. B. Sullivan, “VirSorter: mining viral signal from microbial genomic data,” PeerJ, vol. 3, p. e985, 2015.
    [Bibtex]
    @Article{Roux:15,
    author = {Simon Roux and Francois Enault and Bonnie L. Hurwitz and Matthew B. Sullivan},
    title = {{VirSorter}: mining viral signal from microbial genomic data},
    journal = {{PeerJ}},
    year = {2015},
    volume = {3},
    pages = {e985},
    doi = {10.7717/peerj.985},
    publisher = {{PeerJ}},
    }

Please also have a look at the following publication for assessing metagenomic assemblers:

  • [DOI] A. Sczyrba, P. Hofmann, P. Belmann, D. Koslicki, S. Janssen, J. Dröge, I. Gregor, S. Majda, J. Fiedler, E. Dahms, A. Bremges, A. Fritz, R. Garrido-Oter, T. S. Jørgensen, N. Shapiro, P. D. Blood, A. Gurevich, Y. Bai, D. Turaev, M. Z. DeMaere, R. Chikhi, N. Nagarajan, C. Quince, F. Meyer, M. Balvočiūtė, L. H. Hansen, S. J. Sørensen, B. K. H. Chia, B. Denis, J. L. Froula, Z. Wang, R. Egan, D. Don Kang, J. J. Cook, C. Deltel, M. Beckstette, C. Lemaitre, P. Peterlongo, G. Rizk, D. Lavenier, Y. Wu, S. W. Singer, C. Jain, M. Strous, H. Klingenberg, P. Meinicke, M. D. Barton, T. Lingner, H. Lin, Y. Liao, G. G. Z. Silva, D. A. Cuevas, R. A. Edwards, S. Saha, V. C. Piro, B. Y. Renard, M. Pop, H. Klenk, M. Göker, N. C. Kyrpides, T. Woyke, J. A. Vorholt, P. Schulze-Lefert, E. M. Rubin, A. E. Darling, T. Rattei, and A. C. McHardy, "Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software," Nat Methods, vol. 14, p. 1063–1071, 2017.
    [Bibtex]
    @Article{Sczyrba:17,
    author = {Sczyrba, Alexander and Hofmann, Peter and Belmann, Peter and Koslicki, David and Janssen, Stefan and Dröge, Johannes and Gregor, Ivan and Majda, Stephan and Fiedler, Jessika and Dahms, Eik and Bremges, Andreas and Fritz, Adrian and Garrido-Oter, Ruben and Jørgensen, Tue Sparholt and Shapiro, Nicole and Blood, Philip D and Gurevich, Alexey and Bai, Yang and Turaev, Dmitrij and DeMaere, Matthew Z and Chikhi, Rayan and Nagarajan, Niranjan and Quince, Christopher and Meyer, Fernando and Balvočiūtė, Monika and Hansen, Lars Hestbjerg and Sørensen, Søren J and Chia, Burton K H and Denis, Bertrand and Froula, Jeff L and Wang, Zhong and Egan, Robert and Don Kang, Dongwan and Cook, Jeffrey J and Deltel, Charles and Beckstette, Michael and Lemaitre, Claire and Peterlongo, Pierre and Rizk, Guillaume and Lavenier, Dominique and Wu, Yu-Wei and Singer, Steven W and Jain, Chirag and Strous, Marc and Klingenberg, Heiner and Meinicke, Peter and Barton, Michael D and Lingner, Thomas and Lin, Hsin-Hung and Liao, Yu-Chieh and Silva, Genivaldo Gueiros Z and Cuevas, Daniel A and Edwards, Robert A and Saha, Surya and Piro, Vitor C and Renard, Bernhard Y and Pop, Mihai and Klenk, Hans-Peter and Göker, Markus and Kyrpides, Nikos C and Woyke, Tanja and Vorholt, Julia A and Schulze-Lefert, Paul and Rubin, Edward M and Darling, Aaron E and Rattei, Thomas and McHardy, Alice C},
    title = {{C}ritical {A}ssessment of {M}etagenome {I}nterpretation-a benchmark of metagenomics software},
    journal = {{Nat Methods}},
    year = {2017},
    volume = {14},
    pages = {1063--1071},
    abstract = {Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.},
    doi = {10.1038/nmeth.4458},
    issue = {11},
    keywords = {Algorithms; Benchmarking; Metagenomics; Sequence Analysis, DNA; Software},
    pmid = {28967888},
    }