FluGenome: Genotyping Influenza A Viruses with Full
Genome Sequences



    The influenza A virus genome is constituted by 8 separate RNA segments (Figure 1). These segments are frequently exchanged between different viruses by a process known as genetic reassortment. Gene exchange can give rise to viruses with novel antigenic and replicative properties. As a result of progress in sequencing technology and instrumentation, complete sequencing of influenza genomes is now becoming part of the routine analysis and characterization of these viruses. Genetic reassortment can be identified by analyzing the sequences of whole virus genomes. However, current bioinformatics tools are not available to analyze segmented virus genomes. Specialized bioinformatics tools to analyze genetic reassortment are needed to help understand its role in host range, virulence and transmissibility of influenza viruses. This web server was developed to meet the growing need to genotype Influenza A viruses.

Figure 1
Figure 1. Co-infection with two different strains of influenza usually gives rise to progeny viruses with mixtures of genes derived from the parental strains.

    Two nomenclature conventions are used routinely in influenza research: 1) the 8 segments in the influenza A genome are numbered from 1 to 8 for PB2, PB1, PA, HA, NP, NA, M, and NS respectively; 2) There are currently 16 subtypes for hemagglutinin (HA), 9 subtypes for neuraminidase (NA), and 2 alleles for nonstructural (NS) proteins. Since influenza A viruses have a complicated genomic structure, we approached genotyping by studying each gene segment separately at first. According to the conventions and the fact that the evolution rate varies from segment to segment, we define a genotype as a sequential combination of the lineages for each of the eight segments in genome, where a letter is assigned to each lineage of PB2, PB1, PA, NP, and M, and a number with a letter was assigned to each lineage of HA, NA, and NS with the number representing the subtype or allele. For example, [A, B, C, 5A, A, 1A, A, 2C] is the genotype of an H5N1 virus with the lineage A for the PB2 segment, B for PB1, C for PA, 5A for HA, and so on (the first gene listed being gene 1, the second being gene 2, and so on). The use of a nomenclature for influenza A virus genotypes is important, since it will allow researchers to describe influenza A virus genotypes in an equivocal way and avoid the confusion when a genotype is labeled differently by researchers.

RUN OPTIONS

    FluGenome has two running options.

  • Segment Lineage - When you want to determine the lineage of a single segment for one or many viruses.
  • Genotyping - When you want to determine the genotype of a full genome for one or many viruses. You do not need all segments to run the genotyping option - but you will not get a complete genotype without it.

INPUT FORMATS

    For each option sequences can be pasted into the window in FASTA format or FASTA files can be uploaded from your computer. For the segment option the FASTA file can contain as many viruses as you would like, but must be the same gene segment. For the genotyping option a FASTA files with the gene segments for a single virus can be uploaded. Please see Sample Data for examples of file formats. The sequences are then compared against the database of segment lineages using our BLAST algorithm. The top BLAST results for a user-submitted query sequence are sorted by identity and coverage, and the best result is used to assign a lineage to the query sequence. If the BLAST hit values are below the thresholds (95% for identity and coverage), a lineage will be assigned with an asterisk (*) indicating the query sequence does not meet criteria and may be from a new lineage. If no data is entered "missing data" will appear in the results.

DATABASES

    FluGenome database contains three tables, i.e., Segment, Genome, and Genotype. The Segment table includes around 30,000 records of gene segments, each record with detailed information of strain name, segment, serotype, host, country, year, GenBank accession number, nucleotide sequence, and sequence length. The genome table includes 7889 records of virus genomes, where each entry has information of genotype, accession numbers, and other information associated with the virus. When two or more sequences were available for a gene segment, the longest sequence's accession number was used. The Genotype table has 156 unique genotypes with lineage information of all eight gene segments.

Read More