Method To Assign Lineages


LINEAGE DETERMINATION

    Genomic sequences of all influenza A viruses with >75% of the whole segment length were downloaded from NCBI. Alignments were performed for each gene segment using the ClustalW program [1]. The Neighbor-joining algorithm with the HKY-85 model selected was applied for the construction of phylogenetic trees using the MEGA software [2].The determination of lineages was carefully assigned by doing the following:
  1. All sequences with >75% of the whole segment length were analyzed for significant clusters.
  2. A cut-off of 10% nucleotide difference by p-distance was chosen for the lineages.
  3. A smaller set of sequences were used for the bootstrap analysis.
  4. Initial lineages were evaluated for nucleotide differences within and between other lineages and for strength of bootstrap support
  5. Approximately 10 sequences in each lineage were randomly selected for Maximum Likelihood (ML) analysis for each gene segment, serotype (for HA, NA), or allele (for NS) on the multiphyl server (http://www.cs.nuim.ie/distributed/multiphyl.php).

PROCESSING METHOD

    The BLAST algorithm are used for sequence comparison, because of its advantages such as fast computation and accurate results in detecting local highly similar sequence regions. To cope with its inherent disadvantage (i.e., not a global alignment algorithm), we developed a new parameter called "coverage" to detect gene-wide sequence similarity (http://www.biomedcentral.com/1471-2105/7/S4/S18). The default thresholds to be used for identifying groupings were set to be 95% identity and 95% coverage [3]. Note that the thresholds are changeable. The top BLAST results for a user-submitted query sequence are sorted by identity and coverage, and the best result is used to assign a lineage to the query sequence. If the BLAST hit values are below the thresholds (95% for identity and coverage), a lineage will be assigned with an asterisk (*) indicating the query sequence does not meet criteria and may be from a new lineage.

    For the determination of a genotype, the lineage of each segment sequence is first determined. The genotype will be created by the sequential combination of the lineages for each of the eight segments in gene order. If all segments belong to the known lineages, the genotype of the query genomic sequence will be decided. The resulting genotype can be compared to previously identified genotypes in the genotype database. From this analysis reassortment events and host switching can be identified. Through reassortment, it is possible that new combinations of genotypes not in our database will be found. Additionally, if one or more segment sequences are found not to belong to any known lineages, the query virus may be a novel genotype.

IMPLEMENTATION

    The Web interface and database were implemented with the LAMP strategy. LAMP stands for Linux, a popular operating system, Apache, the most commonly-used Web server, MySQL, a relational database management system (RDBMS), and either PHP, Perl or Python, popular object-oriented scripting languages for web development. Because of its stability and reliability, LAMP currently predominates existing wet sites.

REFERENCES
  1. Thompson, J.D., T.J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins, The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res, 1997. 25(24): p. 4876-82.
  2. Kumar, S., K. Tamura, and M. Nei. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Briefings in Bioinformatics, 2004. 5:150-163.
  3. Lu, G., L. Jiang, et al. GenomeBlast: a web tool for small genome comparison. BMC Bioinformatics, 2006. 7(Suppl 4): S18.