Method To Assign Lineages
LINEAGE DETERMINATION
Genomic sequences of all influenza A viruses with >75% of the whole segment
length were downloaded from NCBI. Alignments were performed for each gene segment using the ClustalW
program [1]. The Neighbor-joining algorithm with the HKY-85 model selected was applied for the
construction of phylogenetic trees using the MEGA software [2].The determination of lineages was
carefully assigned by doing the following:
- All sequences with >75% of the whole segment length were analyzed for significant clusters.
- A cut-off of 10% nucleotide difference by p-distance was chosen for the lineages.
- A smaller set of sequences were used for the bootstrap analysis.
- Initial lineages were evaluated for nucleotide differences within and between other lineages and for strength of bootstrap support
- Approximately 10 sequences in each lineage were randomly selected for Maximum Likelihood (ML) analysis for each gene segment, serotype
(for HA, NA), or allele (for NS) on the multiphyl server (http://www.cs.nuim.ie/distributed/multiphyl.php).
PROCESSING METHOD
The BLAST algorithm are used for sequence comparison, because of its advantages such as fast
computation and accurate results in detecting local highly similar sequence regions. To cope with its inherent
disadvantage (i.e., not a global alignment algorithm), we developed a new parameter called "coverage" to detect
gene-wide sequence similarity (http://www.biomedcentral.com/1471-2105/7/S4/S18). The default thresholds to be used for
identifying groupings were set to be 95% identity and 95% coverage [3]. Note that the thresholds are changeable. The top
BLAST results for a user-submitted query sequence are sorted by identity and coverage, and the best result is used to
assign a lineage to the query sequence. If the BLAST hit values are below the thresholds (95% for identity and coverage),
a lineage will be assigned with an asterisk (*) indicating the query sequence does not meet criteria and may be from a new lineage.
For the determination of a genotype, the lineage of each segment sequence is first determined. The
genotype will be created by the sequential combination of the lineages for each of the eight segments in gene order. If all
segments belong to the known lineages, the genotype of the query genomic sequence will be decided. The resulting genotype
can be compared to previously identified genotypes in the genotype database. From this analysis reassortment events and host
switching can be identified. Through reassortment, it is possible that new combinations of genotypes not in our database
will be found. Additionally, if one or more segment sequences are found not to belong to any known lineages, the query virus
may be a novel genotype.
IMPLEMENTATION
The Web interface and database were implemented with the LAMP strategy. LAMP stands for Linux, a
popular operating system, Apache, the most commonly-used Web server, MySQL, a relational database management system (RDBMS),
and either PHP, Perl or Python, popular object-oriented scripting languages for web development. Because of its stability
and reliability, LAMP currently predominates existing wet sites.
REFERENCES
- Thompson, J.D., T.J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins, The CLUSTAL_X windows interface: flexible
strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res, 1997. 25(24): p. 4876-82.
- Kumar, S., K. Tamura, and M. Nei. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence
alignment. Briefings in Bioinformatics, 2004. 5:150-163.
- Lu, G., L. Jiang, et al. GenomeBlast: a web tool for small genome comparison. BMC Bioinformatics, 2006. 7(Suppl 4): S18.