BII Flusurver - Frequently Asked Questions

FluSurver: Frequently Asked Questions

What is the FluSurver?

The FluSurver is a research tool developed to help the influenza research community with the identification, analysis and interpretation of mutations in influenza sequences.
Back to Top

What can it do?

The FluSurver allows researchers, clinician scientists and surveillance labs to rapidly screen their influenza sequences for potentially interesting mutations to identify candidates for phenotypic changes or special epidemiological relevance. For the latter, we provide geographic and temporal frequency of occurrence as well as co-occurrence of mutations. For phenotypic changes we utilize our in-house database of curated literature annotations for mutation effects such as drug resistance, host receptor specificity, virulence, antigenic drift and antibody escape mutants. We also show the position of the mutation(s) in structural models and highlight if mutations are close to common drug, host receptor or antibody binding sites or if a glycosylation motif is lost or created through a mutation. The FluSurver has already been instrumental in the discovery of new influenza strain variants with altered antiviral susceptibility, host specificity, glycosylation and antigenic properties.

Please also read the next two paragraphs to help avoiding misinterpretation of analysis results.
Back to Top

Best usage scenarios and common misconceptions.

Our curated reference sequences used for annotation transfer of equivalent mutations are mainly comprised of strains that recently infected humans. Therefore, the usage scenario that will give the most fruitful and reliable results are current surveillance sequences with very close relation to used vaccine strains, including some candidates for avian flu and novel reassortant swine flus. While we may add more animal host influenza viruses in future, the current clear bias is towards strains that are known to infect humans. We are open to add more reference strains suggested by any serious side as well as regularly update the respective vaccine strains.

Related to above, we do not do a BLAST search against all available flu strains in databases in the first step but only against the limited set of selected reference strains. This limitation is necessary as we annotate each reference strain including human quality control steps to check alignments with each other (to allow identification of equivalent positions), structural models, sites of small ligand or antibody binding, mutation occurrence statistics (including geo-mapping) etc. However, we do provide a link in the FluSurver output for each sequence that allows to identify the best database hit in GenBank using our Tachyon search tool.

FluSurver is not suited to detect reassortments. For several used reference sequences, only HA and NA sequences were available which limits the ability to look at the other genes/segments in all possible contexts. Given that we only compare query sequences with the small set of annotated reference sequences, it makes no sense in most cases to interpret hits to different reference strains as reassortments. Instead there are other tools available for this purpose, e.g. GiRaF or FluReF.
Back to Top

Special notes for using results in publications.

The main intention for the FluSurver research tool is to allow highlighting phenotypically or epidemiologically interesting candidate mutations for further research and should ideally be combined with experimental testing and verification of any predicted phenotypes. Importantly, any direct diagnostic use, assumed severity or recommendation on patient treatment should not be based solely on these computational predictions. The FluSurver mutation effect annotation results are based on knowledge transfer by similarity to mutations studied in specific sequence contexts which in most cases will not be identical to the one of the user input sequences. For this, the simple rule applies that the closer your sequence is to the one for which the phenotype has been reported (e.g. <20 mutations for long and <10 mutations for short sequences), the more likely a similar effect can be expected for your mutation.

Inclusion of results for publications of any potential phenotypic changes highlighted by FluSurver need to be substantiated by careful analysis and consideration of the evidence leading to the assumed effect by reading and understanding the associated literature (links provided in mutation summary report) as well as any accompanying further experimental, clinical and/or epidemiological data.

Naturally, given that the FluSurver results are purely computationally derived and require careful expert judgement, the unfiltered results are not suitable for direct communication to the general public or any kind of publication without proper peer review by the influenza research community.

If you are in doubt how to interpret or communicate the FluSurver results, please feel free to contact us (sebastianms@bii.a-star.edu.sg) for advice.
Back to Top

What kind of information is being curated in the FluSurver project?
Although the user only sees the agglomerated cross-linked results in the FluSurver output, under the hood we essentially use and curate 5 different databases. The first is a selection of reference sequences which is mainly comprised of current or recent vaccine strains as well as strains of particular interest for research and/or causing human infections. This database includes a curated MAFFT L-INS-I alignment of the reference strains as well as a residue position mapping to allow linking up the respective equivalent mutation positions among strains. Importantly, this also includes a disambiguation for different used numbering schemes (e.g. H3, H1, H1pdm literal...).

The second database stores information on mutations that are known to affect drug resistance, alter virulence, cause antigenic drift or host specificity shifts as curated by our group from the literature. This includes over 200 mutations with information extracted from several hundred publications. Accompanying information such as the subtype, host, protein, strain and PubMed references for the mutation effect are also provided.

The third database includes structural models for all reference sequences (whenever a suitable template is available). For this we developed a homology modelling pipeline (using BLAST, MAFFT, MODELLER and our own scripts to combine them) that creates structural models for all proteins of the included reference strains. As such automated procedure could still produce errors in some models, we systematically check all models before uploading them to FluSurver. These models are used to highlight all mutated positions together in their structural context.

Related to the above, the fourth database is derived through another pipeline to annotate structural positions of mutations based on processing all known influenza crystal structures in PDB and identifying positions as being close to bound small molecules such as drugs, host receptor sialic acids or carbohydrates, or proteins such as antibodies or other host proteins as well as positions involved in viral oligomerization. This structural interaction context of mutations is incorporated as links from the mutation summary.

Finally, the fifth database stores all mutation occurrence infromation. It is currently derived from viral sequences that are downloaded from the NCBI Virus Resource on a weekly basis. These sequences are aligned and compared with various reference sequences to count individual mutation occurrences as well as co-occurrences. Since flu sequences most often include date of collection and geographical location we provide this information in associated tables as well as a global occurrence map using the Google map API.
Back to Top

Will I be able to add information of the effects of a mutation not yet reflected by FluSurver?
Searching with the keywords "influenza" and "mutation" in PubMed gives a new paper appearing on average every 2 days. Since manual inspection of the flood of new papers is a tedious and difficult task, we are very happy for suggested new mutation effect reports. You may send an email to sebastianms@bii.a-star.edu.sg or leetc@bii.a-star.edu.sg and we will try to include it into the FluSurver.
Back to Top

What are interestlevels?
We use "interestlevels" as simplified classification of the estimated significance of a mutation based on expected or known effects. In the downloadable tabular output we use numbers to indicate the interestlevels ranging from 0-3, 0 being the least significant and 3 being the most significant. In the graphical output, we use color identified mutations based on their interestlevels (see below).
Back to Top

What do the colors of the mutations mean?
The mutations are color-coded according to their known or predicted biological effect significance. When there are no known effects for the mutation, the mutation will appear in black colored font and assigned interestlevel 0 (least significant). When the mutation is a common subtype marker, the mutation will appear in green colored font and assigned interestlevel 0 (least significant). Mutations occurring at a site of interaction will appear in blue colored font and assigned interestlevel 1 (moderately significant). If the mutation occurs at a site known to involved in drug-binding or alters host-cell specificity, it will appear in orange and assigned interestlevel 2 (significant). Mutations will also appear in orange and assigned interestlevel 2 when its equivalent site is known to result in antigenic shifts or causes mild drug resistance. Mutations that create or remove a potential glycosylation site are colored magenta with assigned interestlevel 2. Only mutations that are known to alter the virulence of the virus, cause strong drug resistance or reverses the effects of the premature STOP codon in the PB1-F2 gene of pandemic H1N1 will appear in red and assigned interestlevel 3 (most significant).
Back to Top

How are the global mutation data obtained?
Viral sequences are downloaded from the NCBI Virus Resource on a weekly basis. These sequences are aligned and compared with various reference sequences. Using associated information such as date of collection and geographical location, the Flusurver is capable of generating global occurrence statistics of the relevant mutations.
Back to Top

I am uncertain of the information available for my mutation of interest, how can i find out more about the mutation?
A mutation summary can be accessed from the first output page by clicking on the respective mutation of interest. Further hyperlinks are provided within each report for additional details behind each annotation statement, including literature links where available. More information on how to use and interpret the mutation report can also be found in the tutorial.
Back to Top

I think my mutation of interest is causing some effects. However, there is very limited meaningful information from the literature. What else can be done?
You may write to us about your problem. The hosting research institute also offers more manual computational follow-up analyses such as molecular dynamics simulations and other structure calculations (stability, drug binding, host receptor binding, glycosylation modelling) and a variety of bioinformatics approaches (whole genome phylogenetic analysis, monophyletic clade analysis, etc.) to examine mutations if there are mutual interests in collaborations.
Back to Top

Do you have a tutorial on how to use the FluSurver?
Yes. The FluSurver tutorial can be found here.
Back to Top

How can I cite the FluSurver?
The manuscript for the FluSurver is currently in preparation. For now, if you find FluSurver useful, please drop us an email to let us know and cite the website URL (http://flusurver.bii.a-star.edu.sg). Please come back here for updates of how you may cite the FluSurver in future.
Back to Top

Who is behind the FluSurver?
The FluSurver has been conceived of by Sebastian Maurer-Stroh and developed together with his group at the A*STAR Bioinformatics Institute (BII) in Singapore since 2009. Many current and former (*) colleagues from BII contributed critically to its development and maintenance, including:

Sebastian Maurer-Stroh
Raphael Tze Chuen Lee
Vachiranee Limviphuvadh
Jianmin Ma
Fernanda L Sirota
Vithiagaran Gunalan
Swe Swe Thet Paing*
Narumol Doungpan*
Joy Xiang*
and of course our director Frank Eisenhaber who continues supporting the project with enthusiasm.
Back to Top

Further acknowledgements
The idea for FluSurver arose out of the need to make sense out of the rapidly increasing amount of influenza sequences as a result of the swine flu pandemic as well as more generally available and cheaper sequencing methods. We are very grateful to our collaborators that provided sequences for analysis and helped shape FluSurver into a tool useful for a whole scientific community. These include, in chronological order, the Genome Institute of Singapore (GIS), INMEGEN Mexico City, National Public Health Laboratory (NPHL) of the Ministry of Health Singapore, IAL Sao Paulo, the WHO Collaborating Centre for Reference and Research on Influenza and the Global Initiative for Sharing All Influenza Data (GISAID).
Back to Top