|
- What is the FluSurver?
- What can it do?
- Best usage scenarios and
common misconceptions.
- Special notes for using
results in publications.
- What kind of information
is being curated in the FluSurver project?
- Will I be able to add
information of the effects of a mutation not yet
reflected by FluSurver?
- What are interestlevels?
- What do the colors of the
mutations mean?
- How are the global
mutation data obtained?
- I am uncertain of the
information available for my mutation of interest, how
can I find out more about the mutation?
- I think my mutation of
interest is causing some effects. However, there is
very limited meaningful information from the
literature. What else can be done?
- Do you have a tutorial
on how to use the FluSurver?
- How can I cite the
Flusurver?
- Who is behind the
FluSurver?
- Further
Acknowledgements.
- What is the FluSurver?
The
FluSurver is a research tool developed to help the
influenza research community with the identification,
analysis and interpretation of mutations in influenza
sequences.
Back to Top
- What can it do?
The
FluSurver allows researchers, clinician scientists and
surveillance labs to rapidly screen their influenza
sequences for potentially interesting mutations to
identify candidates for phenotypic changes or special
epidemiological relevance. For the latter, we provide
geographic and temporal frequency of occurrence as
well as co-occurrence of mutations. For phenotypic
changes we utilize our in-house database of curated
literature annotations for mutation effects such as
drug resistance, host receptor specificity, virulence,
antigenic drift and antibody escape mutants. We also
show the position of the mutation(s) in structural
models and highlight if mutations are close to common
drug, host receptor or antibody binding sites or if a
glycosylation motif is lost or created through a
mutation. The FluSurver has already been instrumental
in the discovery of new influenza strain variants with
altered antiviral susceptibility, host specificity,
glycosylation and antigenic properties.
Please also read the next two paragraphs to help
avoiding misinterpretation of analysis results.
Back to Top
- Best usage scenarios and common
misconceptions.
Our
curated reference sequences used for annotation
transfer of equivalent mutations are mainly comprised
of strains that recently infected humans. Therefore, the
usage scenario that will give the most fruitful and
reliable results are current surveillance sequences
with very close relation to used vaccine strains,
including some candidates for avian flu and novel
reassortant swine flus. While we may add more
animal host influenza viruses in future, the current
clear bias is towards strains that are known to infect
humans. We are open to add more reference strains
suggested by any serious side as well as regularly
update the respective vaccine strains.
Related to above, we do not do a BLAST search against
all available flu strains in databases in the first
step but only against the limited set of selected
reference strains. This limitation is necessary as we
annotate each reference strain including human quality
control steps to check alignments with each other (to
allow identification of equivalent positions),
structural models, sites of small ligand or antibody
binding, mutation occurrence statistics (including
geo-mapping) etc. However, we do provide a link in the
FluSurver output for each sequence that allows to
identify the best database hit in GenBank using our Tachyon
search tool.
FluSurver is not suited to detect reassortments.
For several used reference sequences, only HA and NA
sequences were available which limits the ability to
look at the other genes/segments in all possible
contexts. Given that we only compare query sequences
with the small set of annotated reference sequences,
it makes no sense in most cases to interpret hits to
different reference strains as reassortments. Instead
there are other tools available for this purpose, e.g.
GiRaF or FluReF.
Back to Top
- Special notes for using results
in publications.
The main
intention for the FluSurver research tool is to allow
highlighting phenotypically or epidemiologically
interesting candidate mutations for further research
and should ideally be combined with experimental
testing and verification of any predicted phenotypes.
Importantly, any direct diagnostic use, assumed
severity or recommendation on patient treatment should
not be based solely on these computational
predictions. The FluSurver mutation effect annotation
results are based on knowledge transfer by similarity
to mutations studied in specific sequence contexts
which in most cases will not be identical to the one
of the user input sequences. For this, the simple rule
applies that the closer your sequence is to the one
for which the phenotype has been reported (e.g. <20
mutations for long and <10 mutations for short
sequences), the more likely a similar effect can be
expected for your mutation.
Inclusion of results for publications of any potential
phenotypic changes highlighted by FluSurver need to be
substantiated by careful analysis and consideration of
the evidence leading to the assumed effect by reading
and understanding the associated literature (links
provided in mutation summary report) as well as any
accompanying further experimental, clinical and/or
epidemiological data.
Naturally, given that the FluSurver results are purely
computationally derived and require careful expert
judgement, the unfiltered results are not suitable for
direct communication to the general public or any kind
of publication without proper peer review by the
influenza research community.
If you are in doubt how to interpret or communicate
the FluSurver results, please feel free to contact us
(sebastianms@bii.a-star.edu.sg) for advice.
Back to Top
- What kind of information is
being curated in the FluSurver project?
Although
the user only sees the agglomerated cross-linked
results in the FluSurver output, under the hood we
essentially use and curate 5 different databases. The
first is a selection of reference sequences which is
mainly comprised of current or recent vaccine strains
as well as strains of particular interest for research
and/or causing human infections. This database
includes a curated MAFFT L-INS-I alignment of the
reference strains as well as a residue position
mapping to allow linking up the respective equivalent
mutation positions among strains. Importantly, this
also includes a disambiguation for different used
numbering schemes (e.g. H3, H1, H1pdm literal...).
The second database stores information on mutations
that are known to affect drug resistance, alter
virulence, cause antigenic drift or host specificity
shifts as curated by our group from the literature.
This includes over 200 mutations with information
extracted from several hundred publications.
Accompanying information such as the subtype, host,
protein, strain and PubMed references for the mutation
effect are also provided.
The third database includes structural models for all
reference sequences (whenever a suitable template is
available). For this we developed a homology modelling
pipeline (using BLAST, MAFFT, MODELLER and our own
scripts to combine them) that creates structural
models for all proteins of the included reference
strains. As such automated procedure could still
produce errors in some models, we systematically check
all models before uploading them to FluSurver. These
models are used to highlight all mutated positions
together in their structural context.
Related to the above, the fourth database is derived
through another pipeline to annotate structural
positions of mutations based on processing all known
influenza crystal structures in PDB and identifying
positions as being close to bound small molecules such
as drugs, host receptor sialic acids or carbohydrates,
or proteins such as antibodies or other host proteins
as well as positions involved in viral
oligomerization. This structural interaction context
of mutations is incorporated as links from the
mutation summary.
Finally, the fifth database stores all mutation
occurrence infromation. It is currently derived from
viral sequences that are downloaded from the NCBI
Virus Resource on a weekly basis. These sequences are
aligned and compared with various reference sequences
to count individual mutation occurrences as well as
co-occurrences. Since flu sequences most often include
date of collection and geographical location we
provide this information in associated tables as well
as a global occurrence map using the Google map API.
Back to Top
- Will I be able to add
information of the effects of a mutation not yet
reflected by FluSurver?
Searching
with the keywords "influenza" and "mutation" in PubMed
gives a new paper appearing on average every 2 days.
Since manual inspection of the flood of new papers is
a tedious and difficult task, we are very happy for
suggested new mutation effect reports. You may send an
email to sebastianms@bii.a-star.edu.sg or
leetc@bii.a-star.edu.sg and we will try to include it
into the FluSurver.
Back to Top
- What are interestlevels?
We use
"interestlevels" as simplified classification of the
estimated significance of a mutation based on expected
or known effects. In the downloadable tabular output
we use numbers to indicate the interestlevels ranging
from 0-3, 0 being the least significant and 3 being
the most significant. In the graphical output, we use
color identified mutations based on their
interestlevels (see below).
Back to Top
- What do the colors of the
mutations mean?
The
mutations are color-coded according to their known or
predicted biological effect significance. When there
are no known effects for the mutation, the mutation
will appear in black
colored font and assigned interestlevel 0 (least
significant). When the mutation is a common subtype
marker, the mutation will appear in green
colored font and assigned interestlevel 0 (least
significant). Mutations occurring at a site of
interaction will appear in blue
colored font and assigned interestlevel 1 (moderately
significant). If the mutation occurs at a site known
to involved in drug-binding or alters host-cell
specificity, it will appear in orange
and assigned interestlevel 2 (significant). Mutations
will also appear in orange
and assigned interestlevel 2 when its equivalent site
is known to result in antigenic shifts or causes mild
drug resistance. Mutations that create or remove a potential
glycosylation site are colored magenta
with assigned interestlevel 2. Only mutations that are known to
alter the virulence of the virus, cause strong drug
resistance or reverses the effects of the premature
STOP codon in the PB1-F2 gene of pandemic H1N1 will
appear in red and
assigned interestlevel 3 (most significant).
Back to Top
- How are the global mutation
data obtained?
Viral
sequences are downloaded from the NCBI Virus Resource
on a weekly basis. These sequences are aligned and
compared with various reference sequences. Using
associated information such as date of collection and
geographical location, the Flusurver is capable of
generating global occurrence statistics of the
relevant mutations.
Back to Top
- I am uncertain of the
information available for my mutation of interest,
how can i find out more about the mutation?
A
mutation summary can be accessed from the first output
page by clicking on the respective mutation of
interest. Further hyperlinks are provided within each
report for additional details behind each annotation
statement, including literature links where available.
More information on how to use and interpret the
mutation report can also be found in the tutorial.
Back to Top
- I think my mutation of
interest is causing some effects. However, there is
very limited meaningful information from the
literature. What else can be done?
You may
write to us about your problem. The hosting research
institute also offers more manual computational
follow-up analyses such as molecular dynamics
simulations and other structure calculations
(stability, drug binding, host receptor binding,
glycosylation modelling) and a variety of
bioinformatics approaches (whole genome phylogenetic
analysis, monophyletic clade analysis, etc.) to
examine mutations if there are mutual interests in
collaborations.
Back to Top
- Do you have a tutorial on how
to use the FluSurver?
Yes. The
FluSurver tutorial can be found here.
Back to Top
- How can I cite the FluSurver?
The
manuscript for the FluSurver is currently in
preparation. For now, if you find FluSurver useful,
please drop us an email to let us know and cite the
website URL (http://flusurver.bii.a-star.edu.sg).
Please come back here for updates of how you may cite
the FluSurver in future.
Back to Top
- Who is behind the FluSurver?
The
FluSurver has been conceived of by Sebastian
Maurer-Stroh and developed together with his group at
the A*STAR Bioinformatics Institute (BII) in Singapore
since 2009. Many current and former (*) colleagues
from BII contributed critically to its development and
maintenance, including:
Sebastian Maurer-Stroh
Raphael Tze Chuen Lee
Vachiranee Limviphuvadh
Jianmin Ma
Fernanda L Sirota
Vithiagaran Gunalan
Swe Swe Thet Paing*
Narumol Doungpan*
Joy Xiang*
and of course our director Frank Eisenhaber who
continues supporting the project with enthusiasm.
Back to Top
- Further acknowledgements
The
idea for FluSurver arose out of the need to make sense
out of the rapidly increasing amount of influenza
sequences as a result of the swine flu pandemic as
well as more generally available and cheaper
sequencing methods. We are very grateful to our
collaborators that provided sequences for analysis and
helped shape FluSurver into a tool useful for a whole
scientific community. These include, in chronological
order, the Genome Institute of Singapore (GIS),
INMEGEN Mexico City, National Public Health Laboratory
(NPHL) of the Ministry of Health Singapore, IAL Sao
Paulo, the WHO Collaborating Centre for Reference and
Research on Influenza and the Global Initiative for
Sharing All Influenza Data (GISAID).
Back to Top
|
|
|
|