Filter Analysis¶

With this form we offer different options for filtering your variants based on the fields provided by different databases and tools such as Annovar, Snpeff, VEP, dbSNP, 1000Genomes, Exome Sequencing Project and some others.

One Click¶

This method was implemented with the idea in mind that the doctors/researchers would always have an idea about the most possible inheritance type of the disease from the family in study. So after selecting beetween Recessive Homozygous, Recessive Heretozygous, Dominant or X-linked the use would get quickly a small list of variants that fits the selected inheritance mode.

We do this by defining the attributes according to the table describe above:

Family Analysis¶

This method was created to allow the analysis of exomes from families.

Formats¶

In order to fully understand this filtering process it’s usually required that you have some knowledge about some different formats, tools and scores of predictions and conservation.

Please before you start filtering your variants using Mendel,MD it’s recommended that you read the following links:

VCF

http://1000genomes.org/wiki/Analysis/Variant Call Format/vcf-variant-call-format-version-41

SnpEff

http://snpeff.sourceforge.net/SnpEff_manual.html

VEP

http://www.ensembl.org/info/genome/variation/predicted_data.html#consequences http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#output

Quick Start¶

The method of filtering variants is based on different criterias that are grouped by tabs.

We have 5 main tabs that offer you filtering options: Individuals, VCF, Annovar, SnpEff and VEP.

Each one of this tabs have different options that enables you to filter variants from your individual.

Individuals Options¶

On this section you have two columns “Select Variants From” and “Exclude Variants From”. The first column enables you to include individuals that you uploaded, snps from dbsnp138 and genes symbols from refseq that you want to see in your results. The second column is where you can choose also individuals, snps and genes that you want to exclude from your results.

VCF Options¶

This Desction let you select fields from your VCF File(s).

Mutation Types

With this field you can choose beetween mohozygous or heterzygous variants.

Ex. Homozygous (Ex. 1/1, 1/2, 2/3 and etc) => for recessive models on inheritance: Heterozygous (Ex. 0/1, 0/2, 0/2 and etc) => for dominant models on inheritance

Usage: This is usefull if you want you want to search for recessive and dominant models of inheritance.

Chr

With this field you can choose to see only variants from a single chromossome. Ex. X, Y, 1, 2 ... 22

-Pos This option let you choose variants in certain regions of a chromossome based in a string.

Ex. Chr 1: Pos:12345-12360

(this will return all variants beetween the positions 12345-12360 of Chromossome 1)

Filter Type

With this option you can select variants based on the FILTER column of your VCF.

Ex. PASS, LowQual, repeat, SnpCluster Usage: You can select variants that are only PASS in you VCF file(s)

dbSNP Build

With this option you can choose only variants after a certain build of last version of dbSNP Ex. >= 130 (will get only variants after dbsnp 130)

NOTE: dbSNP 129 is generally regarded as the last “clean” dbSNP without “contamination” from 1000 Genomes Project and other large-scale next-generation sequencing projects. Many published papers utilize dbSNP129 only. Source: http://www.openbioinformatics.org/annovar/annovar_filter.html#dbsnp

Read Depth

This option should be choosen based on the mean coverage of the individuals (exomes, genomes) that you selected under the previous section “Individuals” Ex. >= 30 (This will select only variants with more than 29 reads of coverage and can be used for filtering an exome with 80X of coverage)

Qual

QUAL phred-scaled quality score for the assertion made in ALT. i.e. -10log_10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log_10 p(variant), and if ALT is not ”.” this is -10log_10 p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. If unknown, the missing value should be specified. (Numeric) This option let you choose beetwen the quality score of the variant based on the value provided by VCF. Ex. >= 50

Source: www.1000genomes.org/wiki/Analysis/Variant Call Format/vcf-variant-call-format-version-41

Records per gene

This option let you choose how many different variants per gene you want to see in your results Ex. <= 2 (will show only variants in genes that have less than or equal 2 variants per gene for each individual)

Show only records in genes common to all the individuals selected

With this option you can search for variants in genes that are common to all individuals selected.

Annovar Options¶

List of fields that can be filtered in Mendel,MD: all from annovar, snpeff and VEP.

Annovar Fields

Source: http://www.openbioinformatics.org/annovar/annovar_db.html

Func.refGene
Gene.refGene
ExonicFunc.refGene
AAChange.refGene
phastConsElements46way
genomicSuperDups
esp6500si_all
1000g2012apr_all
LJB2_SIFT
LJB2_PolyPhen2_HDIV
LJB2_PP2_HDIV_Pred
LJB2_PolyPhen2_HVAR
LJB2_PolyPhen2_HVAR_Pred
LJB2_LRT
LJB2_LRT_Pred
LJB2_MutationTaster
LJB2_MutationTaster_Pred
LJB_MutationAssessor
LJB_MutationAssessor_Pred
LJB2_FATHMM
LJB2_GERP++
LJB2_PhyloP
LJB2_SiPhy
cosmic65
avsift

Description of the Fields¶

Func refGene¶

With this option you can choose the region of your variant: Ex. [‘splicing’, ‘UTR5’, ‘ncRNA_exonic’, ‘intergenic’, ‘intronic’, ‘UTR3’, ‘exonic’, ‘upstream’, ‘ncRNA_intronic’]

Gene.refGene¶

Description: Known human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq)

Usage: With this option you can choose the gene Simbol of a variant. Ex. SUCLA2

Source: http://www.ncbi.nlm.nih.gov/refseq/

ExonicFunc.refGene¶

Description: Exonic variant function (non-synonymous, synonymous, etc)

Usage:[‘frameshift_insertion’, ‘frameshift_deletion’, ‘frameshift_block_substitution’, ‘stopgain’, ‘stoploss’, ‘nonframeshift_insertion’, ‘nonframeshift_deletion’, ‘nonframeshift_block_substitution’, ‘nonsynonymous_SNV’, ‘synonymous_SNV’, ‘unknown’]

AAChange.refGene¶

Description: Amino acid changes

phastConsElements46way¶

Description: Conserved elements produced by the phastCons program based on a whole-genome alignment of vertebrates. There is no specific recommended cutoff for highly conserved elements.

PhastCons (which has been used in previous Conservation tracks) is a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP separately measures conservation at individual columns, ignoring the effects of their neighbors. As a consequence, the phyloP plots have a less smooth appearance than the phastCons plots, with more “texture” at individual sites. The two methods have different strengths and weaknesses. PhastCons is sensitive to “runs” of conserved sites, and is therefore effective for picking out conserved elements. PhyloP, on the other hand, is more appropriate for evaluating signatures of selection at particular nucleotides or classes of nucleotides (e.g., third codon positions, or first positions of miRNA target sites).

Usage: With this option you can select only varinats that are conserved among 46 vertebrate species

Sources: http://compgen.bscb.cornell.edu/phast/phastCons-HOWTO.html http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way

genomicSuperDups¶

Description: Segmental duplications in genome. Duplications of >1000 Bases of Non-RepeatMasked Sequence (>90 percent similar)

Regions detected as putative genomic duplications. For overlap with each of those regions another chromosome and location are reported.

Usage: With this option you can exclude variants in regions of Segmental duplication

esp6500si_all¶

Description: alternative allele frequency in all subjects in the NHLBI-ESP project with 6500 exomes. Usage: With this option you can select only variants with a certain frequency in this databse. There is also an option to exclude all variants seen in this database. Source: http://evs.gs.washington.edu

1000g2012apr_all¶

Description: alternative allele frequency data in 1000 Genomes Project (1000g2012apr) Usage: With this option you can select only variants with a certain frequency in this databse. There is also an option to exclude all variants seen in this database. Source: http://1000genomes.org

LJB2_SIFT¶

Description: SIFT predicts whether an amino acid substitution affects protein function. SIFT prediction is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST. SIFT can be applied to naturally occurring nonsynonymous polymorphisms or laboratory-induced missense mutations.

Usage: Ex. <= 0.05

With this option you can select only variants predicted as “damaging” by LJB2_SIFT. You can also exclude all variants that does not have a sift score from your results. obs: remember that indels and frameshifts doesn’t have sift scores.

Source: http://sift.jcvi.org/

Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-81. PubMed PDF

LJB2_PolyPhen2_HDIV¶

Description: Whole-exome PolyPhen scores built on HumanDiv database (for complex phenotypes)

PolyPhen-2 (Polymorphism Phenotyping v2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Please, use the form below to submit your query.

Usage:

Source: http://genetics.bwh.harvard.edu/pph2/bgi.shtml

Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. 2010. A method and server for predicting damaging missense mutations. Nature Methods 7: 248–249.