- Establish the correct gene(s) – variant pairs using a genome browser; variants may also be located outside a gene; typically, the nearest gene(s) to this variant will be included in the list.
Obtain a reference sequence (rs) id for all variants. - Use database resources to explore what is known about the function of each gene and the protein it encodes (i.e. a few sentences summarising what the gene does) as well as its tissue expression profile. Prepare a table with each gene`s potential implication to disease.
- Establish the functional impact of each sequence variant; for example, is a variant located in a coding region of the genome and if yes, does it alter an amino acid? or is it overlapping a regulatory element? Report this element e.g. promoter
- Are there any good proxy SNPs (r⊃2; ≥ 0.8) for the variants rs7636 and rs13107325?
- Report, where possible, the minor allele frequency of each variant in the population reference the data source you used (e.g. URL of data base or repository). Use a database to report MAF in different population groups i.e. HapMap / 1000Genomes panels. Discuss if you observe any frequency differences between populations.
- Investigate using both variant identifier and the gene name whether there is a known association to one or more human traits (e.g. blood pressure) including disease.
- For those genes you have established an association to a human trait(s) report the number of known rare variants and how many of these rare variants are Loss of Function.
- Relationship to other nearby common variants in European- descent populations? In which common diseases this variant may play a role?
- Based on all the information assembled, assess whether thegenes found in question 1 could be divided in to subgroups underlying a specific trait or combination of traits.
- Are there any epigenetic effects known to be associated with the trait(s) of the subgroup(s) you defined (i.e. question 9)