General workflows for the most common analyses
In this section, we describe example workflows for the most common analyses researchers are conducting with FinnGen data in FinnGen Sandbox: endpoint analysis, survival time analysis, and genotype variant analysis.
See also a presentation of New FinnGen tools and their application to example diseases from User's meeting 28th March 2023 recording (at 25min 23sec).
Step 1. Create cohorts
Create cohort based on medical codes: Make case and control cohorts for endpoints in Atlas. Set cohort entry events, inclusion criteria, exclusion criteria, filtering, and registries carefully following the instructions. Pay attention that the exclusion criteria are set right.
Create cohort based on genotype: Use Genotype Browser to extract carriers and non-carriers. Design the genotype cohorts you use in further analyses. For example, you may like to export minor homozygotes 1|1, WT homozygotes 0|0, and heterozygotes 0|1 and 1|0. To combine e.g. two heterozygotes cohorts (0|1 and 1|0) into one cohort use Operate Cohorts feature in the Cohort Operations tool. If the variant you are looking for is not in the Genotype Browser, see Genotypes from VCF files.
Step 2. Explore your cohorts with tools designed for the purpose.
For fast inspection of cohorts build in Atlas use the Cohort Characterizations tool in Atlas. Make first improvements to the cohorts based on Cohort Characterizations if needed.
Inspect your cohorts in detail using the Trajectory Visualization tool (TVT). Pay attention to the entry and exit events of the patients. Are the patients entering and exiting the cohort as was mentioned? Are the conditions appearing in the right temporal intervals according to the inclusion and exclusion rules set in Atlas? Atlas is a powerful tool that can create very complex cohorts. Also, some settings may easily be wrongly selected by accident.
Output files from the Genotype Browser can be uploaded and visualized using TVT.
Tip! If one individual appears interesting or e.g. outlying, you may use the LifeTrack tool to explore that person closely by viewing all medical codes for that person in a single view.
Explore the cohorts with the Cohort Operations tool (CO). Compare the cohorts to the FinnGen endpoints. Do similar endpoints already exist? Explore which conditions and medicines are enriched in the cases compared to the control cohorts by running CodeWAS analyses using the Cohort Operations tool. For genotype data, CodeWAS can be run e.g. for rarer homozygotes compared to hetero- & WT homozygotes using the same instructions. See also instructions on how to conduct PheWAS for rare variants in R. Consider if the results make sense. Are the right conditions and medicines enriched in the cases group compared to the controls? Are there conditions or medicines that should be included or excluded from the cohorts? Clinicians' help may be needed to interpret CodeWAS results and help to build the cohorts.
For genotype variant analysis: Consider the results from CodeWAS. Are the cohorts of carriers and non-carriers enough for your study or should the cohorts be modified using phenotypic information? Are carriers and non-carriers differentiating by diagnoses not expected or using medicine not expected? If so, you may build phenotypic cohorts for diseases and medicines arising from CO results with Atlas. You can then filter these phenotypes in or out of genotype cohorts using Operate Cohorts feature in the Cohort Operations tool.
Step 3. Improve your cohorts until they are detailed.
Consider the results from TVT and CO. If needed go back to Atlas and improve the cohorts based on the results from TVT and CO. Then inspect the cohorts again using TVT and CO. Repeat step 1 and step 2 until you are pleased with the cohorts. Help from a Clinician may be needed to interpret CodeWAS results and to build clinically meaningful cohorts.
Tip! If you need more complex filtering than is possible to conduct in Atlas consider joining two or more cohorts with the CO. You may create cohorts in Atlas, import them to CO and combine cohorts in CO with the rules you select.
Step 4. Proceed to the downstream analysis
When the cohorts are ready and checked with TVT and CO you may proceed to the downstream analyses. To select a suitable software and model for your study see How to run genome-wide association studies (GWAS). The easiest way to conduct a GWAS is to use the Custom GWAS tools. For these and other analyses not in the Custom GWAS tools, ready pipelines are available in the Sandbox. Using pipelines needs some coding skills. Users need to prepare part of the input files and run the pipeline.
For Binary Phenotype analyses (yes/no for cases and controls): The easiest way to conduct a custom GWAS is to use the Custom GWAS tools or launch Custom GWAS directly from The Cohort Operations tool. Pipelines to run GWAS in binary mode with REGENIE or SAIGE are also available.
For Quantitative Phenotype analysis (continuous variables for cases and controls): The easiest way to conduct quantitative GWAS is to use the Custom GWAS CLI tool in quantitative mode. Preparing an input ID list as a text file is easily done using by exporting cohorts function in the Cohort Operations tool. Pipelines to conduct the same analyses of GWAS in quantitative mode with REGENIE are also available.
For Survival analyses: To run survival analyses one needs to prepare input files. You can run survival analysis using cox model or by running GWAS using survival models (GATE). See instructions for the file preparation and running survival models with GATE. Tip! The ID list needed to build a phenotype-covariate file for GATE can be exported as a text file using the Cohort Operations tool.
For more instructions about other analyses and Pipelines Running analyses in Sandbox.
Last updated