Conditional analysis of Custom GWAS analyses
This page explains the following:
How is conditional analysis performed for Custom GWAS analyses?
How to get your endpoint analysed?
How to access the data?
What data is available and how is it structured?
Conditional analysis
In short, the pipeline selects regions with variants with p-values smaller than 5e-8, and runs conditional analysis using regenie. The results are collated and copied to green library.
Conditional analysis is done by running association tests on a region with the conditioned variants as covariates in the model. If there are variants in the region with p-value smaller than 1e-6, the conditional analysis is run again with the variant with the lowest p-value added to the conditioning variants. This is performed until either the maximum amount of conditioning variants (10) is reached, or until variants in the region no longer have small enough p-values in the region.
The conditional analysis pipeline is available here: https://github.com/FINNGEN/regenie-pipelines/blob/master/wdl/conditional-analysis/custom_gwas_pipeline/regenie_conditional_custom_gwas.wdl
How to get your endpoint analysed?
You can request your endpoint to be conditionally analysed in the same way you can request for your endpoint to be finemapped. To get your GWAS analysis analysed, send an email to our servicedesk (finngen-servicedesk(at)helsinki.fi), with the following information:
Request the analysis for your endpoint
endpoint name
finngen release
URL to the endpoint in user results Pheweb
For example:
For now, only release 11 endpoints are available for conditional analysis.
Data access
The data will be available in two places: The green library folder containing your imported GWAS endpoint summary statistic, and in the userresults pheweb browser. Conditional analysis results will be in the bucket /green_library/finngen_R11/sandbox_custom_gwas/PHENOTYPE/conditional/.
Conditional analysis results are automatically uploaded to the userresults pheweb browser.
The conditional analysis results are shown in the region view. To get to a region view, go to your endpoint, and either click on a genome-wide significant variant in the Manhattan plot, or on the table below.
Available files
Conditional analysis results will be in the bucket /green_library/finngen_R11/sandbox_custom_gwas/PHENOTYPE/conditional/
Here is a table describing those files or directories:
Filename | Description |
---|---|
had_results | Contains "True" if conditional analysis resulted in analysis results. Contains "False" if there were no results (i.e. no region outside MHC had any variants with p-value < 5e-8) |
CUSTOM_GWAS_sql.merged.txt | Data file used to import conditional regions to a database used by pheweb |
PHENOTYPE.independent_snps.txt | File containing the independent signals in the endpoint |
data/PHENOTYPE_LOCUS_STEP.conditional | Folder that contains the independent signals in individual regions. |
CUSTOM_GWAS_sql.merged.txt
This file is used when importing conditional analysis results to a pheweb database. It is a comma-separated file where every value is quoted. It does not have a header. The columns in the file are:
Column name | Column number | Column description |
---|---|---|
rel | 1 | FinnGen release, number |
type | 2 | type of finemapping data: one of finemap, conditional, susie |
phenocode | 3 | endpoint name |
chr | 4 | Chromosome, number |
start | 5 | Start of finemapped region, position in basepairs |
end | 6 | End of finemapped region, position in basepairs |
n_signals | 7 | Number of signals in region |
n_signals_prob | 8 | Probability of this number of signals: Not applicable to conditional analysis |
variants | 9 | variants that the region was conditioned with |
path | 10 | path to data file in pheweb file storage |
PHENOTYPE.independent_snps.txt
This file is a tab-separated file without header. it contains the lead snps of the independent signals in the endpoint. The columns in the file are:
Column name | Column number | Column description |
---|---|---|
variant id | 1 | Variant id in format "chrom_pos_ref_alt" |
beta | 2 | Effect size of variant |
sebeta | 3 | Standard error of effect size |
mlog10p | 4 | P-value of variant, transformed by -log10(p-value) |
beta_conditioned | 5 | Conditioned effect size |
sebeta_conditioned | 6 | Conditioned standard error |
mlog10p_conditioned | 7 | Conditioned transformed p-value |
conditioning_variants | 8 | Variants the model is conditioned on |
data/PHENOTYPE_LOCUS_STEP.conditional
This file contains regenie output for each of the steps the data was conditioned. The file is a tab-separated file with a header. The columns in the file are the following:
Column name | Column description |
---|---|
SNPID | variant identifier |
CHR | chromosome |
rsid | variant identifier |
POS | variant position, in basepairs |
Allele1 | reference allele |
Allele2 | alternate allele |
AF_Allele2 | alternate allele frequency |
p.value_cond | Conditioned p-value |
BETA_cond | Conditioned effect size |
SE_cond | Conditioned standard error |
Last updated