Conditional analysis of Custom GWAS analyses
This page explains the following:
How is conditional analysis performed for Custom GWAS analyses?
How to get your endpoint analysed?
How to access the data?
What data is available and how is it structured?
Conditional analysis
In short, the pipeline selects regions with variants with p-values smaller than 5e-8, and runs conditional analysis using regenie. The results are collated and copied to green library.
Conditional analysis is done by running association tests on a region with the conditioned variants as covariates in the model. If there are variants in the region with p-value smaller than 1e-6, the conditional analysis is run again with the variant with the lowest p-value added to the conditioning variants. This is performed until either the maximum amount of conditioning variants (10) is reached, or until variants in the region no longer have small enough p-values in the region.
The conditional analysis pipeline is available here: https://github.com/FINNGEN/regenie-pipelines/blob/master/wdl/conditional-analysis/custom_gwas_pipeline/regenie_conditional_custom_gwas.wdl
How to get your endpoint analysed?
You can request your endpoint to be conditionally analysed in the same way you can request for your endpoint to be finemapped. To get your GWAS analysis analysed, send an email to our servicedesk (finngen-servicedesk(at)helsinki.fi), with the following information:
Request the analysis for your endpoint
endpoint name
finngen release
URL to the endpoint in user results Pheweb
For example:
For now, only release 11 endpoints are available for conditional analysis.
Data access
The data will be available in two places: The green library folder containing your imported GWAS endpoint summary statistic, and in the userresults pheweb browser. Conditional analysis results will be in the bucket /green_library/finngen_R11/sandbox_custom_gwas/PHENOTYPE/conditional/.
Conditional analysis results are automatically uploaded to the userresults pheweb browser.
The conditional analysis results are shown in the region view. To get to a region view, go to your endpoint, and either click on a genome-wide significant variant in the Manhattan plot, or on the table below.
Available files
Conditional analysis results will be in the bucket /green_library/finngen_R11/sandbox_custom_gwas/PHENOTYPE/conditional/
Here is a table describing those files or directories:
had_results
Contains "True" if conditional analysis resulted in analysis results. Contains "False" if there were no results (i.e. no region outside MHC had any variants with p-value < 5e-8)
CUSTOM_GWAS_sql.merged.txt
Data file used to import conditional regions to a database used by pheweb
PHENOTYPE.independent_snps.txt
File containing the independent signals in the endpoint
data/PHENOTYPE_LOCUS_STEP.conditional
Folder that contains the independent signals in individual regions.
CUSTOM_GWAS_sql.merged.txt
This file is used when importing conditional analysis results to a pheweb database. It is a comma-separated file where every value is quoted. It does not have a header. The columns in the file are:
rel
1
FinnGen release, number
type
2
type of finemapping data: one of finemap, conditional, susie
phenocode
3
endpoint name
chr
4
Chromosome, number
start
5
Start of finemapped region, position in basepairs
end
6
End of finemapped region, position in basepairs
n_signals
7
Number of signals in region
n_signals_prob
8
Probability of this number of signals: Not applicable to conditional analysis
variants
9
variants that the region was conditioned with
path
10
path to data file in pheweb file storage
PHENOTYPE.independent_snps.txt
This file is a tab-separated file without header. it contains the lead snps of the independent signals in the endpoint. The columns in the file are:
variant id
1
Variant id in format "chrom_pos_ref_alt"
beta
2
Effect size of variant
sebeta
3
Standard error of effect size
mlog10p
4
P-value of variant, transformed by -log10(p-value)
beta_conditioned
5
Conditioned effect size
sebeta_conditioned
6
Conditioned standard error
mlog10p_conditioned
7
Conditioned transformed p-value
conditioning_variants
8
Variants the model is conditioned on
data/PHENOTYPE_LOCUS_STEP.conditional
This file contains regenie output for each of the steps the data was conditioned. The file is a tab-separated file with a header. The columns in the file are the following:
SNPID
variant identifier
CHR
chromosome
rsid
variant identifier
POS
variant position, in basepairs
Allele1
reference allele
Allele2
alternate allele
AF_Allele2
alternate allele frequency
p.value_cond
Conditioned p-value
BETA_cond
Conditioned effect size
SE_cond
Conditioned standard error
Last updated