How to run the LDSC pipeline
The LDSC pipeline is used for calculating heritabilities and genetic correlations for disease endpoints using ldsc. The complete documentation for the pipeline can be found in github.
Example files
You can find example files (ldsc_sandbox.wdl
and ldsc_sandbox.json
) for running the pipeline in: /finngen/library-green/scripts/ldsc/
In the example .json
file, you will first need to define a list of endpoints of interest (given in ldsc_rg.meta_fg
) and their summary statistics (files and total sample sizes) tab- separated in the meta-table format with 3 columns (phenocode
,path_to_phenocode
, N_total
). For example:
NOTE! The number of columns in the summary statistic file is hardcoded. The variant columns be the column containing the snp identifier, it being chrom/pos or rsid, the script can handle multiple formats at the same time if needed.
With this pipeline, you can:
1) calculate heritability estimates and all pair-wise genetic correlations for a list of endpoints by giving just one meta-table list in the .json
file in ldsc_rg.meta_fg
(make sure to comment out the ldsc_rg.comparison_fg
line in this case), or
2) calculate heritability estimates for a list of endpoints (given in ldsc_rg.meta_fg
) and their genetic correlations with endpoints given in another list (given in ldsc_rg.comparison_fg)
, such as the full list of endpoints in a given DF. However, the example .json
file is only for the first scenario, so you will need to generate this file yourself.
3) calculate ONLY heritability estimates for a list of endpoints, by setting the parameter 'only_het'
as True
in the .json
file.
Pre-munge your summary statistics file(s):
Before running the pipeline, you need to make sure that input sumstats are coherent with the requirements by ldsc for its own munging step.
The required input format is as follows:
To get summary statistics (in REGENIE output format) into right format, you can use the following example:
where $SUM_STATS
is a path to your input summary statistics file, and $OUT_FILE
is the name of you munged summary statistics file.
Note: If your summary statistics file is not in the same format as the FG summary statistics, please change the column names from the munging script to correspond to your columns.\
Submit your job
You can submit your ldsc_rg
pipeline job via the command line using the following command:
Output:
You'll find the heritability estimates for your endpoint(s) as one .tsv
file in: /finngen/pipeline/cromwell/workflows/ldsc_rg/[WORKFLOW_ID]/call-gather_h2/[ldsc_rg.name]_[ldsc_rg_population].ldsc.heritability.tsv
and the pairwise genetic correlations, also as one .tsv
file, in: /finngen/pipeline/cromwell/workflows/ldsc_rg/[WORKFLOW_ID]/call-gather_summaries//[ldsc_rg.name]_[ldsc_rg_population].ldsc.summary.tsv
(genetic correlations are in column rg
)
Last updated