How to run GWAS using regenie unmodifiable pipeline

How to run regenie GWAS analysis using unmodifiable pipelines

Unmodifiable pipelines are predefined workflows that cannot be modified by the user. The advantage of running unmodifiable pipelines compared to the standard modifiable pipelines is that you will get results directly to the green library and the User results PheWeb browser. No download requests are needed, because results of unmodifiable pipeline have been verified not to contain any individual-level data.

Running the regenie unmodifiable pipeline is very similar to running regenie in custom GWAS tool, with just a few small changes. The unmodifiable regenie pipeline can be accessed in the sandbox from

The workflow for running the pipeline is:

First, prepare your phenotype file. The phenotype file should contain sample ID columns named FID and IID, and one column for each phenotype included. The FID and IID columns should both contain the FINNGENID, with one line per individual. If you are using any covariates you created, include them in this file as well as separate columns. These covariates cannot have same names as the covariates available in the analysis covariate file.

Second, create a file with phenotype description(s), one per row. The phenotype description is a text description of the phenotypes in your phenotype file. These will be used when the endpoint will be uploaded to the userresults browser. This file should have one line per phenotype column in the phenotype file (described above) and the phenotype descriptions should be in the same order as the phenotype columns in your phenotype file.

Third, copy your phenotype file and phenotype descriptions file to a bucket that is accessible in google cloud (locations in /home/ivm/ are not accessible in the cloud). We recommend copying these files to your sandbox's "IVM bucket", which is mapped to the internal location /finngen/red/ but is actually a google cloud bucket gs://fg-production-sandbox-X-red/ where X is your organisation's sandbox number. You can find this bucket location in the file buckets.txt on your sandbox desktop by looking for the line starting "Sandbox ivm bucket".

To copy the files, open a terminal and navigate to the folder where the files are located. Then you run the command gsutil cp phenotypefile.txt phenotypedescriptionfile.txt gs://fg-production-sandbox-X-red/myuser/myfolder/ where

phenotypefile.txt and phenotypedescriptionfile.txt are the names of your phenotype and phenotype description files, which can also be gzipped
myuser is typically your username, but can also be another name of your choosing
myfolder is the name of the subfolder that you want to move the files to. If it doesn't already exist, the gsutil command will create it when copying
X is the number of your sandbox

Once copied , make a note of the full bucket path of these files. You can get the full path by running the command gsutil ls gs://fg-production-sandbox-X-red/myuser/myfolder/, which will list the full bucket paths of all files that you copied.

Fourth, open the Pipelines tool from the Applications menu in your sandbox VM, and then select "Unmodifiable workflow" from the "Create a new job" menu, find "UnmodifiableRegenieDF12" and select "Create", then scroll down to the "Input JSON" and edit the options (see below). See Fig. 1 for a visual demonstration of the process.

Fill in the following json file options for the workflow (ensuring that the double quotes are kept):

regenie_unmod.pheno_file: This field should have the full bucket path of your phenotype file, created in the first step and copied in the third. E.g. gs://fg-production-sandbox-X-red/myuser/myfolder/phenotypefile.txt
regenie_unmod.phenolist: A comma-separated list of your phenotypes - these are the same as your phenotype column names in your phenotype file. Example value: "endpoint1,endpoint2"
regenie_unmod.phenodescriptionlist: This should be the full bucket path of the phenotype description file you created in the second step and copied in the third step. E.g. gs://fg-production-sandbox-X-red/myuser/myfolder/phenotypedescriptionfile.txt

In addition, there are some other options you may wish to edit, depending on your phenotype or the GWAS model you plan to use:

regenie_unmod.covariates: These are the covariates included in the analysis, as a comma-separated list. By default, this field contains the covariates used in core analysis.
regenie_unmod.test: This should be either "additive", "recessive" or "dominant", depending on the type of test you want to use.
regenie_unmod.is_binary: This input should be set to "true" if your endpoint is a binary case-control endpoint, and "false" if it is quantitative.

Fifth, once the json options have been edited, click "Submit" and navigate to the submitted job list by clicking on "Show pipeline jobs" under the "Submitted job" header on the front page of the Pipelines tool. Your REGENIE GWAS job should appear at the top of the list with the Name as "regenie_unmod" (hopefully in the "Running" state). Make a note of the (job) ID, as you will need it to download your results.

On completion (job state "Succeeded"), the results will be automatically copied to the User results PheWeb browser and the green library bucket specific for each data release: /finngen/library-green/finngen_R[RELEASE]/unmodifiable_pipelines/UnmodifiableRegenieDF[RELEASE]/workflow_id where [RELEASE] is the data freeze number (e.g. 12). E.g. for R12, to /finngen/library-green/finngen_R12/unmodifiable_pipelines/UnmodifiableRegenieDF12/workflow_id. The workflow_id will be the pipelines app ID for your job.

Alternatively, you can access the results outside the sandbox by navigating (using a web browser) to https://console.cloud.google.com/storage/browser/finngen-production-library-green/finngen_RX/unmodifiable_pipelines/UnmodifiableRegenieDFX where X is the FinnGen data freeze you ran the GWAS for (e.g. 12). You will need to log in with the same google account with which you access the FinnGen sandbox. In this page, look for the folder corresponding to your GWAS job ID and open it to access the GWAS output files.

PreviousHow to run trajGWAS NextHow to run survival analysis using GATE unmodifiable pipeline

Last updated 3 months ago

Was this helpful?