Adding new covariates in GWAS using REGENIE and SAIGE

An example of how to make covariate + phenotype file in R for GWAS run using REGENIE and SAIGE

# read covariate file from library-red to R
library(data.table)
cov_pheno = fread("path/finngen_R8_cov_1.0.txt.gz")
# if FinnGen ID is in your list of cases mark it as 1 other rows as 0 
cov_pheno$CASES = is.element(cov_pheno$FID, cases$FINNGENID)*1

If you are using all samples in the covariate file, then this is enough and your cov_pheno file is ready.

If you are not using all samples in the covariate file you can use the following code to include controls and NA values.

# if FinnGen ID is in a list of controls mark it as 1 other rows to 0
cov_pheno$CONTROLS = is.element(cov_pheno$FID, controls$FINNGENID)*1

# set 1 for cases, 0 for controls and NA for the rest
cov_pheno$ASTHMA = ifelse(cov_pheno$CASES == 1, 1, ifelse(cov_pheno$CONTROLS == 1, 0, NA))

# Check that things have gone as expected. For instance, you may have a slightly smaller number
# of cases/controls if some samples have phenotype data but genotype data has not passed QC
sum(cov_pheno$CASES)
sum(cov_pheno$CONTROLS)

# Remove CASES and CONTROLS columns
cov_pheno = cov_pheno[, -which(names(cov_pheno) %in% c("CASES", "CONTROLS"))]

Once your file is ready, save your covariate + phenotype file to your folder in home/ivm

write.table(PhenoFile, file=gzfile("/home/ivm/folder_name/cov_pheno_forASTHMA.txt.gz"),
sep= "\t", quote= FALSE, row.names= FALSE, col.names=TRUE, na="NA")

Pipelines read files in the "red" bucket. In order to make cov-pheno file available for a REGENIE or SAIGE pipeline copy the covariate + phenotype file to /finngen/red/ following instructions in Sharing with your organization.

See an example script available in the green library. Path to example file in the Sandbox:

/finngen/library-green/scripts/code_snippets/Add_CustomPheno_to_COV.R

Last updated