Adding new covariates in GWAS using REGENIE and SAIGE
An example of how to make covariate + phenotype file in R for GWAS run usingREGENIEandSAIGE
# read covariate file from library-red to R
library(data.table)
cov_pheno = fread("path/finngen_R8_cov_1.0.txt.gz")
# if FinnGen ID is in your list of cases mark it as 1 other rows as 0
cov_pheno$CASES = is.element(cov_pheno$FID, cases$FINNGENID)*1
If you are using all samples in the covariate file, then this is enough and your cov_pheno file is ready.
If you are not using all samples in the covariate file you can use the following code to include controls and NA values.
# if FinnGen ID is in a list of controls mark it as 1 other rows to 0
cov_pheno$CONTROLS = is.element(cov_pheno$FID, controls$FINNGENID)*1
# set 1 for cases, 0 for controls and NA for the rest
cov_pheno$ASTHMA = ifelse(cov_pheno$CASES == 1, 1, ifelse(cov_pheno$CONTROLS == 1, 0, NA))
# Check that things have gone as expected. For instance, you may have a slightly smaller number
# of cases/controls if some samples have phenotype data but genotype data has not passed QC
sum(cov_pheno$CASES)
sum(cov_pheno$CONTROLS)
# Remove CASES and CONTROLS columns
cov_pheno = cov_pheno[, -which(names(cov_pheno) %in% c("CASES", "CONTROLS"))]
Once your file is ready, save your covariate + phenotype file to your folder in home/ivm
Pipelines read files in the "red" bucket. In order to make cov-pheno file available for a REGENIE or SAIGE pipeline copy the covariate + phenotype file to /finngen/red/ following instructions in Sharing with your organization.
See an example script available in the green library. Path to example file in the Sandbox: