Meta-analyses

Overview

GWAS summary statistics from matching phenotypes (assessed by ICD-10 code overlap) between other large-scale genotyping studies i.e., FinnGen-UKBB, are meta-analyzed with a custom workflow using the inverse variance weighted method.

Heterogeneity between studies is assessed with Cochran's Q test and p-value is reported for each variant.

The meta-analysis summary statistics and LD clumping based autoreporting outputs (See topic Autoreporting - information of overlaps) are in the green library, in the analysis data for a given release. The files are in .tsv (tab-separated values) format with the sumstats compressed with gzip and indexed with tabix.

Finemapping is not performed because the methods are not capable to handle heterogeneity between studies (e.g. different LD structure, different imputation qualities in variants).

Summary stat preprocessing

Prior to meta-analysis, sumstats are first filtered to remove any bad variants (e.g. missing or unrealistic beta/standard error/allele frequency).

Second, if needed, variants are lifted to genome build GRCh38 to match the FinnGen variants. Liftover is performed with Picard which, in addition to standard genome position liftover, checks that variants match the reference genome and flips alleles/strand if necessary. Variants that fail to liftover for any reason (e.g. variant doesn’t match reference, or no target for variant in new build) are discarded.

Third, variants are aligned with GnomAD reference and allele frequencies from the matching population are compared. Variants with allele frequency difference > 0.20 were discarded.

FinnGen-UKBB meta-analysis

Endpoints from pan-UKBB study are matched to FinnGen endpoints based on ICD-10 code overlap and meta-analysed together. Only perfectly matching endpoints are analysed. The European subset of the pan-UKBB study is used. For a subset of endpoints we have redefined the pan-UKBB endpoints to perfectly match the corresponding FinnGen endpoints. See the green library meta-analysis folder for the readme and mapping files to find out which UKBB endpoints are custom defined. Mainly already matching endpoints are analysed, but we are continuously creating and analysing custom endpoints from the pan-UKBB study to match our FinnGen endpoints.

Note: In FinnGen R6 - UKBB meta-analysis a simpler liftover method (UCSC LiftOver) was used, with no reference alignment.

FinnGen R12 - UKBB meta-analysis results:

  • in the Sandbox:

/finngen/library-green/finngen_R12/finngen_R12_analysis_data/meta_analysis/ukbb/

  • Endpoint mapping:

/finngen/library-green/finngen_R12/finngen_R12_analysis_data/meta_analysis/FinnGen_R12_meta_analysis_mapping_with_definitions.tsv

FinnGen-MVP-UKBB meta-analysis

Endpoints from the VA Million Veteran Program (MVP) were to matched to FinnGen based on ICD-10 code overlap and meta-analysed together. Contrary to UKBB phenotype matching, we considered also imperfect matching endpoints if the sample overlap - according to FinnGen - between the best matching endpoints was over 90%. Additionally, we evaluated the concordance of the effect sizes of the top hits between the studies to assess that the endpoints are a good match. For the matching FinnGen-MVP endpoint pairs we also included the corresponding UKBB endpoint in the meta-analysis if available. From MVP we included all summary statistics from the African, Admixed American, and European populations, if available.

Additionally for each individual study summary statistics, we filtered out variants with the imputation r2 < 0.6.

FinnGen R12 - MVP - UKBB meta-analysis results:

  • in the Sandbox:

/finngen/library-green/finngen_R12/finngen_R12_analysis_data/meta_analysis/mvp_ukbb/

  • Endpoint mapping:

/finngen/library-green/finngen_R12/finngen_R12_analysis_data/meta_analysis/FinnGen_R12_MVP_UKBB_mapping.tsv

Last updated