Meta-analysis

Overview

GWAS summary statistics from matching phenotypes (assessed by ICD-10 code overlap) between other large-scale genotyping studies i.e., FinnGen-UKBB are meta-analyzed with a custom workflow using the inverse variance weighted method.

Heterogeneity between studies is assessed with Cochran's Q test and p-value is reported for each variant.

The meta-analysis summary statistics files contain, on top of the meta-analysis statistics and heterogeneity p-value, also leave-one-out meta-analysis statistics. The files are in .tsv (tab-separated values) format and compressed with bgzip and indexed with tabix.

Finemapping is not performed because the methods are not capable to handle heterogeneity between studies (e.g. different LD structure, different imputation qualities in variants).

Summary stat preprocessing

Prior to meta-analysis, sumstats are first filtered to remove any bad variants (e.g. missing or unrealistic beta/standard error/allele frequency).

Second, if needed, variants are lifted to genome build GRCh38 to match the FinnGen variants. Liftover is performed with Picard which, in addition to standard genome position liftover, checks that variants match the reference genome and flips alleles/strand if necessary. Variants that fail to liftover for any reason (e.g. variant doesn’t match reference, or no target for variant in new build) are discarded.

Third, variants are aligned with GnomAD reference and allele frequencies from the matching population are compared. Variants with allele frequency difference > 0.20 were discarded.

FinnGen-UKBB meta-analysis

Endpoints from pan-UKBB study were matched to FinnGen endpoints based on ICD-10 code overlap and meta-analysed together. Only perfectly matching endpoints were analysed. The European subset of the pan-UKBB study is used. For a subset of endpoints we have redefined the pan-UKBB endpoints to perfectly match the corresponding FinnGen endpoints. See the public data release folder for the readme and mapping files to find out which UKBB endpoints are custom defined. Mainly, already matching endpoints are analysed, but we are continuously creating and analysing custom endpoints from the UKBB to match our FinnGen endpoints.

Release 12 contains in total 867 meta-analyzed phenotypes:

  • 854 binary endpoints

  • 3 continuous endpoints (HEIGHT, WEIGHT, BMI)

  • 10 progression/survival endpoints

Pheweb browser available at https://metaresults-ukbb.finngen.fi/

FinnGen-MVP-UKBB meta-analysis

Endpoints from the VA Million Veteran Program (MVP) were to matched to FinnGen based on ICD-10 code overlap and meta-analysed together. Contrary to UKBB phenotype matching, we considered also imperfect matching endpoints if the sample overlap - according to FinnGen - between the best matching endpoints was over 90%. Additionally, we evaluated the concordance of the effect sizes of the top hits between the studies to assess that the endpoints are a good match. For the matching FinnGen-MVP endpoint pairs we also included the corresponding UKBB endpoint in the meta-analysis if available. From MVP we included all summary statistics from the African, Admixed American, and European populations, if available.

Additionally for each individual study summary statistics, we filtered out variants with the imputation r2 < 0.6.

Release 12 contains in total 330 meta-analysed binary phenotypes.

Pheweb browser available at https://mvp-ukbb.finngen.fi/

Last updated