Whole exome sequencing (WES) data

Data Location

Sandbox:

/finngen/library-red/wes_gnomad_v4_min_QC

Google Cloud bucket:

gs://finngen-production-library-red/wes_gnomad_v4_min_QC

Content

This dataset contains whole exome sequence (WES) variants (not raw data) for 25,201 individuals from the following cohorts:

  • FINRISK: FR92, FR97, FR02, FR07

  • H2000

  • SUPER

The FINRISK cohorts contains the following sub projects:

  • FINRISK AD (N=324) sequenced at Broad

  • FINRISK IBD (N=945) sequenced at Broad

  • FINRISK controls for SUPER (FR02, N=849) sequenced at Broad

  • FINRISK FinMetSeq (N=10,085) sequenced at WashU

Key Details and Data Processing

  • The dataset includes several duplicate samples labeled as _dup1, _dup2, dup_3, and dup4.

  • Genotypes were called at Broad for gnomAD v4. The Reference Genome used was GRCh38.

  • Variants were filtered with GQ > 20 and DP > 10, removing variants with AC = 0.

  • Sample identity was verified by matching genetic duplicates to imputed data to ensure the same FINNGENID was used across sources.

  • Samples without matching FinnGen individuals were removed.

Last updated