All pages
Powered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Genotype data

Chip genotype data processing and QC Samples were genotyped with Illumina (Illumina Inc., San Diego, CA, USA) and Affymetrix arrays (Thermo Fisher Scientific, Santa Clara, CA, USA).

Genotype calls were made with GenCall and zCall algorithms for Illumina and AxiomGT1 algorithm for Affymetrix data.

Chip genotyping data produced with previous chip platforms and reference genome builds were lifted over to build version 38 (GRCh38/hg38) following the protocol described here: dx.doi.org/10.17504/protocols.io.xbhfij6.

Quality control

In sample-wise quality control steps, individuals with ambiguous gender, high genotype missingness (>5%), excess heterozygosity (+-4SD) and non-Finnish ancestry were excluded. In variant-wise quality control steps, variants with high missingness (>2%), low HWE P-value (<1e-6) and low minor allele count (MAC<3) were excluded.

Pre-phasing

Before imputation, chip-genotyped samples were pre-phased with using the default parameters, except the number of conditioning haplotypes, which was set to 20,000.

Eagle 2.3.5

Genotype imputation

Genotype imputation was done with the population-specific SISu v4.0 reference panel.

The reference panel variant call set was produced with the GATK HaplotypeCaller algorithm by following GATK best practices for variant calling.

Genotype-, sample- and variant-wise QC was carried out iteratively by using the Hail framework v0.2 and the resulting high-quality WGS data for 8,554 individuals were phased with Eagle 2.3.5 as described in the previous section.

Genotype imputation was carried out by using the population-specific SISu v4.0 imputation reference panel with Beagle 4.1 (version 08Jun17.d8b) as described in the following protocol: dx.doi.org/10.17504/protocols.io.xbgfijw.

Post-imputation quality control involved checking the expected conformity of the imputation INFO-value distribution, MAF differences between the target dataset and the imputation reference panel and checking chromosomal continuity of the imputed genotype calls.

Genotypes

FinnGen individuals were genotyped with Illumina and Affymetrix chip arrays (Illumina Inc., San Diego, and Thermo Fisher Scientific, Santa Clara, CA, USA).

Chip genotype data were imputed using the population-specific SISu v4.0 imputation reference panel of 8,554 whole genomes.

Merged imputed genotype data is composed of 96 data sets that include samples from multiple cohorts.

  • Total number of individuals: 356,213

  • Total number of variants (merged set): 20,175,454

  • Reference assembly: GRCh38/hg38

SISu reference panel

SISu v4.0 consists of 8,554 WGS of Finnish individuals from 5 research cohorts from:

  1. METSIM (PIs Markku Laakso and Mike Boehnke)

  2. FINRISK (PI Pekka Jousilahti)

  3. Corogene (PI Juha Sinisalo)

  4. Biobank of Eastern Finland (PI Arto Mannermaa)

  5. Finnish EUFAM Dyslipidemia Study (PIs Marja-Riitta Taskinen and Samuli Ripatti)

High-coverage (25x) WGS data used to develop the SISu v4.0 reference panel were generated at the McDonnell Genome Institute at Washington University (PIs Ira Hall and Nathan Stitziel).

Software used

  • Hail v0.2

  • Cromwell-42

  • Wdltool-0.14

  • Plink 1.9 and 2.0

  • BCFtools 1.7 and 1.9

  • Eagle 2.3.5

  • Beagle 4.1 (version 08Jun17.d8b)

  • R 3.4.1 (packages: data.table 1.10.4, sm 2.2-5.4)

LD estimation

The BCOR files were created using LDstore from the Finnish SISU panel v4.0.

The panel has been divided per chromosome. For example, to use the LD information in the first chromosome, FG_LD_chr1.bcor would be the file to use.

Settings used

  • number of samples: 3775

  • window size: 1500 kb

  • accuracy: low

  • number of threads: 96

  • LD threshold to include correlations: 0.05

Example usage

can be downloaded via:

And an example to extract variant range 20 Mb - 50 Mb from chromosome 7 is as follows:

Note

It is not preferred to use these LD estimate files for e.g. fine-mapping, since many of the fine-mapping methods (e.g. SuSiE) require in-sample LD information for good results!

LDstore v1.1
wget http://www.christianbenner.com/ldstore_v1.1_x86_64.tgz
ldstore --bcor FG_LD_chr7.bcor --incl-range 20000000-50000000 --table output_file_name.table