arrow-left

All pages
gitbookPowered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Genotypes

FinnGen individuals were genotyped with Illumina and Affymetrix chip arrays (Illumina Inc., San Diego, and Thermo Fisher Scientific, Santa Clara, CA, USA).

Chip genotype data were imputed using the population-specific SISu v3 imputation reference panel of 3,775 whole genomes.

Merged imputed genotype data is composed of 75 data sets that include samples from multiple cohorts.

  • Total number of individuals: 321,464

  • Total number of variants (merged set): 16,962,023

  • Reference assembly: GRCh38/hg38

LD estimation

The BCORarrow-up-right files were created using LDstorearrow-up-right from the Finnish SISU panel v3.

The panel has been divided per chromosome. For example, to use the LD information in the first chromosome, FG_LD_chr1.bcor would be the file to use.

hashtag
Settings used

  • number of samples: 3775

  • window size: 1500 kb

  • accuracy: low

  • number of threads: 96

  • LD threshold to include correlations: 0.05

hashtag
Example usage

can be downloaded via:

And an example to extract variant range 20 Mb - 50 Mb from chromosome 7 is as follows:

hashtag
Note

It is not preferred to use these LD estimate files for e.g. fine-mapping, since many of the fine-mapping methods (e.g. SuSiE) require in-sample LD information for good results!

LDstore v1.1arrow-up-right
wget http://www.christianbenner.com/ldstore_v1.1_x86_64.tgz
ldstore --bcor FG_LD_chr7.bcor --incl-range 20000000-50000000 --table output_file_name.table

SISu reference panel

SISuarrow-up-right v3 consists of 3,775 WGS of Finnish individuals from six research cohorts:

  1. METSIM (PIs Markku Laakso and Mike Boehnke)

  2. FINRISK (PI Pekka Jousilahti)

  3. Health2000 (PI Seppo Koskinen)

  4. Finnish Migraine Family Study (PI Aarno Palotie)

  5. Merck/Tienari samples (PI Pentti Tienari)

  6. MESTA samples (PI Jaana Suvisaari)

High-coverage (25-30x) WGS data used to develop the SISu v3 reference panel were generated at the Broad Institute of MIT and Harvard and at the McDonnell Genome Institute at Washington University; and jointly processed at the Broad Institute.

Genotype imputation

Genotype imputation was done with the population-specific .

Variant call set was produced with GATK HaplotypeCaller algorithm by following GATK best-practices for variant calling.

Genotype-, sample- and variant-wise QC was applied in an iterative manner by using the and the resulting high-quality WGS data for 3,775 individuals were phased with Eagle 2.3.5 as described in the previous section.

Genotype imputation was carried out by using the population-specific SISu v3 imputation reference panel with (version 08Jun17.d8b) as described in the following protocol: .

Post-imputation quality-control involved checking expected conformity of the imputation INFO-value distribution, MAF differences between the target dataset and the imputation reference panel and checking chromosomal continuity of the imputed genotype calls.

SISu v3 reference panel
Hail framework v0.1arrow-up-right
Beagle 4.1arrow-up-right
dx.doi.org/10.17504/protocols.io.nmndc5earrow-up-right

Software used

  • Cromwell-42

  • Wdltool-0.14

  • Plink 1.9 and 2.0

  • BCFtools 1.7 and 1.9

  • Eagle 2.3.5

  • Beagle 4.1 (version 08Jun17.d8b)

  • R 3.4.1 (packages: data.table 1.10.4, sm 2.2-5.4)

Genotype data

Chip genotype data processing and QC Samples were genotyped with Illumina (Illumina Inc., San Diego, CA, USA) and Affymetrix arrays (Thermo Fisher Scientific, Santa Clara, CA, USA).

Genotype calls were made with GenCall and zCall algorithms for Illumina and AxiomGT1 algorithm for Affymetrix data.

Chip genotyping data produced with previous chip platforms and reference genome builds were lifted over to build version 38 (GRCh38/hg38) following the protocol described here: dx.doi.org/10.17504/protocols.io.nqtddwnarrow-up-right.

hashtag
Quality control

In sample-wise quality control, individuals with ambiguous gender, high genotype missingness (>5%), excess heterozygosity (+-4SD) and non-Finnish ancestry were excluded. In variant-wise quality control variants with high missingness (>2%), low HWE P-value (<1e-6) and minor allele count, MAC<3 were excluded.

hashtag
Pre-phasing

Prior imputation, chip genotyped samples were pre-phased with with the default parameters, except the number of conditioning haplotypes was set to 20,000.

Eagle 2.3.5arrow-up-right