Upload cohorts to CO

Tutorial

A tutorial video about CO available from FinnGen data users meeting 25th Jan 2022 recording at 16min35sec and FinnGen data users' meeting 28th June 2022 at 28min55sec.

Open the Cohort Operations (CO) tool from the FinnGen Sandbox dropdown menu.

The Status page of the Cohort Operations Shiny App tool opens showing version information and the connection status to the data.

A tick mark shows that the connection has been successfully formed.

Upload cohorts built with Atlas

Step 1:

To upload cohorts built in Atlas, click on from Atlas (1). Select the FinnGen data freeze you would like to use (2). Use the search option to find your Atlas cohorts (3). Tap the check box to select the cohorts you'd like to import (4) and click Import Selected to finish moving them in (5).

Step 2:

After uploading, the selected cohorts should appear in the Cohorts workbench view. The cohort workbench view gives the cohort source and name, the number of case entries n_entries, and the number of patients n_patients. The sex ratio is shown with percentages given for males (in blue), gender unknown (in grey), and females (in red), respectively.

The bar plot visualizes the years persons started and ended their participation in the cohort(s).

Here we can see that the excessive earwax cases are included in the cohort later than the controls because the entry date for cases is the date when excessive earwax was first diagnosed, while for the control group the entry date is the first time a person has any record in the health registry data (this same design is discussed here and here).

However, the year at which people exit the cohort(s) should be similar between cases and controls.

The control cohort can be modified to include the latest date in order to make the control cohort's start date more similar compared to the cases cohort (see Modifying Atlas cohort with CO).

Upload cohorts from the Genotype Browser output file

The Cohort Operations tool recognizes Genotype Browser output files and reads them in without the need for any further input from the user. Outputting genotype information using Genotype Browser is very easily done: Select your variant of interest in the browser, then click Download data.

Example: In the following example we use variant rs3091552, a C to G mutation in chromosome 20 (position 46811367), which was found to be the most significant variant detected for excessive earwax in the previous example. We ran the GWAS using the Custom GWAS GUI tool, and viewed it with the Cohort Characterizations tool.

Now back in CO, from the Import Cohorts page select from File (1), click Browse... and search for your Genotype Browser output file (2). The Cohort Operations tool will read in the cohort(s). Select the cohorts you would like to import (3), then click Import Selected (4).

Imported Genotype Browser files appear on the Cohorts workbench in addition to any Atlas cohorts. The cohort workbench view gives the cohort source and name, the number of case entries n_entries, the number of patients n_patients, and the cohort's sex ratio.

There are no bar plots, because no cohort starts or end dates are available for Genotype Browser output files.

Upload cohorts from a text file

To upload a cohort in CO from a tab-separated text file, the columns of the text file should be formatted as follows:

COHORT_SOURCE = as.character(NA),
    COHORT_NAME = as.character(NA),
    FINNGENID = as.character(NA),
    COHORT_START_DATE = lubridate::as_date(NA),
    COHORT_END_DATE = lubridate::as_date(NA),
    SEX = as.character(NA),
    BIRTH_DATE = lubridate::as_date(NA),
    DEATH_DATE = lubridate::as_date(NA)

The column headings should be labelled exactly as given. The first three columns COHORT_SOURCE, COHORT_NAME, and FINNGENID are mandatory. The first two fields will be shown in the Cohort Workbench view after the cohort(s) are uploaded to CO. In the COHORT_SOURCE column, users must define the source that will be repeated for each row in the column. The mandatory fields are:

COHORT_SOURCE = "text file"
    COHORT_NAME = c("my_cohort1", "my_cohort2", "my_cohort3")
    FINNGENID = c("FG0000001", "FG0000002", "FG0000003")

An example of input table format with mandatory fields (FINNGENID, COHORT_SOURCE, and COHORT_NAME).

FINNGENIDCOHORT_SOURCECOHORT_NAME

FG00000001

text_file

my_cohort1

FG00000002

text_file

my_cohort1

FG00000003

text_file

my_cohort1

FG00000004

text_file

my_cohort2

FG00000005

text_file

my_cohort2

FG00000006

text_file

my_cohort3

Upload cohorts from TVT

A tsv file exported from TVT contains one FINNGENID column with a list of FinnGen IDs. In order to read TVT output file into Cohort Operations tool, two other mandatory columns are needed, COHORT_SOURCE and COHORT_NAME fields, as described above. These columns can be added e.g. with Terminal Emulator using the two following commands:

awk 'BEGIN{ FS = OFS = "\t" } { print $0, (NR==1? "COHORT_SOURCE" : "text_file") }' /path/to/cohort_from_TVT.tsv > tmp && mv tmp /path/to/cohort_from_TVT.tsv
awk 'BEGIN{ FS = OFS = "\t" } { print $0, (NR==1? "COHORT_NAME" : "my_TVT_cohort") }' /path/to/cohort_from_TVT.tsv > tmp && mv tmp /path/to/cohort_from_TVT.tsv

Where /path/to/cohort_from_TVT.tsv should be replaced with the file exported from TVT.

Upload FinnGen Endpoint

To upload a FinnGen Endpoint cohort select Import Cohorts page in left panel, from Endpoint in right and use search option to find the endpoints of interest. Select the endpoints to import by tapping the type -box for endpoints you like and click Import Selected to import.

Case, control, and excluded cohorts of the selected FinnGen endpoints will load on the Cohorts workbench.

Last updated