Explore code and endpoint enrichments with CO (CodeWAS)

Cohort Operations provides a PheWAS tool to explore medical codes and FinnGen endpoint enrichments between cohorts with visualizations and p-values. CodeWAS can be also used to check if your cases and controls overlap or if you cohorts have unwanted data in them.

Tutorial

A tutorial video about CO available from FinnGen data users meeting 25th Jan 2022 recording at 16min35sec and FinnGen data users' meeting 28th June 2022 at 28min55sec.

Tip: Preliminary exploration of cohorts prior CO can be done within Atlas using Atlas Cohort Characterizations tool. See an Example workflow for survival time/endpoint analysis.

Perform CodeWAS in CO

Step 1:

Step 2:

In the CodeWAS settings window, select case and control groups from the drop-down menus in the Cases-cohort and Controls-cohort boxes, respectively.

The cohort operation tool will print the number of patients in cases and control cohorts, the number of patients with phenotype information, and inform if there is overlap between the two cohorts.

Step 3:

To use matched controls by sex and birth, select the number of controls to match and tap the Used matched controls box.

The cohort operation tool will warn if there are fewer controls found than the user has asked for and inform the percent of matching controls found. Based on the warning and percentage of matching controls the user may design a new setup with a lower number of controls if needed.

Step 4:

Select the registers included from the detailed longitudinal data. Here, we'll include the hospital INPAT (inpatient) and OUTPAT (outpatient) registries. To select or deselect all registries in the list, click the Select All or the Deselect All buttons at the top.

Step 5:

Select the medical code set(s) included in the comparisons. Here, we've opted to include only the Finnish ICD10 codes.

Step 6:

After the registers and codes are selected, there are additional options to include FinnGen endpoints and to set a lower limit to exclude rare phenotype codes. By default, the exclusion value is set to n<=0 entries.

From the Advanced options menu, you can select the number of characters to include in the comparison. This option allows you to take advantage of the medical code hierarchy.

Example: To compare ICD10 codes at the highest classification level, we've set the number of digits to 3. Limiting to 3 digits will cause all subtypes of a diagnosis to be listed under the main classification (for example all types of asthma (J450, J451,..., J459) under J45, the main classification for asthma). In addition, we will include only the main diagnoses by selecting FG_CODE1 in the top right after setting the ICD10 digit settings.

The rest of the settings are irrelevant as we selected only ICD10 codes to be included in the previous step. However, the advanced options are available for other codes as well if you wish to include them in the comparison. To gain an understanding of the codes and registers we use, see International and Finnish Health Code Sets and Detailed longitudinal data.

Step 7:

Finally, click Run CodeWAS analysis & download results.

Once CodeWAS is completed, popup window offers choices to open or save the output file. When selecting save CodeWAS output file will be saved in the /home/ivm/Downloads folder.

CodeWAS output file provides results in three summary pictures and in table format. Hovering the mouse over the plot dots shows the title of the dot. Finnish ICD10 codes are shown in blue and FinnGen Endpoints are in green. The Manhattan plot shows an overview of the Fisher test p-values for each medical code or endpoint between cases and controls. The volcano plot also shows the significance and also effect size. CasesVsControls plot shows the prevalence of medical codes or FinnGen endpoints. Medical codes above the dashed line are more prevalent in cases. Medical codes below the dashed line are more prevalent in controls.

To have results only for ICD10 codes rerun the CodeWAS without FinnGen endpoints.

Last updated