Examples on cohort building with Atlas

1. How to build a cohort based on a diagnosis using local (non-standard) codes

The cohort is a group of persons starting at the first diagnosis of a local code X until the end of follow-up. In Atlas, local codes, such as ICD codes are referred to as non-standard codes and are displayed in red color while OHDSI OMOP Common data model (CMD) codes are called standard codes and are displayed in blue color. The important thing is not to mix standard and non-standard codes in a concept set. In this example we will create a cohort of people with a diagnosis of type 2 diabetes based on local (non-standard) ICD-10 code E11.

Quick guide:

‘Concept sets’: Create a concept set, t2d_icd10. Search by e11, limit the ‘Vocabulary’ to ICD10 and select the main code. Remember to select ‘Descendants’ to take all the sub codes of E11.
‘Cohort Definitions’: 1) Create a new cohort and add the concept set t2d_icd into ‘Cohort Entry Events’ using ‘Add Initial Event’ and select ‘Add Condition Occurrence’. For non-standard codes, remember to upload the concept sets via ‘Add attribute’ and choose ‘Add Condition Source Concept’. 2) To limit our cohort to those individuals who have at least three diagnosis codes, upload the concept sets into the ‘Inclusion criteria’ and change the number to 3 occurrences. Remember to use the attributes and source concepts for these concept sets with non-standard codes. Save changes and generate the cohort.

Detailed instructions:

We start by creating the necessary concept set. To do the concept sets, we will go to the ‘Concept Sets’ and click the ‘New Concept Set’. A new window will open where we give a name to our concept set. We click the ‘Add Concepts’ button at the bottom of the page. This will take us to a search where we can enter a string or a code. By typing e11 we will get a list of things that include e11. We can limit our search to ICD-10 by clicking the term ICD10 on the left hand panel under Vocabulary. After doing this we’ll see that only non-standard codes in red font are displayed. Note that there is also a Vocabulary term ICD10fi which includes different combination codes marked by an asterisk. The person counts are much smaller than when selecting ICD10, so in this example we will choose ICD10 to demonstrate the use of main codes and their sub codes. Let’s choose the main code E11. Then we scroll at the bottom of the page to tick the box for ‘Descendants’. This will include all the sub codes in our concept set as well without manually having to select them, e.g. E11.1 and so on. Finally, we press the button ‘Add to Concept Set’.

Now we are back in the ‘Concept Sets’. We can see from the tab ‘Included Concepts’ tab that there are 34 concepts included. If we go and explore these, we’ll see the different subcodes of the main ICD-10 code. We can now close this concept set by clicking on the save button and then the X icon next to it and start creating the other concept set for drug purchases.

Once the concept set is done, we can move to ‘Cohort Definitions’. We’ll see a list of already defined cohorts. Click the ‘New Cohort’ and a new window will open. Give the cohort a name and enter a more detailed description. Give a clear name to later find your cohort. It is good practice to add your initials at the end of the name. Remember to save changes.
We start by defining the ‘Cohort Entry Events’. Click the ‘Add Initial Event’ and select ‘Add Condition Occurrence’. Because in this example our concept sets are based on non-standard codes, we need to upload them using source concepts. In practice this means one additional step compared to using concept sets based on standard codes, i.e. we need to click ‘Add attribute’ and choose ‘Add Condition Source Concept’. Another dropdown menu will appear, and we’ll import here the concept set we made for the diagnosis. By doing this, diagnoses in the concept set will be taken from any of the registers included in Atlas. If you’d like to filter by a certain register, please see the example How to build a cohort filtering events by FinnGen register (e.g only those with inpatient records). If we wouldn’t have any other criteria than having a diagnosis, we could generate our cohort already. In this example, we however want to adjust the diagnoses to at least three occurrences.

Next, we will define all the inclusion criteria. Click ‘New Inclusion Criteria’ and give a descriptive name. Click ‘Add criteria to group’ and select ‘Add Condition Occurrence’. We click ‘Add attribute’ and select ‘Add Condition Source Concept’. Here we import again our concept set for diagnosis. Note that unless you have a more specific inclusion criteria, it is not necessary to add the diagnosis again in the ‘Inclusion Criteria’ if you added it into the ‘Cohort Entry Events’. Here we want to include it because we want to limit our criteria to a number of occurrences. In this example we’ll require at least three occurrences of the diagnosis. This can be adjusted at the top of the box.

The final step is to check the ‘Cohort Exit’ criterion. Usually, it is fine to leave it as its default settings, i.e. the Event will persist until the end of continuous observation.
Now we can save the changes and go to the ‘Generation’ tab to generate our cohort in our preferred release of the FinnGen data. Press the ‘Generate’ button and see the number of individuals included in the cohort. In our example there are 74,827 cases in our cohort. To view the effects the inclusion criteria, click the ‘View Report’ button that has appeared after you generated the cohort. This will show as percentages how many of the cases passed the different criteria. You can switch between intersect and attrition views, as well as Person and Event views by clicking on the appropriate links and tabs.

2. How to build a control cohort based on a diagnosis using local (non-standard) codes

The control cohort is a group of persons not having conditions. The easiest way is to copy the case cohort and adjust the definitions accordingly. The entry to the control cohort is entry to any of the registers at any time, unlike for cases where we usually define the entry as the occurrence of first diagnosis. This has implications for the control cohort adjustments as explained below.

Quick guide:

‘Cohort Definitions’: 1) Open and copy the case cohort and give it a new name. In ‘Cohort Entry Events’ click the ‘Delete Criteria’. Click the ‘Add Initial Event’ and select ‘Add Visit occurrence’. Leave it like this. 2) In the ‘Inclusion Criteria’, delete any inclusion criteria made for the cases or adjust to exactly 0 occurrences. Add any additional criteria relevant for the controls, e.g. they may need to be free of other diseases as well. Finally, on top of the criteria, change having ‘any’ of the following criteria to having ’all’ of the following criteria. Generate the cohort.

Detailed instructions:

Start by opening the case cohort in ‘Cohort Definitions’ and press the ‘Create a copy of this cohort definition’ button on top right next to the cohort definition name. This will make the name of the cohort to be COPY OF case-cohort-name. Adjust the name and the definition accordingly. You can for example open the cohort from example 1 and copy it.
Next, adjust the ‘Cohort Entry Events’ by clicking the ‘Delete Criteria’. This will delete the entry we had defined for the cases. For controls, we want to define the entry as any entry to the registers. To do this, click the ‘Add Initial Event’ and select ‘Add Visit occurrence’. We do not have to add any attributes but can leave the definition like this.

In the ‘Inclusion Criteria’, click the blue box with the name of the inclusion criteria that was created for the cases and modify the name accordingly. By clicking the name, all the criteria we added for cases will become visible. For potential criteria for diagnosis and/or drug purchases, change the number of occurrences from ‘at least’ 3 to ‘exactly’ 0 occurrences
You may want to add some additional inclusion criteria. In our example of type 2 diabetes, we also want our controls to be free of type 1 diabetes. For this we have created a concept set using local non-standard codes. We click the ‘Add criteria to group’ and select ‘Add Condition Occurrence’. We click the ‘Add attribute’ and select ‘Add Condition Source Concept’. Here we import our concept set for type 1 diabetes and select exactly 0 occurrences at the top of the box.
Once we have modified/added all the inclusion criteria, we select at the top of the boxes ‘having all of the following criteria’.

We can now save the changes, generate the cohort and inspect the number of individuals in the cohort. In this example we generate our cohort in FinnGen CMD R12, i.e. the latest data release, and see that our control cohort includes 424,741 individuals.

3. How to build a cohort based on a diagnosis using international (standard) codes

NB! This example shows also how to exclude specific sub diagnoses from the cohort definition

The cohort is a group of persons starting at the first diagnosis of national (non-standard) code that maps to multiple international (standard) codes until the end of follow-up. In Atlas, local codes are referred to as non-standard codes and are displayed in red color while OHDSI OMOP Common data model (CMD) codes are called standard codes and are displayed in blue color. The important thing is not to mix standard and non-standard codes in a concept set. In the example below we will create a cohort for cases of type 2 diabetes using international (standard) codes.

Quick guide:

‘Concept Sets’: Search for the term ‘type 2 diabetes’ and limit the ‘Vocabulary’ to SNOMED. Standard codes in blue will remain and their hierarchy can be inspected in more detail by clicking the selected term, here ‘Type 2 diabetes mellitus’. Once you have added the term with its ‘Descendants’ into your concept set, you can use the ‘Included Concepts’ tab to inspect all the sub codes and if you want to exclude any, tick the box next to it and select ‘Exclude’ at the bottom of the page. Finally, click ‘Add to Concept Set’.
‘Cohort Definitions’: Create the cohort as usual. As the concept set is based on standard codes, there is no need to ‘Add Attribute’ and use source concept criteria but you can upload the concept set directly. Generate the cohort.

Detailed instructions:

We start by creating the concept set. In the ‘Concept Sets’, click on ‘New concept’, give it a name, save it and click ‘Add concepts’. This will open a new window with a Search. Let’s write ‘type 2 diabetes’, click enter or the magnifying glass icon to search. On the ‘Vocabulary’, we can limit to SNOMED by clicking on that (see image below). Now we see that all the entries that remain are in blue and thus standard codes.

Let’s click on the ‘Type 2 diabetes mellitus’. This will open a new window. For standard codes we can inspect the ‘Hierarchy’ tab to see the parents and children of this code. In our example, we see that there is a parent code ‘Diabetes mellitus’ and 12 children codes.

Next we click the ‘Current concept’ box under the ‘Hierarchy’ tab and as we are happy with our initial selection, we can keep this concept by ticking the box next to it, selecting also the ‘Descendants’ and clicking the ‘Add To Concept Set’.

Now when we go back to ‘Concept Sets’, we see that our concept includes 18 concepts. We can see them in more detail by going to the ’Included Concepts’ tab. We can order the concepts by record count by clicking the RC. If we wanted to exclude some of them, e.g. ‘Pre-existing type 2 diabetes mellitus’, we can select the box next to it and at the bottom of the page choose ‘Exclude’ and click the ‘Add To Concept Set’.

Once we have done this we see that the number of included concepts has been updated to 17. This exclusion is also reflected on the ‘Concept Set Expression’ tab. Save the changes and go to ‘Cohort Definitions’.
In ‘Cohort Definitions’, create ‘New Cohort’. Give it a name, click the save icon and start from the ‘Cohort Entry Events’. Click the ‘Add Initial Event’ and select ‘Add Condition Occurrence’. Now we can directly upload our concept set to the condition occurrence without having to add an attribute, unlike we do with local (non-standard) codes.

If we don’t have any other specific inclusion criteria, we can proceed to generating the cohort in the ‘Generation’ tab in our preferred release of the FinnGen data.
Now with the definitions above, our cohort generated on FinnGen R12 includes 85,917 individuals.

4. How to build a sex-specific cohort

A sex-specific cohort is created by adding an additional inclusion criterion for sex. That is, you will first build your cohort as usual by creating the necessary concept sets and then in the ‘Cohort Definitions’, add the criterion for sex. In the example below, we use an existing cohort for type 2 diabetes we created in the Example 1. We modify this cohort and create a cohort for females only. Similarly, a cohort for males can be built.

Quick guide:

‘Concept Sets’: Create as usual.
‘Cohort Definitions’: Add new ‘Inclusion Criteria’ and select ‘Add Demographic’. Click the ‘Add attribute’ button and select ‘Add Gender Criteria’. Search for ‘female’. A list of terms including ‘FEMALE’ will appear. Choose this standard code by clicking on it and next, click the ‘Add and close’ button. Generate the cohort.

Detailed instructions:

Create your cohort from scratch or copy an existing cohort that you would like to stratify by sex. We will now copy a cohort by opening an existing cohort in ‘Cohort Definitions’ and clicking the clone button in the top right corner and edit the name of the cohort. Save the cohort.
In the ‘Inclusion Criteria’, click the green ‘New inclusion criteria’ button on the left and give your criterion a name, e.g. “Females only”. Click the ‘Add Criteria to group’ and select ‘Add Demographic’.

Click the ‘Add attribute’ button and select ‘Add Gender Criteria’.

A new window will open. Search for ‘female’. A list of terms including ‘FEMALE’ will appear. Choose this standard code by clicking on it and next, click the ‘Add and close’ button.

Your cohort now includes a criterion for females (see the figure below).

Save your cohort and proceed to generating the cohort in the ‘Generate’ tab. Create the cohort for males similarly.
Explore in the ‘Characterizations’ that the cohorts are as intended, i.e. include only males or females. Start by creating a ‘New Characterization’ and give a name to it. Stay on the ‘Design’ tab and import your cohorts in the ‘Cohort definition’ section and select features of interest in the ‘Feature analyses’ section. When you click the ‘Import’ button in the ‘Feature analysis’, a list of possible features will appear in a new window. You can filter e.g. on demographics only by selecting on the left hand panel ‘Domain’. Once you have selected your features of interest, scroll down, and click the ‘Import’ button.

Save your characterization and go to the ‘Executions’ tab. Choose the FinnGen release in which you would like to run the analyses. In the figure below we have chosen the latest release, R12. After the analysis has been generated, you can click the ‘View latest results’ to see the characterizations.

A new window will open. By default, only one cohort is chosen, so to inspect your cohorts at the same time, in the ‘Filter panel’ you’ll need to tick the box for both cohorts. In the figure below, the results show that we have generated our cohorts correctly, i.e. the female cohort has 0 males and the male cohort has 0 females in it, respectively. The figure also shows no overlap in our cohorts.

Now that we are confident in our sex-specific case cohorts, we can proceed to making the sex-specific control cohorts by cloning the sex-specific case cohorts and adjusting the definitions accordingly.

5. How to build a cohort based on OHDSI PhenotypeLibrary

This is probably the easiest way of making a cohort in Atlas. The OHDSI PhenotypeLibrary includes cohorts created by the OHDSI community that anyone can make use of.

Quick guide:

Go to data.ohdsi.org/PhenotypeLibrary/. Use the search function to find your desired cohort and open the JSON tab. Copy the code and use the Clipboard in Sandbox to bring the code into Sandbox.
In Atlas ‘Cohort Definitions’: Create a new cohort and go to the ‘Export’ tab and select the ‘JSON’ button. Here, paste the JSON code from Clipboard – if needed, in small chunks. Finally, click the ‘Reload’ button at the bottom of the screen. Go to the ‘Definition’ tab and check that all the cohort entry, inclusion and exit criteria have appeared there. Generate the cohort.

Detailed instructions:

Go to data.ohdsi.org/PhenotypeLibrary/. Here, one can see the list of all available cohorts in the library. By searching and then selecting the desired cohort, tabs appear under the table. The tabs include details about the cohort definition as well as text for JSON and SQL that can be copied and used accordingly, e.g. in Atlas.

Copy the JSON text into Clipboard in the Sandbox. Note that due to the small size of the Clipboard, you may need to copy and paste to the Clipboard and Atlas in small chunks.
In Atlas, you will not need to go to ‘Concept Sets’ but can go directly to ‘Cohort Definitions’. Click the ‘New Cohort’ button and give a name to the cohort. Then go to the ‘Export’ tab and select the ‘JSON’ button. Here, paste the JSON code from Clipboard – if needed, in small chunks. Finally, click the ‘Reload’ button at the bottom of the screen.

Go back to the ‘Definition’ tab and check that all the cohort entry, inclusion and exit criteria have appeared there.

If everything looks ok, the cohort is ready to be generated in the ‘Generation’ tab.
Finally, generate the control cohort as usual, i.e. by cloning the case cohort and adjusting the definitions accordingly.

6. How to build a cohort filtering events by FinnGen register, e.g only those with inpatient records

We can use appearance in specific registers to filter cases. In Atlas, there are already readily made concept sets for the following registers: INPAT, OUTPAT, OPER_IN, OPER_OUT, PRIM_OUT, CANC, PURCH, and REIMB, named as ‘REGISTER [FinnGen support concept set]’, where REGISTER is one of the above-mentioned registers. Some of the registers have also been combined into one concept set, e.g. INPAT+OUTPAT [FinnGen support concept set]. These concept sets can be used in the ‘Cohort Definitions’ for filtering as will be described below.

Quick guide:

‘Concept Sets’: Create as usual.
‘Cohort Definitions’: in the ‘Cohort Entry Events’, add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Nested Criteria’. Next click the ‘Add criteria to group’ and choose ‘Add Visit Occurrence’. There, click ‘Add attribute’ and select ‘Add Visit Source Concept Criteria’. Now, import one of the readily made register concepts, here ‘INPAT [FinnGen support concept]’ and click the box ‘restrict to the same visit occurrence’. Generate the cohort.

Detailed instructions:

Create a new cohort from scratch or clone an existing cohort you want to modify. In this example we will copy an existing cohort for type 2 diabetes patients by first opening the cohort in 'Cohort Definitions', here T2D_cases[MK], and then clicking the copy button in the top right corner after which we edit the name of the cohort, e.g. to T2D_cases_inpatients. In the ‘Cohort Entry Events’, we add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Nested Criteria’.

Next click the ‘Add criteria to group’ and choose ‘Add Visit Occurrence’. There, click ‘Add attribute’ and select ‘Add Visit Source Concept Criteria’. Now, import one of the readily made register concepts, here ‘INPAT [FinnGen support concept]’ and click the box ‘restrict to the same visit occurrence’. Note that it doesn’t matter whether you add first the diagnosis and then the register as a nested criteria or the other way round. However, if you have several entry events, e.g. two different diagnoses that you want to limit to a specific register, it may be better to add first the register and then the diagnoses as nested criteria from the register.

Save the changes and go to the ‘Generation’ tab. Generate the cohort in the data release you wish, e.g. the latest one. Now we see that restricting our cases to those with inpatient records only gives us a sample size of 38,319. Our cohort without this restriction had 82,164 individuals.

7. How to build a cohort using only diagnoses from specialty clinics, i.e. filtering for visit type

Similarly to filtering by registers, we can filter by visits to specialty clinics. A list of the clinics is given here. Clinics with visits from >50 persons only have been brought to Atlas for privacy reasons. In this example we will continue with our cohort of type 2 diabetes patients but will include only patients who have visited an endocrinology clinic.

Quick guide:

‘Concept Sets’: Create as usual.
‘Cohort Definitions’: In the ‘Cohort Entry Events’, add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Provider Specialty’. Click the ‘Add’ button and type in the search box your desired specialty clinic, in this example ‘endocrinology’. A list of terms including ‘endocrinology’ will appear, including both non-standard (N) and standard (S) codes. Choose the standard code with the ‘Vocabulary’ term ‘Medicare Specialty’. Click ‘Add And Close’. Generate the cohort.

Detailed instructions:

1. Create a new cohort from scratch or clone an existing cohort you want to modify. In this example we will clone an existing cohort for type 2 diabetes patients by first opening the cohort in ‘Cohort Definitions’ and then clicking the clone button in the top right corner after which we edit the name of the cohort, e.g. to T2D_cases_endocrinology. In the ‘Cohort Entry Events’, we add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Provider Specialty’.

2. Next, click the ‘Add’ button and type in the search box your desired specialty clinic, in this example ‘endocrinology’. A list of terms including ‘endocrinology’ will appear, including both non-standard (N) and standard (S) codes. Choose the standard code with the ‘Vocabulary’ term ‘Medicare Specialty’. Click ‘Add And Close’.

3. Now, this new restriction has appeared in our cohort definition.

4. Save the changes and generate the cohort as usual by going to the ‘Generation’ tab and selecting the desired release of the data in which to generate the cohort. Inspect the number of individuals included in the cohort. By generating the cohort in R12 we notice that there are now 8,037 type 2 diabetes patients who had a visit to the endocrinology clinic as compared to our original cohort of 82,164 type 2 diabetes patients from all the registers.

8. How to build a cohort filtering by medication use

e.g. new users of blood glucose lowering medication with a prior diagnosis of type 2 diabetes

In this example we will define two concept sets, one for the diagnosis and one for medication, using SNOMED codes for the diagnosis and ATC codes for the medication.

Quick guide:

‘Concept Sets’: Create as usual.
‘Cohort Definitions’: 1) In ‘Cohort Entry Events’, click the ‘Add Initial Event’ and select ‘Add Drug Exposure’. Import the concept set and add first time of use by clicking ‘Add attribute’ and select ‘Add First Exposure Criteria’. 2) In the ‘Inclusion Criteria’, create ‘New inclusion criteria’ for the prior diagnosis of type 2 diabetes as usual. Modify the boxes below to include ‘where the event starts all days before and 0 days before index start date’. By doing this, we ensure that the diagnosis was given between 0 and any day before the drug use start date which is the index start date. 3) In the ‘Cohort Exit’, define that ‘Event will persist until end of a continuous drug exposure’ and import the concept set for the drugs. Specify a persistence window, e.g. a maximum of 30 days’ gap between prescriptions. Save the changes and generate the cohort.

Detailed instructions:

We start by creating the concept set. In the ‘Concept Sets’, click on ‘New concept’, give it a name, save it and click ‘Add concepts’. This will open a new window with a Search. Let’s write ‘type 2 diabetes’, click enter or the magnifying glass icon to search. On the ‘Vocabulary’, we can limit to SNOMED by clicking on that. Now we see that all the entries that remain are in blue and are thus standard codes. We can select the ‘Type 2 diabetes mellitus’ by clicking the checkbox on the left hand side of the name, and at the bottom of the page tick the box for ‘Descendants’ and ‘Add To New Concept Set’. Now we can go back to the ‘Concept Sets’, save the changes and close the window.

We can create a concept set similarly for the medication. In the ‘Concept Sets’, click on ‘New concept’, give it a name, save it and click ‘Add concepts’. This will open a new window with a Search. Let’s write ‘A10B’, click enter or the magnifying glass icon to search. On the ‘Vocabulary’, we can limit to ATC codes by clicking on that. Now we see that all the entries that remain are in purple color. We can select the appropriate term by clicking the checkbox on the left hand side of its name, and at the bottom of the page tick the box for ‘Descendants’ and ‘Add To New Concept Set’. Now we can go back to the ‘Concept Sets’, save the changes and close the window.

Next we create a new cohort by going to ‘Cohort Definitions’, clicking the ‘New Cohort’ button and by giving a name to our cohort.
In ‘Cohort Entry Events’, click the ‘Add Initial Event’ and select ‘Add Drug Exposure’. Since we used ATC codes for our concept set definition, we can treat them similarly to standard codes and input directly. We can add any other specific criteria, e.g. first time of use. Click again ‘Add attribute’ and select ‘Add First Exposure Criteria’.

Next, we need to include only those with a prior diagnosis of type 2 diabetes. In the ‘Inclusion Criteria’, click the ‘New inclusion criteria’, give it a name and click ‘Add criteria to group’. From the dropdown menu select ‘Add Condition Occurrence’. Since our concept set for type 2 diabetes was made using SNOMED (standard) codes, we can import it directly. Since we require for a prior diagnosis, we’ll need to modify the boxes below to include ‘where the event starts all days before and 0 days before index start date’. By doing this, we ensure that the diagnosis was given between 0 and any day before the drug use start date which is the index start date.

In the ‘Cohort Exit’, we define that ‘Event will persist until end of a continuous drug exposure’, since we are interested in the cohort of people who use the blood glucose lowering drugs. We need to import our concept set for drugs to define our drugs of interest. We can allow for a persistence window, e.g. a maximum of 30 days’ gap between prescriptions. Then, individuals with larger gaps will not be considered as continuous drug users and will not be included in our cohort.
Now we are ready to generate our cohort in the ‘Generation’ tab. In this example, we can see that there are 22,364 cases with this definition in FinnGen R12 data.

9. How to build a cohort filtering for the number of medications an individual has received

In the example 1. How to build a cohort based on a diagnosis using local (non-standard) codes we already used the option to limit the cohort to only those individuals who have at least three occurrences of the diagnosis. You can follow that example for building the cohort but use concept sets for medications instead of diagnosis. Similarly for medications, we can filter for the number of purchases. This can be done in the ‘Cohort Definitions’ ‘Inclusion Criteria’ section at the top of each criterion box by selecting the appropriate option from ‘at most’, ‘exactly’, and ‘at least’ and the number of occurrences.

10. How to build a cohort based on Drug Era

e.g. Parkinson’s Disease patients who have used levodopa medication continuously for at least five years after the diagnosis

In Atlas, you can build cohorts for the following ‘Eras’: ‘Condition’, ‘Dose’ and ‘Drug’. In this example we will focus on ‘Drug Eras’ which refer to continuous periods of drug use based on the active ingredient of a drug. For this, we will need to use the international RxNorm codes of drugs. The eras are calculated from the start date of the first purchase to the end date of the last purchase in a defined period. Note that in FinnGen data the end date is currently defined as start date + 1. Therefore, single drug purchases will be counted to have a length of 2 days. Also, in FinnGen data gaps larger than 120 days between drug purchases result in the calculation of distinct eras.

In this example we will use Parkinson’s Disease based on international SNOMED coding and levodopa medication based on international RxNorm coding.

Quick guide:

‘Concept Sets’: Create as usual, one for the condition and one for the drugs.
‘Cohort Definitions’: 1) In the ‘Cohort Entry Events’, click ‘Add Initial Event’, select ‘Add Condition Occurrence’ and import the concept set for the condition. 2) In the ‘Inclusion Criteria’, create a new inclusion criteria. Click ‘Add criteria to group’ and select ‘Add Drug Era’. Import the concept set for the drug. To have the drug use after the diagnosis and to be continuous for at least five years, click the ‘Add attribute’ and select ‘Add Era Length Criteria’. Select the Era length to be Greater than Equal to 1825 days, i.e. five years. Modify the section ‘where event starts between 0 days Before and All days After index start date’ to consider only drug purchases after the diagnosis. Generate the cohort.

Detailed instructions:

We start by creating two ‘Concept sets’, one for the disease and one for the drug. Go to ‘Concept Sets’ and click ‘New Concept Set’. Give a name to your concept sets and click ‘Add Concepts’ at the bottom of the page. A Search window will appear. Let’s write ‘Parkinson’s disease’. A list of standard (blue), and non-standard (red) codes will appear. We can limit our search to SNOMED codes by selecting it from the left hand panel ‘Vocabulary’. Now only standard codes will remain. We can click the SNOMED code ‘Parkinson’s Disease’ and inspect its hierarchy from the ‘Hierarchy’ tab. We can see all its Parents and Children. We can add it with its Children (‘Descendants’ tick box clicked) to the concept set. Now when we go back to the ‘Concept Sets’, we see that it includes 10 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.
Next, we create a concept set for the drug. Similarly, we create a new concept set and in the search window, write ‘levodopa’. We can again limit our search to RxNorm codes only. By clicking the ‘levodopa’ and inspecting it, we can conclude that it covers the information we are looking for. We can add this concept with its Children (‘Descendants’) to the concept set. Now when we go back to the ‘Concept Sets’, we see that it includes 4160 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.
Next we go to ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Select ‘Add Condition Occurrence’. Click the arrow next to the ‘Any Condition’ and import the concept set of Parkinson’s Disease you created in the first step. Since this was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly.
Next, go to ‘Inclusion Criteria’. Click the ‘New inclusion criteria’ and give it a name. Click ‘Add criteria to group’ and select ‘Add Drug Era’. Import the concept set for levodopa. Now we want to have the drug use after the diagnosis and to be continuous for at least five years. For this, we’ll click the ‘Add attribute’ and select ‘Add Era Length Criteria’. Next we will select the Era length to be Greater than Equal to 1825 days, i.e. five years. We also need to modify the section ‘where event starts between 0 days Before and All days After index start date’ to consider only drug purchases after the diagnosis.

Now, we are ready to generate our cohort. Save the changes and go to the ‘Generation’ tab. Select the data release in which you’d like to generate the cohort and click ‘Generate’. We can see that in R12 there are 758 individuals in our cohort. When we click the ‘View Report’, we see that in total there are 6,065 individuals with Parkinson’s Disease but only 12.5%, i.e. 758 of them have used levodopa medication continuously for at least five years after the disease diagnosis.

11. How to build a cohort using KELA reimbursement codes

The KELA reimbursement codes can be found in Atlas using the ‘Vocabulary’ REIMB. More detailed information on the mapping is provided here. In this example we will use the KELA reimbursement code 112 for ‘Severe psychotic and other severe mental disorders’ to define our cohort.

Quick guide:

‘Concept Sets’: The best way to search for the KELA reimbursement codes is by string search, e.g. here ‘psych’. To limit the search to KELA reimbursement codes, select REIMB from the ‘Vocabulary’ panel. Select the appropriate code and add it to the concept set. Note that the KELA reimbursement codes do not have ‘Descendants’ so no need to select that tick box.
‘Cohort Definitions’: In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Reimbursements are in the domain ‘Condition’ so select ‘Add Condition Occurrence’. Since reimbursement codes are based on non-standard codes, click the ‘Add attribute’ and select ‘Add Condition Source Concept’. Import the concept set, save changes and generate the cohort as usual.

Detailed instructions:

1. Go to ‘Concept Sets’ and click ‘New Concept Set’. Give a name to your concept sets and click ‘Add Concepts’ at the bottom of the page. A Search window will appear. Let’s write ‘psych’. It is better to write part of the string rather than the code itself, here 112, because writing the number will get Atlas stuck. After you press the Enter, a list of codes will appear. We can limit our search to REIMB codes by selecting it from the left hand panel ‘Vocabulary’. Now only such codes will remain. We can select the appropriate code and add it to the concept set. Note that the KELA reimbursement codes do not have ‘Descendants’ so no need to select that tick box. Now when we go back to the ‘Concept Sets’, we see that it has 1 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.

2. We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Note that reimbursements are in the domain ‘Condition’ so you’ll need to select ‘Add Condition Occurrence’. Since reimbursement codes are based on non-standard codes, we need to click the ‘Add attribute’ and select ‘Add Condition Source Concept’. By clicking the arrow next to the ‘Condition Source Concept is Any Condition’, we can import the concept set we created in step 1.

3. In this example we don’t need to define any additional ‘Inclusion criteria’. We can save the changes and go to the ‘Generation’ tab. Select the data release in which you’d like to generate the cohort and click ‘Generate’. We can see that in R12 there are 19,243 individuals in our cohort which is the same number as for our concept set since we didn’t apply any other criteria.

12. How to build a cohort using birth/delivery as a variable

e.g. a cohort of women who develop autoimmune disease within a year of giving birth

This example requires two concept sets: one for autoimmune disease and one for giving birth. Selecting only females will be done at the ‘Cohort Definitions’, although the concept of delivery should already filter for females.

Quick guide:

‘Concept Sets’: Create as usual, one for the disease and one for giving birth (delivery).
‘Cohort Definitions’: 1) Create a new cohort. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Delivery is in the domain ‘Procedure’ so select ‘Add Procedure Occurrence’ and import the concept set of delivery. Delivery should be recorded for females only but to make sure that our cohort covers only females, click ‘Add attribute’, select ‘Add Gender Criteria’ and click ‘Add’. A new window will open where you can write ‘female’. Select the correct one from the list by clicking the checkmark on the left, and click ‘Add And Close’. 2) In the ‘Inclusion Criteria’, create new inclusion criteria, click ‘Add criteria to group’ and select ‘Add Condition Occurrence’. Import the concept set for autoimmune disease. To restrict the diagnosis to within one year of giving birth, modify the section ‘where event starts between 0 days before and 365 days after index start date’. Generate the cohort.

Detailed instructions:

1. Go to ‘Concept Sets’ and click ‘New Concept Set’. Give a name to your concept sets and click ‘Add Concepts’ at the bottom of the page. A Search window will appear. Let’s write ‘autoimmune’. A list of standard (blue), and non-standard (red) codes will appear. We can limit our search to SNOMED codes by selecting it from the left hand panel ‘Vocabulary’. Now only standard codes will remain. We can click the SNOMED code ‘autoimmune disease’ and inspect its hierarchy from the ‘Hierarchy’ tab. We can see all its Parents and Children. This code seems reasonable for us so we will add it with its Children (‘Descendants’ tick box clicked) to the concept set. Now when we go back to the ‘Concept Sets’, we see that it includes 593 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.

2. Next, we create a concept set for giving birth. Similarly, we create a new concept set and in the search window, write ‘delivery’. We can again limit our search to SNOMED codes only. By clicking the ‘Delivery procedure’ and inspecting it, we can conclude that it covers the information we are looking for, including vaginal delivery of fetus and cesarean section. We can add this concept with its Children (‘Descendants’) to the concept set.

3. We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Note that delivery is in the domain ‘Procedure’ so you’ll need to select ‘Add Procedure Occurrence’. Click the arrow next to the ‘Any Procedure’ and import the concept set of delivery. Since this was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly. Delivery should be recorded for females only but to make sure that our cohort covers only females, click ‘Add attribute’, select ‘Add Gender Criteria’ and click ‘Add’. A new window will open where you can write ‘female’. Select the correct one from the list by clicking the checkmark on the left, and click ‘Add And Close’.

4. Next, go to ‘Inclusion Criteria’. Click the ‘New inclusion criteria’ and give it a name. Click ‘Add criteria to group’ and select ‘Add Condition Occurrence’. Import the concept set for autoimmune disease. Now we want to restrict the diagnosis to within one year of giving birth. For this, we’ll need to modify the section ‘where event starts between 0 days before and 365 days after index start date’.

5. Now, we are ready to generate our cohort. Save the changes and go to the ‘Generation’ tab. Select the data release in which you’d like to generate the cohort and click ‘Generate’. We can see that in R12 there are 661 women in our cohort. When we click the ‘View Report’, we see that in total there are 121,078 events (deliveries) but only 0.55%, i.e. 661 women develop an autoimmune disease within one year of giving birth.

13. How to build a cohort with multiple events per person

e.g. individuals with repeated fractures

In this example we create a cohort of persons who have at least two fractures. We also consider the time span between separate events as 120 days (4 months).

Quick guide:

‘Concept Sets’: Create as usual.
‘Cohort Definitions’: Create a new cohort. 1) In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Condition Occurrence’, and import the concept set. Now, the important part is to change the ‘Limit initial events to: earliest event per person’ to ‘Limit initial events to: all events per person’. 2) In the ‘Inclusion Criteria’, do the same change as above even if there were no other criteria. In this example we add one criteria of having at least 2 fractures. Press the ‘New inclusion criteria’, give it a name, and click the ‘Add criteria to group’ and select ‘Add Condition Occurrence’. Import the concept set and at the top of the box change the number to with ‘at least 2’ occurrences of. Finally, change the ‘Limit qualifying events to earliest event per person’ to ‘Limit qualifying events to all events per person’. 3) In the ‘Cohort Exit’, change the Event Persistence to ‘fixed duration relative to initial event’. Change also the Number of days offset from 0 to an appropriate period, e.g. to 120 days, meaning that fractures happening 120 days (4 months) apart are considered separate events. 4) Generate the cohort and inspect the numbers of people and records. There should be more records than people now.

Detailed instructions:

We start by creating a ‘Concept Set’ for fractures. Go to ‘Search’ and type ‘fracture of bone’. We can limit our search to SNOMED terms in the ‘Vocabulary’ panel and Clinical Finding in the ‘Class’ panel. It is often useful to sort the results by record count (RC). After sorting, we see that ‘Fracture of bone’ is the one with the highest record count.

By clicking the name we can inspect it further. If we go to the ‘Hierarchy’ tab we see that this concept has one Parent and 28 Children concepts. We are happy with our selection, so we can tick the checkbox on the left of the name, tick the ‘Descendants’ and finally click the ‘Add To New Concept Set’.

Next we can go to ‘Concept Sets’, and give a name to our new concept set. We see that there are 2920 concepts included. We can save the changes and close the window by clicking the appropriate buttons next to the name of the concept set.

We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Condition Occurrence’. Since the concept set was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly. Now, change the ‘Limit initial events to: earliest event per person’ to ‘Limit initial events to: all events per person’.

In the ‘Inclusion Criteria’, do the same change as above even if there were no other criteria. We, however, want to add a criteria for a minimum of two fractures per person. To do this, we press the ‘New inclusion criteria’, give it a name, and click the ‘Add criteria to group’ and select ‘Add Condition Occurrence’. We can import our concept set and at the top of the box change the number to with ‘at least 2’ occurrences of. Finally, we change the ‘Limit qualifying events to earliest event per person’ to ‘Limit qualifying events to all events per person’.

In the ‘Cohort Exit’, change the Event Persistence to ‘fixed duration relative to initial event’. Change also the Number of days offset from 0 to an appropriate period. In this example we change it to 120 days, meaning that fractures happening 120 days (4 months) apart are considered separate events.

Save the changes and generate the cohort in the ‘Generation’ tab in your selected release of data. By selecting R12, we note that there are 112,762 individuals in our cohort and the event count is greater than the number of individuals with 218,578 records, reflecting multiple fractures per person. Note that since we set the number of days offset to 120 days, for some individuals multiple fractures within this time frame are counted as one episode, and hence, the number of records is not at least twice the number of individuals. By varying the days offset to different numbers, you will observe that the number of records will change accordingly.

14. How to build a cohort by filtering by main/side diagnosis

Sometimes we want to consider only main or side diagnoses. In this example we use the previous example How to build a cohort with multiple events per person, e.g. individuals with repeated fractures to take only main diagnoses.

In ‘Cohort Definitions’, open and copy the previously generated cohort and give it a new name. In the ‘Cohort Entry Events’, click the ‘Add attribute’ and select ‘Add Condition Status’. Press the ‘Add’ button and a new window will open. Type ‘primary diagnosis’ in the search bar and click the ‘Search’ button. This will give you ‘Primary diagnosis’. Select this by ticking the checkbox on the left of the name and press ‘Add And Close’. Note that if you wanted to select a side diagnosis, you would type ‘Secondary diagnosis’ in the search bar.

Save the changes and generate the cohort in the ‘Generation’ tab in your selected release of data. Now we see that there are 107,141 individuals with 190,738 records.

15. How to build a cohort using Kanta lab values

Kanta lab values can be found in Atlas as concepts under the ‘Vocabulary’ as ‘LOINC’ in their harmonized form and as ‘LABfi_ALL’ in their original form. In this example we will build a cohort of individuals who have high fasting triglycerides value (>2.0 mmol/l) at least once.

Quick guide:

‘Concept Sets’: Go to ‘Search’ and type your search term, here ‘trigly’. Select the harmonized value which can be identified with the ‘Vocabulary’ as ‘LOINC’. You can check from Risteys that the id of your selected measurement matches with the OMOP id of the correct measurement. Select the correct term and create the concept set as usual.
‘Cohort Definitions’: Click ‘New Cohort’, give it a name and description. 1) In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Measurement’. 2) In the ‘Inclusion Criteria’, add a new criteria, click the ‘Add criteria to group’, select ‘Add measurement’ and import the concept set for the triglycerides. Click the ‘Add attribute’ and select ‘Add Value as Number Criteria’. Edit the number to be ‘Greater than 2’. 3) Save and generate the cohort as usual.

Detailed instructions:

We start by creating a ‘Concept Set’ for triglycerides. Go to ‘Search’ and type ‘trigly’. After sorting by record count (RC), we see that the harmonized value with ‘Vocabulary’ as ‘LOINC’ has the most records. We go to Risteys to check that this is the one we are after. In Risteys we also type ‘trigly’ and see that the OMOP id for the fasting triglycerides matches with the one in Atlas. We can tick the checkbox next to the id number and click the ‘Add To Concept Set’. There are no ‘Descendants’ in the lab values so no need to select that one. We can go to ‘Concept Sets’ from the left side panel, give a name to the concept and save it.

We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Measurement’. Since the concept set was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly.

We will add an ‘Inclusion Criteria’ for triglyceride values higher than 2 mmol/l. We click the ‘New inclusion criteria’ and give it a name. Next, click the ‘Add criteria to group’, select ‘Add measurement’ and import the concept set for the triglycerides. Now click the ‘Add attribute’ and select ‘Add Value as Number Criteria’.

Now a new criteria has appeared and we can edit the number to be ‘Greater than 2’.

Save the changes and generate the cohort in the ‘Generation’ tab in your selected release of data. With R12, we see that there are 85,737 individuals in our cohort.

16. How to export a cohort built in Atlas into R

Sometimes it is useful to work further with the cohort in other tools, such as R. There is no quick way to do the export in Atlas but we can use other tools in Sandbox to do this. The Cohort Operations tool can read in all the cohorts made in Atlas and it can also be used to export cohorts.

Quick guide:

In Sandbox, go to Applications > Sandbox > CohortOperations2.
On the left side panel, click the ‘Import Cohorts’. From the tabs, select ‘Atlas’. Type in the search bar the name of the cohort and select the correct one by ticking the checkbox on the left next to the cohort name and click ‘Import Selected’.
On the left side panel, click the ‘Export’. Under ‘Select cohort:’, choose the one you want to export and click ‘Export’.
A new window will open. Modify the name and the location accordingly and click the ‘Save’ button.

Detailed instructions:

In Sandbox, go to Applications > Sandbox > CohortOperations2.

On the left side panel, click the ‘Import Cohorts’. You can now choose from the tabs where the cohort is imported from. Select ‘Atlas’. A list of cohorts created in Atlas will appear. We use the search bar to type the name of the cohort created in Atlas. In this example we look for the cohort repeated_fractures[MK], so we’ll write ‘repeated fractures’ in the search bar. Cohorts including this in their name will appear. Select the one you are interested in by ticking the checkbox on the left next to the cohort name and click ‘Import Selected’.

Once the cohort has been imported into Cohort Operations, we can select ‘Export’ from the left side panel. Under ‘Select cohort:’, we can click the arrow to show a dropdown menu of imported cohorts. In this example we have imported only one cohort, so only that one will show and we will select that. Next we are ready to click ‘Export’ at the bottom of the page.

A new window will open. There we can modify the name of the cohort as well as the location where we want to save the cohort. Here we keep the name as it is and change the location to ‘Downloads’. Once ready, we click the ‘Save’ button. Now the cohort is saved as cohortname.tsv format into the ‘Downloads’ folder and can be read into R.

PreviousUsing Atlas in Sandbox NextDetailed guide

Last updated 2 days ago

Was this helpful?