Using Atlas in Sandbox
Last updated
Was this helpful?
Last updated
Was this helpful?
The figure below shows the full workflow of cohort building and analyses within Sandbox. In this tutorial we will focus on the part containing Atlas.
Atlas has a lot of functionalities, from cohort definitions to calculating incidence rates. However, for cohort building purposes, ‘Concept Sets’ and ‘Cohort Definitions’ are usually sufficient, with the addition of ‘Characterizations’ to inspect that the built cohort is as intended.
In the case of using a readily made phenotype definition from the OHDSI PhenotypeLibrary, you can proceed directly to the ‘Cohort Definitions’.
To use Atlas in the Sandbox, start by opening your IVM which may be of any size, including the smallest one, as Atlas does all the data fetching using BigQuery.
In Sandbox, select Applications > Sandbox > Atlas.
These can be for instance medical codes (ICD, SNOMED) or drug purchases (ATC, VNRfi, RxNorm).
Search for the concept (medical code, drug purchase etc.) using the ‘Search’ function: either as strings, as ICD codes, as SNOMED codes, etc.
Use the ‘Descendants’ tick box to include sub codes of a diagnosis/medication main code.
Use the ‘Exclude’ tick box to exclude specific sub codes from the concept set you are creating.
International standard codes, such as SNOMED codes, are displayed in blue color and local non-standard codes, such as ICD codes, are displayed in red color.
You should not mix standard and non-standard codes in a single concept set. This is because concept sets including standard codes can be uploaded in the ‘Cohort Definitions’ directly while concept sets including non-standard codes need to be added as an attribute as ‘source’ concepts. If you mix standard and non-standard codes into one concept set, you will have to choose whether to upload the concept set in the cohort definition directly or via attribute as a source concept, and then Atlas will search for individuals from only one type of code, standard or non-standard, not both, depending on how you uploaded the concept set. Therefore, if you need both standard and non-standard codes, put them into separate concept sets and upload separately in ‘Cohort Definitions’.
Note that ATC codes are in the ’Standard concept’ classification neither standard nor non-standard but in their own category classification and shown in purple color. When inputting them in the ‘Cohort Definitions’, they can be treated similarly to standard codes.
For standard codes, explore the ‘Hierarchy’ to see the ‘Parents’ and ‘Children’ (i.e. ‘Descendants’) of the code and to help you decide which code to select. For non-standard codes this is not available.
Columns RC, DRC, PC, DPC refer to record count, descendant record count, person count, and descendant person count, respectively. A single concept with hierarchy, e.g. an ICD-10 code A10, will have descendant records and persons for the sub codes A10.0, A10.1 and A10.5, for instance. Record and person counts of all these subcodes are included in DRC and DPC. It is possible to have 0 records/persons for the main code, but descendant records/persons for the subcodes. It is good practice to sort by RC to see in which codes there are records in FinnGen data.
Use the ‘Concept Sets’ in ‘Cohort Definitions’ to define the ‘Cohort Entry Events’, ‘Inclusion Criteria’ and ‘Cohort Exit’. You will need to build separate cohorts for cases and controls.
In the ‘Define’ tab give a name to your cohort and add definitions for the ‘Cohort Entry Events’, ‘Inclusion Criteria’ and ‘Cohort Exit’.
‘Cohort Entry Events’ defines the starting point for the cohort
For case cohort: can be anything from the Atlas dropdown menu ‘Add initial event’, e.g. first diagnosis (‘Add Condition Occurrence’), drug purchase (‘Add Drug Exposure’), etc. The entry to the cohort should be clearly defined, avoiding entries such as ‘Any Visit Occurence’ without a specification.
For control cohort: usually ‘Any Visit Occurrence’ meaning the entry to any of the registers since they are a group of people with no conditions.
Concept sets based on non-standard codes need to be imported as source concepts: click the ‘Add attribute’ and use the relevant ‘Source Concept Criteria’.
By default, the codes are searched from all the available FinnGen registers in Atlas. If you want to filter by a specific register, you can use the readily made concept sets for different registers (search for ‘FinnGen support concept set’) and filter for them in the ‘Cohort Entry Events’ (see Examples).
‘Inclusion Criteria’ defines the inclusion to the cohort more specifically, e.g. by number of drug purchases, etc.
To create a cohort based on multiple concepts, e.g. conditions and drugs, in the ‘Inclusion Criteria’ box above all the criteria you have added, there is a dropdown menu to choose from whether the inclusion is based on all, any, at least or at most of the criteria.
Concept sets based on non-standard codes need to be imported as source concepts: click the ‘Add attribute’ and use the relevant ‘Source Concept Criteria’.
‘Cohort Exit’ defines when a person exits a cohort
Usually the default given by Atlas is sufficient
Modify this if you want to create a cohort where a person can enter more than once,e.g. with multiple fractures.
Creating a control cohort:
Copy the case cohort
Adjust the ‘Cohort Entry Events’ to ‘Any Visit Occurrence’ if appropriate
In the ‘Inclusion Criteria’, adjust any condition/drug purchase to exactly 0 occurrences or delete completely
Add any new inclusion criteria, e.g. the controls may need to be free of some other conditions.
Creating a cohort by exporting a JSON code e.g. from OHDSI PhenotypeLibrary:
Use the ‘Export’ tab and select the ‘JSON’ button
Paste the JSON code from Sandbox Clipboard – if needed, in small chunks
Click the ‘Reload’ button at the bottom of the screen. The cohort definitions should have appeared in the ‘Define’ tab
Final step: go to the ‘Generate’ tab and choose the FinnGen data release in which you would like to generate the cohort.
View the report for the number of individuals included in the cohort. This is an essential step because without successful cohort generation the cohort cannot be found and applied to further analyses.
Using existing FinnGen endpoints: use the Cohort Operations tool in Sandbox, where endpoint cohorts can be imported directly from the ‘Endpoint’ tab
Inspect the cohorts by using the Atlas function ‘Characterizations’ and/or the separate Cohort Operations tool and/or other tools in Sandbox.
In the Atlas ‘Characterizations’, import the case and control cohorts using the ‘Design’ tab and next, choose the features that you want to characterize in each cohort, for example age and gender
In the ‘Executions’ tab, generate the report in your preferred FinnGen data release and view the report directly there
Standard concept: standard (international)
SNOMED, LOINC, RxNorm
Standard concept: non-standard (local)
ICD8fi, ICD9fi, ICD10fi, ICD10, ICPC, NCSPfi, VNRfi
Standard concept: classification
ATC
Concept Set
A set of codes based on a diagnosis, drug purchase, drug reimbursement, etc. Each set is based either on standard or non-standard codes but not on both. E.g. a concept set 1 on disease X based on ICD codes or a concept set 2 on disease X based on SNOMED codes. The concept sets will be used in the ‘Cohort Definitions’ to define ‘Cohort Entry Events’, ‘Inclusion Criteria’ and ‘Cohort Exit’.
Concept Set: Descendants
Descendants are the sub codes of ICD or ATC codes, e.g. A10.1.
Concept Set: RC, DRC, PC, DPC
Record count (RC) and person count (PC) refer to the counts for main codes, e.g. for ICD-10 code A10, whereas descendant record count (DRC) and descendant person count (DPC) refer to the counts for the sub codes, e.g. ICD-10 code A10.1.