How to check case counts from the data
Last updated
Last updated
There are several different ways to search for the number of individuals with a certain endpoint, or a certain FinnGen health register code. Below are a few examples of how to do such searches using FinnGen data.
If you would like to know the number of individuals with a certain endpoint, the easiest way to do this is to use the Risteys browser. Please take look at the topic How to use Risteys as an endpoint browser for more instructions on how you can use Risteys for finding out the number of individuals with certain endpoint, and also for more in-depth endpoint statistics.
You can check the number of individuals with certain endpoint also by using R in Sandbox.
library(data.table); library(dplyr); library(Rutils)
end<-fread("finngen_R8_endpoint.txt.gz",
data.table=F)
end1<-end[,c("FINNGENID", "U22_COVID19_CONFIRMED")]
covid_conf<-filter(end1, U22_COVID19_CONFIRMED==1);dim(covid_conf)
length(unique(covid_conf$FINNGENID))
Remember that the location of the endpoint and control files, including, the definitions, endpoint short and long names, can be found from the Location of FinnGen Endpoint and Control Description Files section. Interpretation of endpoint definition file section gives you more information on how to read endpoint definition file.
If you would like to know how many individuals in the FinnGen data have a certain health register code in detailed longitudinal data you can also use the Atlas tool in the Sandbox for this purpose.
Before you start any search, remember to check that you are searching for the correct codes. For example, Finnish ICD10 codes are somewhat different than international ICD10CM codes. Take a look at the location of translation file for Finnish register codes to find out where the translations of the Finnish register codes can be found. From these files you can search the condition you are interested in, and the codes related to that condition. (NB: selecting codes should always be done with help of medical professionals who understand how the codes are used and how the underlying register data affects their usage.)
Here we use as an example ICD10 code L20 for atopic dermatitis. We are interested in how many individuals have been diagnosed with L20 ICD10-code the FinnGen data.
First, launch Atlas by going to the Applications menu on the Sandbox and selecting Finngen>Atlas.
Once Atlas opens, in the Search menu on the left you can type in L20 and it will show you all the codes with L20. There is also a column "RC" (for record count), which tells how many times that code has been seen (some individuals may have the count more than once, so this is different than a count of individuals). When typing "atopic" to the Search menu, you can see other "atopic" related codes, such as S87 which is ICPC2 code. (ICPC2 codes are assigned in primary care, you can read more about them at Finnish Health Registries and Medical Coding.)
Take a look at the topic How to define a cohort in Atlas for more detailed instructions. By default, all registers included in the detailed longitudinal data are used in the search. However, if only certain register(s) are of interest results can be filtered by register by following the instructions in the topic filtering by clinical registries in Atlas.
Currently, Atlas contains only the detailed longitudinal data file. There are some other specialty registers such as kidney, vaccination, and the birth register that are not yet rolled into the detailed longitudinal data file. If you are interested in counts from these other FinnGen registries; eg. you would like to know how many individuals have a certain health register code in the vaccination register, or you would like to search codes from the detailed longitudinal data using R, take a look at the instructions below.
If you would like to know how many individuals have certain health register code(s) in other FinnGen registers, or in the detailed longitudinal data, and you are comfortable using R, you can do also do the search in the Sandbox using R.
Let's use the same example as above; how many individuals have atopic dermatitis, determined by ICD10 code L20. For this example, let's search only from the inpatient Hilmo registry (INPAT) and specialist outpatient Hilmo registry (OUTPAT) (both of these registries are included in the detailed longitudinal data).
The first step is to double-check that you are searching for the correct codes (as above). Here you can find the location of the translation file for register codes.
When you are sure about the codes you are interested in, you can do the search as below.
Here we are searching ICD10-code L20 from the inpatient Hilmo registry (INPAT) and specialist outpatient Hilmo registry (OUTPAT). We are searching all ICD10 codes that begin with L20 (^in front of the code says the code has to begin there)
, because all of them are related to atopic dermatitis, and we use both symptom (CODE1) and cause (CODE2) codes for the search.
library(data.tabe); library(dplyr); library(stringr)
foo<-fread("finngen_R8_detailed_longitudinal_data.txt",
data.table=F)
foo1<-filter(foo, SOURCE=="INPAT" | SOURCE=="OUTPAT")
foo2<-filter(foo1, ICDVER=="10" & str_detect(CODE1,"^L20") | ICDVER=="10" & str_detect(CODE2,"^L20"))
length(unique(foo2$FINNGENID))
We can search the same way, for example, for ATC-codes for the bacterial vaccines (codes beginning J07A), from the vaccination register:
library(data.tabe); library(dplyr); library(stringr)
foo<-fread("finngen_R8_vaccination_register.txt",
data.table=F)
foo2<-filter(foo, str_detect(DRUG,"^J07A"))
length(unique(foo2$FINNGENID))
Finally, if you don't have Sandbox access, and you would to do a lookup like above, you can send a lookup request to the email finngen-lookups@helsinki.fi, and we will take care of it!