How to make your summary stats viewable in a PheWeb-style?
Following these instructions, you can set up a PheWeb browser for summary statistics at https://userresults.finngen.fi/
Two files are needed for each results: "metadata.json" and summary statistic file. You can either prepare the files outside of sandbox and upload them to the green bucket or prepare them within sandbox and request file download.
Data is imported using an automatic pipeline. For this pipeline it is important that the data follows exact input standards. Preparation of files is responsibility of requester.
1. Prepare summary statistics file
Summary statistics should be in a single tab limited file with 11 columns:
column number | column name | allowed values |
1 | #chrom | 1-24, X, Y, M, MT |
2 | pos | integer |
3 | ref | must match reference genome |
4 | alt | anything |
5 | pval | number in [0,1] |
6 | mlogp | number (Inf not allowed) |
7 | beta | number |
8 | sebeta | number |
9 | af_alt | number in (0,1) |
10 | af_alt_cases | number in (0,1) |
11 | af_alt_controls | number in (0,1) |
If there are missing values in columns 7-11, they should be set to a number value, e.g. 0.5.
Order summary statistics by #chrom and pos.
Name the summary statistics file "[phenotype_name].gz" e.g. C3_COLORECTAL.gz without any extra suffices (such as C3_COLORECTAL.txt.gz). The name of the file should be identical with the attribute "name" in the metadata.json (exept for the suffix .gz).
Metadata file should be named as metadata.json. Other type of naming for metadata file will cause error in the run. Example file: gs://finngen-production-library-green/finngen_R6/sandbox_custom_gwas/C3_COLORECTAL/C3_COLORECTAL.gz
Additional format requirements:
No extra white space are allowed in columns.
Variable names should be exactly as in instructions and contain no typos.
File format should be UNIX. No other line break types are allowed. You can change your file to UNIX format in Terminal with
zcat $infile | dos2unix | gzip > $outfile
2. Prepare metadata.json file
The metadata is used to build the entry in PheWeb. See examples from pheweb: https://userresults.finngen.fi/
The json should contain this metadata:
{"admin_email": ["finngen-servicedesk@helsinki.fi"], "analysis_type": "additive", "category": "custom", "description": "R6 Colorectal cancer", "freeze": 6, "name": "C3_COLORECTAL", "num_cases":4957, "num_controls": 304197, "output_bucket": "gs://finngen-production-library-green/finngen_R6/sandbox_custom_gwas/C3_COLORECTAL", "pheno_coding": "binary", "submitter": ["firstname.lastname@finngen.fi"], "submitter_email": ["first.last@domain.com"], "title": "C3_COLORECTAL"}
Note: ”name” must match the summary statistics file name: C3_COLORECTAL.gz -> "name": "C3_COLORECTAL"
Example: gs://finngen-production-library-green/finngen_R6/sandbox_custom_gwas/C3_COLORECTAL/metadata.json
Please change only the fields: “description”, “freeze”, “name”, “num_cases”, “num_controls”, “submitter”, “submitter_email” and “title”. Output bucket path is dynamic and will change with new FinnGen releases. Data managers will know where to copy your files so that the results appear in https://userresults.finngen.fi/.
Note that you can have data from any earlier freeze too, it should work fine as long as the file formats are correct.
Copy the files to a folder in the green bucket
Note: The green bucket is not writeable from within sandbox. You can either prepare the files outside of sandbox and upload them to the green bucket or prepare them within sandbox and request file download (requiring security check by data managers).). The file pair should be copied to the same folder and this folder should preferably have the same name as your summary stats file.
For instance, if you are working in FIMM Sandbox (number 6):
gs://fg-production-sandbox-6_greenuploads/C3_COLORECTAL/C3_COLORECTAL.gz
gs://fg-production-sandbox-6_greenuploads/C3_COLORECTAL/metadata.json
3. Validate input files
Use the PheWeb users input validator tool to validate your PheWeb input files.
4. Send an email to FinnGen servicedesk
When the data is prepared accordingly, please send a message to finngen-servicedesk@helsinki.fi pinpointing the files. Data manager will copy the data to the library green bucket that is linked to the PheWeb browser, e.g. for C3_COLORECTAL gs://finngen-production-library-green/finngen_R6/sandbox_custom_gwas/C3_COLORECTAL/
The results will be available the next day at https://userresults.finngen.fi/. Loading to PheWeb occurs once in 24h.
Last updated