PheWeb Users Input Validator tool

The PheWeb Users Input Validation tool is used to validate the correct format of user-formatted input files to make custom GWAS summary stats viewable in a PheWeb-style. User needs to provide two files: metadata file in JSON format (1) and statistics file (2). Prepare your files according to the instructions. There are two modes in which PheWeb Users Inputs Validator tool can perform scanning of user-provided statistics file: deep and shallow (specified by the parameter "--deep true/false", see the user manual below). With the deep mode, the whole stats file is scanned while with the shallow mode ~80k lines are subsampled from the stats file and subjected to the scan. Otherwise, scanning for the same issues is performed in either of the modes.

The following scans are performed by the PheWeb Users Inputs Validator tool.

Metadata file:

  1. Check for special characters in metadata.

  2. Check that metadata contains all required fields.

  3. Check that metadata fields have the correct format.

  4. Check that metadata field "name" matches the stats filename.

Statistics file:

  1. Check that file is compressed.

  2. Check that file is tab-delimited.

  3. Check that the column order is correct.

  4. Check that file doesn't contain special characters.

  5. Check that the chromosome column has correct formatting, i.e. only contains values 1-24, X, Y, M, MT.

  6. Check that columns 7-11 (beta, sebeta, af_alt, af_alt_cases, af_alt_controls) don't contain missing values.

  7. Check that columns 2-11 have correct formatting, i.e. according to the instructions.

  8. Check that the stats file doesn't contain unsorted positions.

Automated fixing

In addition, a user can enable automated fixing of issues detected by the validator in the user-specified files by setting the parameter "--fix true" (see the user manual below). The following fixes can be done by the validator when possible:

  • Remove special characters from the metadata file.

  • Remove special characters from the stats file.

  • If the stats file is space/comma delimited, it will be fixed to be tab-delimited.

  • If the stats file contains missing values in columns 7-11, they will be substituted with value 0.5.

  • Remove chromosome prefix, e.g. "chr1" change to "1" if the chromosome column contains that.

  • Sort the stats file if unsorted positions are found.

  • Fix column order/number to contain 11 columns as described in the instructions.

Examples of what cannot be fixed by the PheWeb Users Input Validator tool:

  • Incorrect values in the columns, for instance, negative p-values.

  • Name of the stats file specified in the "name" field of the metadata.json file.

  • Some special characters cannot be recognized by the validator.

How to run PheWeb Users Input Validator tool

Step 1. Open the PheWeb user input validator tool from the Applications menu in the Sandbox.

The PheWeb Users Input Validator tool will open in a Terminal root window as follows

Step 2. Read PheWeb Validator help page type to the Terminal root validator.py -h or in the FinnGen github repository.

Step 3. Copy the STATS and METADATA files to your home directory /home/ivm/, if not already there. You can copy the files using copy-paste with File Manager opening from the Sandbox Applications menu or with Terminal ivm by typing

cp /path/to/source_folder/stats_file /home/ivm/

For example

(Note that command cp won't work within Terminal root / PheWas validator tool)

Step 4. Start validation script in Terminal root by typing

validator.py -s /home/STATS_file -m /home/metadata_file -o /home/ --fix False --deep False

For example

validator.py -s /home/C3_COLORECTAL.gz -m /home/metadata.json -o /home/ --fix False --deep False

The above example will provide the following report saying that the formats of the input files are correct. The path to the report saved to the home directory is also given at the end of the report.

Report `report.txt` file created by the validator contains the following:

Output files generated by the PheWeb Users Input Validator tool:

  1. Full report on the results of validator scanning will be saved in file <DIR_OUT>/scan<SCAN_TIMESTAMP>/report.txt.

  2. (Optional) If the fixing mode is activated and some issues were fixed by the validator, a new stats file is written to <DIR_OUT>/scan<SCAN_TIMESTAMP>/<STATS_FILENAME>.

  3. (Optional) Lines from the stats file in which validator was able to detect issues are saved in the file <DIR_OUT>/scan<SCAN_TIMESTAMP>/<STATS_FILENAME>_lines_with_errors.

Last updated