DOWNLOAD
Files
- ldpac.tar.gz : ldpac software package (64 bit UNIX)
- hapmap3_r2_b36.tar.gz : required HapMap data (600Mb, compressed).
- drawing_scripts.tar.gz : python scripts for drawing plots
Version
(2011-08-29) v1.0.0 releasedUSAGE
Installation
Unpack the downloaded file into a directory, using the commandtar xzf ldpac.tar.gz
The contents include- ldpacy.pyc : compiled python file (under python version 2.4.3)
- lib : directory including 3 necessary files (namely ldpac, glexport, and glfilter)
tar xzf hapmap3_r2_b36.tar.gz
The directory named hapmap3_r2_b36 will be created.File Preparation
Put rsID and p-values of SNPs, as in example file . You should put neighboring SNPs in a region, not only the significant SNPs, because our method uses information from SNPs in LD. The simplest and safe approach is put every SNP in the genome in the file. SNPs with P-values "NA" or "N/A" will be excluded automatically.Run
Go to the directory where ldpac.pyc is located. An example running command ispython ldpac.pyc -p ceu -f inputfile -h hapmap_directory -o output_directory
Python (version 2.4.3 or above) must be installed. The required and optional parameters are as follows.Required options: | |
-p,--population [pop] | (Required) Population name. [pop] must be one of ( asw ceu chd gih jpt+chb mex mkk lwk tsi yri ) |
-f,--input_file [file] | (Required) Input file |
-h,--hapmap_data_dir [dir] | (Required) Directory where HapMap data is stored. |
-o,--output_dir [dir] | (Required) Output directory where results will be stored. |
Advanced options: | |
-t,--pvalue_threshold [number] | P-value threshold. Only the associations above this threshold will be tested for spurious associations. (Default: 5E-7) |
-w,--window_size [number] | Window size in basepairs within which neighboring markers will be chosen. (Default: 500000) |
-r,--r2_threshold [number] | r-square threshold above which neighboring markers will be chosen. (Default: 0.1) |
-m,--max_num_proxy [number] | Maximum number of neighboring markers. The computation is exponential to this number, so too large number is not recommended. (Default: 20) |
-n,--min_num_proxy [number] | Minimum number of neighboring markers. The test is not performed if this criterion is not met. (Default: 3) |
-d,--decision_threshold [number] | Log-likelhood ratio threshold used in prediction. (Default: 3.0) |
-l,--lib_dir [dir] | Directory where actual executable files are located. (Default: lib) |
-v,--verbose | Increment verbosity (Default: False) |
Output
In the output directory specified by "-o" option, there will be many output files. The most important output file is association_prediction.txt which looks like
rs10496113 2 64519474 4.68000e-07 T -5.090
rs1387005 3 48626895 2.49000e-07 U
rs11939325 4 82636167 2.51000e-07 T -3.488
rs11746274 5 54086366 7.38000e-09 U
rs13355280 5 94858271 3.79000e-08 T -5.681
rs17084927 5 94964949 9.30000e-08 T -5.955
rs6864080 5 136801905 4.01000e-08 U
2nd column is chromosome.
3rd column is Position.
4th column is input p-value.
5th column is prediction.
6th column is LDPAC likelihood ratio statistic (LR).
Prediction can be T, S, A, and U.
- S : SPURIOUS: LR > +3.0
- T : TRUE: LR < -3.0
- A : AMBIGUOUS: -3.0 < LR < +3.0
- U : UNTESTED: Untested because the SNP has too few (< 3) neighboring markers (r2>0.1) in LD
Other output files are as follows.
association_not_in_hapmap.txt includes SNPs that are not found in the HapMap3 data we have and therefore are not processed.
inputsnps_in_hapmap.txt is an intermediate file that is used by our program and contains the SNPs found in our HapMap data.
plotdata.txt (1) contains the all SNP data and prediction sorted by their positions. This file is created to fascilitate plot drawing.
rsXXXXXXX is a directory created for each SNP that is tested. Each directory contains the followings.
neighbors.rsid contains the neighboring SNPs' rsIDs to the SNP tested. logfile contains the correlation structure information between SNPs. plotdata.txt (2) is created to fascilitate plot drawing. The last column is the r-square of each neighboring SNPs to the SNP tested.
Plot Drawing
We provide simple python scripts that draw plots. These scripts are the ones used to draw plots in our web server. Although the plots are not aesthetically perfect, we provide the scripts so that one can freely modify and use them to create better plots. gnuplot must be installed (scripts are based on gnuplot v4.3, so it may not work in other versions).The usage is
python draw_manhattan.py plotdata.txt 1E-7 tmp_commands.txt plot.png
1st argument is the plotdata.txt (1)2nd argument is p-value threshold to draw the horizontal line
3rd argument is any temporary file that will contain the gnuplot commands
4th argument is the output PNG file.
To draw plot for each SNP,
python draw_snpplot.py plotdata.txt tmp_commands.txt snp.png
1st argument is the plotdata.txt (2) created for each SNP2nd argument is any temporary file that will contain the gnuplot commands
3rd argument is the output PNG file.