DOWNLOAD

Files

Version

(2011-08-29) v1.0.0 released


USAGE

Installation

Unpack the downloaded file into a directory, using the command

tar xzf ldpac.tar.gz

The contents include
  • ldpacy.pyc : compiled python file (under python version 2.4.3)
  • lib : directory including 3 necessary files (namely ldpac, glexport, and glfilter)
Unpack the downloaded HapMap data into a directory (e.g. the same directory), using the command

tar xzf hapmap3_r2_b36.tar.gz

The directory named hapmap3_r2_b36 will be created.

File Preparation

Put rsID and p-values of SNPs, as in example file . You should put neighboring SNPs in a region, not only the significant SNPs, because our method uses information from SNPs in LD. The simplest and safe approach is put every SNP in the genome in the file. SNPs with P-values "NA" or "N/A" will be excluded automatically.

Run

Go to the directory where ldpac.pyc is located. An example running command is

python ldpac.pyc -p ceu -f inputfile -h hapmap_directory -o output_directory

Python (version 2.4.3 or above) must be installed. The required and optional parameters are as follows.
Required options:
-p,--population [pop] (Required) Population name. [pop] must be one of ( asw ceu chd gih jpt+chb mex mkk lwk tsi yri )
-f,--input_file [file] (Required) Input file
-h,--hapmap_data_dir [dir] (Required) Directory where HapMap data is stored.
-o,--output_dir [dir] (Required) Output directory where results will be stored.
Advanced options:
-t,--pvalue_threshold [number] P-value threshold. Only the associations above this threshold will be tested for spurious associations. (Default: 5E-7)
-w,--window_size [number] Window size in basepairs within which neighboring markers will be chosen. (Default: 500000)
-r,--r2_threshold [number] r-square threshold above which neighboring markers will be chosen. (Default: 0.1)
-m,--max_num_proxy [number] Maximum number of neighboring markers. The computation is exponential to this number, so too large number is not recommended. (Default: 20)
-n,--min_num_proxy [number] Minimum number of neighboring markers. The test is not performed if this criterion is not met. (Default: 3)
-d,--decision_threshold [number] Log-likelhood ratio threshold used in prediction. (Default: 3.0)
-l,--lib_dir [dir] Directory where actual executable files are located. (Default: lib)
-v,--verbose Increment verbosity (Default: False)

Output

In the output directory specified by "-o" option, there will be many output files. The most important output file is association_prediction.txt which looks like

rs10496113 2 64519474 4.68000e-07 T -5.090
rs1387005 3 48626895 2.49000e-07 U
rs11939325 4 82636167 2.51000e-07 T -3.488
rs11746274 5 54086366 7.38000e-09 U
rs13355280 5 94858271 3.79000e-08 T -5.681
rs17084927 5 94964949 9.30000e-08 T -5.955
rs6864080 5 136801905 4.01000e-08 U

1st column is rsID.
2nd column is chromosome.
3rd column is Position.
4th column is input p-value.
5th column is prediction.
6th column is LDPAC likelihood ratio statistic (LR).
Prediction can be T, S, A, and U.
  • S : SPURIOUS: LR > +3.0
  • T : TRUE: LR < -3.0
  • A : AMBIGUOUS: -3.0 < LR < +3.0
  • U : UNTESTED: Untested because the SNP has too few (< 3) neighboring markers (r2>0.1) in LD
The default threshold of LD (3.0) can be changed using "-d" option. The default criteron for not testing a SNP can be changed using "-r" and "-n" options.

Other output files are as follows.
association_not_in_hapmap.txt includes SNPs that are not found in the HapMap3 data we have and therefore are not processed.
inputsnps_in_hapmap.txt is an intermediate file that is used by our program and contains the SNPs found in our HapMap data.
plotdata.txt (1) contains the all SNP data and prediction sorted by their positions. This file is created to fascilitate plot drawing.
rsXXXXXXX is a directory created for each SNP that is tested. Each directory contains the followings.
	neighbors.rsid contains the neighboring SNPs' rsIDs to the SNP tested.
	logfile contains the correlation structure information between SNPs.
	plotdata.txt (2) is created to fascilitate plot drawing. 
	    The last column is the r-square of each neighboring SNPs to the SNP tested.
	

Plot Drawing

We provide simple python scripts that draw plots. These scripts are the ones used to draw plots in our web server. Although the plots are not aesthetically perfect, we provide the scripts so that one can freely modify and use them to create better plots. gnuplot must be installed (scripts are based on gnuplot v4.3, so it may not work in other versions).
The usage is

python draw_manhattan.py plotdata.txt 1E-7 tmp_commands.txt plot.png

1st argument is the plotdata.txt (1)
2nd argument is p-value threshold to draw the horizontal line
3rd argument is any temporary file that will contain the gnuplot commands
4th argument is the output PNG file.
To draw plot for each SNP,

python draw_snpplot.py plotdata.txt tmp_commands.txt snp.png

1st argument is the plotdata.txt (2) created for each SNP
2nd argument is any temporary file that will contain the gnuplot commands
3rd argument is the output PNG file.