This R package implements the cross-population empirical Bayes method, XPEB, described in Coram et al. AJHG 2015. XPEB takes as input P-value summary statistics from two GWAS, a target-GWAS from for example an ethnic minority population of primary interest and an auxiliary base-GWAS such as a larger GWAS in Europeans, and reprioritizes SNPs in the target population to compute local false discovery rates. XPEB also estimates the degree of overlap in the genetic architecture underlying the trait between the two populations.
It is assumed that R 3.1 or later has been installed on your machine. R is a free software, and can be downloaded here for Linux, Mac OS X, and Windows machines.
There are currently two ways to install the XPEB package:
1. At the command prompt, using the local tar.gz
cd to the directory that contains the tar.gz source file
type at the command prompt:
> R CMD INSTALL XPEB_0.1.1.tar.gz
2. From the R console, using the local tar.gz
set the working directory to that of the tar.gz source file
type at the prompt in the R console
Input files requirements
Two input text files are required, one for the target-GWAS and one for the base-GWAS, that contain the P-values for the target and base-GWAS. These files are white space delimited, and have a header. The target-GWAS file includes four columns with names: SNP, CHR, BP, and P. The base-GWAS file includes two columns named: SNP and P. These files contain the GWAS results for each marker, one marker per line. CHR, BP, and P are numeric. Missing data are allowed in the P column and coded as NA.
...SNP CHR P
rs1110052 863421 0.4288
rs3748595 1 877423 0.7718
#In the R console, load the XPEB package:
#Unzip the example files from the package and retrieve their path
> path.target <- system.file("extdata", "target.gwas.txt.zip", package="XPEB")
> path.target <- "TMPinput/target.gwas.txt"
> path.base <- system.file("extdata", "base.gwas.txt.zip", package="XPEB")
> path.base <- "TMPinput/base.gwas.txt"
#Run XPEB on the example files
> res <- run.xpeb(path.target=path.target,path.base=path.base,n.target=1e4,n.base=1e5)
#Save the locfdr calculation results to a text file
> write.table(res$locfdr, file="locfdrResults.txt", sep="\t", quote=F, row.names=F)
#For help on the run.xpeb() function:
#To run the example included in the package type:
Options for run.xpeb()
string indicating the path to the GWAS result files in the target and base populations.
gc.target and gc.base:
T or F to indicate whether a genomic control correction should be applied or not. Default is to apply the genomic control correction.
n.target and n.base:
median sample size in target and base.
number of iterations for the MCMC. Default is the recommended 1e6.
res$kappa0 is the estimated kappa0,
res$kappa1 is the estimated kappa1 or overlap
res$locfdr is a data frame with the XPEB locfdr calculation for the target population markers, limited to the markers with non missing P-values in the target, and sorted by CHR and BP.
As input, XPEB takes the sample size in target and in base GWAS. If, as is often the case, sample size is different for each marker, we recommend setting the sample size to the median sample size for all markers. Ideally the sample size for the markers of each GWAS should be similar but variation within 10% of the median sample size is acceptable. Markers with very small or very large sample sizes should be trimmed out before running XPEB. We will be working on accommodating variable sample size in future XPEB versions.
Coram MA, Candille SI, Duan Q, Chan KHK, Li Y, Kooperberg C, Reiner AP, Tang H. (2015) Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach. Am J Hum Genet 96:740-52.
Fixed an issue with the handling of missing values, and the overlap output.
Thank you for using XPEB! Please email us if you have any problems running the program, or if you have ideas to make the program more user-friendly. We'd love to hear from you!