GWAS分析
- Plink
- FaST-LMM
- TASSEL 有Windows版本
- FaST-LMM-Select
- GAPIT基于R
GAPIT和Tassle等的对比。
TASSEL5命令行模式运行方法
1. 典型的MLM(混合线性模型)分析管道命令如下:
perl run_pipeline.pl
-fork1 -h genotype.hmp -filterAlign -filterAlignMinFreq 0.05 注:导入基因型数据并过滤
-fork2 -r trait.txt 注:导入表型数据
-fork3 -r pop_structure.txt -excludeLastTrait 注:导入群体结构数据
-fork4 -k kinship.txt -combine5 -input1 -input2 -input3 -intersect -combine6 -input5 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel None -export result 注:导入kinship矩阵,合并表型、基因型和群体结构,设定MLM参数
2. 例子:在此基础上结合批处理实现对基因型数据的MLM分析:
perl run_pipeline.pl
-fork1 -h ./hmp/mouse.hmp -filterAlign -filterAlignMinFreq 0.05
-fork2 -r trait.txt
-fork3 -r pop_structure -excludeLastTrait
-fork4 -k kinship -combine5 -input1 -input2 -input3 -intersect -combine6 -input5 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel None -export result
将以上脚本保存为bat格式,放在TASSEL5的安装目录里,其它数据也放在安装目录中。
3.tassel5路径
nohup /home/guo/tool/tasseladmin-tassel-5-standalone-5100767ae9e7/run_pipeline.pl -fork1 -h All_Merged_1.25M_MAF0.05.hmp -filterAlign -filterAlignMinFreq 0.05 \
-fork2 -r 2011—.txt\
-fork3 -r 513lines_27229snps_pop_structure_110608.txt -excludeLastTrait \
-fork4 -k 513lines_27229snps_kinship_110608.txt -combine5 -input1 -input2 -input3 -intersect -combine6 -input5 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel None -export result &
4.工作路径:
/home/fzy/
/disks/workin/fzy/
存储路径:
/disks/backup/fzy/
5.提交任务:
nohup 完整命令行 &
/home/guo/tool/tasseladmin-tassel-5-standalone-5100767ae9e7/run_pipeline.pl -fork1 -h All_Merged_1.25M_MAF0.05.hmp -filterAlign -filterAlignMinFreq 0.05 -fork2 -r 2011.txt -fork3 -r 513lines_27229sn
ps_pop_structure_110608.txt -excludeLastTrait -fork4 -k 513lines_27229snps_kinship_110608.txt -combine5 -input1 -input2 -input3 -intersect -combine6
-input5 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel None -export result
/home/guo/tool/tasseladmin-tassel-5-standalone-5100767ae9e7/run_pipeline.pl -Xms10G -Xmx10G -fork1 -h All_Merged_1.25M_MAF0.05.hmp -filterAlign -filterAlignMinFreq 0.05 -fork2 -r 2011.txt -fork3 -r 513lines_27229snps_pop_structure_110608.txt -excludeLastTrait -fork4 -k 513lines_27229snps_kinship_110608.txt -combine5 -input1 -input2 -input3 -intersect -combine6 -input5 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel None -export result
/home/fzy/tassel3.0_standalone_110430/tassel3.0_standalone/run_pipeline.pl -Xms10G -Xmx10G -fork1 -h All_Merged_1.25M_MAF0.05.hmp -filterAlign -filterAlignMinFreq 0.05 -fork2 -r 2011.txt -fork3 -r 513lines_27229snps_pop_structure_110608.txt -excludeLastTrait -fork4 -k 513lines_27229snps_kinship_110608.txt -combine5 -input1 -input2 -input3 -intersect -combine6 -input5 -input4 -mlm -mlmVarCompEst P3D -mlmCompressionLevel None -export result
GAPTI使用方法
以下内容为R语言,直接保存为test.R
在终端运行,Rscript test.R
即可。
#!/path/to/Rscript
#Author:Frank Chai
#Email:chaimol@163.com
#this is a code for GWAS analysis.use packages is GAPIT
#参考网址http://blog.sina.com.cn/s/blog_83f77c940102wg16.html
#安装包
source("http://www.bioconductor.org/biocLite.R")
#options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("multtest",destdir = "~/disk/GWAS/R",lib="~/disk/GWAS/GAPIT/")
#biocLite("ggbio")
install.packages("gplots",destdir = "~/disk/GWAS/R",lib="~/disk/GWAS/GAPIT/")
install.packages("LDheatmap",destdir = "~/disk/GWAS/R",lib="~/disk/GWAS/GAPIT/")
install.packages("genetics",destdir = "~/disk/GWAS/R",lib="~/disk/GWAS/GAPIT/")
install.packages("ape",destdir = "~/disk/GWAS/R",lib="~/disk/GWAS/GAPIT/")
install.packages("EMMREML",destdir = "~/disk/GWAS/R",lib="~/disk/GWAS/GAPIT/")
install.packages("scatterplot3d",destdir = "~/disk/GWAS/R",lib="~/disk/GWAS/GAPIT/")
#加载包
library(multtest)
library(gplots)
library(LDheatmap)
library(genetics)
library(ape)
library(EMMREML)
library(compiler)
source("http://zzlab.net/GAPIT/gapit_functions.txt")
source("http://zzlab.net/GAPIT/emma.txt")
setwd("~/disk/GWAS/test/")
myG <- read.table("mdp_traits_validation.txt",head=TRUE)
myY <- read.table("mdp_genotype_test.hmp.txt",head=TRUE)
myGAPIT <- GAPIT(Y=myY,G=myG,kinship.cluster=c("average","complete","ward"),kinship.group=c("Mean","Max"),SNP.MAF=0,SNP.FDR=1,PCA.total=3,Model.selection=TRUE)