数据处理
VCF转为 rrBLUP {-1,0,1} 格式
rrBLUP可识别的基因型格式为 {-1,0,1} (行头为marker,列为sample),因此需要对基本数据处理转换;
编码G矩阵计算时, 有不同的编码形式,如下:
- 0,1,2; 即AA是0, 表示major基因, 1 表示杂合, 2表示aa(minor).
- -1, 0, 1; 即-1是AA, 表示major基因型, 0表示杂合, 1表示aa(minor).
## vcftools 生成{ 0,1,2} 矩阵
vcftools --vcf test.genotypes_no_missing_IDs.vcf --012 --out snp_matrix
- –012
This option outputs the genotypes as a large matrix. Three files are produced. The first, with suffix “.012”, contains the genotypes of each individual on a separate line. Genotypes are represented as 0, 1 and 2, where the number represent that number of non-reference alleles. Missing genotypes are represented by -1. The second file, with suffix “.012.indv” details the individuals included in the main file. The third file, with suffix “.012.pos” details the site locations included in the main file.
##R
data snp.txt
文件输入
示例文件:
traits.txt: https://pbgworks.org/sites/pbgworks.org/files/traits.txt
snp.txt: https://pbgworks.org/sites/pbgworks.org/files/snp.txt
Pheno
数据过滤和填充
impute = A.mat(Markers,max.missing=0.5,impute.method="mean",return.imputed=T)#按50%缺失值过滤,并按均值填充
Markers_impute2 = impute$imputed
简单交叉验证
traits=1
cycles=300
accuracy = matrix(nrow=cycles, ncol=traits)
for(r in 1:cycles){
train= as.matrix(sample(1:207, 180))
test

多性状自动化计算
资料:
Introduction to Genomic Selection in R using the rrBLUP Package
【GS专栏】8-全基因组选择实战之RRBLUP