Training Set Determination for Genomic Selection
Jen-Hsiang Ou and Chen-Tuo Liao
A new optimality criterion is proposed to determine a training set for genomic selection, which is derived from Pearson's correlation between GEBVs and phenotypic values of a test set. R functions are provided to generate the optimal training set.
- Version: 1.0
- Date: 2019-03-06
- Author: Jen-Hsiang Ou and Chen-Tuo Liao
- License: GPL (>=3)
- Description: Determining training set for genomic selection using a genetic algorithm (Holland J.H. (1975) <10.1145>) or simple exchange algorithm (change an individual every iteration). Three different criteria are used in both algorithms, which are r-score (J.H. Ou, C.T. Liao (2018)), PEV-score (D. Akdemir et al. (2015)) and CD-score (D. Laloe (1993)). Phenotypic data for candidate set is not necessary for all these methods. By using it, one may readily determine a training set that can be expected to provide a better training set comparing to random sampling.10.1145>
** Rtools should be installed for Windows users.
Example Data: 44k rice genome data
44K rice genome data was first published by Zhao et at. (2011) which provide 44,100 SNP variants across 413 diverse associations of Oriza sativa. The original data set can be downloaded at "Rice Diversity" website.
- Change stoppint rules. (Algorithm stopped when the index computed by the used criteria doesn't improved in the last half of iterations.)
- First stable version
- r-score, PEV, and CD criteria are all included