R/percentdiff.evaluate.core.R
percentdiff.evaluate.core.Rd
Compute the following differences between the entire collection (EC) and core set (CS).
Percentage of significant differences of mean (\(MD\%_{Hu}\)) (Hu et al. 2000)
Percentage of significant differences of variance (\(VD\%_{Hu}\)) (Hu et al. 2000)
Average of absolute differences between means (\(MD\%_{Kim}\)) (Kim et al. 2007)
Average of absolute differences between variances (\(VD\%_{Kim}\)) (Kim et al. 2007)
Percentage difference between the mean squared Euclidean distance among accessions (\(\overline{d}D\%\)) (Studnicki et al. 2013)
Percentage of range ratios smaller than 0.70 (\(S_{RR_{0.7}}\)) (Diwan et al. 1995)
percentdiff.evaluate.core(
data,
names,
quantitative,
selected,
alpha = 0.05,
rr.crit = 0.7
)
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
Name of column with the individual names as a character string
Name of columns with the quantitative traits as a character vector.
Character vector with the names of individuals selected in
core collection and present in the names
column.
Type I error probability (Significance level) of difference.
The critical value of range ratio considered to be acceptable for a representative CS. The default value is 0.7.
A data frame with the values of
\(MD\%_{Hu}\),
\(VD\%_{Hu}\),
\(MD\%_{Kim}\),
\(VD\%_{Kim}\) and
\(\overline{d}D\%\).
The differences are computed as follows.
\[MD\%_{Hu} = \left ( \frac{S_{t}}{n} \right ) \times 100\]
Where, \(S_{t}\) is the number of traits with a significant difference between the means of the EC and the CS and \(n\) is the total number of traits. A representative core should have \(MD\%_{Hu}\) < 20 % and \(CR\) > 80 % (Hu et al. 2000) .
\[VD\%_{Hu} = \left ( \frac{S_{F}}{n} \right ) \times 100\]
Where, \(S_{F}\) is the number of traits with a significant difference between the variances of the EC and the CS and \(n\) is the total number of traits. Larger \(VD\%_{Hu}\) value indicates a more diverse core set.
\[MD\%_{Kim} = \left ( \frac{1}{n}\sum_{i=1}^{n} \frac{\left | M_{EC_{i}}-M_{CS_{i}} \right |}{M_{CS_{i}}} \right ) \times 100\]
Where, \(M_{EC_{i}}\) is the mean of the EC for the \(i\)th trait, \(M_{CS_{i}}\) is the mean of the CS for the \(i\)th trait and \(n\) is the total number of traits.
\[VD\%_{Kim} = \left ( \frac{1}{n}\sum_{i=1}^{n} \frac{\left | V_{EC_{i}}-V_{CS_{i}} \right |}{V_{CS_{i}}} \right ) \times 100\]
Where, \(V_{EC_{i}}\) is the variance of the EC for the \(i\)th trait, \(V_{CS_{i}}\) is the variance of the CS for the \(i\)th trait and \(n\) is the total number of traits.
\[\overline{d}D\% = \frac{\overline{d}_{CS}-\overline{d}_{EC}}{\overline{d}_{EC}} \times 100\]
Where, \(\overline{d}_{CS}\) is the mean squared Euclidean distance among accessions in the CS and \(\overline{d}_{EC}\) is the mean squared Euclidean distance among accessions in the EC.
Percentage of range ratios smaller than 0.70 (Diwan et al. 1995) is computed as follows.
\[RR\%_{0.7} = \left ( \frac{S_{RR_{0.7}}}{n} \right ) \times 100\]
Where, \(S_{RR_{0.7}}\) is the number of traits with a range ratio smaller than 0.7 (\(\frac{R_{CS_{i}}}{R_{EC_{i}}} < 0.7\)) \(R_{CS_{i}}\) is the range of the \(i\)th trait in the CS, \(R_{EC_{i}}\) is the range of the \(i\)th trait in the EC and \(n\) is the total number of traits.
Diwan N, McIntosh MS, Bauchan GR (1995).
“Methods of developing a core collection of annual Medicago species.”
Theoretical and Applied Genetics, 90(6), 755--761.
Hu J, Zhu J, Xu HM (2000).
“Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops.”
Theoretical and Applied Genetics, 101(1), 264--268.
Kim K, Chung H, Cho G, Ma K, Chandrabalan D, Gwag J, Kim T, Cho E, Park Y (2007).
“PowerCore: A program applying the advanced M strategy with a heuristic search for establishing core sets.”
Bioinformatics, 23(16), 2155--2162.
Studnicki M, Madry W, Schmidt J (2013).
“Comparing the efficiency of sampling strategies to establish a representative in the phenotypic-based genetic diversity core collection of orchardgrass (Dactylis glomerata L.).”
Czech Journal of Genetics and Plant Breeding, 49(1), 36--47.
data("cassava_CC")
data("cassava_EC")
ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL
core <- rownames(cassava_CC)
quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
"ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
"ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
"PSTR")
ec[, qual] <- lapply(ec[, qual],
function(x) factor(as.factor(x)))
percentdiff.evaluate.core(data = ec, names = "genotypes",
quantitative = quant, selected = core)
#> MDPercent_Hu VDPercent_Hu MDPercent_Kim VDPercent_Kim DDPercent RR
#> 1 50 80 13.02737 41.64331 18.2052 20