The function inflen.quant
computes the length of information code that
can indicate the possession of a specific value by a quantitative trait
(Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003)
.
Details
For each quantitative trait \(d\), it is assumed that it is normally distributed within subset \(t\) with mean \(\mu_{d,t}\) and the standard deviation \(\sigma_{d,t}\) estimated as below.
\[\mu_{d,t} = \frac{\sum x_{d,s}}{n_{d,t}}\]
\[\sigma_{d,t} = \sqrt{\frac{\sum(x_{d,s} - \mu_{d,t})^2}{n_{d,t} - 1}}\]
From this, a distribution normalizing constant \(g_{d,t}\) can be estimated as
\[g_{d,t} = \ln \left ( \frac{\sigma_{d,t}}{K \cdot \varepsilon_{d}} \right )\]
Where \(K = \frac{1}{\sqrt{2\Pi}}\), \(\varepsilon_{d}\) is the least count of measurement of the descriptor \(d\). i.e. \(x\) is measured to an accuracy of \(\pm\varepsilon_{d}\).
The probability of getting a measurement \(x\) from a distribution of mean \(\mu\) and variance \(\sigma\) is approximately as follows.
\[K \cdot \frac{\varepsilon_{d}}{\sigma_{d,t}} \cdot e^{\frac{-(x_{d,s} - \mu_{d,t})^2}{2 \sigma_{d,t}^{2}}}\]
Now the length of the information code that can optimally indicate the possession of a value \(x\) by the trait \(d\) is computed as follows:
\[c_{x,d,t} = g_{d,t} + \frac{(x_{d,s} - \mu_{d,t})^2}{2 \sigma_{d,t}^{2}}\]
References
Balakrishnan R, Nair NV (2003).
“Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.”
Plant Genetic Resources Newsletter, 33–41.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.”
Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.”
Indian Journal of Plant Genetic Resources, 14(3), 342–349.
Wallace CS, Boulton DM (1968).
“An information measure for classification.”
The Computer Journal, 11(2), 185–194.
Examples
suppressPackageStartupMessages(library(EvaluateCore))
library(EvaluateCore)
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
# Data of 'Average plant weight' quantitative trait
AVPW <- cassava_EC$AVPW
# Compute information length
AVPWinflen <- inflen.quant(x = AVPW, mean = 4, sd = 3.25, e = 1)
head(AVPWinflen)
#> x inflen
#> 1 3.0000000 2.144931
#> 2 2.5000000 2.204102
#> 3 0.4000000 2.711085
#> 4 0.3333333 2.734017
#> 5 7.5000000 2.677475
#> 6 5.6000000 2.218777