For accessions in a collection compute the Length of Encoded Attribute Values (LEAV) information measure from qualitative and quantitative trait data (Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003) .
Arguments
- data
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
- names
Name of column with the individual names as a character string.
- quantitative
Name of columns with the quantitative traits as a character vector.
- qualitative
Name of columns with the qualitative traits as a character vector.
- freq
A named list with the target absolute frequencies of the descriptor states for each qualitative trait specified in
qualitative
. The list names should be same asqualitative
.- adj
logical. If
TRUE
, the proportion estimates are slightly biased to include zero frequency descriptor states in the computation (See Details). Default isTRUE
.- mean
A named vector of target means for each quantitative trait specified in
quantitative
. The list names should be same asquantitative
.- sd
A named vector of target standard deviation for each quantitative trait specified in
quantitative
. The list names should be same asquantitative
.- e
A named vector of least count of measurement for each quantitative trait specified in
quantitative
. The list names should be same asquantitative
.
Details
For each accession \(s\) in the collection, the message length \(F_{s}\) to optimally encode all the \(d\) traits/descriptors is computed as follows using the joint density distribution of the whole collection.
\[F_{s} = l_{t} + \sum_{i=1}^{p} c_{m_{s},d_{i},t} + \sum_{j=1}^{q} c_{x_{s},d_{j},t}\]
Here, the first expression \(l_{t}\) is the message length for the subset \(t\) to which an accession belongs when there are \(N\) accessions in the whole collection and \(n_{t}\) accessions in the subset \(t\).
\[l_{t} = l_{t} = - \ln{\left ( \frac{N}{n_{t}} \right )}\]
Similarly \(\sum_{i=1}^{p} c_{m_{s},d_{i},t}\) is sum of the optimum
message length for \(p\) qualitative traits, \(\sum_{j=1}^{q}
c_{x_{s},d_{j},t}\) is sum of the optimum optimum message length for
\(q\) quantitative traits. See inflen.qual
and
inflen.quant
for more details.
References
Balakrishnan R, Nair NV (2003).
“Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.”
Plant Genetic Resources Newsletter, 33–41.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.”
Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.”
Indian Journal of Plant Genetic Resources, 14(3), 342–349.
Wallace CS, Boulton DM (1968).
“An information measure for classification.”
The Computer Journal, 11(2), 185–194.