Compute the Length of Encoded Attribute Values

For accessions in a collection compute the Length of Encoded Attribute Values (LEAV) information measure from qualitative and quantitative trait data (Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003) .

Usage

LEAV(
  data,
  names,
  quantitative = NULL,
  qualitative = NULL,
  freq,
  adj = TRUE,
  mean,
  sd,
  e
)

Arguments

data: The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
names: Name of column with the individual names as a character string.
quantitative: Name of columns with the quantitative traits as a character vector.
qualitative: Name of columns with the qualitative traits as a character vector.
freq: A named list with the target absolute frequencies of the descriptor states for each qualitative trait specified in qualitative. The list names should be same as qualitative.
adj: logical. If TRUE, the proportion estimates are slightly biased to include zero frequency descriptor states in the computation (See Details). Default is TRUE.
mean: A named vector of target means for each quantitative trait specified in quantitative. The list names should be same as quantitative.
sd: A named vector of target standard deviation for each quantitative trait specified in quantitative. The list names should be same as quantitative.
e: A named vector of least count of measurement for each quantitative trait specified in quantitative. The list names should be same as quantitative.

Value

A data frame with

Details

For each accession \(s\) in the collection, the message length \(F_{s}\) to optimally encode all the \(d\) traits/descriptors is computed as follows using the joint density distribution of the whole collection.

\[F_{s} = l_{t} + \sum_{i=1}^{p} c_{m_{s},d_{i},t} + \sum_{j=1}^{q} c_{x_{s},d_{j},t}\]

Here, the first expression \(l_{t}\) is the message length for the subset \(t\) to which an accession belongs when there are \(N\) accessions in the whole collection and \(n_{t}\) accessions in the subset \(t\).

\[l_{t} = l_{t} = - \ln{\left ( \frac{N}{n_{t}} \right )}\]

Similarly \(\sum_{i=1}^{p} c_{m_{s},d_{i},t}\) is sum of the optimum message length for \(p\) qualitative traits, \(\sum_{j=1}^{q} c_{x_{s},d_{j},t}\) is sum of the optimum optimum message length for \(q\) quantitative traits. See inflen.qual and inflen.quant for more details.

References

Balakrishnan R, Nair NV (2003). “Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.” Plant Genetic Resources Newsletter, 33–41.

Balakrishnan R, Suresh KK (2001). “Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.” Indian Journal of Plant Genetic Resources, 14(1), 32–42.

Balakrishnan R, Suresh KK (2001). “Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.” Indian Journal of Plant Genetic Resources, 14(3), 342–349.

Wallace CS, Boulton DM (1968). “An information measure for classification.” The Computer Journal, 11(2), 185–194.