The function inflen.qual
computes the length of information code that
can indicate the possession of a descriptor state of a qualitative trait
(Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003)
.
Arguments
- x
Data of a qualitative trait for accessions in a collection as a vector of type factor.
- freq
The target absolute frequencies of the descriptor states of the qualitative trait
x
in the subset of all accessions.- adj
logical. If
TRUE
, the proportion estimates are slightly biased to include zero frequency descriptor states in the computation (See Details). Default isTRUE
.
Details
For each qualitative trait/descriptor \(d\) the probability of occurrence of a descriptor state \(m\) in the in a subset \(t\) is estimated as
\[p_{m,d,t} = \frac{n_{m,d,t} + 1}{n_{d,t} + M_{d}}\]
Where, \(n_{m,d,t}\) is the number of accessions with \(m\) state of trait \(d\) in subset \(t\), \(n_{d,t}\) is the number of accessions with any known state of trait \(d\) in subset \(t\),i.e. the number of accessions in subset \(t\) and \(M_{d}\) is the number of descriptor states of trait \(d\).
This is a slightly biased estimate to include zero frequency descriptor states in the computation. The actual estimate is
\[p_{m,d,t} = \frac{n_{m,d,t}}{n_{d,t}}\]
Now the length of the information code that can optimally indicate the possession of descriptor state \(m\) of trait \(d\) in the subset \(t\) is computed as
\[c_{m,d,t} = -\ln p_{m,d,t}\]
References
Balakrishnan R, Nair NV (2003).
“Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.”
Plant Genetic Resources Newsletter, 33–41.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.”
Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.”
Indian Journal of Plant Genetic Resources, 14(3), 342–349.
Wallace CS, Boulton DM (1968).
“An information measure for classification.”
The Computer Journal, 11(2), 185–194.
Examples
suppressPackageStartupMessages(library(EvaluateCore))
library(EvaluateCore)
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
# Data of 'Colour of unexpanded apical leaves' qualitative trait
CUAL <- as.factor(cassava_EC$CUAL)
# Get frequencies based on sample size
prop <- prop.adj(CUAL, method = "sqrt")
size.prop <- 0.2
size.count <- ceiling(size.prop * length(CUAL))
CUALfreq <- round(prop * size.count)
# Compute information length
CUALinflen <- inflen.qual(x = CUAL, freq = CUALfreq, adj = TRUE)
head(CUALinflen)
#> x inflen
#> 1 Dark green 1.452784
#> 2 Light green 2.400824
#> 3 Dark green 1.452784
#> 4 Dark green 1.452784
#> 5 Dark green 1.452784
#> 6 Dark green 1.452784