Allocation of Entries to be Selected from Clusters/Groups based on Size for Core Collection Development
Source:R/allocate.basic.R
allocate.basic.RdEstimate the number of entries to be allocated from each cluster/group in the entire collection to construct a core collection on the basis of cluster/group size. The following strategies are implemented.
Constant
Proportional
Logarithmic
Square root
The different methods to determine the number of entries from each group or
clusters implemented in allocate.basic are as follows.
Arguments
- data
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
- names
Name of column with the accession names as a character string.
- group
Name of column with the accession group/cluster names as a character string.
- method
The allocation method. Either
"const"for constant or"prop"for proportional or"log"for logarithmic or"sqrt"for square root allocation.- log.base
The logarithm base to be used for logarithmic method of sampling. Default is
exp(1).- size
The desired core set size proportion.
Value
A named numeric vector specifying the number of entries to be
selected from each cluster/group. The vector names correspond to the levels
of the ""group" column, and values indicate the number of elements
to be selected from each level.
Details
These are different methods which estimate the number of entries only on the basis of total number of entries in each cluster/group.
Brown (1989) proposed the constant (C), proportional (P) and logarithmic (L) methods and later a similar square root method was proposed by Huaman et al. (1999) .
Constant method
From an entire collection of size \(N\), to construct a core set of sample size \(n\), the number of entries to be selected from the \(i\)th group among \(1 \cdots g\) groups (\(n_{i}\)) is estimated as below.
\[n_{i} = \frac{n}{g} \times N\]
Proportional method
Here the number of entries to be selected is proportional to the cluster/group size (\(N_{i}\)) as below.
\[n_{i} = n \times \frac{N_{i}}{\sum_{i=1}^{g}N_{i}}\]
\[n_{i} = n \times \frac{N_{i}}{N}\]
References
Brown AHD (1989).
“Core collections: A practical approach to genetic resources management.”
Genome, 31(2), 818–824.
Huaman Z, Aguilar C, Ortiz R (1999).
“Selecting a Peruvian sweetpotato core collection on the basis of morphological, eco-geographical, and disease and pest reaction data:.”
Theoretical and Applied Genetics, 98(5), 840–844.
Examples
# Get data
data("cassava_EC_gp")
set.seed(123)
cassava_EC_gp <- cassava_EC_gp[sample(1:nrow(cassava_EC_gp), 500), ]
data <- cassava_EC_gp
data <- cbind(genotypes = rownames(cassava_EC_gp), cassava_EC_gp)
row.names(data) <- NULL
# Constant allocation
const_out <-
allocate.basic(data = data, names = "genotypes",
group = "Cluster", method = "const",
size = 0.2)
const_out
#> I II III IV V VI
#> 17 17 17 17 17 17
# Proportional allocation
prop_out <-
allocate.basic(data = data, names = "genotypes",
group = "Cluster", method = "prop",
size = 0.2)
prop_out
#> I II III IV V VI
#> 17 11 11 27 23 9
# Logarithmic allocation
log_out <-
allocate.basic(data = data, names = "genotypes",
group = "Cluster", method = "log",
size = 0.2)
log_out
#> I II III IV V VI
#> 17 16 15 19 18 15
# Square root allocation
sqrt_out <-
allocate.basic(data = data, names = "genotypes",
group = "Cluster", method = "sqrt",
size = 0.2)
sqrt_out
#> I II III IV V VI
#> 17 14 14 22 20 13