Data pre-processing

DataClean()

Clean PGR passport data

MergeKW() MergePrefix() MergeSuffix()

Merge keyword strings

Generation of KWIC Index

KWIC()

Create a KWIC index

Probable duplicate set retrieval

ProbDup()

Identify probable duplicates of accessions

Set review, modification & validation

DisProbDup()

Get disjoint probable duplicate sets

ReviewProbDup()

Retrieve probable duplicate set information from PGR passport database for review

ReconstructProbDup()

Reconstruct an object of class ProbDup

Adjuncts/Helper functions

ValidatePrimKey()

Validate if a data frame column confirms to primary key/ID constraints

DoubleMetaphone()

'Double Metaphone' phonetic algorithm

read.genesys()

Convert 'Darwin Core - Germplasm' zip archive to a flat file

KWCounts()

Generate keyword counts

ParseProbDup()

Parse an object of class ProbDup to a data frame.

SplitProbDup()

Split an object of class ProbDup

MergeProbDup()

Merge two objects of class ProbDup

AddProbDup()

Add probable duplicate sets fields to the PGR passport database

ViewProbDup()

Visualize the probable duplicate sets retrieved in a ProbDup object

print(<KWIC>)

Prints summary of KWIC object.

print(<ProbDup>)

Prints summary of ProbDup object.

Dataset

GN1000

Sample groundnut PGR passport data