r-cran-tokenizers binary package in Ubuntu Jammy ppc64el
Convert natural language text into tokens. Includes tokenizers for
shingled n-grams, skip n-grams, words, word stems, sentences,
paragraphs, characters, shingled characters, lines, tweets, Penn
Treebank, regular expressions, as well as functions for counting
characters, words, and sentences, and a function for splitting longer
texts into separate documents, each with the same number of words.
The tokenizers have a consistent interface, and the package is built
on the 'stringi' and 'Rcpp' packages for fast yet correct
tokenization in 'UTF-8'.
Publishing history
Date | Status | Target | Component | Section | Priority | Phased updates | Version | ||
---|---|---|---|---|---|---|---|---|---|
2021-11-10 13:13:56 UTC | Published | Ubuntu Jammy ppc64el | release | universe | gnu-r | Optional | 0.2.1-3 | ||
|
|||||||||
Deleted | Ubuntu Jammy ppc64el | proposed | universe | gnu-r | Optional | 0.2.1-3 | |||
|