Binary package “r-cran-tokenizers” in ubuntu lunar
GNU R fast, consistent tokenization of natural language text
Convert natural language text into tokens. Includes tokenizers for
shingled n-grams, skip n-grams, words, word stems, sentences,
paragraphs, characters, shingled characters, lines, tweets, Penn
Treebank, regular expressions, as well as functions for counting
characters, words, and sentences, and a function for splitting longer
texts into separate documents, each with the same number of words.
The tokenizers have a consistent interface, and the package is built
on the 'stringi' and 'Rcpp' packages for fast yet correct
tokenization in 'UTF-8'.
Source package
Published versions
- r-cran-tokenizers 0.2.1-3 in amd64 (Release)
- r-cran-tokenizers 0.3.0-1 in amd64 (Proposed)
- r-cran-tokenizers 0.3.0-1 in amd64 (Release)
- r-cran-tokenizers 0.2.1-3 in arm64 (Release)
- r-cran-tokenizers 0.3.0-1 in arm64 (Proposed)
- r-cran-tokenizers 0.3.0-1 in arm64 (Release)
- r-cran-tokenizers 0.2.1-3 in armhf (Release)
- r-cran-tokenizers 0.3.0-1 in armhf (Proposed)
- r-cran-tokenizers 0.3.0-1 in armhf (Release)
- r-cran-tokenizers 0.2.1-3 in ppc64el (Release)
- r-cran-tokenizers 0.3.0-1 in ppc64el (Proposed)
- r-cran-tokenizers 0.3.0-1 in ppc64el (Release)
- r-cran-tokenizers 0.2.1-3 in riscv64 (Release)
- r-cran-tokenizers 0.3.0-1 in riscv64 (Proposed)
- r-cran-tokenizers 0.3.0-1 in riscv64 (Release)
- r-cran-tokenizers 0.2.1-3 in s390x (Release)
- r-cran-tokenizers 0.3.0-1 in s390x (Proposed)
- r-cran-tokenizers 0.3.0-1 in s390x (Release)