libTLK
1.3.1
|
Data Structures | |
struct | tLLM |
Language model. More... | |
Functions | |
tL_lm_new_from_arpa_file (gzFile from, tLDict *vocab, const tLBool emptyvocab, const char *begin_sym, const char *end_sym, const char *ninf, char **err) | |
Reads a language model from an ARPA file. | |
tL_arpa2tlkformat (gzFile from, FILE *to, const char *begin_sym, const char *end_sym, const char *ninf, const tLBool binary, char **err) | |
Reads a language model from an ARPA file and writes it in TLK format. | |
tL_lm_free (tLLM *lm) | |
Frees memory. | |
tL_lm_print (const tLLM *lm, FILE *to, const tLDict *words, const tLBool binary) | |
Prints the language model. | |
tL_lm_set_gsf (tLLM *lm, const tLFloat gsf) | |
Sets the grammar scale factor. | |
tL_lm_set_wip (tLLM *lm, const tLProb wip) | |
Sets the word insertion penalty. | |
tL_lm_new_from_file (gzFile from, tLLexicon *lexicon, const tLBool emptylex, const tLDict *syms, char **err) | |
Reads a language model from a file. | |
tL_lm_new_from_wordnet_file (gzFile from, tLDict *vocab, const tLBool emptyvocab, const char *begin_sym, const char *end_sym, char **err) | |
Reads a language model from a Wordnet file. |
tL_arpa2tlkformat | ( | gzFile | from, |
FILE * | to, | ||
const char * | begin_sym, | ||
const char * | end_sym, | ||
const char * | ninf, | ||
const tLBool | binary, | ||
char ** | err | ||
) |
Reads a language model from an ARPA file and writes it in TLK format.
This function has similar effects to calling first 'tL_lm_new_from_arpa_file' and then 'tL_lm_print', but less memory is used.
from | File where the text description is stored. |
to | File to which the model will be written. |
begin_sym | The token for the special initial word. Can be NULL. |
end_sym | The token for the special final word. |
ninf | The token representing -INF. |
binary | Flag to indicate if binary output is desired. |
err | Pointer to string variable. If not NULL, an error message is allocated in the variable in case of error. |
tL_lm_free | ( | tLLM * | lm | ) |
Frees memory.
Frees the memory allocated for the language model.
lm | The language model. |
tL_lm_new_from_arpa_file | ( | gzFile | from, |
tLDict * | vocab, | ||
const tLBool | emptyvocab, | ||
const char * | begin_sym, | ||
const char * | end_sym, | ||
const char * | ninf, | ||
char ** | err | ||
) |
Reads a language model from an ARPA file.
This function creates a new language model from a text description in ARPA format stored in the file provided. A tLDict is provided to register the words. If the vocabulary is empty, new words are registered; else, an out-of-vocabulary word is treated as an error. The tokens used to represent the special final word and -INF are required. The token for the special initial word is optional. The token used for the final word can also be used for the initial word.
from | File where the text description is stored. |
vocab | The dictionary where the words are registered. |
emptyvocab | Specifies whether the provided dictionary is empty (new words must be registered) or not (new words are treated as errors). |
begin_sym | The token for the special initial word. Can be NULL. |
end_sym | The token for the special final word. |
ninf | The token representing -INF. |
err | Pointer to string variable. If not NULL, an error message is allocated in the variable in case of error. |
tL_lm_new_from_file | ( | gzFile | from, |
tLLexicon * | lexicon, | ||
const tLBool | emptylex, | ||
const tLDict * | syms, | ||
char ** | err | ||
) |
Reads a language model from a file.
This function creates a new language model from a text description stored in the file provided. The language model is assumed to be in the same format as the one generated by tL_lm_print. A tLLexicon is provided to register the words. If the lexicon is empty, new words are registered in the lexicon; else, an out-of-vocabulary word is treated as an error. If a dictionary is provided, new registered words are considered to be coded in UTF-8 and split into characters, an out-of-dictionary character being an error. Otherwise, new words are registered whit an empty sequence of symbols. Hence, in this case the resulting tLLexicon is a badly formed tLLexicon; however, it can be used to print the language model.
from | File where the text description is stored. |
lexicon | The lexicon where words are registered. |
emptylex | Specifies if the provided lexicon is empty (new words must be registered) or not (new words are treated as errors). |
syms | A character dictionary used to register new words. If not NULL, out-of-vocabulary characters are treated as errors; if NULL, characters are ignored. |
err | Pointer to string variable. If not NULL, an error message is allocated in the variable in case of error. |
tL_lm_new_from_wordnet_file | ( | gzFile | from, |
tLDict * | vocab, | ||
const tLBool | emptyvocab, | ||
const char * | begin_sym, | ||
const char * | end_sym, | ||
char ** | err | ||
) |
Reads a language model from a Wordnet file.
This function creates a new language model from a text description in Wordnet format stored in the file provided. A tLDict is provided to register the words. If the vocabulary is empty, new words are registered; else, an out-of-vocabulary word is treated as an error. The tokens used to represent the special initial and final words are required. The token used for the final word can also be used for the initial word.
from | File where the text description is stored. |
vocab | The dictionary where the words are registered. |
emptyvocab | Specifies if the provided dictionary is empty (new words must be registered) or not (new words are treated as errors). |
begin_sym | The token for the special initial word. |
end_sym | The token for the special final word. |
err | Pointer to string variable. If not NULL, an error message is allocated in the variable in case of error. |
tL_lm_print | ( | const tLLM * | lm, |
FILE * | to, | ||
const tLDict * | words, | ||
const tLBool | binary | ||
) |
Prints the language model.
This function writes to the given file the content of the language model, using the provided word dictionary. The provided word dictionary is assumed to contain all needed words, otherwise an unexpected error could happen. The content is not written using the ARPA format or Wordnet format; instead, a private text representation is used.
lm | The language model. |
to | File to which the model will be written. |
words | Dictionary containing the words. |
binary | Flag to indicate if binary output is desired. |
tL_lm_set_gsf | ( | tLLM * | lm, |
const tLFloat | gsf | ||
) |
Sets the grammar scale factor.
This function modifies the given language model by applying the given grammar scale factor. Note that if a previous grammar scale factor (gsf0) was applied, the resulting model will have a grammar scale factor of gsf0*gsf.
lm | The language model to modify. |
gsf | Grammar scale factor. |
tL_lm_set_wip | ( | tLLM * | lm, |
const tLProb | wip | ||
) |
Sets the word insertion penalty.
This function modifies the given language model by applying the word insertion penalty. Note that if a previous word insertion penalty (wip0) was applied, the resulting model will have a word insertion penalty of wip0+wip.
lm | The language model to modify. |
wip | Word insertion penalty. |