libTLK  1.3.1
Data Structures | Functions
Language model
Models

Data Structures

struct  tLLM
 Language model. More...

Functions

 tL_lm_new_from_arpa_file (gzFile from, tLDict *vocab, const tLBool emptyvocab, const char *begin_sym, const char *end_sym, const char *ninf, char **err)
 Reads a language model from an ARPA file.
 tL_arpa2tlkformat (gzFile from, FILE *to, const char *begin_sym, const char *end_sym, const char *ninf, const tLBool binary, char **err)
 Reads a language model from an ARPA file and writes it in TLK format.
 tL_lm_free (tLLM *lm)
 Frees memory.
 tL_lm_print (const tLLM *lm, FILE *to, const tLDict *words, const tLBool binary)
 Prints the language model.
 tL_lm_set_gsf (tLLM *lm, const tLFloat gsf)
 Sets the grammar scale factor.
 tL_lm_set_wip (tLLM *lm, const tLProb wip)
 Sets the word insertion penalty.
 tL_lm_new_from_file (gzFile from, tLLexicon *lexicon, const tLBool emptylex, const tLDict *syms, char **err)
 Reads a language model from a file.
 tL_lm_new_from_wordnet_file (gzFile from, tLDict *vocab, const tLBool emptyvocab, const char *begin_sym, const char *end_sym, char **err)
 Reads a language model from a Wordnet file.

Function Documentation

tL_arpa2tlkformat ( gzFile  from,
FILE *  to,
const char *  begin_sym,
const char *  end_sym,
const char *  ninf,
const tLBool  binary,
char **  err 
)

Reads a language model from an ARPA file and writes it in TLK format.

This function has similar effects to calling first 'tL_lm_new_from_arpa_file' and then 'tL_lm_print', but less memory is used.

Parameters:
fromFile where the text description is stored.
toFile to which the model will be written.
begin_symThe token for the special initial word. Can be NULL.
end_symThe token for the special final word.
ninfThe token representing -INF.
binaryFlag to indicate if binary output is desired.
errPointer to string variable. If not NULL, an error message is allocated in the variable in case of error.
Returns:
0 on success, or -1 in case of error.
tL_lm_free ( tLLM lm)

Frees memory.

Frees the memory allocated for the language model.

Parameters:
lmThe language model.
tL_lm_new_from_arpa_file ( gzFile  from,
tLDict vocab,
const tLBool  emptyvocab,
const char *  begin_sym,
const char *  end_sym,
const char *  ninf,
char **  err 
)

Reads a language model from an ARPA file.

This function creates a new language model from a text description in ARPA format stored in the file provided. A tLDict is provided to register the words. If the vocabulary is empty, new words are registered; else, an out-of-vocabulary word is treated as an error. The tokens used to represent the special final word and -INF are required. The token for the special initial word is optional. The token used for the final word can also be used for the initial word.

Parameters:
fromFile where the text description is stored.
vocabThe dictionary where the words are registered.
emptyvocabSpecifies whether the provided dictionary is empty (new words must be registered) or not (new words are treated as errors).
begin_symThe token for the special initial word. Can be NULL.
end_symThe token for the special final word.
ninfThe token representing -INF.
errPointer to string variable. If not NULL, an error message is allocated in the variable in case of error.
Returns:
The language model, or NULL in case of error.
tL_lm_new_from_file ( gzFile  from,
tLLexicon lexicon,
const tLBool  emptylex,
const tLDict syms,
char **  err 
)

Reads a language model from a file.

This function creates a new language model from a text description stored in the file provided. The language model is assumed to be in the same format as the one generated by tL_lm_print. A tLLexicon is provided to register the words. If the lexicon is empty, new words are registered in the lexicon; else, an out-of-vocabulary word is treated as an error. If a dictionary is provided, new registered words are considered to be coded in UTF-8 and split into characters, an out-of-dictionary character being an error. Otherwise, new words are registered whit an empty sequence of symbols. Hence, in this case the resulting tLLexicon is a badly formed tLLexicon; however, it can be used to print the language model.

Parameters:
fromFile where the text description is stored.
lexiconThe lexicon where words are registered.
emptylexSpecifies if the provided lexicon is empty (new words must be registered) or not (new words are treated as errors).
symsA character dictionary used to register new words. If not NULL, out-of-vocabulary characters are treated as errors; if NULL, characters are ignored.
errPointer to string variable. If not NULL, an error message is allocated in the variable in case of error.
Returns:
The language model, or NULL in case of error.
tL_lm_new_from_wordnet_file ( gzFile  from,
tLDict vocab,
const tLBool  emptyvocab,
const char *  begin_sym,
const char *  end_sym,
char **  err 
)

Reads a language model from a Wordnet file.

This function creates a new language model from a text description in Wordnet format stored in the file provided. A tLDict is provided to register the words. If the vocabulary is empty, new words are registered; else, an out-of-vocabulary word is treated as an error. The tokens used to represent the special initial and final words are required. The token used for the final word can also be used for the initial word.

Parameters:
fromFile where the text description is stored.
vocabThe dictionary where the words are registered.
emptyvocabSpecifies if the provided dictionary is empty (new words must be registered) or not (new words are treated as errors).
begin_symThe token for the special initial word.
end_symThe token for the special final word.
errPointer to string variable. If not NULL, an error message is allocated in the variable in case of error.
Returns:
The language model, or NULL in case of error.
tL_lm_print ( const tLLM lm,
FILE *  to,
const tLDict words,
const tLBool  binary 
)

Prints the language model.

This function writes to the given file the content of the language model, using the provided word dictionary. The provided word dictionary is assumed to contain all needed words, otherwise an unexpected error could happen. The content is not written using the ARPA format or Wordnet format; instead, a private text representation is used.

Parameters:
lmThe language model.
toFile to which the model will be written.
wordsDictionary containing the words.
binaryFlag to indicate if binary output is desired.
tL_lm_set_gsf ( tLLM lm,
const tLFloat  gsf 
)

Sets the grammar scale factor.

This function modifies the given language model by applying the given grammar scale factor. Note that if a previous grammar scale factor (gsf0) was applied, the resulting model will have a grammar scale factor of gsf0*gsf.

Parameters:
lmThe language model to modify.
gsfGrammar scale factor.
tL_lm_set_wip ( tLLM lm,
const tLProb  wip 
)

Sets the word insertion penalty.

This function modifies the given language model by applying the word insertion penalty. Note that if a previous word insertion penalty (wip0) was applied, the resulting model will have a word insertion penalty of wip0+wip.

Parameters:
lmThe language model to modify.
wipWord insertion penalty.
 All Data Structures Variables