indicngram
An n-gram generator for indic languages
What is Ngram?
An n-gram model is a type of probabilistic model for predicting the
next item in a sequence. n-grams are used in various areas of
statistical natural language processing and genetic sequence analysis.
An n-gram is a subsequence of n items from a given sequence. The
items in question can be phonemes, syllables, letters, words or base
pairs according to the application.
An n-gram of size 1 is referred to as a “unigram”; size 2 is a
“bigram” (or, less commonly, a “digram”); size 3 is a “trigram”; and
size 4 or more is simply called an “n-gram”.
API reference
-
class indicngram.core.Ngram[source]
Ngram class.You need to create an object to use the function
-
get_info()[source]
returns info on the module
-
get_module_name()[source]
returns the module’s name
-
letterNgram(word, window_size=2)[source]
Parameters: |
- word (str.) – The word to be split into ngrams.
- window_size (int.) – window size to be used while making the ngrams.
|
Returns: | list of ngrams.
|
-
syllableNgram(text, window_size=2)[source]
Parameters: |
- text – The text to be split into ngrams.
- window_size (int.) – window size to be used while making the ngrams.
|
Returns: | list of syllable ngrams.
|
-
wordNgram(text, window_size=2)[source]
Parameters: |
- text – The text to be split into ngrams.
- window_size (int.) – window size to be used while making the ngrams.
|
Returns: | list of word ngrams.
|