indicngram

An n-gram generator for indic languages

What is Ngram?

An n-gram model is a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis.

An n-gram is a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application.

An n-gram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”; and size 4 or more is simply called an “n-gram”.

API reference

class indicngram.core.Ngram[source]

Ngram class.You need to create an object to use the function

get_info()[source]

returns info on the module

get_module_name()[source]

returns the module’s name

letterNgram(word, window_size=2)[source]
Parameters:
  • word (str.) – The word to be split into ngrams.
  • window_size (int.) – window size to be used while making the ngrams.
Returns:

list of ngrams.

syllableNgram(text, window_size=2)[source]
Parameters:
  • text – The text to be split into ngrams.
  • window_size (int.) – window size to be used while making the ngrams.
Returns:

list of syllable ngrams.

wordNgram(text, window_size=2)[source]
Parameters:
  • text – The text to be split into ngrams.
  • window_size (int.) – window size to be used while making the ngrams.
Returns:

list of word ngrams.

Indices and tables

Table Of Contents

Related Topics

This Page