Implement the NGram model in julia
1 Star
Updated Last
5 Years Ago
Started In
November 2013


Linear interpolation

This implementation uses the linear interpolation to build the model. For example, with a simple trigram model

p("book" | "the", "green") = count("the green book") / count("the green")

But there are some limitations

  • We need a bigger corpus to efficiently train a trigram model compared to bigram or unigram
  • Count(trigram) is often equal to zero
  • With bigram or unigram we don't capture as much information

The idea is then to combine the results of trigram with bigram and unigram. We can generalize by saying that to compute ngram, we also use the results of (n-1)gram, ..., bigram, unigram. Here is an exemple in the case of a trigram model.

p("book" | "the", "green") = a * count("the green book") / count("the green")
                          +  b * count("the green") / count("the")
                          +  c * count("the") / count()
        a + b + c = 1
        a >= 0
        b >= 0
        c >= 0

# For example: a = b = c = 1 / 3


using NGram

texts = String["the green book", "my blue book", "his green house", "book"]

# Train a trigram model on the documents
model = NGramModel(texts, 3)

# Query on the model
# p(book | the, green)
model["the green book"]