hultig.sumo
Class HNgram

java.lang.Object
  extended by java.util.Dictionary<K,V>
      extended by java.util.Hashtable<String,Integer>
          extended by hultig.sumo.HNgram
All Implemented Interfaces:
Serializable, Cloneable, Map<String,Integer>

public class HNgram
extends Hashtable<String,Integer>

An efficient representation of a large set of n-grams. Based on a HashMap, it associates an integer - the frequency of the corresponding n-gram.
(9:37:45 13 April 2009)

See Also:
Serialized Form

Constructor Summary
HNgram()
           
HNgram(String fname)
          Creates this object and loads the n-gram table from a given file.
HNgram(String fname, int n)
          Creates this object and loads the n-gram table from a given file.
 
Method Summary
 void countNGram(String sngram)
           
 int exclude(String pattern)
          Removes all n-grams from this table, that satisfies a given string pattern (regular expression).
 int freq(String ngram)
          Returns the frequency of a given n-gram.
 int freq(String[] v)
          Returns the frequency of a given n-gram, indicated through an array of strings.
 long getSum()
          Gives the sum of frequencies for all n-grams stored in this table, a value necessary for n-gram probability estimation.
static void main(String[] args)
          The main is used for demonstration.
 double prob(String ngram)
          The estimated probability of a given n-gram, for the data in this table.
 double prob(String[] v)
          The estimated probability of a given n-gram, for the data in this table.
 double probabilidade(String sws)
          Computes the log-likelihood of a given word sequence, based on the n-gram model stored in this object.
 double probability(String sws)
          Computes the likelihood of a given word sequence, based on the n-gram model stored in this object.
 void set(FileIN f)
          This method loads the n-gram table from a given file.
static boolean test201110191137()
           
 
Methods inherited from class java.util.Hashtable
clear, clone, contains, containsKey, containsValue, elements, entrySet, equals, get, hashCode, isEmpty, keys, keySet, put, putAll, rehash, remove, size, toString, values
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HNgram

public HNgram()

HNgram

public HNgram(String fname)
Creates this object and loads the n-gram table from a given file. The expected file has a textual representation, with one n-gram and frequency pair per line. The default n-gram dimensionality is two (bigrams). This constructor invokes the set(FileIN) method.

Parameters:
fname - The name of the text file to be processed.

HNgram

public HNgram(String fname,
              int n)
Creates this object and loads the n-gram table from a given file. This constructor invokes the HNgram(String) method.

Parameters:
fname - The name of the text file to be processed.
n - The n-gram dimensionality value.
Method Detail

set

public final void set(FileIN f)
This method loads the n-gram table from a given file. It is assumed that the n-grams dimensionality was already defined.

Parameters:
f - Represents the n-gram table file to be processed.

countNGram

public void countNGram(String sngram)

exclude

public int exclude(String pattern)
Removes all n-grams from this table, that satisfies a given string pattern (regular expression).

Parameters:
pattern - The regular expression.
Returns:
The number of n-grams removed.

freq

public int freq(String ngram)
Returns the frequency of a given n-gram.

Parameters:
ngram - The indicated n-gram.
Returns:
The frequency, a greater than zero value, or -1 if the indicated n-gram does not exist in the table.

freq

public int freq(String[] v)
Returns the frequency of a given n-gram, indicated through an array of strings.

Parameters:
v - The array containing the n-gram word sequence.
Returns:
The frequency of the n-gram, or -1 upon inexistence or erroneous situations.

prob

public double prob(String ngram)
The estimated probability of a given n-gram, for the data in this table.

Parameters:
ngram - The indicated n-gram
Returns:
The probability estimation, in percentage.

prob

public double prob(String[] v)
The estimated probability of a given n-gram, for the data in this table. The n-gram is represented through an array with n strings.

Parameters:
v - The n-gram representation.
Returns:
The probability estimation, in percentage.

probabilidade

public double probabilidade(String sws)
Computes the log-likelihood of a given word sequence, based on the n-gram model stored in this object.

Parameters:
sws - The word sequence.
Returns:
The estimated log-likelihood in the ]-00, 0] interval.

probability

public double probability(String sws)
Computes the likelihood of a given word sequence, based on the n-gram model stored in this object.

Parameters:
sws - The word sequence.
Returns:
The estimated likelihood in the [0,1] interval.

getSum

public long getSum()
Gives the sum of frequencies for all n-grams stored in this table, a value necessary for n-gram probability estimation.

Returns:
The sum of frequencies.

main

public static void main(String[] args)
The main is used for demonstration. It creates an instance of this class by loading a given table of a 4-gram model of part-of-speech tags. Afterwards, tag sequence likelihood is calculated.

Parameters:
args - No arguments are expected.

test201110191137

public static boolean test201110191137()