hultig.sumo
Class OpenNLPKit

java.lang.Object
  extended by hultig.sumo.OpenNLPKit

public class OpenNLPKit
extends Object

This class gathers and simplifies the access to the main features of the OpenNLP package, as part-of-speech tagging and sentence parsing. The package must already have been installed and its installation path must be supplied to the constructor of this class, as exemplified bellow:

  OpenNLPKit model = new OpenNLPKit("/a/tools/opennlp-tools-1.5.0/models/english/");
 

The main method performs a general test, demonstrating the class key features.

University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)


Field Summary
static int CHUNKER
          The chunker reference code.
static String[] modelFileName
          An array containing the file names of the language models, from the sentence detector until the parser.
static int PARSER
          The parser reference code.
static int STDETECT
          The sentence detector reference code.
static int TAGGER
          The tagger reference code.
static int TOKENIZER
          The tokenizer reference code.
 
Constructor Summary
OpenNLPKit()
          The default constructor initializes the models path with null.
OpenNLPKit(String path)
          Creates an OpenNLP kit trying to define the path for the main directory containing the language models.
 
Method Summary
 boolean allDefined()
          Tests if all the language models are well loaded and defined.
 boolean allDefined(boolean silent)
          Tests is all language models are well loaded and defined.
 String chunk(Sentence stc)
          Gives the shallow parsed string from an already marked sentence.
 String chunk(String tagedline)
          This method shallow parses a given sentence string.
 boolean definedSentenceDetector()
          Tests if the sentence detection model is well defined.
static void help()
          Prints the command line help, implemented in the main method.
 boolean loadAllModels()
          Tries to load all the language models from the defined path modelsPATH.
 void loadChunker()
          Tries to load the model necessary for sentence chunking, that is shallow parsing.
 void loadParser()
          Tries to load the model necessary for sentence parsing (full parsing).
 void loadSentenceDetector()
          Tries to load the model necessary for sentence detection.
 void loadTagger()
          Tries to load the model necessary for part-of-speech tagging.
 void loadTokenizer()
          Tries to load the model necessary for string tokenization.
static void main(String[] args)
          Demonstrates the class main features, as well as a small command line for making several operations on sentences, like tagging and parsing.
 String parse(String stc)
          Fully parses a sentence contained in a string.
 Sentence postag(Sentence stc)
          Process the Part-of-Speech tagging for a given Sentence.
 String postag(String stxt)
          Process the Part-of-Speech tagging for a given textual string.
 boolean setModelsPath(String path)
          Sets and validates a given path as being a valid directory.
 void setSilentMode(boolean value)
          Sets the "silent mode" state, used in some methods for printing log/info/status messages.
 String[] splitSentences(String stxt)
          Splits an assumed textual string, having possibly several sentences, into an array of strings, with one sentence per position.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STDETECT

public static final int STDETECT
The sentence detector reference code.

See Also:
Constant Field Values

TOKENIZER

public static final int TOKENIZER
The tokenizer reference code.

See Also:
Constant Field Values

TAGGER

public static final int TAGGER
The tagger reference code.

See Also:
Constant Field Values

CHUNKER

public static final int CHUNKER
The chunker reference code.

See Also:
Constant Field Values

PARSER

public static final int PARSER
The parser reference code.

See Also:
Constant Field Values

modelFileName

public static String[] modelFileName
An array containing the file names of the language models, from the sentence detector until the parser.

Constructor Detail

OpenNLPKit

public OpenNLPKit()
The default constructor initializes the models path with null. This means that this path must be defined later, in order to be able to use this object. Each model filename is also defined here, in the modelFileName array.


OpenNLPKit

public OpenNLPKit(String path)
Creates an OpenNLP kit trying to define the path for the main directory containing the language models. If the path is not a valid OS directory then the modelsPATH variable is set to null.

Parameters:
path - The string path
Method Detail

setModelsPath

public boolean setModelsPath(String path)
Sets and validates a given path as being a valid directory. In order to subsequently find the language models, the path must point to the directory containing these modules.

Parameters:
path - The string path
Returns:
The true value if the path points to a valid OS directory.

setSilentMode

public void setSilentMode(boolean value)
Sets the "silent mode" state, used in some methods for printing log/info/status messages.

Parameters:
value - True for activation and false for deactivation.

loadAllModels

public boolean loadAllModels()
Tries to load all the language models from the defined path modelsPATH.

Returns:
True means just that the the path is well defined, targeting the assumed modules main directory.

loadSentenceDetector

public void loadSentenceDetector()
Tries to load the model necessary for sentence detection. On error the model will not be loaded, meaning that the stdetect variable still be equal to null or to the old model object.


loadTokenizer

public void loadTokenizer()
Tries to load the model necessary for string tokenization. On error the model will not be loaded, meaning that the tokenizer variable still be equal to null or to the old model object.


loadTagger

public void loadTagger()
Tries to load the model necessary for part-of-speech tagging. On error the model will not be loaded, meaning that the tagger variable still be equal to null or to the old model object.


loadChunker

public void loadChunker()
Tries to load the model necessary for sentence chunking, that is shallow parsing. On error the model will not be loaded, meaning that the chunker variable still be equal to null or to the old model object.


loadParser

public void loadParser()
Tries to load the model necessary for sentence parsing (full parsing). On error the model will not be loaded, meaning that the parser variable still be equal to null or to the old model object.


definedSentenceDetector

public boolean definedSentenceDetector()
Tests if the sentence detection model is well defined.

Returns:
The true/false values uppon success/failure.

allDefined

public boolean allDefined()
Tests if all the language models are well loaded and defined. This method uses allDefined(boolean) to Silently make the verification, meaning that no output message will be printed.

Returns:
Gives true only if all language models are well defined.

allDefined

public boolean allDefined(boolean silent)
Tests is all language models are well loaded and defined.

Parameters:
silent - If activated (true) avoids printing any error message.
Returns:
Gives true only if all language models are well defined.

splitSentences

public String[] splitSentences(String stxt)
Splits an assumed textual string, having possibly several sentences, into an array of strings, with one sentence per position.

Parameters:
stxt - The textual string.
Returns:
The array of sentences or null.

postag

public String postag(String stxt)
Process the Part-of-Speech tagging for a given textual string.

Parameters:
stxt - The textual string.
Returns:
The tagged string.

postag

public Sentence postag(Sentence stc)
Process the Part-of-Speech tagging for a given Sentence. This method will reconstruct the Sentence object, returning a new tagged one, where every word is POS tagged.

Parameters:
stc - Sentence The sentence to be tagged.
Returns:
Sentence The tagged sentence.

chunk

public String chunk(String tagedline)
This method shallow parses a given sentence string. It obtained and adapted from the file opennlp-tools.models.english.chunker.TreebankChunker.java.

Parameters:
tagedline - The tagged sentence string to be chunked.
Returns:
String The shallow parsed sentence.

chunk

public String chunk(Sentence stc)
Gives the shallow parsed string from an already marked sentence.

Parameters:
stc - The marked sentence.
Returns:
String The shallow parsed string.

parse

public String parse(String stc)
Fully parses a sentence contained in a string.

Parameters:
stc - The sentence string.
Returns:
The parsed string.

help

public static void help()
Prints the command line help, implemented in the main method.


main

public static void main(String[] args)
Demonstrates the class main features, as well as a small command line for making several operations on sentences, like tagging and parsing. This class uses the 1.5 version of the OpenNLP package, and in order to perform the demonstration, the package must have been installed and their main system path must be supplied in the constructor, for example:
  OpenNLPKit model = new OpenNLPKit("/a/tools/opennlp-tools-1.5.0/models/english/");
 
If everything is well defined, then the execution of this test will produce the following output:
 [X] - LOADING SENTENCE DETECTOR MODEL ... SENTENCE DETECTOR MODEL LOADED
 [X] - LOADING TOKENIZER MODEL ........... TOKENIZER MODEL LOADED
 [X] - LOADING TAGGER MODEL .............. TAGGER MODEL LOADED
 
 [NLP KIT GENERAL DEMONSTRATION]


 [A MULTI-SENTENCE TEXT SEGMENT]
    --------------------------------------------
    |Yes, said Mr. Heinberg. And it's written  |
    |for that audience. It's certainly not     |
    |written in such a way that only experts   |
    |could benefit from it. A general reader   |
    |can easily pick up this $22 book and find |
    |their way through it with no problem at   |
    |all.                                      |
    --------------------------------------------


 [LIST OF ALL SENTENCES FOUND IN TEXT]
    S( 0):... [Yes, said Mr. Heinberg.]
    S( 1):... [And it's written for that audience.]
    S( 2):... [It's certainly not written in such a way that only experts could benefit from it.]
    S( 3):... [A general reader can easily pick up this $22 book and find their way through it with no problem at all.]


 [PART-OF-SPEECH OF EACH SENTENCE]
    S( 0):... [Yes/UH ,/, said/VBD Mr./NNP Heinberg/NNP ./.]
    S( 1):... [And/CC it/PRP 's/VBZ written/VBN for/IN that/DT audience/NN ./.]
    S( 2):... [It/PRP 's/VBZ certainly/RB not/RB written/VBN in/IN such/JJ a/DT way/NN that/IN only/JJ experts/NNS could/MD benefit/VB from/IN it/PRP ./.]
    S( 3):... [A/DT general/JJ reader/NN can/MD easily/RB pick/VB up/RP this/DT $/$ 22/CD book/NN and/CC find/VB their/PRP$ way/NN through/IN it/PRP with/IN no/DT problem/NN at/IN all/DT ./.]


 [X] - LOADING CHUNKER MODEL ............. CHUNKER MODEL LOADED


 [SHALLOW PARSING OF EACH SENTENCE]
    S( 0):...  [INTJ  Yes/UH ]  ,/, [VP  said/VBD ] [NP  Mr./NNP  Heinberg/NNP ]  ./.
    S( 1):...   And/CC [NP  it/PRP ] [VP  's/VBZ  written/VBN ] [PP  for/IN ] [NP  that/DT  audience/NN ]  ./.
    S( 2):...  [NP  It/PRP ] [VP  's/VBZ  certainly/RB  not/RB  written/VBN ] [PP  in/IN ] [NP  such/JJ  a/DT  way/NN ] [PP  that/IN ] [NP  only/JJ  experts/NNS ] [VP  could/MD  benefit/VB ] [PP  from/IN ] [NP  it/PRP ]  ./.
    S( 3):...  [NP  A/DT  general/JJ  reader/NN ] [VP  can/MD  easily/RB  pick/VB ] [PRT  up/RP ] [NP  this/DT  $/$  22/CD  book/NN ]  and/CC [VP  find/VB ] [NP  their/PRP$  way/NN ] [PP  through/IN ] [NP  it/PRP ] [PP  with/IN ] [NP  no/DT  problem/NN ] [ADVP  at/IN  all/DT ]  ./.


 [X] - LOADING PARSER MODEL .............. PARSER MODEL LOADED


 [COMPLETE PARSING OF EACH SENTENCE]
    S( 0):... (TOP (S (NP (NNP Yes/UH)) (, ,/,) (VP (VBD said/VBD) (NP (NNP Mr./NNP) (NNP Heinberg/NNP))) (. ./.)))
    S( 1):... (TOP (NP (NP (NNP And/CC)) (PP (IN it/PRP) (NP (NP (JJ 's/VBZ) (NN written/VBN)) (PP (IN for/IN) (NP (DT that/DT) (NN audience/NN))))) (. ./.)))
    S( 2):... (TOP (S (NP (NP (DT It/PRP) (JJ 's/VBZ) (JJ certainly/RB) (NN not/RB) (NN written/VBN)) (PP (IN in/IN) (NP (NP (JJ such/JJ) (NN a/DT) (NN way/NN)) (PP (IN that/IN) (NP (JJ only/JJ) (NNS experts/NNS)))))) (VP (MD could/MD) (VP (VB benefit/VB) (PP (IN from/IN) (NP (PRP it/PRP))))) (. ./.)))
    S( 3):... (TOP (NP (NP (NNP A/DT) (NN general/JJ) (NN reader/NN)) (PP (IN can/MD) (NP (NP (DT easily/RB) (NN pick/VB)) (PP (IN up/RP) (NP (NP (DT this/DT) (JJ $/$) (CD 22/CD) (NN book/NN)) (PP (IN and/CC) (NP (NP (NN find/VB)) (NP (DT their/PRP$) (NN way/NN)) (PP (IN through/IN) (NP (NP (PRP it/PRP)) (PP (IN with/IN) (NP (NP (DT no/DT) (NN problem/NN)) (PP (IN at/IN) (NP (DT all/DT))))))))))))) (. ./.)))

 input sentence>
 
The last line is the command line prompt. To know the available commands just type: help.

Parameters:
args - String[]