OpenNLPKit

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

hultig.sumo
Class OpenNLPKit

java.lang.Object
  hultig.sumo.OpenNLPKit

public class OpenNLPKit
extends Object
extends Object

This class gathers and simplifies the access to the main features of the OpenNLP package, as part-of-speech tagging and sentence parsing. The package must already have been installed and its installation path must be supplied to the constructor of this class, as exemplified bellow:

  OpenNLPKit model = new OpenNLPKit("/a/tools/opennlp-tools-1.5.0/models/english/");

The main method performs a general test, demonstrating the class key features.

University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)

Field Summary
`static int`	`CHUNKER` The chunker reference code.
`static String[]`	`modelFileName` An array containing the file names of the language models, from the sentence detector until the parser.
`static int`	`PARSER` The parser reference code.
`static int`	`STDETECT` The sentence detector reference code.
`static int`	`TAGGER` The tagger reference code.
`static int`	`TOKENIZER` The tokenizer reference code.

Constructor Summary
`OpenNLPKit()` The default constructor initializes the models path with null.
`OpenNLPKit(String path)` Creates an OpenNLP kit trying to define the path for the main directory containing the language models.

Method Summary
`boolean`	`allDefined()` Tests if all the language models are well loaded and defined.
`boolean`	`allDefined(boolean silent)` Tests is all language models are well loaded and defined.
`String`	`chunk(Sentence stc)` Gives the shallow parsed string from an already marked sentence.
`String`	`chunk(String tagedline)` This method shallow parses a given sentence string.
`boolean`	`definedSentenceDetector()` Tests if the sentence detection model is well defined.
`static void`	`help()` Prints the command line help, implemented in the main method.
`boolean`	`loadAllModels()` Tries to load all the language models from the defined path `modelsPATH`.
`void`	`loadChunker()` Tries to load the model necessary for sentence chunking, that is shallow parsing.
`void`	`loadParser()` Tries to load the model necessary for sentence parsing (full parsing).
`void`	`loadSentenceDetector()` Tries to load the model necessary for sentence detection.
`void`	`loadTagger()` Tries to load the model necessary for part-of-speech tagging.
`void`	`loadTokenizer()` Tries to load the model necessary for string tokenization.
`static void`	`main(String[] args)` Demonstrates the class main features, as well as a small command line for making several operations on sentences, like tagging and parsing.
`String`	`parse(String stc)` Fully parses a sentence contained in a string.
`Sentence`	`postag(Sentence stc)` Process the Part-of-Speech tagging for a given Sentence.
`String`	`postag(String stxt)` Process the Part-of-Speech tagging for a given textual string.
`boolean`	`setModelsPath(String path)` Sets and validates a given path as being a valid directory.
`void`	`setSilentMode(boolean value)` Sets the "silent mode" state, used in some methods for printing log/info/status messages.
`String[]`	`splitSentences(String stxt)` Splits an assumed textual string, having possibly several sentences, into an array of strings, with one sentence per position.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

STDETECT

public static final int STDETECT

The sentence detector reference code.

See Also:: Constant Field Values

TOKENIZER

public static final int TOKENIZER

The tokenizer reference code.

See Also:: Constant Field Values

TAGGER

public static final int TAGGER

The tagger reference code.

See Also:: Constant Field Values

CHUNKER

public static final int CHUNKER

The chunker reference code.

See Also:: Constant Field Values

PARSER

public static final int PARSER

The parser reference code.

See Also:: Constant Field Values

modelFileName

public static String[] modelFileName

An array containing the file names of the language models, from the sentence detector until the parser.

Constructor Detail

OpenNLPKit

public OpenNLPKit()

The default constructor initializes the models path with null. This means that this path must be defined later, in order to be able to use this object. Each model filename is also defined here, in the modelFileName array.

OpenNLPKit

public OpenNLPKit(String path)

Creates an OpenNLP kit trying to define the path for the main directory containing the language models. If the path is not a valid OS directory then the modelsPATH variable is set to null.

Parameters:: path - The string path

Method Detail

setModelsPath

public boolean setModelsPath(String path)

Sets and validates a given path as being a valid directory. In order to subsequently find the language models, the path must point to the directory containing these modules.

Parameters:: path - The string path
Returns:: The true value if the path points to a valid OS directory.

setSilentMode

public void setSilentMode(boolean value)

Sets the "silent mode" state, used in some methods for printing log/info/status messages.

Parameters:: value - True for activation and false for deactivation.

loadAllModels

public boolean loadAllModels()

Tries to load all the language models from the defined path modelsPATH.

Returns:: True means just that the the path is well defined, targeting the assumed modules main directory.

loadSentenceDetector

public void loadSentenceDetector()

Tries to load the model necessary for sentence detection. On error the model will not be loaded, meaning that the stdetect variable still be equal to null or to the old model object.

loadTokenizer

public void loadTokenizer()

Tries to load the model necessary for string tokenization. On error the model will not be loaded, meaning that the tokenizer variable still be equal to null or to the old model object.

loadTagger

public void loadTagger()

Tries to load the model necessary for part-of-speech tagging. On error the model will not be loaded, meaning that the tagger variable still be equal to null or to the old model object.

loadChunker

public void loadChunker()

Tries to load the model necessary for sentence chunking, that is shallow parsing. On error the model will not be loaded, meaning that the chunker variable still be equal to null or to the old model object.

loadParser

public void loadParser()

Tries to load the model necessary for sentence parsing (full parsing). On error the model will not be loaded, meaning that the parser variable still be equal to null or to the old model object.

definedSentenceDetector

public boolean definedSentenceDetector()

Tests if the sentence detection model is well defined.

Returns:: The true/false values uppon success/failure.

allDefined

public boolean allDefined()

Tests if all the language models are well loaded and defined. This method uses allDefined(boolean) to Silently make the verification, meaning that no output message will be printed.

Returns:: Gives true only if all language models are well defined.

allDefined

public boolean allDefined(boolean silent)

Tests is all language models are well loaded and defined.

Parameters:: silent - If activated (true) avoids printing any error message.
Returns:: Gives true only if all language models are well defined.

splitSentences

public String[] splitSentences(String stxt)

Splits an assumed textual string, having possibly several sentences, into an array of strings, with one sentence per position.

Parameters:: stxt - The textual string.
Returns:: The array of sentences or null.

postag

public String postag(String stxt)

Process the Part-of-Speech tagging for a given textual string.

Parameters:: stxt - The textual string.
Returns:: The tagged string.

postag

public Sentence postag(Sentence stc)

Process the Part-of-Speech tagging for a given Sentence. This method will reconstruct the Sentence object, returning a new tagged one, where every word is POS tagged.

Parameters:: stc - Sentence The sentence to be tagged.
Returns:: Sentence The tagged sentence.

chunk

public String chunk(String tagedline)

This method shallow parses a given sentence string. It obtained and adapted from the file opennlp-tools.models.english.chunker.TreebankChunker.java.

Parameters:: tagedline - The tagged sentence string to be chunked.
Returns:: String The shallow parsed sentence.

chunk

public String chunk(Sentence stc)

Gives the shallow parsed string from an already marked sentence.

Parameters:: stc - The marked sentence.
Returns:: String The shallow parsed string.

parse

public String parse(String stc)

Fully parses a sentence contained in a string.

Parameters:: stc - The sentence string.
Returns:: The parsed string.

help

public static void help()

Prints the command line help, implemented in the main method.

main

public static void main(String[] args)

Demonstrates the class main features, as well as a small command line for making several operations on sentences, like tagging and parsing. This class uses the 1.5 version of the OpenNLP package, and in order to perform the demonstration, the package must have been installed and their main system path must be supplied in the constructor, for example:

  OpenNLPKit model = new OpenNLPKit("/a/tools/opennlp-tools-1.5.0/models/english/");

If everything is well defined, then the execution of this test will produce the following output:

 [X] - LOADING SENTENCE DETECTOR MODEL ... SENTENCE DETECTOR MODEL LOADED
 [X] - LOADING TOKENIZER MODEL ........... TOKENIZER MODEL LOADED
 [X] - LOADING TAGGER MODEL .............. TAGGER MODEL LOADED
 
 [NLP KIT GENERAL DEMONSTRATION]


 [A MULTI-SENTENCE TEXT SEGMENT]
    --------------------------------------------
    |Yes, said Mr. Heinberg. And it's written  |
    |for that audience. It's certainly not     |
    |written in such a way that only experts   |
    |could benefit from it. A general reader   |
    |can easily pick up this $22 book and find |
    |their way through it with no problem at   |
    |all.                                      |
    --------------------------------------------


 [LIST OF ALL SENTENCES FOUND IN TEXT]
    S( 0):... [Yes, said Mr. Heinberg.]
    S( 1):... [And it's written for that audience.]
    S( 2):... [It's certainly not written in such a way that only experts could benefit from it.]
    S( 3):... [A general reader can easily pick up this $22 book and find their way through it with no problem at all.]


 [PART-OF-SPEECH OF EACH SENTENCE]
    S( 0):... [Yes/UH ,/, said/VBD Mr./NNP Heinberg/NNP ./.]
    S( 1):... [And/CC it/PRP 's/VBZ written/VBN for/IN that/DT audience/NN ./.]
    S( 2):... [It/PRP 's/VBZ certainly/RB not/RB written/VBN in/IN such/JJ a/DT way/NN that/IN only/JJ experts/NNS could/MD benefit/VB from/IN it/PRP ./.]
    S( 3):... [A/DT general/JJ reader/NN can/MD easily/RB pick/VB up/RP this/DT $/$ 22/CD book/NN and/CC find/VB their/PRP$ way/NN through/IN it/PRP with/IN no/DT problem/NN at/IN all/DT ./.]


 [X] - LOADING CHUNKER MODEL ............. CHUNKER MODEL LOADED


 [SHALLOW PARSING OF EACH SENTENCE]
    S( 0):...  [INTJ  Yes/UH ]  ,/, [VP  said/VBD ] [NP  Mr./NNP  Heinberg/NNP ]  ./.
    S( 1):...   And/CC [NP  it/PRP ] [VP  's/VBZ  written/VBN ] [PP  for/IN ] [NP  that/DT  audience/NN ]  ./.
    S( 2):...  [NP  It/PRP ] [VP  's/VBZ  certainly/RB  not/RB  written/VBN ] [PP  in/IN ] [NP  such/JJ  a/DT  way/NN ] [PP  that/IN ] [NP  only/JJ  experts/NNS ] [VP  could/MD  benefit/VB ] [PP  from/IN ] [NP  it/PRP ]  ./.
    S( 3):...  [NP  A/DT  general/JJ  reader/NN ] [VP  can/MD  easily/RB  pick/VB ] [PRT  up/RP ] [NP  this/DT  $/$  22/CD  book/NN ]  and/CC [VP  find/VB ] [NP  their/PRP$  way/NN ] [PP  through/IN ] [NP  it/PRP ] [PP  with/IN ] [NP  no/DT  problem/NN ] [ADVP  at/IN  all/DT ]  ./.


 [X] - LOADING PARSER MODEL .............. PARSER MODEL LOADED


 [COMPLETE PARSING OF EACH SENTENCE]
    S( 0):... (TOP (S (NP (NNP Yes/UH)) (, ,/,) (VP (VBD said/VBD) (NP (NNP Mr./NNP) (NNP Heinberg/NNP))) (. ./.)))
    S( 1):... (TOP (NP (NP (NNP And/CC)) (PP (IN it/PRP) (NP (NP (JJ 's/VBZ) (NN written/VBN)) (PP (IN for/IN) (NP (DT that/DT) (NN audience/NN))))) (. ./.)))
    S( 2):... (TOP (S (NP (NP (DT It/PRP) (JJ 's/VBZ) (JJ certainly/RB) (NN not/RB) (NN written/VBN)) (PP (IN in/IN) (NP (NP (JJ such/JJ) (NN a/DT) (NN way/NN)) (PP (IN that/IN) (NP (JJ only/JJ) (NNS experts/NNS)))))) (VP (MD could/MD) (VP (VB benefit/VB) (PP (IN from/IN) (NP (PRP it/PRP))))) (. ./.)))
    S( 3):... (TOP (NP (NP (NNP A/DT) (NN general/JJ) (NN reader/NN)) (PP (IN can/MD) (NP (NP (DT easily/RB) (NN pick/VB)) (PP (IN up/RP) (NP (NP (DT this/DT) (JJ $/$) (CD 22/CD) (NN book/NN)) (PP (IN and/CC) (NP (NP (NN find/VB)) (NP (DT their/PRP$) (NN way/NN)) (PP (IN through/IN) (NP (NP (PRP it/PRP)) (PP (IN with/IN) (NP (NP (DT no/DT) (NN problem/NN)) (PP (IN at/IN) (NP (DT all/DT))))))))))))) (. ./.)))

 input sentence>

The last line is the command line prompt. To know the available commands just type: help.

Parameters:: args - String[]

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

hultig.sumo Class OpenNLPKit

STDETECT

TOKENIZER

TAGGER

CHUNKER

PARSER

modelFileName

OpenNLPKit

OpenNLPKit

setModelsPath

setSilentMode

loadAllModels

loadSentenceDetector

loadTokenizer

loadTagger

loadChunker

loadParser

definedSentenceDetector

allDefined

allDefined

splitSentences

postag

postag

chunk

chunk

parse

help

main

hultig.sumo
Class OpenNLPKit