|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objecthultig.sumo.OpenNLPKit
public class OpenNLPKit
This class gathers and simplifies the access to the main features of the OpenNLP package, as part-of-speech tagging and sentence parsing. The package must already have been installed and its installation path must be supplied to the constructor of this class, as exemplified bellow:
OpenNLPKit model = new OpenNLPKit("/a/tools/opennlp-tools-1.5.0/models/english/");
The main
method performs a general test,
demonstrating the class key features.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
Field Summary | |
---|---|
static int |
CHUNKER
The chunker reference code. |
static String[] |
modelFileName
An array containing the file names of the language models, from the sentence detector until the parser. |
static int |
PARSER
The parser reference code. |
static int |
STDETECT
The sentence detector reference code. |
static int |
TAGGER
The tagger reference code. |
static int |
TOKENIZER
The tokenizer reference code. |
Constructor Summary | |
---|---|
OpenNLPKit()
The default constructor initializes the models path with null. |
|
OpenNLPKit(String path)
Creates an OpenNLP kit trying to define the path for the main directory containing the language models. |
Method Summary | |
---|---|
boolean |
allDefined()
Tests if all the language models are well loaded and defined. |
boolean |
allDefined(boolean silent)
Tests is all language models are well loaded and defined. |
String |
chunk(Sentence stc)
Gives the shallow parsed string from an already marked sentence. |
String |
chunk(String tagedline)
This method shallow parses a given sentence string. |
boolean |
definedSentenceDetector()
Tests if the sentence detection model is well defined. |
static void |
help()
Prints the command line help, implemented in the main method. |
boolean |
loadAllModels()
Tries to load all the language models from the defined path modelsPATH . |
void |
loadChunker()
Tries to load the model necessary for sentence chunking, that is shallow parsing. |
void |
loadParser()
Tries to load the model necessary for sentence parsing (full parsing). |
void |
loadSentenceDetector()
Tries to load the model necessary for sentence detection. |
void |
loadTagger()
Tries to load the model necessary for part-of-speech tagging. |
void |
loadTokenizer()
Tries to load the model necessary for string tokenization. |
static void |
main(String[] args)
Demonstrates the class main features, as well as a small command line for making several operations on sentences, like tagging and parsing. |
String |
parse(String stc)
Fully parses a sentence contained in a string. |
Sentence |
postag(Sentence stc)
Process the Part-of-Speech tagging for a given Sentence. |
String |
postag(String stxt)
Process the Part-of-Speech tagging for a given textual string. |
boolean |
setModelsPath(String path)
Sets and validates a given path as being a valid directory. |
void |
setSilentMode(boolean value)
Sets the "silent mode" state, used in some methods for printing log/info/status messages. |
String[] |
splitSentences(String stxt)
Splits an assumed textual string, having possibly several sentences, into an array of strings, with one sentence per position. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int STDETECT
public static final int TOKENIZER
public static final int TAGGER
public static final int CHUNKER
public static final int PARSER
public static String[] modelFileName
Constructor Detail |
---|
public OpenNLPKit()
modelFileName
array.
public OpenNLPKit(String path)
modelsPATH
variable is set to null.
path
- The string pathMethod Detail |
---|
public boolean setModelsPath(String path)
path
- The string path
public void setSilentMode(boolean value)
value
- True for activation and false for deactivation.public boolean loadAllModels()
modelsPATH
.
public void loadSentenceDetector()
stdetect
variable
still be equal to null or to the old model object.
public void loadTokenizer()
tokenizer
variable
still be equal to null or to the old model object.
public void loadTagger()
tagger
variable
still be equal to null or to the old model object.
public void loadChunker()
chunker
variable still be equal to null or to
the old model object.
public void loadParser()
parser
variable still be equal to null or to
the old model object.
public boolean definedSentenceDetector()
public boolean allDefined()
allDefined(boolean)
to
Silently make the verification, meaning that no output message will
be printed.
public boolean allDefined(boolean silent)
silent
- If activated (true) avoids printing any
error message.
public String[] splitSentences(String stxt)
stxt
- The textual string.
public String postag(String stxt)
stxt
- The textual string.
public Sentence postag(Sentence stc)
stc
- Sentence The sentence to be tagged.
public String chunk(String tagedline)
tagedline
- The tagged sentence string to be chunked.
public String chunk(Sentence stc)
stc
- The marked sentence.
public String parse(String stc)
stc
- The sentence string.
public static void help()
public static void main(String[] args)
OpenNLPKit model = new OpenNLPKit("/a/tools/opennlp-tools-1.5.0/models/english/");If everything is well defined, then the execution of this test will produce the following output:
[X] - LOADING SENTENCE DETECTOR MODEL ... SENTENCE DETECTOR MODEL LOADED [X] - LOADING TOKENIZER MODEL ........... TOKENIZER MODEL LOADED [X] - LOADING TAGGER MODEL .............. TAGGER MODEL LOADED [NLP KIT GENERAL DEMONSTRATION] [A MULTI-SENTENCE TEXT SEGMENT] -------------------------------------------- |Yes, said Mr. Heinberg. And it's written | |for that audience. It's certainly not | |written in such a way that only experts | |could benefit from it. A general reader | |can easily pick up this $22 book and find | |their way through it with no problem at | |all. | -------------------------------------------- [LIST OF ALL SENTENCES FOUND IN TEXT] S( 0):... [Yes, said Mr. Heinberg.] S( 1):... [And it's written for that audience.] S( 2):... [It's certainly not written in such a way that only experts could benefit from it.] S( 3):... [A general reader can easily pick up this $22 book and find their way through it with no problem at all.] [PART-OF-SPEECH OF EACH SENTENCE] S( 0):... [Yes/UH ,/, said/VBD Mr./NNP Heinberg/NNP ./.] S( 1):... [And/CC it/PRP 's/VBZ written/VBN for/IN that/DT audience/NN ./.] S( 2):... [It/PRP 's/VBZ certainly/RB not/RB written/VBN in/IN such/JJ a/DT way/NN that/IN only/JJ experts/NNS could/MD benefit/VB from/IN it/PRP ./.] S( 3):... [A/DT general/JJ reader/NN can/MD easily/RB pick/VB up/RP this/DT $/$ 22/CD book/NN and/CC find/VB their/PRP$ way/NN through/IN it/PRP with/IN no/DT problem/NN at/IN all/DT ./.] [X] - LOADING CHUNKER MODEL ............. CHUNKER MODEL LOADED [SHALLOW PARSING OF EACH SENTENCE] S( 0):... [INTJ Yes/UH ] ,/, [VP said/VBD ] [NP Mr./NNP Heinberg/NNP ] ./. S( 1):... And/CC [NP it/PRP ] [VP 's/VBZ written/VBN ] [PP for/IN ] [NP that/DT audience/NN ] ./. S( 2):... [NP It/PRP ] [VP 's/VBZ certainly/RB not/RB written/VBN ] [PP in/IN ] [NP such/JJ a/DT way/NN ] [PP that/IN ] [NP only/JJ experts/NNS ] [VP could/MD benefit/VB ] [PP from/IN ] [NP it/PRP ] ./. S( 3):... [NP A/DT general/JJ reader/NN ] [VP can/MD easily/RB pick/VB ] [PRT up/RP ] [NP this/DT $/$ 22/CD book/NN ] and/CC [VP find/VB ] [NP their/PRP$ way/NN ] [PP through/IN ] [NP it/PRP ] [PP with/IN ] [NP no/DT problem/NN ] [ADVP at/IN all/DT ] ./. [X] - LOADING PARSER MODEL .............. PARSER MODEL LOADED [COMPLETE PARSING OF EACH SENTENCE] S( 0):... (TOP (S (NP (NNP Yes/UH)) (, ,/,) (VP (VBD said/VBD) (NP (NNP Mr./NNP) (NNP Heinberg/NNP))) (. ./.))) S( 1):... (TOP (NP (NP (NNP And/CC)) (PP (IN it/PRP) (NP (NP (JJ 's/VBZ) (NN written/VBN)) (PP (IN for/IN) (NP (DT that/DT) (NN audience/NN))))) (. ./.))) S( 2):... (TOP (S (NP (NP (DT It/PRP) (JJ 's/VBZ) (JJ certainly/RB) (NN not/RB) (NN written/VBN)) (PP (IN in/IN) (NP (NP (JJ such/JJ) (NN a/DT) (NN way/NN)) (PP (IN that/IN) (NP (JJ only/JJ) (NNS experts/NNS)))))) (VP (MD could/MD) (VP (VB benefit/VB) (PP (IN from/IN) (NP (PRP it/PRP))))) (. ./.))) S( 3):... (TOP (NP (NP (NNP A/DT) (NN general/JJ) (NN reader/NN)) (PP (IN can/MD) (NP (NP (DT easily/RB) (NN pick/VB)) (PP (IN up/RP) (NP (NP (DT this/DT) (JJ $/$) (CD 22/CD) (NN book/NN)) (PP (IN and/CC) (NP (NP (NN find/VB)) (NP (DT their/PRP$) (NN way/NN)) (PP (IN through/IN) (NP (NP (PRP it/PRP)) (PP (IN with/IN) (NP (NP (DT no/DT) (NN problem/NN)) (PP (IN at/IN) (NP (DT all/DT))))))))))))) (. ./.))) input sentence>The last line is the command line prompt. To know the available commands just type: help.
args
- String[]
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |