|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjava.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.AbstractSequentialList<E>
java.util.LinkedList<Word>
hultig.sumo.Sentence
hultig.sumo.ChunkedSentence
public class ChunkedSentence
A specialization of the Sentence class, for handling shallow parsed
sentences in a more efficient way. It uses chunk marks (ChunkMark) to
represent the sequence of chunk boundaries.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
| Field Summary |
|---|
| Fields inherited from class hultig.sumo.Sentence |
|---|
cod, label, parentise, pontuacao, stx |
| Fields inherited from class java.util.AbstractList |
|---|
modCount |
| Constructor Summary | |
|---|---|
ChunkedSentence()
The default constructor, which invokes several default settings, including the definition of chunk values. |
|
ChunkedSentence(Sentence s,
OpenNLPKit model)
This constructor receives a sentence and a language model, and creates an instance of a chunked sentence. |
|
ChunkedSentence(String s,
OpenNLPKit model)
This constructor receives a string and a language model, and creates a chunked sentence. |
|
| Method Summary | |
|---|---|
Chunk |
getChunk(int index)
Gets a string with the k-th chunk, from this
sentence chunk sequence. |
ChunkMark |
getChunkMark(int index)
Gives the chunk mark (boundaries and tag), for the chunk at position index, in the sequence of sentence chunks. |
ChunkMark |
getChunkOnPosition(int index)
Gives the chunk mark relative to the word at position index, in this sentence. |
Chunk[] |
getChunks()
Gets an array of strings containing the complete sequence of chunks, from this sentence, one chunk per array position. |
int |
getNumChunks()
Gives the number of chunks contained in this sentence. |
int |
getNumChunks(String postag)
Counts the number of chunks of a certain kind (tag). |
int |
getNumWords()
Gives the number of effective words contained in this sentence. |
String |
getPOStrFixed()
|
String |
getSPOSig()
The same as getSPOSig(char chconnect) with the
connection character being equal to the default of a blank space. |
String |
getSPOSig(char chconnect)
Gives a string with the sequence of part-of-speech tags, corresponding to to the sequence of words in the sentence. |
String[] |
getVPOSig()
Gives the array of part-of-speech tags, corresponding to the sequence of words in the sentence. |
String |
getWordChunkMark(int index)
Gives the chunk mark for a word at position index, identifying
first to which chunk does the word belong. |
double |
lexicoSyntacticEntailmentMetric(ChunkedSentence hypot)
This function was designed to compute a likelihood value for the "lexico-syntactic entailment" between this sentence (thesis) and the entailed sentence - the other sentence (hypothesis). |
static void |
main(String[] args)
Generally exemplifies the operative features of this class. |
void |
printArrayWords()
This method is a default shortcut for printArrayWords(java.lang.String), with label = null. |
void |
printArrayWords(String label)
Outputs the sequence of words in this shallow parsed sentence with their corresponding lexico-syntactic codes. |
ChunkedSentence |
subList(int fromIndex,
int toIndex)
Gives a subsequence of this sentence, in the form of a list of words. |
String |
toPOString()
Gives a string with only the part-of-speech tags. |
String |
toStringChunk()
Gives a shallow parsed representation of this chunked sentence. |
String |
toStringRegex()
Gives another format of a shallow parsed representation of this sentence, in a format suitable for regular expression matching. |
String |
toStringRegexPOS()
A toString() method type that gives a string representation
of this chunked sentence, where each word is printed followed by
its part-of-speech tag, as shown in the next example:
the/dt lazy/jj fox/nn jumped/vbd over/in the/dt fence/nn
(26, April 2009, 10:47) |
String |
toStringRegexPOSCHK()
This method is similar to toStringRegexPOS(), differing
only in the fact that the chunk tag is also included in each word printing, after
the part-of-speech tag. |
| Methods inherited from class hultig.sumo.Sentence |
|---|
addWord, codify, codify, compareTo, countIntersectLinks, countMatch, countMatchNGram, countNormIntersectLinks, countNotMatch, countNumWords, ctMatchNGram, demoForWebPage, dgauss, dgauss, distlex, dLinear, dParabolic, dsBLEU, dsEntropy, dSin, dsLevenshtein, dsNgram, dsNgram, dsuffixArrays, dsumo, dsumo, dsumoWSize, ensureCodification, equalArrays, fracNumWords, getCodes, getTag, getTags, getWord, getWords, indexOf, indexOf, isCodefied, isPunct, isWord, length, match, mutation, print, print, println, readLinks, readLinks, reload, set, setMetric, similarity, similarity, splitPunct, subcodes, subs, testaMetricas, toLowerCase, toLowerCase, toMWUString, toString, toStringPOS, x201102012359, x201102281055 |
| Methods inherited from class java.util.LinkedList |
|---|
add, add, addAll, addAll, addFirst, addLast, clear, clone, contains, descendingIterator, element, get, getFirst, getLast, indexOf, lastIndexOf, listIterator, offer, offerFirst, offerLast, peek, peekFirst, peekLast, poll, pollFirst, pollLast, pop, push, remove, remove, remove, removeFirst, removeFirstOccurrence, removeLast, removeLastOccurrence, set, size, toArray, toArray |
| Methods inherited from class java.util.AbstractSequentialList |
|---|
iterator |
| Methods inherited from class java.util.AbstractList |
|---|
equals, hashCode, listIterator, removeRange |
| Methods inherited from class java.util.AbstractCollection |
|---|
containsAll, isEmpty, removeAll, retainAll |
| Methods inherited from class java.lang.Object |
|---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface java.util.List |
|---|
containsAll, equals, hashCode, isEmpty, iterator, listIterator, removeAll, retainAll |
| Methods inherited from interface java.util.Deque |
|---|
iterator |
| Constructor Detail |
|---|
public ChunkedSentence()
definition of chunk values.
public ChunkedSentence(String s,
OpenNLPKit model)
model).
s - A string representing a textual sentence.model - The language model which should had already
be adequately loaded/configured.
public ChunkedSentence(Sentence s,
OpenNLPKit model)
model).
s - The sentence for shallow parsing.model - The language model.| Method Detail |
|---|
public int getNumWords()
public int getNumChunks()
public int getNumChunks(String postag)
postag - The chunk tag to be counted, for
example "NP", "VP".
postag.public Chunk getChunk(int index)
k-th chunk, from this
sentence chunk sequence.
index - The chunk index.
k-th chunk in the "usual" format,
as for example: [NP the/DT Pet/NNP passport/NN ]. On error,
null will be returned.public Chunk[] getChunks()
null
on error.public ChunkMark getChunkMark(int index)
index, in the sequence of sentence chunks.
index - The chunk index.
public ChunkMark getChunkOnPosition(int index)
index, in this sentence.
index - A valid index of a sentence word. It must be
greater than zero and less than the number of words in the
sentence.
null,
on erroneous cases.public String getWordChunkMark(int index)
index, identifying
first to which chunk does the word belong.
index - The word sequential index, in the sentence.
NP, VP), or null on
index out of bounds.public String[] getVPOSig()
public String getSPOSig(char chconnect)
chconnect - The connection character, between two tags, usually a
blank space.
null on error.
For example: "NP VP PP NP VP".public String getSPOSig()
getSPOSig(char chconnect) with the
connection character being equal to the default of a blank space.
public String toPOString()
public String toStringChunk()
CHK1 CHK2 ... CHn,
where CHKi represents the i-th sentence chunk, with the
following structure: CHKi = [CT W1/T1, W2/T2, ..., Wn/Tn], where
CT represents the chunk tag, and Wj and Tj the
j-th chunk word and POS tag. For example:
[NP The/DT lazy/JJ fox/NN] [VP jumped/VBD] [PP over/IN] [NP the/DT fence/NN]
public String toStringRegex()
np:<the/dt lazy/jj fox/nn>:np vp:<jumped/vbd>:vp pp:<over/in>:pp np:<the/dt fence/nn>:np
public String toStringRegexPOS()
toString() method type that gives a string representation
of this chunked sentence, where each word is printed followed by
its part-of-speech tag, as shown in the next example:
the/dt lazy/jj fox/nn jumped/vbd over/in the/dt fence/nn
(26, April 2009, 10:47)
public String toStringRegexPOSCHK()
toStringRegexPOS(), differing
only in the fact that the chunk tag is also included in each word printing, after
the part-of-speech tag. For example:
the/dt/np lazy/jj/np fox/nn/np jumped/vbd/vp over/in/pp the/dt/np fence/nn/np
(27, April 2009, 20:10)
public double lexicoSyntacticEntailmentMetric(ChunkedSentence hypot)
hypot - The sentence that represents the hypothesis.
[0,1] interval.public void printArrayWords()
printArrayWords(java.lang.String), with label = null.
public void printArrayWords(String label)
label - A string to be printed before the whole sequence.
public ChunkedSentence subList(int fromIndex,
int toIndex)
subList in interface List<Word>subList in class AbstractList<Word>fromIndex - The inclusive starting index.toIndex - The inclusive ending index.
public String getPOStrFixed()
public static void main(String[] args)
OpenNLP object)
must be previously set.
args - The are no arguments expected.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||