|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjava.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.AbstractSequentialList<E>
java.util.LinkedList<Word>
hultig.sumo.Sentence
hultig.sumo.ChunkedSentence
public class ChunkedSentence
A specialization of the Sentence
class, for handling shallow parsed
sentences in a more efficient way. It uses chunk marks (ChunkMark
) to
represent the sequence of chunk boundaries.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
Field Summary |
---|
Fields inherited from class hultig.sumo.Sentence |
---|
cod, label, parentise, pontuacao, stx |
Fields inherited from class java.util.AbstractList |
---|
modCount |
Constructor Summary | |
---|---|
ChunkedSentence()
The default constructor, which invokes several default settings, including the definition of chunk values . |
|
ChunkedSentence(Sentence s,
OpenNLPKit model)
This constructor receives a sentence and a language model, and creates an instance of a chunked sentence. |
|
ChunkedSentence(String s,
OpenNLPKit model)
This constructor receives a string and a language model, and creates a chunked sentence. |
Method Summary | |
---|---|
Chunk |
getChunk(int index)
Gets a string with the k -th chunk, from this
sentence chunk sequence. |
ChunkMark |
getChunkMark(int index)
Gives the chunk mark (boundaries and tag), for the chunk at position index , in the sequence of sentence chunks. |
ChunkMark |
getChunkOnPosition(int index)
Gives the chunk mark relative to the word at position index , in this sentence. |
Chunk[] |
getChunks()
Gets an array of strings containing the complete sequence of chunks, from this sentence, one chunk per array position. |
int |
getNumChunks()
Gives the number of chunks contained in this sentence. |
int |
getNumChunks(String postag)
Counts the number of chunks of a certain kind (tag). |
int |
getNumWords()
Gives the number of effective words contained in this sentence. |
String |
getPOStrFixed()
|
String |
getSPOSig()
The same as getSPOSig(char chconnect) with the
connection character being equal to the default of a blank space. |
String |
getSPOSig(char chconnect)
Gives a string with the sequence of part-of-speech tags, corresponding to to the sequence of words in the sentence. |
String[] |
getVPOSig()
Gives the array of part-of-speech tags, corresponding to the sequence of words in the sentence. |
String |
getWordChunkMark(int index)
Gives the chunk mark for a word at position index , identifying
first to which chunk does the word belong. |
double |
lexicoSyntacticEntailmentMetric(ChunkedSentence hypot)
This function was designed to compute a likelihood value for the "lexico-syntactic entailment" between this sentence (thesis) and the entailed sentence - the other sentence (hypothesis). |
static void |
main(String[] args)
Generally exemplifies the operative features of this class. |
void |
printArrayWords()
This method is a default shortcut for printArrayWords(java.lang.String) , with label = null . |
void |
printArrayWords(String label)
Outputs the sequence of words in this shallow parsed sentence with their corresponding lexico-syntactic codes. |
ChunkedSentence |
subList(int fromIndex,
int toIndex)
Gives a subsequence of this sentence, in the form of a list of words. |
String |
toPOString()
Gives a string with only the part-of-speech tags. |
String |
toStringChunk()
Gives a shallow parsed representation of this chunked sentence. |
String |
toStringRegex()
Gives another format of a shallow parsed representation of this sentence, in a format suitable for regular expression matching. |
String |
toStringRegexPOS()
A toString() method type that gives a string representation
of this chunked sentence, where each word is printed followed by
its part-of-speech tag, as shown in the next example:
the/dt lazy/jj fox/nn jumped/vbd over/in the/dt fence/nn
(26, April 2009, 10:47) |
String |
toStringRegexPOSCHK()
This method is similar to toStringRegexPOS() , differing
only in the fact that the chunk tag is also included in each word printing, after
the part-of-speech tag. |
Methods inherited from class hultig.sumo.Sentence |
---|
addWord, codify, codify, compareTo, countIntersectLinks, countMatch, countMatchNGram, countNormIntersectLinks, countNotMatch, countNumWords, ctMatchNGram, demoForWebPage, dgauss, dgauss, distlex, dLinear, dParabolic, dsBLEU, dsEntropy, dSin, dsLevenshtein, dsNgram, dsNgram, dsuffixArrays, dsumo, dsumo, dsumoWSize, ensureCodification, equalArrays, fracNumWords, getCodes, getTag, getTags, getWord, getWords, indexOf, indexOf, isCodefied, isPunct, isWord, length, match, mutation, print, print, println, readLinks, readLinks, reload, set, setMetric, similarity, similarity, splitPunct, subcodes, subs, testaMetricas, toLowerCase, toLowerCase, toMWUString, toString, toStringPOS, x201102012359, x201102281055 |
Methods inherited from class java.util.LinkedList |
---|
add, add, addAll, addAll, addFirst, addLast, clear, clone, contains, descendingIterator, element, get, getFirst, getLast, indexOf, lastIndexOf, listIterator, offer, offerFirst, offerLast, peek, peekFirst, peekLast, poll, pollFirst, pollLast, pop, push, remove, remove, remove, removeFirst, removeFirstOccurrence, removeLast, removeLastOccurrence, set, size, toArray, toArray |
Methods inherited from class java.util.AbstractSequentialList |
---|
iterator |
Methods inherited from class java.util.AbstractList |
---|
equals, hashCode, listIterator, removeRange |
Methods inherited from class java.util.AbstractCollection |
---|
containsAll, isEmpty, removeAll, retainAll |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.util.List |
---|
containsAll, equals, hashCode, isEmpty, iterator, listIterator, removeAll, retainAll |
Methods inherited from interface java.util.Deque |
---|
iterator |
Constructor Detail |
---|
public ChunkedSentence()
definition of chunk values
.
public ChunkedSentence(String s, OpenNLPKit model)
model
).
s
- A string representing a textual sentence.model
- The language model which should had already
be adequately loaded/configured.public ChunkedSentence(Sentence s, OpenNLPKit model)
model
).
s
- The sentence for shallow parsing.model
- The language model.Method Detail |
---|
public int getNumWords()
public int getNumChunks()
public int getNumChunks(String postag)
postag
- The chunk tag to be counted, for
example "NP", "VP".
postag
.public Chunk getChunk(int index)
k
-th chunk, from this
sentence chunk sequence.
index
- The chunk index.
k
-th chunk in the "usual" format,
as for example: [NP the/DT Pet/NNP passport/NN ]
. On error,
null
will be returned.public Chunk[] getChunks()
null
on error.public ChunkMark getChunkMark(int index)
index
, in the sequence of sentence chunks.
index
- The chunk index.
public ChunkMark getChunkOnPosition(int index)
index
, in this sentence.
index
- A valid index of a sentence word. It must be
greater than zero and less than the number of words in the
sentence.
null
,
on erroneous cases.public String getWordChunkMark(int index)
index
, identifying
first to which chunk does the word belong.
index
- The word sequential index, in the sentence.
NP
, VP
), or null
on
index out of bounds.public String[] getVPOSig()
public String getSPOSig(char chconnect)
chconnect
- The connection character, between two tags, usually a
blank space.
null
on error.
For example: "NP VP PP NP VP"
.public String getSPOSig()
getSPOSig(char chconnect)
with the
connection character being equal to the default of a blank space.
public String toPOString()
public String toStringChunk()
CHK1 CHK2 ... CHn
,
where CHKi
represents the i
-th sentence chunk, with the
following structure: CHKi = [CT W1/T1, W2/T2, ..., Wn/Tn]
, where
CT
represents the chunk tag, and Wj
and Tj
the
j
-th chunk word and POS tag. For example:
[NP The/DT lazy/JJ fox/NN] [VP jumped/VBD] [PP over/IN] [NP the/DT fence/NN]
public String toStringRegex()
np:<the/dt lazy/jj fox/nn>:np vp:<jumped/vbd>:vp pp:<over/in>:pp np:<the/dt fence/nn>:np
public String toStringRegexPOS()
toString()
method type that gives a string representation
of this chunked sentence, where each word is printed followed by
its part-of-speech tag, as shown in the next example:
the/dt lazy/jj fox/nn jumped/vbd over/in the/dt fence/nn(26, April 2009, 10:47)
public String toStringRegexPOSCHK()
toStringRegexPOS()
, differing
only in the fact that the chunk tag is also included in each word printing, after
the part-of-speech tag. For example:
the/dt/np lazy/jj/np fox/nn/np jumped/vbd/vp over/in/pp the/dt/np fence/nn/np(27, April 2009, 20:10)
public double lexicoSyntacticEntailmentMetric(ChunkedSentence hypot)
hypot
- The sentence that represents the hypothesis.
[0,1]
interval.public void printArrayWords()
printArrayWords(java.lang.String)
, with label = null
.
public void printArrayWords(String label)
label
- A string to be printed before the whole sequence.public ChunkedSentence subList(int fromIndex, int toIndex)
subList
in interface List<Word>
subList
in class AbstractList<Word>
fromIndex
- The inclusive starting index.toIndex
- The inclusive ending index.
public String getPOStrFixed()
public static void main(String[] args)
OpenNLP
object)
must be previously set.
args
- The are no arguments expected.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |