ChunkedSentence

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

hultig.sumo
Class ChunkedSentence

java.lang.Object
  java.util.AbstractCollection<E>
      java.util.AbstractList<E>
          java.util.AbstractSequentialList<E>
              java.util.LinkedList<Word>
                  hultig.sumo.Sentence
                      hultig.sumo.ChunkedSentence

All Implemented Interfaces:: Serializable, Cloneable, Comparable, Iterable<Word>, Collection<Word>, Deque<Word>, List<Word>, Queue<Word>

public class ChunkedSentence
extends Sentence
extends Sentence

A specialization of the Sentence class, for handling shallow parsed sentences in a more efficient way. It uses chunk marks (ChunkMark) to represent the sequence of chunk boundaries.

University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)

See Also:: Serialized Form

Field Summary

Fields inherited from class hultig.sumo.Sentence
`cod, label, parentise, pontuacao, stx`

Fields inherited from class java.util.AbstractList
`modCount`

Constructor Summary
`ChunkedSentence()` The default constructor, which invokes several default settings, including the `definition of chunk values`.
`ChunkedSentence(Sentence s, OpenNLPKit model)` This constructor receives a sentence and a language model, and creates an instance of a chunked sentence.
`ChunkedSentence(String s, OpenNLPKit model)` This constructor receives a string and a language model, and creates a chunked sentence.

Method Summary
`Chunk`	`getChunk(int index)` Gets a string with the `k`-th chunk, from this sentence chunk sequence.
`ChunkMark`	`getChunkMark(int index)` Gives the chunk mark (boundaries and tag), for the chunk at position `index`, in the sequence of sentence chunks.
`ChunkMark`	`getChunkOnPosition(int index)` Gives the chunk mark relative to the word at position `index`, in this sentence.
`Chunk[]`	`getChunks()` Gets an array of strings containing the complete sequence of chunks, from this sentence, one chunk per array position.
`int`	`getNumChunks()` Gives the number of chunks contained in this sentence.
`int`	`getNumChunks(String postag)` Counts the number of chunks of a certain kind (tag).
`int`	`getNumWords()` Gives the number of effective words contained in this sentence.
`String`	`getPOStrFixed()`
`String`	`getSPOSig()` The same as `getSPOSig(char chconnect)` with the connection character being equal to the default of a blank space.
`String`	`getSPOSig(char chconnect)` Gives a string with the sequence of part-of-speech tags, corresponding to to the sequence of words in the sentence.
`String[]`	`getVPOSig()` Gives the array of part-of-speech tags, corresponding to the sequence of words in the sentence.
`String`	`getWordChunkMark(int index)` Gives the chunk mark for a word at position `index`, identifying first to which chunk does the word belong.
`double`	`lexicoSyntacticEntailmentMetric(ChunkedSentence hypot)` This function was designed to compute a likelihood value for the "lexico-syntactic entailment" between this sentence (thesis) and the entailed sentence - the other sentence (hypothesis).
`static void`	`main(String[] args)` Generally exemplifies the operative features of this class.
`void`	`printArrayWords()` This method is a default shortcut for `printArrayWords(java.lang.String)`, with `label = null`.
`void`	`printArrayWords(String label)` Outputs the sequence of words in this shallow parsed sentence with their corresponding lexico-syntactic codes.
`ChunkedSentence`	`subList(int fromIndex, int toIndex)` Gives a subsequence of this sentence, in the form of a list of words.
`String`	`toPOString()` Gives a string with only the part-of-speech tags.
`String`	`toStringChunk()` Gives a shallow parsed representation of this chunked sentence.
`String`	`toStringRegex()` Gives another format of a shallow parsed representation of this sentence, in a format suitable for regular expression matching.
`String`	`toStringRegexPOS()` A `toString()` method type that gives a string representation of this chunked sentence, where each word is printed followed by its part-of-speech tag, as shown in the next example: the/dt lazy/jj fox/nn jumped/vbd over/in the/dt fence/nn (26, April 2009, 10:47)
`String`	`toStringRegexPOSCHK()` This method is similar to `toStringRegexPOS()`, differing only in the fact that the chunk tag is also included in each word printing, after the part-of-speech tag.

Methods inherited from class hultig.sumo.Sentence
addWord, codify, codify, compareTo, countIntersectLinks, countMatch, countMatchNGram, countNormIntersectLinks, countNotMatch, countNumWords, ctMatchNGram, demoForWebPage, dgauss, dgauss, distlex, dLinear, dParabolic, dsBLEU, dsEntropy, dSin, dsLevenshtein, dsNgram, dsNgram, dsuffixArrays, dsumo, dsumo, dsumoWSize, ensureCodification, equalArrays, fracNumWords, getCodes, getTag, getTags, getWord, getWords, indexOf, indexOf, isCodefied, isPunct, isWord, length, match, mutation, print, print, println, readLinks, readLinks, reload, set, setMetric, similarity, similarity, splitPunct, subcodes, subs, testaMetricas, toLowerCase, toLowerCase, toMWUString, toString, toStringPOS, x201102012359, x201102281055

Methods inherited from class hultig.sumo.Sentence

addWord, codify, codify, compareTo, countIntersectLinks, countMatch, countMatchNGram, countNormIntersectLinks, countNotMatch, countNumWords, ctMatchNGram, demoForWebPage, dgauss, dgauss, distlex, dLinear, dParabolic, dsBLEU, dsEntropy, dSin, dsLevenshtein, dsNgram, dsNgram, dsuffixArrays, dsumo, dsumo, dsumoWSize, ensureCodification, equalArrays, fracNumWords, getCodes, getTag, getTags, getWord, getWords, indexOf, indexOf, isCodefied, isPunct, isWord, length, match, mutation, print, print, println, readLinks, readLinks, reload, set, setMetric, similarity, similarity, splitPunct, subcodes, subs, testaMetricas, toLowerCase, toLowerCase, toMWUString, toString, toStringPOS, x201102012359, x201102281055

Methods inherited from class java.util.LinkedList
`add, add, addAll, addAll, addFirst, addLast, clear, clone, contains, descendingIterator, element, get, getFirst, getLast, indexOf, lastIndexOf, listIterator, offer, offerFirst, offerLast, peek, peekFirst, peekLast, poll, pollFirst, pollLast, pop, push, remove, remove, remove, removeFirst, removeFirstOccurrence, removeLast, removeLastOccurrence, set, size, toArray, toArray`

Methods inherited from class java.util.AbstractSequentialList
`iterator`

Methods inherited from class java.util.AbstractList
`equals, hashCode, listIterator, removeRange`

Methods inherited from class java.util.AbstractCollection
`containsAll, isEmpty, removeAll, retainAll`

Methods inherited from class java.lang.Object
`finalize, getClass, notify, notifyAll, wait, wait, wait`

Methods inherited from interface java.util.List
`containsAll, equals, hashCode, isEmpty, iterator, listIterator, removeAll, retainAll`

Methods inherited from interface java.util.Deque
`iterator`

Constructor Detail

ChunkedSentence

public ChunkedSentence()

The default constructor, which invokes several default settings, including the definition of chunk values.

ChunkedSentence

public ChunkedSentence(String s,
                       OpenNLPKit model)

This constructor receives a string and a language model, and creates a chunked sentence. The shallow parser is invoked from the language model object (model).

Parameters:: s - A string representing a textual sentence.; model - The language model which should had already be adequately loaded/configured.

ChunkedSentence

public ChunkedSentence(Sentence s,
                       OpenNLPKit model)

This constructor receives a sentence and a language model, and creates an instance of a chunked sentence. The shallow parser is invoked from the language model object (model).

Parameters:: s - The sentence for shallow parsing.; model - The language model.

Method Detail

getNumWords

public int getNumWords()

Gives the number of effective words contained in this sentence.

Returns:: The number of words.

getNumChunks

public int getNumChunks()

Gives the number of chunks contained in this sentence.

Returns:: The number of chunks.

getNumChunks

public int getNumChunks(String postag)

Counts the number of chunks of a certain kind (tag).

Parameters:: postag - The chunk tag to be counted, for example "NP", "VP".
Returns:: The number of chunks matching postag.

getChunk

public Chunk getChunk(int index)

Gets a string with the k-th chunk, from this sentence chunk sequence.

Parameters:: index - The chunk index.
Returns:: String The string of the k-th chunk in the "usual" format, as for example: [NP the/DT Pet/NNP passport/NN ]. On error, null will be returned.

getChunks

public Chunk[] getChunks()

Gets an array of strings containing the complete sequence of chunks, from this sentence, one chunk per array position.

Returns:: String[] The sequence of chunks or null on error.

getChunkMark

public ChunkMark getChunkMark(int index)

Gives the chunk mark (boundaries and tag), for the chunk at position index, in the sequence of sentence chunks.

Parameters:: index - The chunk index.
Returns:: ChunkMark The chunk mark.

getChunkOnPosition

public ChunkMark getChunkOnPosition(int index)

Gives the chunk mark relative to the word at position index, in this sentence.

Parameters:: index - A valid index of a sentence word. It must be greater than zero and less than the number of words in the sentence.
Returns:: Whether the corresponding chunk mark or null, on erroneous cases.

getWordChunkMark

public String getWordChunkMark(int index)

Gives the chunk mark for a word at position index, identifying first to which chunk does the word belong.

Parameters:: index - The word sequential index, in the sentence.
Returns:: The chunk tag (e.g. NP, VP), or null on index out of bounds.

getVPOSig

public String[] getVPOSig()

Gives the array of part-of-speech tags, corresponding to the sequence of words in the sentence.

Returns:: The array of part-of-speech tags.

getSPOSig

public String getSPOSig(char chconnect)

Gives a string with the sequence of part-of-speech tags, corresponding to to the sequence of words in the sentence.

Parameters:: chconnect - The connection character, between two tags, usually a blank space.
Returns:: The string with part-of-speech sequence, or null on error. For example: "NP VP PP NP VP".

getSPOSig

public String getSPOSig()

The same as getSPOSig(char chconnect) with the connection character being equal to the default of a blank space.

Returns:

toPOString

public String toPOString()

Gives a string with only the part-of-speech tags.

Returns:: The POS string.

toStringChunk

public String toStringChunk()

Gives a shallow parsed representation of this chunked sentence. The representation follows a conventional format: CHK1 CHK2 ... CHn, where CHKi represents the i-th sentence chunk, with the following structure: CHKi = [CT W1/T1, W2/T2, ..., Wn/Tn], where CT represents the chunk tag, and Wj and Tj the j-th chunk word and POS tag. For example:

    [NP The/DT lazy/JJ fox/NN] [VP jumped/VBD] [PP over/IN] [NP the/DT fence/NN]

Returns:: The string representing the shallow parsed sentence.

toStringRegex

public String toStringRegex()

Gives another format of a shallow parsed representation of this sentence, in a format suitable for regular expression matching. The idea was to be able to apply sentence simplification rules expressed expressed through regular expressions (13, February 2009, 11:57). This format is exemplified in the following example:

    np:<the/dt lazy/jj fox/nn>:np  vp:<jumped/vbd>:vp  pp:<over/in>:pp  np:<the/dt fence/nn>:np

Returns:: The string representing the shallow parsed sentence.

toStringRegexPOS

public String toStringRegexPOS()

A toString() method type that gives a string representation of this chunked sentence, where each word is printed followed by its part-of-speech tag, as shown in the next example:

    the/dt lazy/jj fox/nn jumped/vbd over/in the/dt fence/nn

(26, April 2009, 10:47)

Returns:: A string representation of this chunked sentence.

toStringRegexPOSCHK

public String toStringRegexPOSCHK()

This method is similar to toStringRegexPOS(), differing only in the fact that the chunk tag is also included in each word printing, after the part-of-speech tag. For example:

    the/dt/np lazy/jj/np fox/nn/np jumped/vbd/vp over/in/pp the/dt/np fence/nn/np

(27, April 2009, 20:10)

Returns:: A string representation of this chunked sentence.

lexicoSyntacticEntailmentMetric

public double lexicoSyntacticEntailmentMetric(ChunkedSentence hypot)

This function was designed to compute a likelihood value for the "lexico-syntactic entailment" between this sentence (thesis) and the entailed sentence - the other sentence (hypothesis). We say that sentence T entails sentence H if we can infer/conclude H by knowing T. This metric was created to work with data from the RTE collections. The calculations are based on lexical and syntactical (shallow parsed sentence) features.

Parameters:: hypot - The sentence that represents the hypothesis.
Returns:: A real value in the [0,1] interval.

printArrayWords

public void printArrayWords()

This method is a default shortcut for printArrayWords(java.lang.String), with label = null.

printArrayWords

public void printArrayWords(String label)

Outputs the sequence of words in this shallow parsed sentence with their corresponding lexico-syntactic codes.

Parameters:: label - A string to be printed before the whole sequence.

subList

public ChunkedSentence subList(int fromIndex,
                               int toIndex)

Gives a subsequence of this sentence, in the form of a list of words.

Specified by:: subList in interface List<Word>
Overrides:: subList in class AbstractList<Word>

Parameters:: fromIndex - The inclusive starting index.; toIndex - The inclusive ending index.
Returns:: A sublist representing a subsequence of words, from this sequence.

getPOStrFixed

public String getPOStrFixed()

main

public static void main(String[] args)

Generally exemplifies the operative features of this class. In order to run the tests contained in this method, a language model (OpenNLP object) must be previously set.

Parameters:: args - The are no arguments expected.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

hultig.sumo Class ChunkedSentence

ChunkedSentence

ChunkedSentence

ChunkedSentence

getNumWords

getNumChunks

getNumChunks

getChunk

getChunks

getChunkMark

getChunkOnPosition

getWordChunkMark

getVPOSig

getSPOSig

getSPOSig

toPOString

toStringChunk

toStringRegex

toStringRegexPOS

toStringRegexPOSCHK

lexicoSyntacticEntailmentMetric

printArrayWords

printArrayWords

subList

getPOStrFixed

main

hultig.sumo
Class ChunkedSentence