hultig.sumo
Class Chunk

java.lang.Object
  extended by hultig.sumo.ChunkMark
      extended by hultig.sumo.Chunk

public class Chunk
extends ChunkMark

This class represents a phrasal chunk from a sentence. A shallow parser divides a given sentence into a sequence of chunks, where each one is formed by a sequence of one or more words. For example, the following sentence:

   The brown fox jumped over the fence.

has the following chunks:

   [NP The/DT brown/JJ fox/NN] [VP jumped/VBD] [PP over/IN] [NP the/DT fence/NN] ./.

two noun phrases (NP), one verb phrase (VP), and one prepositional phrase (PP).

University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)


Field Summary
 
Fields inherited from class hultig.sumo.ChunkMark
a, b, chtag
 
Constructor Summary
Chunk(ChunkedSentence cs, ChunkMark cm)
          This constructor requires a ChunkedSentence, which is a sentence marked with chunk positions through an array of ChunkMark objects.
 
Method Summary
 double connection(Chunk cother)
          This function computes the connection strength between two chunks, measured in terms of a numeric value.
 boolean contains(String cw)
          Test if a tagged word occurs in this chunk.
 boolean eqaulPOS(Chunk cother)
          Test if both chunks have the same POS tag.
 boolean equal(String sc)
          Tests if this chunk word sequence is equal to a given string.
 Word get(int i)
          Gives the word at a given position, from the chunk's sequence of words.
 String getPOS(int i)
          Gives the word's part-of-speech, at a given position from this chunk word sequence.
 String getToken(int i)
          Gives the token from this chunk at a given position.
 int index(String cw)
          Gives the index of a tagged word, represented by a string, in this Chunk.
static void main(String[] args)
          The main method exemplifies the role of a chunk, in the context of a chunked sentence (obtained from shallow parsing), as well as the connection strength method, for chunk comparison.
 int size()
          Gives the number of words contained in this chunk.
 String toString()
          Gives a string representation of this chunk, in the form of: POS[w1 w2 ... wn], where POS is the chunk part-of-speech tag and w1 ... wn the sequence of n words forming this chunk.
 String toStringRegex()
          Gives another string representation of this chunk, in the form of: <w1 w2 ... wn> : POS, where POS is the chunk tag, and w1 ... wn are the sequence of words in this chunk.
 
Methods inherited from class hultig.sumo.ChunkMark
a, b, POS, posUndefined, set, set
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Chunk

public Chunk(ChunkedSentence cs,
             ChunkMark cm)
This constructor requires a ChunkedSentence, which is a sentence marked with chunk positions through an array of ChunkMark objects. A ChunkedSentence is a specialization of a Sentence, which contains a ChunkMark array defining the boundaries of each chunk. The constructor takes a ChunkedSentence and a ChunkMark object to create an instance of this class.

Parameters:
cs - The chunked sentence.
cm - The chunk mark.
Method Detail

size

public int size()
Gives the number of words contained in this chunk.

Returns:
The number of words.

get

public Word get(int i)
Gives the word at a given position, from the chunk's sequence of words.

Parameters:
i - The position in the chunk.
Returns:
The word obtained. On error returns null.

getToken

public String getToken(int i)
Gives the token from this chunk at a given position.

Parameters:
i - The word position, in the chunk word sequence.
Returns:
The corresponding token or null if something gets wrong (e.g. invalid position).

getPOS

public String getPOS(int i)
Gives the word's part-of-speech, at a given position from this chunk word sequence.

Parameters:
i - The word position, in the chunk word sequence.
Returns:
The POS tag for the word at the given position, or null if something gets wrong.

connection

public double connection(Chunk cother)
This function computes the connection strength between two chunks, measured in terms of a numeric value. This function is based in the lexical connectivity between the chunks, as well as the word's POS relatedness.

Parameters:
cother - The other chunk to compare with.
Returns:
The connection strength, a value laying in the [0, 1] interval.

eqaulPOS

public boolean eqaulPOS(Chunk cother)
Test if both chunks have the same POS tag.

Parameters:
cother - The chunk to compare with.
Returns:
The true value on success, and false otherwise.

equal

public boolean equal(String sc)
Tests if this chunk word sequence is equal to a given string.

Parameters:
sc - The string to compare to.
Returns:
The true value on success, and false otherwise.

index

public int index(String cw)
Gives the index of a tagged word, represented by a string, in this Chunk.

Parameters:
cw - The tagged word, for example: "addicted/VBN"
Returns:
The index occurrence, or -1 if not found.

contains

public boolean contains(String cw)
Test if a tagged word occurs in this chunk. This method is similar to index.

Parameters:
cw - The tagged word, for example: "addicted/VBN"
Returns:
The true value if contained, and false otherwise.

toString

public String toString()
Gives a string representation of this chunk, in the form of: POS[w1 w2 ... wn], where POS is the chunk part-of-speech tag and w1 ... wn the sequence of n words forming this chunk.

Overrides:
toString in class ChunkMark
Returns:
A string representation of this chunk in the previously described format.

toStringRegex

public String toStringRegex()
Gives another string representation of this chunk, in the form of: <w1 w2 ... wn> : POS, where POS is the chunk tag, and w1 ... wn are the sequence of words in this chunk. The method was thought to create regular expressions representing sentence reduction rules. (JPC, 13 February, 2009)

Returns:
A string representation of this chunk.

main

public static void main(String[] args)
The main method exemplifies the role of a chunk, in the context of a chunked sentence (obtained from shallow parsing), as well as the connection strength method, for chunk comparison.

Parameters:
args - No argument is expected.