hultig.sumo
Class ParaphAlignPair

java.lang.Object
  extended by hultig.sumo.ParaphAlignPair

public class ParaphAlignPair
extends Object

This class represents an aligned paraphrase pair, that is a paraphrasic sentence pair having their common and similar words aligned. This class enables alignment representation with various levels of interpretation: lexical, syntactical, and at the chunk level.

University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)

Since:
10:44:43 8/Mai/2008

Constructor Summary
ParaphAlignPair(String sa, String sb)
          The default constructor is based on the two sentence strings.
ParaphAlignPair(String sa, String sb, OpenNLPKit model)
          A more general constructor where a language model is taken to be used for shallow parsing.
 
Method Summary
 void align(POSType postype)
          Uses the Needleman Wunsch algorithm for globally align the paraphrasic sentences of this class.
 void codify(CorpusIndex dic)
          Codifies the aligned sentences according to a given corpus index.
 void codifyChunks(POSType postype)
          Codifies the chunks of the aligned sentences according to a given part-of-speech tag set.
 String[] colorizedChunks()
          Gives the string pair containing this alignment, marked with XML chromatic tags.
 String[] colorizedChunks(POSType post)
          Gives the string pair containing this alignment, marked with XML chromatic tags.
 Vector<XBubble> extractBubblesWithBoundaries()
          A new version of the extractNXBubbles() method in which the BEGIN and END meta-tags are considered.
 Vector<XBubble> extractNXBubbles()
          Extracts all possible bubbles from an aligned paraphrase.
 Vector<XBubble> extractNXBubbles(double minValue)
          Extracts all possible bubbles from an aligned paraphrase.
static void main(String[] args)
          The main method exemplifies the use of this class.
static int numTrueWords(Word[] v)
          Count the number of true words, contained in a given array of words.
 void print()
          A shortcut for the print(int level) method, with level equal to 3.
 void print(int level)
          Outputs this alignment.
 void printWithColors()
          Outputs this alignment marked with XML chromatic tags.
 int size()
          The length of this alignment, in terms of the number of tokens in each sentence, including the void tokens, marked usualy with sequences of underscores.
 String[] subSequence(int a, int b)
          Gives a sub-sequence of this alignment, delimited by two positions.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ParaphAlignPair

public ParaphAlignPair(String sa,
                       String sb)
The default constructor is based on the two sentence strings.

Parameters:
sa - One sentence string.
sb - The other sentence string.

ParaphAlignPair

public ParaphAlignPair(String sa,
                       String sb,
                       OpenNLPKit model)
A more general constructor where a language model is taken to be used for shallow parsing.

Parameters:
sa - One sentence string.
sb - The other sentence string.
model - The language model.
Method Detail

align

public void align(POSType postype)
Uses the Needleman Wunsch algorithm for globally align the paraphrasic sentences of this class.

Parameters:
postype - The definition of the POS tags, to mark the generated alignment.

codify

public void codify(CorpusIndex dic)
Codifies the aligned sentences according to a given corpus index.

Parameters:
dic - The corpus index.

codifyChunks

public void codifyChunks(POSType postype)
Codifies the chunks of the aligned sentences according to a given part-of-speech tag set.

Parameters:
postype - The POS set considered.

size

public int size()
The length of this alignment, in terms of the number of tokens in each sentence, including the void tokens, marked usualy with sequences of underscores.

Returns:
The alignment size/lenght.

subSequence

public String[] subSequence(int a,
                            int b)
Gives a sub-sequence of this alignment, delimited by two positions.

Parameters:
a - The left position.
b - The right positions
Returns:
The two strings from the aligned sub-sequence.

extractNXBubbles

public Vector<XBubble> extractNXBubbles(double minValue)
Extracts all possible bubbles from an aligned paraphrase. For extracting a bubble a certain confidence criteria must hold, namely the length of the contexts (left and right) must outweigh the length of the middle region.

Parameters:
minValue - The minimum value upon which a bubble is extracted.
Returns:
The list with all bubbles found in this pair.

extractNXBubbles

public Vector<XBubble> extractNXBubbles()
Extracts all possible bubbles from an aligned paraphrase. For extracting a bubble a certain confidence criteria must hold, namely the length of the contexts (left and right) must outweigh the length of the middle region.

Returns:
The list with all bubbles found in this pair.

extractBubblesWithBoundaries

public Vector<XBubble> extractBubblesWithBoundaries()
A new version of the extractNXBubbles() method in which the BEGIN and END meta-tags are considered. If one of the contexts is equal to one of this tags, the value of the bubble is recomputed differently taking into account only the other context.

Returns:
The list with all bubbles found in this pair.
Since:
2012-04-25 11:45

numTrueWords

public static int numTrueWords(Word[] v)
Count the number of true words, contained in a given array of words. Here a string is considered as a word if it starts with a letter and ends also with a letter or a digit.

Parameters:
v - The given array of words.
Returns:
A value between 0 and v.lenght-1.

print

public void print()
A shortcut for the print(int level) method, with level equal to 3.


print

public void print(int level)
Outputs this alignment.

Parameters:
level - A code stating the amount of information to be printed in the standard output.

colorizedChunks

public String[] colorizedChunks()
Gives the string pair containing this alignment, marked with XML chromatic tags. See colorTags(int cod).

Returns:
The alignment pair.

colorizedChunks

public String[] colorizedChunks(POSType post)
Gives the string pair containing this alignment, marked with XML chromatic tags. See colorTags(int cod). If the parameter flag is active (true), and array with four strings is returned, where the third and fourth ones are corresponding part-of-speech strings for the first and second strings. These last two contain the chromatic marked sentences.

Parameters:
withPOS - The part-of-speech flag.
Returns:
An array with two or four strings, depending on the withPOS flag.

printWithColors

public void printWithColors()
Outputs this alignment marked with XML chromatic tags.


main

public static void main(String[] args)
The main method exemplifies the use of this class.

Parameters:
args -