|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjava.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.AbstractSequentialList<E>
java.util.LinkedList<Word>
hultig.sumo.Sentence
public class Sentence
Represents a textual sentence using various schemes or interpretations. For instance, a sentence may be intrepreted as a sequence of characters or as a sequence of words, represented by a linked list of words. This class manages different kind of sentence representations.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
Field Summary | |
---|---|
int |
cod
A sentence index, used in news clustering. |
String |
label
This label defines a sentence meta-tag. |
static String |
parentise
The set of text delimiters. |
static String |
pontuacao
The set of punctuation marks. |
protected String |
stx
Internal string representation of this sentence. |
Fields inherited from class java.util.AbstractList |
---|
modCount |
Constructor Summary | |
---|---|
Sentence()
Default constructor. |
|
Sentence(String s)
Creates a new sentence from a given string. |
Method Summary | |
---|---|
void |
addWord(Word w)
Append a new word to this sentence. |
void |
codify(CorpusIndex dic)
Codifies this sentence according to a given previously processed dictionary. |
static void |
codify(Sentence... vs)
Static method to codify a bunch of sentences. |
int |
compareTo(Object other)
|
static int |
countIntersectLinks(Sentence sa,
Sentence sb)
Counts te number of link intersections existing between the two sentences. |
int |
countMatch(String regex)
|
static int |
countMatchNGram(int N,
Sentence sa,
Sentence sb)
Counts the number of exclusive n-gram matches, between two sentences. |
static double |
countNormIntersectLinks(Sentence sa,
Sentence sb)
Percentage of link intersections existing between the two sentences. |
int |
countNotMatch(String regex)
Counts the number of words from this sentence that do not match a given regular expression. |
int |
countNumWords()
Counts the number of words in this sentence. |
static int |
ctMatchNGram(int N,
Sentence sa,
Sentence sb)
Counts the number of n-grams match between two sentences. |
static void |
demoForWebPage()
|
double |
dgauss(Sentence other)
A simple version of the gaussian similarity between twon sentences. |
double |
dgauss(Sentence other,
double p0,
double r0,
double sp0,
double sr0)
The gaussian similarity between two sentences. |
double |
distlex(Word w)
The minumum lexical distance of a word to any word in this sentence. |
double |
dLinear(Sentence other)
The linear similarity metric between two sentences. |
double |
dParabolic(Sentence other)
The parabolic sentence similarity metric. |
static double |
dsBLEU(Sentence sa,
Sentence sb)
Computes the BLEU metric between two sentences. |
double |
dsEntropy(Sentence other)
The "entropy metric" for calculating the similarity between two sentences. |
double |
dSin(Sentence other)
The trignometric function for calculating the similarity between two sentences. |
int |
dsLevenshtein(Sentence other)
This method applies the Edit Distance (ED) metric to compare this sentence with another one. |
static double |
dsNgram(int N,
Sentence sa,
Sentence sb)
Computes the simple n-gram overlap between two sentences, considering a maximum number of n-grams. |
static double |
dsNgram(Sentence sa,
Sentence sb)
Computes the simple n-gram overlap between two sentences, with 4 as the maximum n-gram counted. |
static double |
dsuffixArrays(Sentence sa,
Sentence sb)
A metric for calculating sentence proximity, based on suffix arrays comparisons of n-grams, as defined by Church and Yamamoto. |
static double |
dsumo(int[] u,
int[] v)
The "sumo metric" for calculating the similarity between two sentences. |
double |
dsumo(Sentence other)
The "sumo metric" for calculating the similarity between two sentences. |
double |
dsumoWSize(Sentence other)
A different version of the sumo function
for calculating sentence similarity between two sentences. |
static void |
ensureCodification(Sentence... sentences)
Ensures that a given set of sentences is codified, which means that their words have been marked with a word indexer (a CorpusIndex object). |
static boolean |
equalArrays(int[] u,
int[] v)
Verifies whether two arrays are equal. |
double |
fracNumWords()
The proportion of effective words contained in this sentence. |
int[] |
getCodes()
Gives the array of lexical codes representing this sentence. |
String |
getTag(int index)
The POS tag, if defined, for a given word. |
String[] |
getTags()
Gives the array of POS tags for that sentence, assuming it was already tagged. |
String |
getWord(int index)
Gives the sentence word positioned at a given index. |
String[] |
getWords()
Gives an array of strings, containing all the words in this sentence. |
int |
indexOf(String s)
Gives the index of a string in this sentence. |
int |
indexOf(String s,
int from)
Gives the index of a string occurence within this sentence, starting the search from a given position. |
boolean |
isCodefied()
Verifies whether this sentence has been marked with a CorpusIndex object. |
static boolean |
isPunct(String s)
Tests if a given string is a punctuation mark. |
static boolean |
isWord(String s)
Test if a given string is a word. |
int |
length()
This sentence string length. |
static void |
main(String[] args)
The main method contains a general class tester. |
static int |
match(int[] vsub,
int[] v)
Counts the number of occurrences of a sub-array inside another, presumably longer, array. |
Sentence |
mutation(int n)
Produces a given number of random "mutations" in this sentence. |
void |
print()
Outputs the string representing this sentence. |
void |
print(int a,
int b)
Outputs the words of this sentence, between two positions. |
void |
println()
Outputs all words from this sentence, one word per line. |
static int[][] |
readLinks(int[] va,
int[] vb)
Returns the set of links between two sentences: A = {a1,a2,...an} where ak = (k1, k2) is an integer pair representing the link between word in position k1, in one sentence, and k2 the k2-th word in the other sentence. |
int[][] |
readLinks(Sentence other)
Returns the set of links between two sentences: A = {a1,a2,...an} where ak = (k1, k2) is an integer pair representing the link between word in position k1, in one sentence, and k2 the k2-th word in the other sentence. |
void |
reload(Sentence s)
Recreate this sentence from another one. |
void |
set(String s)
Recreate this sentence from a given string. |
void |
setMetric(String smetric)
Defines which should be the default similarity function to be used in the sentence similarity computation. |
double |
similarity(Sentence other)
Compute the similarity metric between two sentences. |
double |
similarity(Sentence other,
String metric)
Calculates the similarity between two sentences using a given similarity function. |
Sentence[] |
splitPunct()
Split a sentence based on the punctuations found. |
int[] |
subcodes(int start,
int end)
Gives the array of sub-codes corresponding to a sub-sentence of this sentences. |
Sentence |
subs(int a,
int b)
Gives a sub-sentence from this sentence, between positions a and b, which should be valid. |
static void |
testaMetricas(String s1,
String s2)
|
void |
toLowerCase()
Converts every word to lower case and transforms their CorpusIndex codes to -1. |
void |
toLowerCase(CorpusIndex dic)
Converts every word to lower case and redefines each word's lexical code, basesd on a supplied dictionary. |
String |
toMWUString()
Transform this sentence into a kind of a multi-word-unit (MWU) expression. |
String |
toString()
The overriding of the toString() method. |
String |
toStringPOS()
A toString() type method giving each word joined with its respective part-of-speech tag |
static void |
x201102012359()
|
static void |
x201102281055()
Correcções na sequência dos testes exaustivos realizados pelo Steven Burrows. |
Methods inherited from class java.util.LinkedList |
---|
add, add, addAll, addAll, addFirst, addLast, clear, clone, contains, descendingIterator, element, get, getFirst, getLast, indexOf, lastIndexOf, listIterator, offer, offerFirst, offerLast, peek, peekFirst, peekLast, poll, pollFirst, pollLast, pop, push, remove, remove, remove, removeFirst, removeFirstOccurrence, removeLast, removeLastOccurrence, set, size, toArray, toArray |
Methods inherited from class java.util.AbstractSequentialList |
---|
iterator |
Methods inherited from class java.util.AbstractList |
---|
equals, hashCode, listIterator, removeRange, subList |
Methods inherited from class java.util.AbstractCollection |
---|
containsAll, isEmpty, removeAll, retainAll |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.util.List |
---|
containsAll, equals, hashCode, isEmpty, iterator, listIterator, removeAll, retainAll, subList |
Methods inherited from interface java.util.Deque |
---|
iterator |
Field Detail |
---|
public static String pontuacao
public static String parentise
protected String stx
public String label
public int cod
Constructor Detail |
---|
public Sentence()
public Sentence(String s)
s
- - The string containing a sentence.Method Detail |
---|
public int compareTo(Object other)
compareTo
in interface Comparable
public void reload(Sentence s)
s
- The other sentence.public void addWord(Word w)
w
- The word to be appendedpublic void set(String s)
s
- The indicated string.public void codify(CorpusIndex dic)
dic
- The indicated dictionary.public static void codify(Sentence... vs)
vs
- Sentence[]public int length()
public String getWord(int index)
index
- The index to read from.
public String[] getWords()
public String getTag(int index)
index
- The word position in the sentence.
public int[] getCodes()
public String[] getTags()
public boolean isCodefied()
true
value on success.public static boolean isPunct(String s)
s
- The string to be tested.
public static boolean isWord(String s)
s
- The string to be tested
public int countNumWords()
public int indexOf(String s)
s
- The string to be scaned in this sentence.
public int indexOf(String s, int from)
s
- The string to be scaned in this string.from
- The starting index.
indexOf(String s)
public Sentence subs(int a, int b)
a
- One index.b
- The other index.
public Sentence[] splitPunct()
public int dsLevenshtein(Sentence other)
other
- The other sentence.
public int[] subcodes(int start, int end)
start
- end
-
public static int match(int[] vsub, int[] v)
vsub
- The sub-array.v
- The longer array.
public static boolean equalArrays(int[] u, int[] v)
u
- The first array.v
- The second array.
public static int ctMatchNGram(int N, Sentence sa, Sentence sb)
N
- The n-gram size.sa
- The first sentence.sb
- The second sentence.
public static int countMatchNGram(int N, Sentence sa, Sentence sb)
N
- The n-gram size.sa
- The first sentence.sb
- The other sentence.
public int[][] readLinks(Sentence other)
other
- The other sentence.
public static int[][] readLinks(int[] va, int[] vb)
va
- The first sentence array.vb
- The second sentence array.
public int countMatch(String regex)
public int countNotMatch(String regex)
regex
- The indicated regular expression.
public static int countIntersectLinks(Sentence sa, Sentence sb)
Cordeiro, J.P., Dias, G.Cleuziou G. (2007). Biology Based Alignments of Paraphrases for Sentence Compression. In Proceedings of the Workshop on Textual Entailment and Paraphrasing (ACL-PASCAL / ACL2007). Prague, Czech Republic. [link].
sa
- The first sentence.sb
- The second sentence.
public static double countNormIntersectLinks(Sentence sa, Sentence sb)
sa
- The first sentencesb
- The second sentence
public static double dsBLEU(Sentence sa, Sentence sb)
sa
- The first sentence.sb
- The second sentence.
public static double dsNgram(Sentence sa, Sentence sb)
sa
- The first sentence.sb
- The second sentence.
public static double dsNgram(int N, Sentence sa, Sentence sb)
N
- The maximum number of n-grams.sa
- The first sentence.sb
- The second sentence.
public static double dsuffixArrays(Sentence sa, Sentence sb)
sa
- The first sentence.sb
- The second sentence.
public double dsumo(Sentence other)
Cordeiro, J.P., Dias, G. Brazdil, P. (2007). Learning Paraphrases from WNS Corpora. 20th International FLAIRS Conference. AAAI Press. Key West, Florida, USA. [link]
other
- The other sentence to compare with.
public static double dsumo(int[] u, int[] v)
Cordeiro, J.P., Dias, G. Brazdil, P. (2007). Learning Paraphrases from WNS Corpora. 20th International FLAIRS Conference. AAAI Press. Key West, Florida, USA. [link]
u
- The first array of codes.v
- The second array of codes.
public double dsumoWSize(Sentence other)
sumo
function
for calculating sentence similarity between two sentences. The main
difference consists in counting differently the lexical exclusive
links between the two sentences. The "weight" of each link directly
depends from the connected word sizes.
other
- The other sentence to compare with.
public double dsEntropy(Sentence other)
Cordeiro, J.P., Dias, G. Cleuziou G. Brazdil P. (2007). New Functions for Unsupervised Asymmetrical Paraphrase Detection. In Journal of Software. Volume:2, Issue:4, Page(s): 12-23. Academy Publisher. Finland. ISSN: 1796-217X. October 2007. [link]
Date: 2007-06-18
other
- The other sentence to compare with.
public double dgauss(Sentence other)
Cordeiro, J.P., Dias, G. Cleuziou G. Brazdil P. (2007). New Functions for Unsupervised Asymmetrical Paraphrase Detection. In Journal of Software. Volume:2, Issue:4, Page(s): 12-23. Academy Publisher. Finland. ISSN: 1796-217X. October 2007. [link]
other
- The other sentence.
public double dgauss(Sentence other, double p0, double r0, double sp0, double sr0)
Cordeiro, J.P., Dias, G. Cleuziou G. Brazdil P. (2007). New Functions for Unsupervised Asymmetrical Paraphrase Detection. In Journal of Software. Volume:2, Issue:4, Page(s): 12-23. Academy Publisher. Finland. ISSN: 1796-217X. October 2007. [link]
other
- The other sentence.p0
- The expected precision of sentences token match.r0
- The expected recall of sentences token match.sp0
- The expected precision variancesr0
- The expected recall variance.
public double dParabolic(Sentence other)
Cordeiro, J.P., Dias, G. Cleuziou G. Brazdil P. (2007). New Functions for Unsupervised Asymmetrical Paraphrase Detection. In Journal of Software. Volume:2, Issue:4, Page(s): 12-23. Academy Publisher. Finland. ISSN: 1796-217X. October 2007. [link]
other
- The other sentence.
public double dLinear(Sentence other)
other
- The other sentence
public double dSin(Sentence other)
Cordeiro, J.P., Dias, G. Cleuziou G. Brazdil P. (2007). New Functions for Unsupervised Asymmetrical Paraphrase Detection. In Journal of Software. Volume:2, Issue:4, Page(s): 12-23. Academy Publisher. Finland. ISSN: 1796-217X. October 2007. [link]
other
-
public static void ensureCodification(Sentence... sentences)
CorpusIndex
object). If not, the set of sentences will
be marked with a new and specific word indexer, constructed
only from the set of sentences receive as parameter.
sentences
- The setpublic double distlex(Word w)
w
- The input word.
public double fracNumWords()
public void setMetric(String smetric)
smetric
- Contains the name of the similarity function.
The possible values are: ngram, xgram, bleu, edit, entropy,or sumo.The defined metric codes.
public double similarity(Sentence other)
other
- The other sentence.
The defined metric codes.
public double similarity(Sentence other, String metric)
other
- The other sentence.metric
- The name of the similarity function.
public void print(int a, int b)
a
- The fist position.b
- The second position.public void print()
public void println()
public void toLowerCase()
CorpusIndex
codes to -1. Thus, any lexical
codification will be eliminated.
public void toLowerCase(CorpusIndex dic)
dic
- The dictionary.public String toString()
toString
in class AbstractCollection<Word>
public String toStringPOS()
public String toMWUString()
public Sentence mutation(int n)
n
- The maximum and likely number of mutations.
public static void x201102012359()
public static void x201102281055()
public static void testaMetricas(String s1, String s2)
public static void demoForWebPage()
public static void main(String[] args)
args
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |