|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecthultig.sumo.Word
public class Word
A class to represent and process a textual word.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
| Field Summary | |
|---|---|
ChunkTag |
CHTAG
|
int[] |
cods
Introduced later, in June 2008. |
long |
FREQ
|
static String |
RPUNCT
|
static long |
serialVersionUID
|
| Constructor Summary | |
|---|---|
Word()
Default constructor. |
|
Word(String word)
Create a new word from a given received String. |
|
Word(String word,
int syntcod)
Create a Word and mark it with a syntactic code. |
|
Word(String word,
String meta_item)
Create a new word from a given received String. |
|
Word(String word,
String[] meta)
Create a word, labeling it with an array of multi-tags. |
|
| Method Summary | |
|---|---|
char |
charAt(int k)
Access a word character at a given position. |
double |
connectProb(Word w)
Similar to costAlign but inverted and normalized in the [0, 1] interval. |
double |
costAlign(Word w)
Cost of aligning two words. |
double |
distcos(Word w)
Another lexical metric, based on the cosine. |
float |
distlex(String s)
Calls "distlex(word.toString(), s, 2f)". |
static float |
distlex(String sa,
String sb)
Calls "distlex(sa, sb, 2f)". |
static float |
distlex(String sa,
String sb,
float q)
Implements a metric that calculates the lexical distance between two words. |
float |
distlex(Word w)
Calls "distlex(word.toString(), w.toString(), 2f)" |
float |
distlex(Word w,
float q)
Calls "distlex(word.toString(), w.toString(), q)" |
static float |
distlexSuffix(String sa,
String sb)
Calls "distlexSuffix(sa, sb, 2f)". |
static float |
distlexSuffix(String sa,
String sb,
float q)
This method implements a similar metric as in "distlex". |
static double |
distSeqMax(String sa,
String sb)
A normalized Edit Distance which normalizes by taking the maximum common sequence between the two sentences (Presented at ACL 2007). |
double |
dnormEditDistance(Word w)
Computes a normalized Edit Distance of two words. |
static int |
editDistance(String s,
String t)
Computes Levenshtein Distance, also known as the Edit Distance |
int |
editDistance(Word w)
Calls the method "editDistance(this.toString(), w.toString())" |
int |
editProximity(Word w)
The Edit Distance complement. |
boolean |
equals(Word w)
Equality test for two words, this and the other one. |
int |
getChkCod()
Obtain the chunk code. |
int |
getLexCod()
Obtain this word lexical code. |
String |
getMetaValue(String metatag)
Return a given meta-tag value associated with this word. |
String |
getPOS()
Gives the POS tag of this word. |
String |
getPOS(int size)
Get the first @param size chars, from the POS label. |
String |
getPOS(POSType post)
|
int |
getPosCod()
Obtain the POS code. |
String |
getTag()
Get the POS tag of this word, if any is defined. |
boolean |
hasPOS()
Test whether this word is POS tagged or not. |
boolean |
isEmpty()
Test whether this word is undefined or not. |
boolean |
isNumWord()
Test if this is a number or a word. |
static boolean |
isPunct(char c)
Test if a given character is a punctuation mark. |
boolean |
isRPUNCT()
Test whether this is a punctuation mark. |
boolean |
isWord()
Test if whether this is really a word, and not for example a number or a punctuation mark, or any other token. |
int |
length()
Gives the word length. |
static void |
main(String[] args)
The main method tests this class by executing several experiments for a predefined set of word pairs. |
void |
posLabel(POSType post)
|
void |
set(String word)
Redefines this word based on the received string, which is assumed to contain just the alpha sequence representing a single word. |
void |
set(String word,
String meta_item)
Redefines this word based on the received string, which is assumed to contain just the alpha sequence representing a single word. |
void |
set(String word,
String[] meta)
|
void |
setChkCod(int chkcod)
Sets the chunk code of this word, meaning that this word is contained in a chunk (shallow parsing) with that code. |
void |
setLexCod(int lexcod)
Defines the word lexical code. |
void |
setMetaTag(String metatag,
String value)
|
void |
setPOS(char[] v)
Returns the POS tag of this word, to a valid POS tag. |
void |
setPOS(String tag)
Returns the POS tag of this word, to a valid POS tag. |
void |
setPosCod(int poscod)
Sets the POS tag code for this word. |
String |
toLowerCase()
Convert all characters from this word to lower case. |
String |
toString()
Override of the toString() method. |
String |
toString(boolean with_pos_tags)
A specific toString method. |
String |
toStringPOS()
A toString() type method giving the word string concatenated with its part-of-speech tag, if defined. |
String |
toStringPOS(POSType postype)
Similar to the toStringPOS() method, except that the part-of-speech representation is passed by parameter. |
static String |
words2StringPOS(Word[] words,
POSType post)
Transform an array of words into a single string, with each word concatenated with its POS tag. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final long serialVersionUID
public int[] cods
public ChunkTag CHTAG
public static String RPUNCT
public long FREQ
| Constructor Detail |
|---|
public Word()
public Word(String word)
set(String word)method.
word - The String containing the word.
public Word(String word,
String meta_item)
set(String word)method. The created word is also labeled with a meta-tag.
word - The String containing the word.meta_item - The meta-tag labeling the created word.
public Word(String word,
String[] meta)
word - The String containing the word.meta - The array of multi-tags.
public Word(String word,
int syntcod)
word - syntcod - | Method Detail |
|---|
public final void set(String word)
word - The received string.
public void set(String word,
String meta_item)
word - The received string.meta_item - The meta-tag associated with this word.
public char charAt(int k)
throws IndexOutOfBoundsException
k - The position to read.
IndexOutOfBoundsExceptionpublic void setLexCod(int lexcod)
lexcod - The code.public void setPosCod(int poscod)
poscod - The POS code.public void setChkCod(int chkcod)
chkcod - The chunk code.public int getLexCod()
public int getPosCod()
public int getChkCod()
public void set(String word,
String[] meta)
public void setMetaTag(String metatag,
String value)
public boolean hasPOS()
public String getPOS()
public String getPOS(POSType post)
public String getPOS(int size)
size - int
public void setPOS(char[] v)
v - char[]public void setPOS(String tag)
tag - Stringpublic String toLowerCase()
public boolean equals(Word w)
w - The other word.
public String toString()
toString in class Objectpublic String toString(boolean with_pos_tags)
with_pos_tags - The part-of-speech flag.
public void posLabel(POSType post)
public String toStringPOS(POSType postype)
postype - The POS representation.
public String toStringPOS()
public static String words2StringPOS(Word[] words,
POSType post)
words - The array of words.post - The POS representation.
public String getTag()
public String getMetaValue(String metatag)
metatag - The meta-tag (ex: "polarity")
public boolean isEmpty()
public static boolean isPunct(char c)
c - The character to be tested.
public boolean isWord()
public boolean isNumWord()
public boolean isRPUNCT()
public int length()
public static float distlex(String sa,
String sb,
float q)
sa - One word string.sb - The other word string.q - A formula parameter.
public static float distlexSuffix(String sa,
String sb,
float q)
sa - One word string.sb - The other word string.q - A formula parameter
public static float distlexSuffix(String sa,
String sb)
sb - The other word string.q - A formula parameter
public static float distlex(String sa,
String sb)
sb - The other word string.q - A formula parameter.
public float distlex(String s)
s - The other word string.
public float distlex(Word w,
float q)
w - The other word.q - A formula parameter
public float distlex(Word w)
w - The other word.
public double distcos(Word w)
w - The other word.
public double dnormEditDistance(Word w)
w - The other word to compare to.
public static double distSeqMax(String sa,
String sb)
sa - One string.sb - The other string.
public int editProximity(Word w)
size(max(wa,wb)) - editDistance(wa, wb)
w - The other word to compare to.
public int editDistance(Word w)
w - The other word.
public static int editDistance(String s,
String t)
s - One string.t - The other string.
public double costAlign(Word w)
costAlign: Word x Word |-------> [0, +00[
w - Word
public double connectProb(Word w)
w - Word
public static void main(String[] args)
args - String[]
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||