|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objecthultig.sumo.Word
public class Word
A class to represent and process a textual word.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
Field Summary | |
---|---|
ChunkTag |
CHTAG
|
int[] |
cods
Introduced later, in June 2008. |
long |
FREQ
|
static String |
RPUNCT
|
static long |
serialVersionUID
|
Constructor Summary | |
---|---|
Word()
Default constructor. |
|
Word(String word)
Create a new word from a given received String. |
|
Word(String word,
int syntcod)
Create a Word and mark it with a syntactic code. |
|
Word(String word,
String meta_item)
Create a new word from a given received String. |
|
Word(String word,
String[] meta)
Create a word, labeling it with an array of multi-tags. |
Method Summary | |
---|---|
char |
charAt(int k)
Access a word character at a given position. |
double |
connectProb(Word w)
Similar to costAlign but inverted and normalized in the [0, 1] interval. |
double |
costAlign(Word w)
Cost of aligning two words. |
double |
distcos(Word w)
Another lexical metric, based on the cosine. |
float |
distlex(String s)
Calls "distlex(word.toString(), s, 2f)". |
static float |
distlex(String sa,
String sb)
Calls "distlex(sa, sb, 2f)". |
static float |
distlex(String sa,
String sb,
float q)
Implements a metric that calculates the lexical distance between two words. |
float |
distlex(Word w)
Calls "distlex(word.toString(), w.toString(), 2f)" |
float |
distlex(Word w,
float q)
Calls "distlex(word.toString(), w.toString(), q)" |
static float |
distlexSuffix(String sa,
String sb)
Calls "distlexSuffix(sa, sb, 2f)". |
static float |
distlexSuffix(String sa,
String sb,
float q)
This method implements a similar metric as in "distlex". |
static double |
distSeqMax(String sa,
String sb)
A normalized Edit Distance which normalizes by taking the maximum common sequence between the two sentences (Presented at ACL 2007). |
double |
dnormEditDistance(Word w)
Computes a normalized Edit Distance of two words. |
static int |
editDistance(String s,
String t)
Computes Levenshtein Distance, also known as the Edit Distance |
int |
editDistance(Word w)
Calls the method "editDistance(this.toString(), w.toString())" |
int |
editProximity(Word w)
The Edit Distance complement. |
boolean |
equals(Word w)
Equality test for two words, this and the other one. |
int |
getChkCod()
Obtain the chunk code. |
int |
getLexCod()
Obtain this word lexical code. |
String |
getMetaValue(String metatag)
Return a given meta-tag value associated with this word. |
String |
getPOS()
Gives the POS tag of this word. |
String |
getPOS(int size)
Get the first @param size chars, from the POS label. |
String |
getPOS(POSType post)
|
int |
getPosCod()
Obtain the POS code. |
String |
getTag()
Get the POS tag of this word, if any is defined. |
boolean |
hasPOS()
Test whether this word is POS tagged or not. |
boolean |
isEmpty()
Test whether this word is undefined or not. |
boolean |
isNumWord()
Test if this is a number or a word. |
static boolean |
isPunct(char c)
Test if a given character is a punctuation mark. |
boolean |
isRPUNCT()
Test whether this is a punctuation mark. |
boolean |
isWord()
Test if whether this is really a word, and not for example a number or a punctuation mark, or any other token. |
int |
length()
Gives the word length. |
static void |
main(String[] args)
The main method tests this class by executing several experiments for a predefined set of word pairs. |
void |
posLabel(POSType post)
|
void |
set(String word)
Redefines this word based on the received string, which is assumed to contain just the alpha sequence representing a single word. |
void |
set(String word,
String meta_item)
Redefines this word based on the received string, which is assumed to contain just the alpha sequence representing a single word. |
void |
set(String word,
String[] meta)
|
void |
setChkCod(int chkcod)
Sets the chunk code of this word, meaning that this word is contained in a chunk (shallow parsing) with that code. |
void |
setLexCod(int lexcod)
Defines the word lexical code. |
void |
setMetaTag(String metatag,
String value)
|
void |
setPOS(char[] v)
Returns the POS tag of this word, to a valid POS tag. |
void |
setPOS(String tag)
Returns the POS tag of this word, to a valid POS tag. |
void |
setPosCod(int poscod)
Sets the POS tag code for this word. |
String |
toLowerCase()
Convert all characters from this word to lower case. |
String |
toString()
Override of the toString() method. |
String |
toString(boolean with_pos_tags)
A specific toString method. |
String |
toStringPOS()
A toString() type method giving the word string concatenated with its part-of-speech tag, if defined. |
String |
toStringPOS(POSType postype)
Similar to the toStringPOS() method, except that the part-of-speech representation is passed by parameter. |
static String |
words2StringPOS(Word[] words,
POSType post)
Transform an array of words into a single string, with each word concatenated with its POS tag. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final long serialVersionUID
public int[] cods
public ChunkTag CHTAG
public static String RPUNCT
public long FREQ
Constructor Detail |
---|
public Word()
public Word(String word)
set(String word)method.
word
- The String containing the word.public Word(String word, String meta_item)
set(String word)method. The created word is also labeled with a meta-tag.
word
- The String containing the word.meta_item
- The meta-tag labeling the created word.public Word(String word, String[] meta)
word
- The String containing the word.meta
- The array of multi-tags.public Word(String word, int syntcod)
word
- syntcod
- Method Detail |
---|
public final void set(String word)
word
- The received string.public void set(String word, String meta_item)
word
- The received string.meta_item
- The meta-tag associated with this word.public char charAt(int k) throws IndexOutOfBoundsException
k
- The position to read.
IndexOutOfBoundsException
public void setLexCod(int lexcod)
lexcod
- The code.public void setPosCod(int poscod)
poscod
- The POS code.public void setChkCod(int chkcod)
chkcod
- The chunk code.public int getLexCod()
public int getPosCod()
public int getChkCod()
public void set(String word, String[] meta)
public void setMetaTag(String metatag, String value)
public boolean hasPOS()
public String getPOS()
public String getPOS(POSType post)
public String getPOS(int size)
size
- int
public void setPOS(char[] v)
v
- char[]public void setPOS(String tag)
tag
- Stringpublic String toLowerCase()
public boolean equals(Word w)
w
- The other word.
public String toString()
toString
in class Object
public String toString(boolean with_pos_tags)
with_pos_tags
- The part-of-speech flag.
public void posLabel(POSType post)
public String toStringPOS(POSType postype)
postype
- The POS representation.
public String toStringPOS()
public static String words2StringPOS(Word[] words, POSType post)
words
- The array of words.post
- The POS representation.
public String getTag()
public String getMetaValue(String metatag)
metatag
- The meta-tag (ex: "polarity")
public boolean isEmpty()
public static boolean isPunct(char c)
c
- The character to be tested.
public boolean isWord()
public boolean isNumWord()
public boolean isRPUNCT()
public int length()
public static float distlex(String sa, String sb, float q)
sa
- One word string.sb
- The other word string.q
- A formula parameter.
public static float distlexSuffix(String sa, String sb, float q)
sa
- One word string.sb
- The other word string.q
- A formula parameter
public static float distlexSuffix(String sa, String sb)
sb
- The other word string.q
- A formula parameter
public static float distlex(String sa, String sb)
sb
- The other word string.q
- A formula parameter.
public float distlex(String s)
s
- The other word string.
public float distlex(Word w, float q)
w
- The other word.q
- A formula parameter
public float distlex(Word w)
w
- The other word.
public double distcos(Word w)
w
- The other word.
public double dnormEditDistance(Word w)
w
- The other word to compare to.
public static double distSeqMax(String sa, String sb)
sa
- One string.sb
- The other string.
public int editProximity(Word w)
size(max(wa,wb)) - editDistance(wa, wb)
w
- The other word to compare to.
public int editDistance(Word w)
w
- The other word.
public static int editDistance(String s, String t)
s
- One string.t
- The other string.
public double costAlign(Word w)
costAlign: Word x Word |-------> [0, +00[
w
- Word
public double connectProb(Word w)
w
- Word
public static void main(String[] args)
args
- String[]
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |