|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjava.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.AbstractSequentialList<E>
java.util.LinkedList<Sentence>
hultig.sumo.Text
public final class Text
A class to represent and manage text. It can represent a textual document or even a list of independent sentences, since internaly it is represented through a linked list of sentences.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
Field Summary |
---|
Fields inherited from class java.util.AbstractList |
---|
modCount |
Constructor Summary | |
---|---|
Text()
The default constructor. |
|
Text(String s)
Creates a new text from a given string. |
|
Text(String[] vs)
Creates a text from an array of strings. |
|
Text(String s,
OpenNLPKit onlpk)
Creates a new text from a given string. |
Method Summary | |
---|---|
boolean |
add(Sentence s)
Adds a sentence to this text, by inserting it at the end of the list (appending a sentence). |
boolean |
add(String s)
Add all the sentences contained in the readLn string to this file. |
boolean |
add(String stxt,
OpenNLPKit onlpk)
Add all the sentences contained in the readLn string to this file. |
void |
add(Text t)
Adds all the sentences contained in another Text object, to this text. |
void |
codify()
Codifies this text according to the corpus index referenced by CINDEX . |
void |
codify(CorpusIndex idx)
Codifies every word from this text uppon a given corpus index (CorpusIndex). |
void |
cutIfLessThan(int numwords)
Eliminate all sentences having less words than a given minimum number. |
int |
freq(String sw)
|
CorpusIndex |
getCorpusIndex()
Gives the reference of the corpus index stored in this object, an possibly used to codify the text. |
int |
getNumTokens()
|
Sentence |
getSentence(int index)
Gives the i-th sentence from this text. |
Sentence[] |
getSentences()
Gives an array with all the sentences from this text. |
String[] |
getVocab()
|
String |
getWord(int i,
int j)
Tries to return the string of the j-th word from the i-th sentence of this text. |
static void |
main(String[] argv)
The main method contains a general class tester. |
void |
print()
Outputs the text sentences, one sentence per line. |
void |
print(String sleft,
String sright,
boolean withIndex)
Outputs the text sentences, one sentence per line. |
void |
printVocabulary()
|
double |
prob(String sw)
|
void |
randomDrop(int n)
Eliminates randomly n sentences from this text. |
boolean |
readFile(String filename)
Add all the sentences contained in a given text file to this text object. |
void |
removeDuplicates()
Remove duplicate sentences from this text. |
boolean |
saveFile()
Save this text to a new file with the name equal to the current time stamp in the format: YYYYMTDDHHMMSS.txt, with YYYY, MT, DD, HH, MM, SS representing respectively the year, month, day, hour, minute, and second. |
boolean |
saveFile(String filename)
Saves the current text to a given file. |
boolean |
shuffle(Random r)
Shuffles randomly the sentences in this text. |
double |
similarity(Text othr)
Computes a lexical similarity between two texts, based on local evidence. |
static boolean |
testSimilarity()
|
void |
toLowerCase()
Turns every word from this file to lower case. |
String |
toString()
Gives a concatenation of the sentences from this text. |
String |
toString(String separator)
Gives a concatenation of the sentences from this text. |
Methods inherited from class java.util.LinkedList |
---|
add, addAll, addAll, addFirst, addLast, clear, clone, contains, descendingIterator, element, get, getFirst, getLast, indexOf, lastIndexOf, listIterator, offer, offerFirst, offerLast, peek, peekFirst, peekLast, poll, pollFirst, pollLast, pop, push, remove, remove, remove, removeFirst, removeFirstOccurrence, removeLast, removeLastOccurrence, set, size, toArray, toArray |
Methods inherited from class java.util.AbstractSequentialList |
---|
iterator |
Methods inherited from class java.util.AbstractList |
---|
equals, hashCode, listIterator, removeRange, subList |
Methods inherited from class java.util.AbstractCollection |
---|
containsAll, isEmpty, removeAll, retainAll |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.util.List |
---|
containsAll, equals, hashCode, isEmpty, iterator, listIterator, removeAll, retainAll, subList |
Methods inherited from interface java.util.Deque |
---|
iterator |
Constructor Detail |
---|
public Text()
public Text(String s)
s
- The text string.public Text(String[] vs)
Text(String s)
.
vs
- The array of strings.public Text(String s, OpenNLPKit onlpk)
s
- The text string.onlpk
- The OpenNLP Kit.Method Detail |
---|
public boolean add(String s)
s
- The text string.
public boolean add(String stxt, OpenNLPKit onlpk)
stxt
- The text string.onlpk
- The OpenNLP Kit.
public boolean add(Sentence s)
add
in interface Collection<Sentence>
add
in interface Deque<Sentence>
add
in interface List<Sentence>
add
in interface Queue<Sentence>
add
in class LinkedList<Sentence>
s
-
public void add(Text t)
t
- The other text object.public void cutIfLessThan(int numwords)
numwords
- The minimum number of words.public boolean readFile(String filename)
filename
- The file from which to read the sentences.public boolean saveFile()
public boolean saveFile(String filename)
filename
- The name of the saved file.
public void toLowerCase()
public void codify(CorpusIndex idx)
CINDEX
.
idx
- The corpus index.public double similarity(Text othr)
othr
- The other sentence.
[0, 1]
interval.public void codify()
CINDEX
.
public CorpusIndex getCorpusIndex()
public String getWord(int i, int j)
i
- The sentence index position.j
- The word index position in a given sentence.
public Sentence getSentence(int index)
index
- The sentence index in the text.
public Sentence[] getSentences()
public int getNumTokens()
public String[] getVocab()
public void removeDuplicates()
public void randomDrop(int n)
n
- The number of sentences to be eliminated.public boolean shuffle(Random r)
public void print()
public void print(String sleft, String sright, boolean withIndex)
sleft
- The left string context.sright
- The right string context.withIndex
- Print the sequential sentence index.public int freq(String sw)
public double prob(String sw)
public void printVocabulary()
public String toString()
toString
in class AbstractCollection<Sentence>
public String toString(String separator)
separator
- The separator connecting two sentences
public static void main(String[] argv)
argv
- One parameter may be indicated, containing the path to a file
to be processed.public static boolean testSimilarity()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |