hultig.sumo
Class NewsCluster
java.lang.Object
java.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.Vector<Text>
hultig.sumo.NewsCluster
- All Implemented Interfaces:
- Serializable, Cloneable, Iterable<Text>, Collection<Text>, List<Text>, RandomAccess
public class NewsCluster
- extends Vector<Text>
NOT YET WELL COMMENTED.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
- See Also:
- Serialized Form
Methods inherited from class java.util.Vector |
add, add, addAll, addAll, addElement, capacity, clear, clone, contains, containsAll, copyInto, elementAt, elements, ensureCapacity, equals, firstElement, get, hashCode, indexOf, indexOf, insertElementAt, isEmpty, lastElement, lastIndexOf, lastIndexOf, remove, remove, removeAll, removeAllElements, removeElement, removeElementAt, removeRange, retainAll, set, setElementAt, setSize, size, subList, toArray, toArray, toString, trimToSize |
NewsCluster
public NewsCluster()
numSentences
public int numSentences()
- This
- Returns:
codify
public void codify(CorpusIndex dict)
getSentences
public Sentence[] getSentences()
getCleanSentences
public Sentence[] getCleanSentences()
- Clean the whole set of sentences in the news cluster.
- Returns:
- The set of cleaned sentences.
cleanSentence
public static Sentence cleanSentence(Sentence stc)
- Clean a sentence from extra and meta simbols, like HTML/XML
tags.
- Parameters:
s
- The input sentence.
- Returns:
- The cleaned sentence, may be the same if no dirt exist, or null
if the complete sentence is a nonsense sequence of simbols.
startCluster
public static boolean startCluster(String s)
endCluster
public static boolean endCluster(String s)
startNew
public static boolean startNew(String s)
endNew
public static boolean endNew(String s)