hultig.sumo
Class NewsClusterList
java.lang.Object
java.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.ArrayList<NewsCluster>
hultig.sumo.NewsClusterList
- All Implemented Interfaces:
- Serializable, Cloneable, Iterable<NewsCluster>, Collection<NewsCluster>, List<NewsCluster>, RandomAccess
public class NewsClusterList
- extends ArrayList<NewsCluster>
- implements Serializable
NOT YET WELL COMMENTED.
University of Beira Interior (UBI)
Centre For Human Language Technology and Bioinformatics (HULTIG)
- See Also:
- Serialized Form
Methods inherited from class java.util.ArrayList |
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, remove, removeRange, set, size, toArray, toArray, trimToSize |
to
public static Toolkit to
NewsClusterList
public NewsClusterList()
NewsClusterList
public NewsClusterList(String filename)
loadClusters
public boolean loadClusters(String filename)
- Load all news groups from a given file, the one that is
defined with the infile attribute.
- Returns:
- boolean
readCluster
public boolean readCluster(BufferedReader br,
NewsCluster cluster)
throws Exception
- Read the next news cluster, from the current reader, given by @param br.
- Parameters:
br
- BufferedReadercluster
- NewsCluster
- Returns:
- boolean
- Throws:
Exception
passfilter
public boolean passfilter(String line)
- Define a filter to apply on the text.
- Parameters:
line
- String
- Returns:
- boolean
cleanSentence
public static String cleanSentence(String s)
- Clean a sentence from extra and meta simbols, like HTML/XML
tags.
- Parameters:
s
- The readLn sentence.
- Returns:
- The cleaned sentence, may be the same if no dirt exist, or null
if the complete sentence is a nonsense sequence of simbols.
JPC 2008/12/08
printAllSentences
public void printAllSentences(OpenNLPKit model)
gerar_pos_corpus
public static void gerar_pos_corpus(OpenNLPKit model,
String foutname)
main
public static void main(String[] args)
- MAIN - For testing.
- Parameters:
args
-