|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjava.io.File
hultig.io.FileNewsCluster
public class FileNewsCluster
This class was designed to handle a web news files, which are XML data files containing news stories extracted from the web. The news are stored in clusters of related stories. Therefore, the general structure of such a file is illustrated below:
<news-clusters> <cluster i="1" url="http://news.google.com/..."> <new i="1" url="..."> Wall Street stocks began the final week of one of their worst years... </new> ... </cluster> ... ... </news-clusters>Each cluster is sequentially identified and contains the URL of its source, as well as each new story.
Field Summary |
---|
Fields inherited from class java.io.File |
---|
pathSeparator, pathSeparatorChar, separator, separatorChar |
Constructor Summary | |
---|---|
FileNewsCluster(String fpath)
The default constructor. |
Method Summary | |
---|---|
static String |
cleanSentence(String s)
Cleans a sentence string from certain extra/meta symbols, like HTML/XML tags. |
CorpusIndex |
getDictionary()
Gives the reference to the corpus index used in this object. |
NewsCluster |
getNewsCluster(int index)
|
ArrayList<NewsCluster> |
getNewsClusters()
Gives the list of news clusters in this object. |
Sentence[] |
getNewsClusterSentences(int index)
Gives the set of sentences contained in the i -th news cluster,
from this object. |
int |
getNumClusters()
Gives the number of clusters of web news stories loaded. |
Sentence[] |
loadAllSentences()
|
boolean |
loadClusters()
Loads news clusters contained in a given file. |
static void |
main(String[] args)
Demonstrates the class main operators, including the load and manipulation of web news stories. |
boolean |
passfilter(String line)
Defines a filter to be applied to the text, preventing certain exotic or uninteresting strings to be rejected, as for example lines with less than 5 characters, or sentences with less than three words. |
boolean |
readCluster(BufferedReader br,
NewsCluster cluster)
Reads a given news cluster, from the current file reader ( BufferedReader ). |
Methods inherited from class java.io.File |
---|
canExecute, canRead, canWrite, compareTo, createNewFile, createTempFile, createTempFile, delete, deleteOnExit, equals, exists, getAbsoluteFile, getAbsolutePath, getCanonicalFile, getCanonicalPath, getFreeSpace, getName, getParent, getParentFile, getPath, getTotalSpace, getUsableSpace, hashCode, isAbsolute, isDirectory, isFile, isHidden, lastModified, length, list, list, listFiles, listFiles, listFiles, listRoots, mkdir, mkdirs, renameTo, setExecutable, setExecutable, setLastModified, setReadable, setReadable, setReadOnly, setWritable, setWritable, toString, toURI, toURL |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public FileNewsCluster(String fpath)
Method Detail |
---|
public ArrayList<NewsCluster> getNewsClusters()
public NewsCluster getNewsCluster(int index)
public Sentence[] getNewsClusterSentences(int index)
i
-th news cluster,
from this object.
index
- The i
-th news cluster.
public Sentence[] loadAllSentences()
public CorpusIndex getDictionary()
public int getNumClusters()
public boolean loadClusters()
VCLUSTERS
.
true
value if the loading process succeeds,
and false
otherwise.public boolean readCluster(BufferedReader br, NewsCluster cluster) throws Exception
BufferedReader
).
br
- The file reader from which the news cluster should be read.cluster
- An output parameter with the read news clusters.
true
value if the loading process succeeds,
and false
otherwise.
Exception
public boolean passfilter(String line)
line
- The string to be tested.
true
value if the input string
passes the test, false
otherwise.public static String cleanSentence(String s)
s
- The input sentence string.
public static void main(String[] args)
args
- No parameters are expected.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |