|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjava.io.File
hultig.io.FileNewsCluster
public class FileNewsCluster
This class was designed to handle a web news files, which are XML data files containing news stories extracted from the web. The news are stored in clusters of related stories. Therefore, the general structure of such a file is illustrated below:
<news-clusters>
<cluster i="1" url="http://news.google.com/...">
<new i="1" url="...">
Wall Street stocks began the final week of one of their worst
years...
</new>
...
</cluster>
...
...
</news-clusters>
Each cluster is sequentially identified and contains the URL of its source,
as well as each new story.
| Field Summary |
|---|
| Fields inherited from class java.io.File |
|---|
pathSeparator, pathSeparatorChar, separator, separatorChar |
| Constructor Summary | |
|---|---|
FileNewsCluster(String fpath)
The default constructor. |
|
| Method Summary | |
|---|---|
static String |
cleanSentence(String s)
Cleans a sentence string from certain extra/meta symbols, like HTML/XML tags. |
CorpusIndex |
getDictionary()
Gives the reference to the corpus index used in this object. |
NewsCluster |
getNewsCluster(int index)
|
ArrayList<NewsCluster> |
getNewsClusters()
Gives the list of news clusters in this object. |
Sentence[] |
getNewsClusterSentences(int index)
Gives the set of sentences contained in the i-th news cluster,
from this object. |
int |
getNumClusters()
Gives the number of clusters of web news stories loaded. |
Sentence[] |
loadAllSentences()
|
boolean |
loadClusters()
Loads news clusters contained in a given file. |
static void |
main(String[] args)
Demonstrates the class main operators, including the load and manipulation of web news stories. |
boolean |
passfilter(String line)
Defines a filter to be applied to the text, preventing certain exotic or uninteresting strings to be rejected, as for example lines with less than 5 characters, or sentences with less than three words. |
boolean |
readCluster(BufferedReader br,
NewsCluster cluster)
Reads a given news cluster, from the current file reader ( BufferedReader). |
| Methods inherited from class java.io.File |
|---|
canExecute, canRead, canWrite, compareTo, createNewFile, createTempFile, createTempFile, delete, deleteOnExit, equals, exists, getAbsoluteFile, getAbsolutePath, getCanonicalFile, getCanonicalPath, getFreeSpace, getName, getParent, getParentFile, getPath, getTotalSpace, getUsableSpace, hashCode, isAbsolute, isDirectory, isFile, isHidden, lastModified, length, list, list, listFiles, listFiles, listFiles, listRoots, mkdir, mkdirs, renameTo, setExecutable, setExecutable, setLastModified, setReadable, setReadable, setReadOnly, setWritable, setWritable, toString, toURI, toURL |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public FileNewsCluster(String fpath)
| Method Detail |
|---|
public ArrayList<NewsCluster> getNewsClusters()
public NewsCluster getNewsCluster(int index)
public Sentence[] getNewsClusterSentences(int index)
i-th news cluster,
from this object.
index - The i-th news cluster.
public Sentence[] loadAllSentences()
public CorpusIndex getDictionary()
public int getNumClusters()
public boolean loadClusters()
VCLUSTERS.
true value if the loading process succeeds,
and false otherwise.
public boolean readCluster(BufferedReader br,
NewsCluster cluster)
throws Exception
BufferedReader).
br - The file reader from which the news cluster should be read.cluster - An output parameter with the read news clusters.
true value if the loading process succeeds,
and false otherwise.
Exceptionpublic boolean passfilter(String line)
line - The string to be tested.
true value if the input string
passes the test, false otherwise.public static String cleanSentence(String s)
s - The input sentence string.
public static void main(String[] args)
args - No parameters are expected.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||