Serialized Form


Package hultig.io

Class hultig.io.FileIN extends File implements Serializable

Serialized Fields

br

BufferedReader br

encode

String encode

Class hultig.io.FileNewsCluster extends File implements Serializable

Serialized Fields

dictionary

CorpusIndex dictionary
The corpus index reference for this class.


VCLUSTERS

ArrayList<E> VCLUSTERS
The list of news clusters loaded.

Class hultig.io.FileOUT extends File implements Serializable

Serialized Fields

writer

PrintWriter writer

encode

String encode

base

String base
File base name - example: for file "fxyz.dat", "fxyz" would be its base name, and ".dat" its extension token.


ext

String ext

Class hultig.io.FileX extends File implements Serializable

Serialized Fields

base

String base
File base name - example: for file "fxyz.dat", "fxyz" would be its base name, and ".dat" its extension token.


ext

String ext

Package hultig.sumo

Class hultig.sumo.ChunkedSentence extends Sentence implements Serializable

Serialized Fields

CHUNK_VALUE

double[] CHUNK_VALUE
To store a numerical value for each chunk type. It was conceived to compute sentence proximity based on the proximity of their chunks. The idea is to differently weight different chunk types, for example giving more value to NP and VP chunks.


vcmark

ChunkMark[] vcmark
The array of chunk marks defining the sentence chunk boundaries and types.

Class hultig.sumo.CorpusIndex extends Object implements Serializable

Serialized Fields

sdict

TreeMap<K,V> sdict
A corpora index with the words/tokens being the keys. Given a word we can obtain its numeric index.


idict

TreeMap<K,V> idict
A corpora index with the numeric index being the key. Given a numeric index we can get the corresponding word.


hstab

Hashtable<K,V> hstab
An hash table for counting word frequencies in corpora.


TRUNCV

int TRUNCV
Size of word truncation. If this value is greater than zero, the corpora read tokens will be truncated, they are stored with TRUNCV maximum length.


ENCODE

String ENCODE
The text encoding string used to read the text corpora, for example UTF-8, or ISO-8859-1.

Class hultig.sumo.HNgram extends Hashtable<String,Integer> implements Serializable

Serialized Fields

N

int N
The n-gram dimensionality: 2-gram, 3-gram, ... The default is a 2-gram, also mentioned as a bigram.


soma

long soma
The sum of frequencies - the number of processed tokens.


hsubngram

Hashtable<K,V> hsubngram
The n-gram table, holding the frequency of each n-gram in the processed corpora.

Class hultig.sumo.NewsCluster extends Vector<Text> implements Serializable

Class hultig.sumo.NewsClusterList extends ArrayList<NewsCluster> implements Serializable

Serialized Fields

dictionary

CorpusIndex dictionary

Class hultig.sumo.POSType extends ArrayList<String[]> implements Serializable

Serialized Fields

ran

Random ran

Class hultig.sumo.RuleList extends ArrayList<Rule> implements Serializable

Serialized Fields

MODE

int MODE
Holds the sorting criteria.


STYPE

RuleList.SortType STYPE

Class hultig.sumo.Sentence extends LinkedList<Word> implements Serializable

Serialized Fields

stx

String stx
Internal string representation of this sentence.


label

String label
This label defines a sentence meta-tag.


metric

hultig.sumo.Sentence.Metric metric

cod

int cod
A sentence index, used in news clustering.

Since:
2008-06-05

Class hultig.sumo.Text extends LinkedList<Sentence> implements Serializable

Serialized Fields

CINDEX

CorpusIndex CINDEX
The corpus index used for this text.


VOCAB

HashMap<K,V> VOCAB
Dynamically stores the vocabulary of this text.


NUMTOKENS

int NUMTOKENS
The total number of tokens in this text.

Class hultig.sumo.Word extends Object implements Serializable

serialVersionUID: -5223039887894735826L

Serialized Fields

word

String word

META

Vector<E> META

cods

int[] cods
Introduced later, in June 2008. The idea is to use several codes representing different kind of tags, lexical, syntactical, among possibly others. So far the first three positions are used to store respectively the lexical, POS, and chunker codes.


POS

char[] POS
Holds the part-of-speech tag of this word. Introduced on 2007/11/11, but now obsolete due to the
cods
array, added later on this class.


CHTAG

ChunkTag CHTAG

FREQ

long FREQ

Class hultig.sumo.XBubble extends Object implements Serializable

serialVersionUID: -5798479126800064641L

Serialized Fields

WL

Word[] WL

WR

Word[] WR

WX

Word[][] WX

POST

POSType POST

rand

Random rand

Class hultig.sumo.XBubbleList extends ArrayList<XBubble> implements Serializable

serialVersionUID: -2118567303945736768L

Serialized Fields

FORMAT

int FORMAT

postype

POSType postype

Package hultig.util

Class hultig.util.HashStr extends Hashtable<String,Integer> implements Serializable

Serialized Fields

M

int M

ENCODE

String ENCODE