Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

Serialized Form

Package hultig.io

Class hultig.io.FileIN extends File implements Serializable

Serialized Fields

br

BufferedReader br

encode

String encode

Class hultig.io.FileNewsCluster extends File implements Serializable

Serialized Fields

dictionary

CorpusIndex dictionary

The corpus index reference for this class.

VCLUSTERS

ArrayList<E> VCLUSTERS

The list of news clusters loaded.

Class hultig.io.FileOUT extends File implements Serializable

Serialized Fields

writer

PrintWriter writer

encode

String encode

base

String base

File base name - example: for file "fxyz.dat", "fxyz" would be its base name, and ".dat" its extension token.

ext

String ext

Class hultig.io.FileX extends File implements Serializable

Serialized Fields

base

String base

File base name - example: for file "fxyz.dat", "fxyz" would be its base name, and ".dat" its extension token.

ext

String ext

Package hultig.sumo

Class hultig.sumo.ChunkedSentence extends Sentence implements Serializable

Serialized Fields

CHUNK_VALUE

double[] CHUNK_VALUE

To store a numerical value for each chunk type. It was conceived to compute sentence proximity based on the proximity of their chunks. The idea is to differently weight different chunk types, for example giving more value to NP and VP chunks.

vcmark

ChunkMark[] vcmark

The array of chunk marks defining the sentence chunk boundaries and types.

Class hultig.sumo.CorpusIndex extends Object implements Serializable

Serialized Fields

sdict

TreeMap<K,V> sdict

A corpora index with the words/tokens being the keys. Given a word we can obtain its numeric index.

idict

TreeMap<K,V> idict

A corpora index with the numeric index being the key. Given a numeric index we can get the corresponding word.

hstab

Hashtable<K,V> hstab

An hash table for counting word frequencies in corpora.

TRUNCV

int TRUNCV

Size of word truncation. If this value is greater than zero, the corpora read tokens will be truncated, they are stored with TRUNCV maximum length.

ENCODE

String ENCODE

The text encoding string used to read the text corpora, for example UTF-8, or ISO-8859-1.

Class hultig.sumo.HNgram extends Hashtable<String,Integer> implements Serializable

Serialized Fields

N

int N

The n-gram dimensionality: 2-gram, 3-gram, ... The default is a 2-gram, also mentioned as a bigram.

soma

long soma

The sum of frequencies - the number of processed tokens.

hsubngram

Hashtable<K,V> hsubngram

The n-gram table, holding the frequency of each n-gram in the processed corpora.

Class hultig.sumo.NewsCluster extends Vector<Text> implements Serializable

Class hultig.sumo.NewsClusterList extends ArrayList<NewsCluster> implements Serializable

Serialized Fields

dictionary

CorpusIndex dictionary

Class hultig.sumo.POSType extends ArrayList<String[]> implements Serializable

Serialized Fields

ran

Random ran

Class hultig.sumo.RuleList extends ArrayList<Rule> implements Serializable

Serialized Fields

MODE

int MODE

Holds the sorting criteria.

STYPE

RuleList.SortType STYPE

Class hultig.sumo.Sentence extends LinkedList<Word> implements Serializable

Serialized Fields

stx

String stx

Internal string representation of this sentence.

label

String label

This label defines a sentence meta-tag.

metric

hultig.sumo.Sentence.Metric metric

cod

int cod

A sentence index, used in news clustering.

Since:: 2008-06-05

Class hultig.sumo.Text extends LinkedList<Sentence> implements Serializable

Serialized Fields

CINDEX

CorpusIndex CINDEX

The corpus index used for this text.

VOCAB

HashMap<K,V> VOCAB

Dynamically stores the vocabulary of this text.

NUMTOKENS

int NUMTOKENS

The total number of tokens in this text.

Class hultig.sumo.Word extends Object implements Serializable

serialVersionUID: -5223039887894735826L

Serialized Fields

word

String word

cods

int[] cods

Introduced later, in June 2008. The idea is to use several codes representing different kind of tags, lexical, syntactical, among possibly others. So far the first three positions are used to store respectively the lexical, POS, and chunker codes.

POS

char[] POS

Holds the part-of-speech tag of this word. Introduced on 2007/11/11, but now obsolete due to the

cods

array, added later on this class.

CHTAG

ChunkTag CHTAG

FREQ

long FREQ

Class hultig.sumo.XBubble extends Object implements Serializable

serialVersionUID: -5798479126800064641L

Serialized Fields

WL

Word[] WL

WR

Word[] WR

WX

Word[][] WX

POST

POSType POST

rand

Random rand

Class hultig.sumo.XBubbleList extends ArrayList<XBubble> implements Serializable

serialVersionUID: -2118567303945736768L

Serialized Fields

FORMAT

int FORMAT

postype

POSType postype

Package hultig.util

Class hultig.util.HashStr extends Hashtable<String,Integer> implements Serializable

Serialized Fields

M

int M

ENCODE

String ENCODE

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

Serialized Form

br

encode

dictionary

VCLUSTERS

writer

encode

base

ext

base

ext

CHUNK_VALUE

vcmark

sdict

idict

hstab

TRUNCV

ENCODE

N

soma

hsubngram

dictionary

ran

MODE

STYPE

stx

label

metric

cod

CINDEX

VOCAB

NUMTOKENS

word

META

cods

POS

CHTAG

FREQ

WL

WR

WX

POST

rand

FORMAT

postype

M

ENCODE