|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecthultig.sumo.TxtFilter
public class TxtFilter
NOT YET WELL COMMENTED.
| Field Summary | |
|---|---|
int |
MINLEN
The string minimum length. |
int |
MINWORDS
The minimum number of words allowed, by line. |
byte |
NUMTAG
If greater than 0, activates number tagging, i.e every number will be replaced by the |
int |
numtokens
The number of tokens, found on the last string processed (method procWords/1). |
int |
numUpperWords
|
int[] |
NUMWORDS
Number of words with 1, 2, ..., 10 NUMWORDS[0] will contain the total number of words. |
byte |
OFF
A boolean false value defined. |
byte |
ON
A boolean true value defined. |
| Constructor Summary | |
|---|---|
TxtFilter()
Default constructor. |
|
| Method Summary | |
|---|---|
String |
filtering(String line)
Filtering an input string uppon a bunch of rules. |
double |
getPercUpperWords()
|
int[] |
getWordHistogram()
|
static boolean |
isLetter(byte c)
|
static void |
main(String[] args)
The Main class. |
double |
probBeText()
|
double |
probBeText(String s)
|
double |
probGoodText(String s)
Estimates the probability of s to be "good text". |
boolean |
process(String file)
A shortcut for the process/2 method. |
boolean |
process(String file,
String fout)
Process a file by applying a set of filtering rules. |
boolean |
procstr(String s)
Processes a string and setup a bunch of state variables, like number of characters, number of upper and lower characters, probability to be a "good text", etc. |
boolean |
procWords(String s)
|
boolean |
satisfySpecialRules(String sx)
Verifies whether a given string satisfies a number of text rules. |
boolean |
satisfyWordConstraints(double pwords,
int[] reqhistogram)
Test a set of constraints. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public byte ON
public byte OFF
public byte NUMTAG
public int MINLEN
public int MINWORDS
public int[] NUMWORDS
public int numtokens
public int numUpperWords
| Constructor Detail |
|---|
public TxtFilter()
| Method Detail |
|---|
public static boolean isLetter(byte c)
public boolean procstr(String s)
s - The string to be processed.
public boolean procWords(String s)
public double probBeText()
public double probBeText(String s)
public int[] getWordHistogram()
public double getPercUpperWords()
public boolean satisfyWordConstraints(double pwords,
int[] reqhistogram)
pwords - The minimum word percentage.reqhistogram - Requested word histogram satisfaction.
public double probGoodText(String s)
s - String
public boolean satisfySpecialRules(String sx)
sx - The input string or the string to be testes.
public boolean process(String file)
file - The file name to be processed.
public boolean process(String file,
String fout)
file - The input file name.fout - The output file name, if it is null the output will be produced to the standard output.
public String filtering(String line)
line - String
public static void main(String[] args)
args - The array with the input arguments
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||