|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objecthultig.sumo.TxtFilter
public class TxtFilter
NOT YET WELL COMMENTED.
Field Summary | |
---|---|
int |
MINLEN
The string minimum length. |
int |
MINWORDS
The minimum number of words allowed, by line. |
byte |
NUMTAG
If greater than 0, activates number tagging, i.e every number will be replaced by the |
int |
numtokens
The number of tokens, found on the last string processed (method procWords/1). |
int |
numUpperWords
|
int[] |
NUMWORDS
Number of words with 1, 2, ..., 10 NUMWORDS[0] will contain the total number of words. |
byte |
OFF
A boolean false value defined. |
byte |
ON
A boolean true value defined. |
Constructor Summary | |
---|---|
TxtFilter()
Default constructor. |
Method Summary | |
---|---|
String |
filtering(String line)
Filtering an input string uppon a bunch of rules. |
double |
getPercUpperWords()
|
int[] |
getWordHistogram()
|
static boolean |
isLetter(byte c)
|
static void |
main(String[] args)
The Main class. |
double |
probBeText()
|
double |
probBeText(String s)
|
double |
probGoodText(String s)
Estimates the probability of s to be "good text". |
boolean |
process(String file)
A shortcut for the process/2 method. |
boolean |
process(String file,
String fout)
Process a file by applying a set of filtering rules. |
boolean |
procstr(String s)
Processes a string and setup a bunch of state variables, like number of characters, number of upper and lower characters, probability to be a "good text", etc. |
boolean |
procWords(String s)
|
boolean |
satisfySpecialRules(String sx)
Verifies whether a given string satisfies a number of text rules. |
boolean |
satisfyWordConstraints(double pwords,
int[] reqhistogram)
Test a set of constraints. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public byte ON
public byte OFF
public byte NUMTAG
public int MINLEN
public int MINWORDS
public int[] NUMWORDS
public int numtokens
public int numUpperWords
Constructor Detail |
---|
public TxtFilter()
Method Detail |
---|
public static boolean isLetter(byte c)
public boolean procstr(String s)
s
- The string to be processed.
public boolean procWords(String s)
public double probBeText()
public double probBeText(String s)
public int[] getWordHistogram()
public double getPercUpperWords()
public boolean satisfyWordConstraints(double pwords, int[] reqhistogram)
pwords
- The minimum word percentage.reqhistogram
- Requested word histogram satisfaction.
public double probGoodText(String s)
s
- String
public boolean satisfySpecialRules(String sx)
sx
- The input string or the string to be testes.
public boolean process(String file)
file
- The file name to be processed.
public boolean process(String file, String fout)
file
- The input file name.fout
- The output file name, if it is null the output will be produced to the standard output.
public String filtering(String line)
line
- String
public static void main(String[] args)
args
- The array with the input arguments
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |