Defines a filter to be applied to the text, preventing certain
exotic or uninteresting strings to be rejected, as for example
lines with less than 5 characters, or sentences with less than
three words.
Processes a string and setup a bunch of state
variables, like number of characters, number of
upper and lower characters, probability to
be a "good text", etc.