|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectlv.gotika.engine.GothicAnalyzer
public class GothicAnalyzer
Analyzer for Latvian historical texts (originally written in Gothic ortography). Supports the following steps of analysis:
| Nested Class Summary | |
|---|---|
static class |
GothicAnalyzer.DictView
A set of flags that indicate different views of the in-memory dictionary. |
static class |
GothicAnalyzer.ResultTag
A set of tags that indicate different steps of analysis process. |
static class |
GothicAnalyzer.ResultView
A set of flags that indicate different views (output streams) of the analysis results. |
| Field Summary | |
|---|---|
Boolean |
DUPLICATES
Do include references to all the sources per word while loading the in-memory dictionary? |
Boolean |
SYNONYMS
Do extend analysis results with synonyms? |
Boolean |
SOUNDEX
Do apply fuzzy search? |
| Constructor Summary | |
|---|---|
GothicAnalyzer(String conf,
boolean duplicates)
Loads all the dictionaries and the morphological analyzer. |
|
GothicAnalyzer(String conf,
boolean duplicates,
boolean soundex,
boolean syn)
Loads all the dictionaries and the morphological analyzer. |
|
| Method Summary | |
|---|---|
Properties |
analyzeText(Reader in)
Processes a whole text: analyzes it word by word (context is not taken into account) and prints out results in one ore more output streams (in different formats), depending on the configuration of the analyzer: A stream for indexing purposes. |
Pair<Boolean,ArrayList<String>> |
analyzeWord(String word)
Analyzes an individual word form (taken directly from a text). |
TreeMap<String,TreeSet<String>> |
extractSynonyms(Pair<Boolean,ArrayList<String>> results)
Extracts synonyms from the result set returned by analyzeWord(String) and assigns them with the corresponding
lemmas. |
ArrayList<Pair<String,String>> |
getBySoundex(String p)
Searches for words in the in-memory dictionary that match with the given soundex pattern. |
ArrayList<Pair<String,String>> |
getSynonyms(String w)
Searches for synonyms for the given word. |
boolean |
isOn(String view)
Checks whether an output stream according to the specified result view is turned on. |
boolean |
isOnAny()
Checks whether at least one output stream is turned on. |
ArrayList<String> |
lemmatize(String word,
boolean guess)
Finds potential lemmas for the given word form using the SemTi-Kamols morphological analyzer (http://www.semti-kamols.lv/). |
void |
printDictionary(String view,
String file)
Prints the in-memory dictionary into a file. |
ArrayList<Pair<String,String>> |
searchDirectly(String w)
Searches for the given word in the in-memory dictionary as is. |
ArrayList<Pair<String,String>> |
searchFuzzy(String w)
Searches for the given word in the in-memory dictionary in a fuzzy manner (a soundex pattern is created first). |
String |
soundex(String word)
Creates a soundex pattern for the given word. |
String |
transliterate(String word)
Transliterates the given word form from the Gothic to the contemporary ortography, as far as it can be done unambiguously. |
boolean |
turnOff(String view)
Stop writing to an output stream according to the specified result view. |
void |
turnOffAll()
Stop writing to all output streams. |
boolean |
turnOn(String view,
Writer out)
Sets an output stream according to the specified result view (format). |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public final Boolean DUPLICATES
false only the first reference will
be kept.
public Boolean SOUNDEX
public Boolean SYNONYMS
| Constructor Detail |
|---|
public GothicAnalyzer(String conf,
boolean duplicates)
throws Exception
conf - configuration file containing references to the dictionaries.duplicates - if true references to all sources per word will
be stored; otherwise only the first one.
Exception - error while reading some dictionary or unsuccessful
initialization of the morphological analyzer.
public GothicAnalyzer(String conf,
boolean duplicates,
boolean soundex,
boolean syn)
throws Exception
conf - configuration file containing references to the dictionaries.duplicates - if true references to all sources per word will
be stored; otherwise only the first one.soundex - do apply fuzzy search?syn - do extend results with synonyms?
Exception - error while reading some dictionary or unsuccessful
initialization of the morphological analyzer.| Method Detail |
|---|
public boolean turnOn(String view,
Writer out)
throws IllegalArgumentException
view - a flag that indicates the view.out - output stream.
true, if the requested output is already set; output
stream is changed anyway.
IllegalArgumentException - invalid flag or output stream is null.GothicAnalyzer.ResultView
public boolean turnOff(String view)
throws IllegalArgumentException
view - a flag that indicates the view.
false, if such an output has not been turned on.
IllegalArgumentException - invalid flag.GothicAnalyzer.ResultViewpublic void turnOffAll()
public boolean isOn(String view)
throws IllegalArgumentException
view - a flag that indicates the view.
true, if on; false otherwise.
IllegalArgumentException - invalid flag.GothicAnalyzer.ResultViewpublic boolean isOnAny()
true, if there is some output stream set.
public Pair<Boolean,ArrayList<String>> analyzeWord(String word)
throws Exception
word - a word form.
Boolean value indicates whether any approved
lemma has been found for the given word form;ResultTag.DELIMITER and a tag indicating the step of
analysis is assigned to each of them.Exception - unsuccessful initialization of the morphological analyzer.GothicAnalyzer.ResultTag
public Properties analyzeText(Reader in)
throws Exception
in - a text stream.
null, if only transliteration was
performed. Keys: recognized, unknown ,
total.
Exception - unsuccessful initialization of the morphological analyzer or
could not access some of the I/O streams, or no output stream
is set.analyzeWord(String),
turnOn(String, Writer)
public ArrayList<String> lemmatize(String word,
boolean guess)
throws Exception
word - a word form.guess - if true and the word is not defined in the
morphological lexicon, lemmas are guessed (suggestions should
be verified in a dictionary).
Exceptionpublic String soundex(String word)
word - a word.
public String transliterate(String word)
word - a word form.
public TreeMap<String,TreeSet<String>> extractSynonyms(Pair<Boolean,ArrayList<String>> results)
analyzeWord(String) and assigns them with the corresponding
lemmas.
results - a list of analysis results, formatted according to
analyzeWord(String).
analyzeWord(String)
public void printDictionary(String view,
String file)
throws IOException
view - a flag that indicates the view of interest.file - destination filename.
IOException - error while printing to the file.
IllegalArgumentException - invalid flag.GothicAnalyzer.DictViewpublic ArrayList<Pair<String,String>> searchDirectly(String w)
w - a word.
public ArrayList<Pair<String,String>> searchFuzzy(String w)
w - a word.
public ArrayList<Pair<String,String>> getBySoundex(String p)
p - a pattern.
soundex(String)public ArrayList<Pair<String,String>> getSynonyms(String w)
w - a word.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||