Upload two sentence-aligned text files. Files should have the same names, but the extension should be two-letter language codes (e.g. test.en and test.sr). These files are later fed into GIZA++.
Upload a verticalized list of English terminology. Line format: term|extractor
Upload a verticalized list of Serbian terminology. Line format: term|frequency
StringL and StringS: loose and strict string matching; Token: matches sets of normalised tokens.
Run GIZA++ on aligned sentences.
Discard Chunks from Previous Step that are Certainly not from the Desired Domain (by Inspecting English Dictionary i.e. List of English MWUs). This is very "rough" Bag-of-Words Based Elimination.
Perform spaCy lemmatization on English Chunks and Unitex Lemmatization on Serbian Chunks.
After Performing Fine Elimination (by Doing Intersection With English Dictionary), Obtain Other Results.