Copying the contents of a document is a conversion of its image (created by saving it in a graphic file of the scanned sheet of paper)
to the editable text, which can be saved in one of the popular text file formats (word processors).
For text recognition are applied state-of-the-art algorithms that fully use current computer technology.
At present, 66 languages are recognised.
MiceText will have no problem to correctly recognise a document even when a user chooses an option to recognise the document in each of the 66 languages, but you should analyse whether your business documents are processed in so many languages.
Therefore, it is recommended to limit the number of languages, e.g.: to English and German.
Personalisation of OCR settings is the most important factor to determine its efficiency.
A lot of factors affect the text recognition speed. The best results can be obtained by a suitable configuration of the application, depending on the quality and the complexity of the treated document.
The application recognises the document, with no need to correct it manually, if its content is inclined out of the vertical axis by about 2 degrees. If this value is exceeded, the user will have to make the correction with the use of the function Deskew.
The Filter Module is to add in a correct order a few defined functions affecting the text quality of the each loaded image.
H-Liner – is an advanced line search algorithm.
Dictionary recognition technique is a mechanism elevating the text recognition efficiency by comparing words with the dictionary. This practice will modify to a significant extent a sense of sentences by exchanging the words. It is unacceptable for the OCR program to change the word, e.g.: from sma11 into small while processing the text. The operation of programs that recognise words by a dictionary method is related to their inner statistics. They may affect the authenticity of the document processed to a greater or smaller extent. Therefore, no dictionary recognition mechanisms have been implemented in the program engine.