Optical Character Recognition: OCR/ICR

Optical Character Recognition: OCR/ICR

Previous Top Next

The optical characters recognition process,both they printed (ocr) or handwritten (icr) is built on a sequence of operations: having a general vision of this process can help to better define the settings required by the application, getting better results.

The first process step is to find single characters in the field area to recognize. This operation is called segmentation and is built on different algorithms if the text to recognize is machine printed or handwritten, if characters are equally spaced or are touching,

Its clear that more a character is spaced from an other character, more the search of single character will be successful, also if using state of art algorithms our system is able to separate and to recognize also touching characters.

The second step is the extraction of features from each segmented character. There are several methods to extract this features, some based on statistichs procedures (pixel frequencies histograms) other on geometric procedures (curve and directions of lines). In every case the base idea is tha this feature should be common to each character of the same type, should be enough to identify it and shouild be resistent to noised and distorsion introduced by scanning process.

The thirdy step is the classification: the extracted features ara analyzed so that is possible to find the ascii value starting from character shape, using an a-priori knowlodgement based on prototypes of characters to be recognized.

Last step is the validation of classified characters. Using context analysis can be confermed or not multiple classification hypotesys of the same character. Using formatting masks, describing the allowed sequences of permissed characters, as well as the usage of vocabularies can allow to the validation phase to give a better result.