Recogniform - Optical Recognition Technologies



Recogniform Workgroup Reader

Click HERE to download Recogniform Reader brochure!

Over 200 million forms processed in 2006!  

1. Description

5. Output Station

2. Input Station

6. Control Server

3. Recognition Station

7. Customization by scripts

4. Correction Station

8. Statistics


1. Description

Recogniform Workgroup Reader is the innovative solution for the automatic data capture (optical reading) from paper forms.

The manual data input often represents the bottle-neck in the processes of forms elaboration and documents filing.

Using Recogniform Workgroup Reader you will immediately and simultaneously get:

  • Lower costs;
  • Higher speed;
  • Quality 100% guaranteed.

Compared to manual data-entry, our data-capture system allows you to use a reduced number of human resources, always operating at the maximum hardware speed, without performance decreasing and errors caused by personnel's tiredness and boring.

Using the most advanced technologies, Recogniform Workgroup Reader is able to achieve results that other products can't get and won't probably ever reach.

Recogniform Workgroup Reader recognizes and reads:

  • Alphabetical, numerical and alphanumeric fields, hand-written in cursive: CHR
  • Alphabetical, numerical and alphanumeric data, hand-written in uppercase, lowercase, constrained and unconstrained capital letters: ICR
  • Typed or printed texts with any font type and size: OCR
  • Standard codelines with numbers and symbols: OCR-A and OCR-B
  • Any type of barcode: BCR
  • Check boxes: OMR

Moreover it allows user to:

  • Cut and extrapolate whole pages or part of them, creating digital format images that can be easily filed: IDE
  • Enable the manual data-entry: AMK

The recognition of manuscript text in cursive or in capital unconstrained letters, that seemed impossible, it's now reality!

Recogniform Workgroup Reader is the natural evolution of Recogniform Desktop Reader, whose main features are obviously inluded. The main difference is the opportunity that Workgroup gives you to operate with a distributed architecture instead of a single station.

While with the Desktop version all steps (input, recognition, verification, correction and output)take place on the same station, with Workgroup you can split this steps on many stations you wish, according to your resources and your needs.

The Workgroup solution is made up of the following applications:

  • Recogniform Control Server
  • Recogniform Input Station
  • Recogniform Recognition Station
  • Recogniform Correction Station
  • Recogniform Output Station

Workgroup's configuration made depending on specific requirements. For example, if the correction phase requires more resources, it's possible to arrange multiple correction stations; at the same time, if you have more scanners, it would be convenient to arrange several input stations.

Our licensing system gives you the opportunity to use any number you wish of input, correction and output stations without paying any additional charges.

The communication between stations and server is TCP/IP based. In this way the Workgroup is suitable and ready for internet use, allowing you to place the stations in different locations. One of the many applications is the opportunity to have many operators working at home. At the same time, You could also acquire the input forms with no need to move them phisically from the place they are kept.

 

lettura ottica con architettura distribuita
 

2. Input Station

Recogniform Input Station is the application of Workgroup solution that allows to carry out the optical reading of documents acquired by various ways: scanner, fax, file, internet.

scannerAll scanners equipped with TWAIN drivers are supported. More than one hundred scanner models (SCSI, proprietary, Xionics and Kofax), equipped with ISIS drivers, are directly supported, too. The ADF (automatic documents feeder), front/rear capability, drop-ink lamps and images processing cards for dynamic thresholding are not a problem! This means that you can freely choose the scanner most suitable to your needs: from the simplest and cheapest one (able to scan only few sheets a minute), to the most sophisticated one, characterized by high productivity and able to scan hundreds of sheets a minute.

Recogniform Input Station also allows you to process images previously acquired and stored on disk as files in uncompressed TIFF format or in any compressed format (CCITT G4, CCITT G3, Huffman, Packbits, Jpeg), in single page or multi-pages, as well as in JPEG (JFIF), BMP, PNG and, in the newest version, in PDF format, too.

An interesting feature now available is the possibility to monitor a directory, in which files previously saved from other applications are automatically imported, and make the process start as a pre-determined number of forms is stored.

The last improved version allows you to use not only monochromatic images but grayscale and color images, too: the system can drop the colored form by software, as well as performed by hardware scanning using the proper drop-ink lamp.

Fax input is directly supported too, without appealing to external products. Enabling the automatic fax mode reception and using a common fax-modem, Recogniform Input Station, set in stand-alone mode, receives, standardizes and makes available for the optical reading every incoming fax, in standard or fine quality.

routerThe support of Internet is guaranteed by the possibility of automatically download from any server the images to process. In fact, using FTP protocol with ID and password, the remote files can be quickly and safely downloaded. This means that working with TCP/IP protocol the collection and the scanning place can be different from the place where the optical reading is effected, therefore the realization of geographically distributed applications is easy and naturally supported.

3. Recognition Station

Recogniform Recognition Station automatically ricognizes the batches ready for recognition. The power of Recogniform Recognition Station is the use of sophisticated reading engines that can be used and combined together according to user's requirements.

  • CHR - Cursive Handwritten Recognition
    It allows user to read manuscript data in cursive written (natural writing not in capital letters): what was only imaginary a few years ago, now it's real. This feature is essential for reading forms that haven't been expressly designed for the automatic acquisition and contain unconstrained fields, freely written, without any tie.
  • ICR - Intelligent Character Recognition
    With ICR system it's possible to recognize manuscript data in unconstrained or constrained mode when there is usually space among characters. The engine has been expressly trained on the European and American writing style with an high accuracy.
  • OCR - Optical Character Recognition
    It's the recognition technology for printed and typed texts. It's omnifont and it can recognize characters having any font style and size.
  • OCR-A/B - Optical Character Recognition font OCRA and OCRB
    This engine works on pre-printed OCR-A or OCR-B codelines of postal and banking documents.
  • OMR - Optical Mark Recognition
    The OMR technology allows user to read the check boxes, that is the sign affixed in predefined spaces. The power of this engine, compared to similar products, is the advantage of working with two operative parameters, evaluating the quantity of existing ink and the size of the sign in the box.
  • BCR - Bar Code Recognition
    The BCR Technology allows the recognition of bar codes, decoding their content. It recognizes all the standard bar codes, enclosed the pharmaceutical ones.
  • AMK - Assisted Manual Keying
    The AMR allows you to perform the assisted manual data-entry for data you can't or you don't want to read automatically.
  • IDE - Image Data Export
    Through the IDE system it's possible to export page zones or whole pages in image format. It's very useful when you need to keep portions of forms containing signatures, sketches, drafts or other information, without modifying anything of their characteristics or structure.

All this engines work on all Windows platforms and don't need any additional board to your HW.

An adequate preventive images elaboration increases the quality of the recognition process and reduces the files sizes, so we introduced a lot of functions covering every specific requirement.

  • Deskew
    Correction of the slope that sometimes is caused by the scanner when it feed with ADF.
  • Black Border Removal
    Removal of the possible black edges due to the excessive sizes of the scanning area.
  • Form Alignment and Removal
         -   Alignment of the form to compensate the horizontal and vertical shift;      -   Removal of the pre-printed elements when the form is not printed with blind-ink.
  • Lines Removal
    Removal of horizontal and vertical lines rebuilding the characters crossed by them.
  • Despeckle
    Removal of black isolated points and spots.
  • Field Box Removal
    Removal of the box delimiting a single field, if it is not printed with blind-ink.
  • Character Box Removal
    Removal of the box delimiting the single characters of constrained fields, if it is not printed with blind-ink.
  • Color Inversion
    Inversion of colours in case of white characters on dark background.

These cleaning functions can be used on the whole image or locally, defining the specific areas to be processed.

4. Correction Station

Using a new validation and correction system, the major part of alphabetic fields can be verified and it's possible to solve automatically all doubts generated by ambiguous recognitions.

This system uses statistical information relative to all the possible combinations of three consecutive characters in the target language, analyzing some thousand of words: these combinations are called trigrams.

Trigrams are enclosed and ready to be used: in this way all the trigrams combinations, the allowed one and the forbidden one, as well as their using frequency, are fully and clearly known. For example, if a recognition engine reads the word "SMITK", the system will correct this word changing it into "SMITH", without operator's intervention and with no risk to give a wrong result. In fact, using a new technology called CREP® (Common Recognition Errors Proofing) and with the help of an Artificial Intelligence system, Recogniform Correction Station is able to investigate all proofing combinations, selecting the most probable one. All this process takes place in full autonomy.

During the definition phase of the application, it's possible to get the manuscript characters or those coming from codelines (all of them or only the suspect ones, whose confidence level is under a specific parametric threshold) to be presented to an operator for a visual verification. Thanks to the Recogniform Technologies exclusive system called Eye Blow Verification®, the operator is able to verify thousands of characters a minute with minimum effort and maximum accuracy.

This system is based on the principle that, inside a group of similar elements, the eye is automatically attracted from the different ones.

For example, if in a group of people, everybody suits a blue t-shirt except a person who has got a red t-shirt, the last one would be fast noticed without a detailed examination of the whole group!

According to these considerations, Recogniform Correction Station shows in the same context all the characters recognized and classified in the same manner, so the operator can immediately look and notice the presence of intruders generated by wrong classification.

Characters verification and correction

 

In this way it's possible to find and correct all the possible errors of recognition introduced by: not eliminable dirtiness, defects of scanning, ambiguity, etc. Also in case of ambiguous cases, Recogniform Correction Station instantly allows you to solve the problem, automatically recalling on the video the image of the whole field.

After the characters verification, the phase of fields correction starts, visualizing in the same context both data and the relative image, so they can be compared directly.

Fields correction

 

To save time in comparison operations, we have added an option that allows user to listen read data from the computer, using the text-to-speech technology: in this way, instead of reading data two times, the operator can read only the data on the image, listening from the computer the read data.

It's possible to decide which fields need correction: all; only those whose level of esteemed accuracy is under a certain threshold; nobody. It's also possible to set for each field a different confidence level.

Before being stored, the read data can be formatted in several ways, allowing the output to adapt to any necessity: conversion to uppercase or lowercase, empty spaces deletion, characters substitution, addition of suffix or prefix, etc.

In addition to database formats and .txt files, you can also use compressed dictionary .dct for the look-up functions. These files are characterized by high speed of access, small memory consuming and don't need settings of alias, etc.

To create a .dct file, there is a specific utility that receives in input an ASCII file (a word for each row). Look-up speed is very high: hundreds of thousands of comparisons a seconds. Compression is around millions of words / Mb.

The following .dct dictionary are available:

  • Worldwide Nations;
  • Italian regions;
  • Italian towns;
  • Italian urban centers;
  • Italian Postal codes (CAP);
  • Italian Phone Codes;
  • Italian street addresses;
  • Italian Provinces;
  • Italian Surnames;
  • Italian Female names (common);
  • Italian Male names (common);
  • Italian Female names (uncommon);
  • Italian Male names (uncommon);
  • American Female names;
  • American Male names;
  • English Female names;
  • English Male names;
  • Spanish Female names;
  • Spanish Male names;
  • French Female names;
  • French Male names;
  • German Female names;
  • German Male names;
  • Names' dictionary available on request: African, Arab, Armenian, Australian, Catalunyan, Brasilian, Bretons, Jewish, Finnish, Japanese, Greek, Hawaiian, Indians, Indians-Americans, Irish, Mexican, Nigerian, Norwegian, Dutch, Persian, Polish, Provencial, Russian, Swedish, Turkish, Vietnamese, Yiddish and many more...

Moreover you can easily create your personal dictionary.

5. Output Station

Recogniform Output Station allows a flexible use of the recognized data thanks to the possibility of exporting them to the applications in different ways.

It's possible to export data in three different ways:

  • Data output to database
  • Data output to files
  • Data output to applications.

Practically, all these databases are easily interfaced:

  • Oracle
  • Interbase
  • Sybase
  • Microsoft SQL Server
  • DB2
  • Paradox
  • dBase
  • FoxPro
  • Access
  • ODBC / ADO / dbExpress

The file formats supported for data output are:

  • ASCII text (.txt)
  • Fixed spaced text (.asc)
  • Comma Separed Value (.csv),
  • TabSepared Value (.csv),
  • Hypertest Markup Language (.html)
  • Microsoft Excel (.xls)
  • Symbolic Link Interchange Format (.slk/.sylk)
  • SQL Scripts (.sql)
  • Extensible Markup Language (.xml)
  • Microsoft Excel (.xls)
  • dBase (.dbf)
  • Paradox (.db)
  • Access (.mdb)
  • Portable Document Format (.pdf)

The file formats supported for image output are:

  • Tiff (.tif)
  • Bitmap (.bmp)
  • Paintbrush (.pcx)

In addition to the listed formats, it's now available a mixed format "data + images" that allows to get in output a single searchable .PDF file; in this file, recognized data are put on elaborated: the first ones are not visible, but it's possible to search it, cut it and pasted. This feature is useful to create an autonomous PDF file from homogeneously filled documents, keeping the original scanned images and the recognized data or data inserted by keying for indexing the content.

In the latest version there's the opportunity to overlay a pre-printed form to the PDF output, useful to re-build the original form if the scanning has been made removing blind-ink.

Recogniform Output Station can also directly dispatch data to other applications, using some script functions that emulate the operator's keyboard input.

6. Control Server

Recogniform Workgroup Reader allows user to work in a client/server configuration: as Recogniform Desktop reader, it uses a job-oriented approach, grouping documents in batches.

A batch can contain any number of documents. Each batch's process is structured as follows:

  • Images input (scanning or importing files)
  • Pre-processing (deskew, despeckle, form-removal, etc.)
  • Automatic reading (CHR, OCR, ICR, BCR, OMR, etc.)
  • Suspicious characters verification (Eye Blow Verification®)
  • Fields correction (comparison between data and image)
  • Post-processing (data transformation and normalization)
  • Data output (storing in files or in databases)

The Control Server tracks every batch's process progress, centralizing the storage and managing the workflow.

7. Customization by scripts

A specific software procedure, the Recogniform Application Designer, allows you to set all the necessary parameters to extrapolate data from the forms, using a visual interface: it's also possible to use a flexible scripting language to get new customized functions.

With this language you can associate a specific procedure written by user to each scheduled event, modifying the standard behaviour of the program or integrating new features.

It's very similar to Basic and to Pascal so it's very easy to learn and user-friendly.

This language includes in its syntax conditional expression (IF.), cycles (FOR., REPEAT. and WHILE.) and uses thousands of integrated functions.

The system can automatically set specific variables, allowing effective interaction between the application and the process of optical recognition. This feature gives you the possibility to implement new personal features like controls of squaring between fields, multiple readings of the same field with different pre-processing options, or output of personalized layouts.

With this system you can get a strongly customized application starting from a standardized product. The advantages are as simple as great: small costs and better reliability and flexibility.

8. Statistics

Recogniform Workgroup Reader allows user to record automatically all statistical data relative to each job in a log file, that can be centralized in case of multiple installations.

The information stored are:

  • Starting date and time
  • Workstation name
  • User name
  • Form type
  • Automatic reading time
  • Characters verification time
  • Fields correction time
  • Processed forms number
  • Number of correctly read fields
  • Number of suspects fields
  • Number of corrected fields
  • Number of correctly read characters
  • Number of suspects characters
  • Number of corrected character

 

All data are saved in CSV format, in order to be instantly importable in spreadsheet to get statistics and graphics. They show:

  • Number of processed forms for workstation
  • Number of processed forms for user
  • Number of processed forms per day
  • Number of processed forms per type
  • Working time for workstation
  • Working time for user
  • Working time for day
  • Working time for type
  • Distribution times of reading, verification and correction
  • Total read characters, suspects and corrected ones
  • Total fields read, suspects and corrected ones

Briefly, with this system it's possible to monitor users' productivity by a complete set of statistics in a way detailed and global at the same time.

Demo and Handbooks
Click here to see a demo of Desktop Reader module for data capture

Click here to see a demo of Application Designer module for template creation

Click here to read Desktop and Workgroup Reader user manual

Click here to read the Application Designer user manual

Price & Ordering Info
You can order on-line in our e-commerce store or
contact us.
If you need further information about this product, please use the
contacts page.


© Recogniform Technologies SpA - All rights reserved