Use an OCR Value Source

For files imported from an external source, you can define an automatically added property that uses the OCR value source. The OCR value source is a zone defined on the page. Using OCR, the zone gives a value for the selected property. In the Define Property dialog you can select Use an OCR value source. For more information about defining different properties, refer to Metadata.

The use of an OCR value source is only possible when using an external source. The OCR value source cannot be defined in M-Files Desktop.

Note: The M-Files OCR module is an M-Files add-on product available for extra fee. It can be activated with a license code. The old license code must be replaced by the license code that enables the use of OCR. For more information, refer to License Management. In order to enable OCR, you need also to download and install some additional files to your M-Files Server (for further information, contact our customer support). The OCR related functions will then be available in M-Files Admin and M-Files Desktop.

M-Files uses an OCR engine offered by I.R.I.S. M-Files OCR also offers barcode recognition. For the M-Files OCR module purchase inquiries, please contact our sales team at [email protected].

Defining an OCR value source

Start defining an OCR value source by adding a new property via the Metadata tab of the New Connection to External Source dialog, and by then selecting Use an OCR value source and Define...



The "OCR Value Source Definition" dialog.

Zone type

Specify whether the recognition is to be done via barcode or text.

Zone position

Define a zone in which certain characters are recognized as values of a defined property. The characters may include any letters, numbers or punctuation marks. For example, an invoice number shown on a page can be added as the Invoice number property value for the scanned document. This enables you to automate scanning and storing specified documents in M-Files with metadata that is always correct.

In most cases, to ensure that the defined zone is correctly positioned, the document to be scanned should be placed onto the scanner's glass plate by hand rather than fed via an automatic sheet feeder.

In some cases, the OCR may give an incorrect recognition result of the text: for example, depending on the font type or size, the number 1 may be interpreted as the letter I. To ensure that the characters are added correctly to the document metadata, you can check the property values with event handlers and VBScript. You can then use VBScript to check, for example, that all added characters are numbers. For more information, see Event Handlers.



An example of a zone definition.

Barcode recognition

M-Files recognizes most of the 1D barcodes in use and two types of 2D bar code: PDF417 and QR Code.

If there is only one barcode to recognize on the page, you can specify the whole page as a zone. If there are several barcodes, restrict the zone in a such a way that it contains the desired barcode only. With QR codes, you should specify a zone larger than the actual barcode.

If the specified zone has several barcodes, all of them are considered to be a property value.

If you are using an OCR supported license code that has been delivered before the version 9.0, please ask our customer service to provide you a new license code if you want to use barcode recognition.

Text recognition (OCR) guidance

Although the OCR automatically recognizes all Western languages and Cyrillic character sets, specifying a language selection often improves the quality of the text recognition results.

In ambiguous cases, a problematic recognition result may be resolved by a language-specific factor, such as recognition of the letter 'Ä' in Finnish. The list of secondary languages only includes languages that are allowed to be used together with the selected primary language.

The zone position determines the two corners (top left and bottom right) of the zone in relation to the origin of the coordinate system (the top left corner of the page). In the above example, the following values are used: left 144 mm, top 59 mm, right 170 mm, and bottom 68 mm.

Note: You can use the OCR value source without selecting the Use OCR to enable full-text search of scanned documents function in the Searchable PDF tab.