For files imported from an external source, you can define an automatically added property that uses the OCR value source. The OCR value source is a zone defined on the page. Using OCR, the zone gives a value for the selected property. In the Define Property dialog you can select Use an OCR value source. For more information about defining different properties, refer to Metadata.
The use of an OCR value source is only possible when using an external source. The OCR value source cannot be defined in M-Files Desktop.
M-Files uses an OCR engine offered by I.R.I.S. M-Files OCR also offers barcode recognition. For the M-Files OCR module purchase inquiries, please contact our sales team at [email protected].
Start defining an OCR value source by adding a new property via the Metadata tab of the New Connection to External Source dialog, and by then selecting Use an OCR value source and Define...
The "OCR Value Source Definition" dialog.
Specify whether the recognition is to be done via barcode or text.
Define a zone in which certain characters are recognized as values of a defined property. The characters may include any letters, numbers or punctuation marks. For example, an invoice number shown on a page can be added as the Invoice number property value for the scanned document. This enables you to automate scanning and storing specified documents in M-Files with metadata that is always correct.
In most cases, to ensure that the defined zone is correctly positioned, the document to be scanned should be placed onto the scanner's glass plate by hand rather than fed via an automatic sheet feeder.
In some cases, the OCR may give an incorrect recognition result of the text: for example, depending on the font type or size, the number 1 may be interpreted as the letter I. To ensure that the characters are added correctly to the document metadata, you can check the property values with event handlers and VBScript. You can then use VBScript to check, for example, that all added characters are numbers. For more information, see Event Handlers.
An example of a zone definition.
M-Files recognizes most of the 1D barcodes in use and two types of 2D bar code: PDF417 and QR Code.
If there is only one barcode to recognize on the page, you can specify the whole page as a zone. If there are several barcodes, restrict the zone in a such a way that it contains the desired barcode only. With QR codes, you should specify a zone larger than the actual barcode.
If the specified zone has several barcodes, all of them are considered to be a property value.
If you are using an OCR supported license code that has been delivered before the version 9.0, please ask our customer service to provide you a new license code if you want to use barcode recognition.
Although the OCR automatically recognizes all Western languages and Cyrillic character sets, specifying a language selection often improves the quality of the text recognition results.
In ambiguous cases, a problematic recognition result may be resolved by a language-specific factor, such as recognition of the letter 'Ä' in Finnish. The list of secondary languages only includes languages that are allowed to be used together with the selected primary language.
The zone position determines the two corners (top left and bottom right) of the zone in relation to the origin of the coordinate system (the top left corner of the page). In the above example, the following values are used: left 144 mm, top 59 mm, right 170 mm, and bottom 68 mm.