Defining an OCR Value Source
You can extract text or barcodes from a scanned document using optical character recognition (OCR) and use them as automatic property values for files imported from an external source, a scanner in this case. The OCR value source is a zone defined on a scanned page. For more information on defining different properties for objects imported from external file sources, see Defining Metadata for an External File Source.
- TIF
- TIFF
- JPG
- JPEG
- BMP
- PNG
The use of an OCR value source is only possible when using an external source. The OCR value source cannot be defined in M-Files Desktop.
Do the following steps to define an OCR value source:
To ensure that the defined zone is correctly positioned, in most cases the document to be scanned should be placed onto the scanner glass by hand rather than fed via an automatic sheet feeder.
In some cases, the OCR may give an incorrect recognition result of the text: for example, depending on the font type or size, the number 1 may be interpreted as the letter I. To ensure that the characters are added correctly to the document metadata, you can check the property values with event handlers and VBScript. You can then use VBScript to check, for example, that all added characters are numbers. For more information, see Event Handlers.
Supported Barcode Types
The M-Files OCR module supports the following barcode types:
- QR Code
- EAN-13
- EAN-8
- EAN-5
- EAN-2
- MSI Plessley
- MSI Pharma
- UPC-A
- UPC-E
- Codabar
- Interleaved 2 of 5
- Discrete 2 of 5
- Code 39
- Code 39 Extended
- Code 39 HIBC
- Code 93
- Code 128
- PDF 417
- Postnet
- Postnet 32
- Postnet 52
- Postnet 62
- Patchcode
- UCC-128
- UPCE Extended
- IATA 2 of 5
- Datalogic 2 of 5
- Reverse 2 of 5
- Code 39 (out-of-spec)
- Code 128 (out-of-spec)
- Codabar (out-of-spec)