[!IMPORTANT] As of 2024-03-15, this repo is now archived, for reference only. For Clearly Local users, please see:
- New demo site at https://cl-tools.deno.dev/pages/easylt-ocr/samples/ocr-sample-editor-poc.html (requires Clearly Local Toolkit account)
- New repo at https://github.com/clearlylocal/easylt-ocr (requires Clearly Local GitHub access)
OcrSdk classimport { OcrSdk } from 'path/to/OcrSdk.ts'
// credentials obtained from https://cloud.ocrsdk.com/Account/Register
const ocrSdk = new OcrSdk(
applicationId: Deno.env.get('ABBYY_APPLICATION_ID')!, // e.g. 7ea53f47-8bbc-477b-b17c-989a3184c363
password: Deno.env.get('ABBYY_PASSWORD')!, // e.g. n6WL0rCFlhU9bDXDri6AQEZV
serviceUrl: Deno.env.get('ABBYY_SERVICE_URL')!, // e.g. https://cloud-eu.ocrsdk.com/
)
const { txt } = await ocrSdk.ocr(await Deno.readFile('input.jpg'), {
languages: ['English'],
exportFormats: ['txt'],
})
await Deno.writeFile('output.txt', new Uint8Array(await txt.arrayBuffer()))
To run the CLI, note that the relevant ABBYY_APPLICATION_ID, ABBYY_PASSWORD, and ABBYY_SERVICE_URL must be available as environment variables.
# view help
deno task cli --help
# `convert` command, specifying output formats (default "txt")
deno task cli convert path/to/image.jpg -o txt -o xml
# `html`/`json` commands
deno task cli html path/to/image.jpg
deno task cli json path/to/image.jpg
# specify languages (default "English")
deno task cli json path/to/image.jpg -l ChinesePRC -l English
src/
core/
OcrSdk class, with various methods for interacting with the ABBYY Cloud OCR API. Loosely based on ABBYY’s sample JS code, but with the following changes:
ocr method to OCR an image and return the output file binary in the requested formatimageMap function, for converting XML output to an image map that can be rendered in HTML etc.prettifyXml function, for pretty-printing XML output while preserving significant whitespaceOcrSdkcli/
functions/
OcrSdk’s ocr method to get text and XML files of the OCRed contentimageMap.tssamples/
convertImage.ts on ocr-sample.jpgconvertImage.ts on ocr-sample.jpghtmlImageMap.ts on ocr-sample.jpgjsonImageMap.ts on ocr-sample.jpg