Third-party software integration: OCR Cuneiform

From OpenKM Documentation
Jump to: navigation, search

CuneiForm is an OCR tool. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of OCR engine was released under the open source BSD license license at the beginning of April 2008.

You can grab binaries from these sites:

If you are using a computer with Debian / Ubuntu, the installation simplifies a lot:

 $ aptitude install cuneiform

Compile from source code

You can download the source code from http://code.google.com/p/tesseract-ocr/ and compile yourself. Also download the language files you need and uncompress them in the same folder of the application.

$ aptitude install cmake g++ imagemagick libmagick++-dev
$ tar xjvf cuneiform-linux-1.0.0.tar.bz2
$ cd cuneiform-linux-1.0.0
$ mkdir builddir
$ cd builddir
$ cmake -DCMAKE_BUILD_TYPE=release ..
$ make install

Once installed, edit the file /etc/bash.bashrc and add at the end:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64

The Cuneiform executable will be located at:

/usr/local/bin/cuneiform