Difference between revisions of "Third-party software integration"

From OpenKM Documentation
Jump to: navigation, search
(Antivirus)
(Replaced content with 'In order o extend OpenKM functionalities, it can be integrated with some external software which improves the OpenKM user experience adding new features to the application. W…')
Line 1: Line 1:
{{TOCright}} __TOC__
+
In order o extend OpenKM functionalities, it can be integrated with some external software which improves the OpenKM user experience adding new features to the application. We are working to expand this list of applications, so stay tuned!
  
== Apache ==
+
* [[Apache]]
Expose OpenKM directly from JBoss can be dangerous if you need the application to be accessed from Internet. Also this 8080 may be closed by a firewall. For these reasons, is a good idea expose your OpenKM installation through the standard web port 80. In the following steps we explain how to configure Apache to handle these request and forward to JBoss application server using the AJP13 protocol.
+
* [[OCR]]
 +
* [[OpenOffice.org]]
 +
* [[Antivirus]]
  
From the Apache documentation: The AJP13 protocol is packet-oriented. A binary format was presumably chosen over the more readable plain text for reasons of performance. The web server communicates with the servlet container over TCP connections. To cut down on the expensive process of socket creation, the web server will attempt to maintain persistent TCP connections to the servlet container, and to reuse a connection for multiple request/response cycles.
+
[[Category:Installation Guide]]
 
 
The first thing in to install the required Apache software. From Debian / Ubuntu you can install Apache with a single command:
 
 
 
$ sudo aptitude install apache2
 
 
 
Edit the file called /etc/apache2/apache2.conf and configure a ServerName to prevent warnings in the Apache startup process:
 
 
 
<source lang="apache">
 
ServerRoot "/etc/apache2"
 
ServerName "your-domain.com"
 
</source>
 
 
 
Enable the proxy module, needed to forward petitions to JBoss:
 
 
 
$ sudo a2enmod proxy_ajp
 
 
 
Now create the configuration file /etc/apache2/sites-available/openkm.cfg with this content:
 
 
 
<source lang="apache">
 
<VirtualHost *>
 
    ServerName openkm.your-domain.com
 
    RedirectMatch ^/$ /OpenKM
 
    <Location /OpenKM>
 
        ProxyPass ajp://127.0.0.1:8009/OpenKM
 
        ProxyPassReverse http://openkm.your-domain.com/OpenKM
 
    </Location>
 
    CustomLog /var/log/apache2/openkm-access.log combined
 
</VirtualHost>
 
</source>
 
 
 
The VirtualHost ServerName must be other than ServerName in the main Apache configuration. Enable this site configuration:
 
 
 
$ sudo a2ensite openkm.cfg
 
 
 
You have to enable explicity the proxy access editing the Apache configuration file ''/etc/apache2/mods-available/proxy.conf'':
 
 
 
<source lang="apache">
 
<Proxy *>
 
    AddDefaultCharset off
 
    Order deny,allow
 
    Allow from all
 
    Deny from all
 
    #Allow from .example.com
 
</Proxy>
 
</source>
 
 
 
Finally restart Apache:
 
 
 
$ sudo /etc/init.d/apache2 restart
 
 
 
Now you can access your OpenKM installation from http://openkm.your-domain.com/. Another advantage of using Apache is that you can log OpenKM access and generate web statistics.
 
 
 
For more info, visit:
 
* http://httpd.apache.org/docs/2.2/mod/mod_proxy.html
 
* http://httpd.apache.org/docs/2.2/mod/mod_proxy_ajp.html
 
 
 
== OCR ==
 
Tesseract is an Open Source OCR engine adopted by Google. It works really well. The OCR natively can read TIFF documents and has hight ratio of recognition with images 300 dpi of resolution and converted to lineart (1 bit color).
 
 
 
You can download the source code from http://code.google.com/p/tesseract-ocr/ and compile yourself. Also download the language files you need and uncompress them in the same folder of the application.
 
 
 
If you are using a computer with Debian / Ubuntu, the installation simplifies a lot:
 
 
 
$ aptitude install tesseract-ocr
 
 
 
And
 
 
 
$ aptitude install tesseract-ocr-eng
 
 
 
If you want to add support for english language. Now you have to tell OpenKM to use this OCR application. Edit the file OpenKM.cfg:
 
 
 
$ vim OpenKM.cfg
 
 
 
And set the system.ocr property to the path of the tesseract executable:
 
 
 
<source lang="java">
 
system.ocr=/usr/local/bin/tesseract
 
</source>
 
 
 
For more info, go to http://code.google.com/p/tesseract-ocr/.
 
 
 
There is also another interesting free OCR application called OCRopus. It has many improvements over Tesseract but is on early development stage. Last released version (0.3.1) is quite usable and works very well but have to be compiled and actually is a difficult task. Visit http://code.google.com/p/ocropus/ for more info.
 
 
 
== OpenOffice.org ==
 
OpenKM can convert some document types to PDF. This is a great help if need to read an Microsoft Office / OpenOffice.org document and you don't have the software installed in the computer.
 
 
 
You need an OpenOffice.org installation in the OpenKM server, and also this OpenOffice.org application has to be running in server mode (also known as headless). In Debian / Ubuntu, depending of you OpenOffice.org version you will have to install an X11 virtual server or not:
 
 
 
$ apt-get install xvfb
 
 
 
And start it using this command:
 
 
 
$ xvfb-run /usr/lib/openoffice/program/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard
 
 
 
From OpenOffice.org 2.3, it is not necessary the X11 virtual server but you should install these packages:
 
 
 
$ aptitude install openoffice.org-headless openoffice.org-java openoffice.org
 
 
 
But before of this, you must enable a couple of repositories:
 
 
 
<source lang="text">
 
deb http://en.archive.ubuntu.com/ubuntu/ hardy-updates universe
 
deb http://en.archive.ubuntu.com/ubuntu/ hardy-updates multiverse
 
</source>
 
 
 
This script simplifies the start process (For security reasons, you should no start OpenOffice.org as root):
 
 
 
<source lang="bash">
 
#!/bin/sh
 
unset DISPLAY
 
/usr/lib/openoffice/program/soffice "-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo
 
-headless -nofirststartwizard
 
</source>
 
 
 
OpenOffice.org will listen at port 8100, so you can check that the application has started running this:
 
 
 
$ netstat -putan | grep 8100
 
 
 
Also you can configure OpenOffice.org as a service with this script:
 
 
 
<source lang="bash">
 
#!/bin/bash
 
# openoffice.org headless server script
 
#
 
# chkconfig: 2345 80 30
 
# description: headless openoffice server script
 
# processname: openoffice
 
#
 
# Author: Vic Vijayakumar
 
# Modified by Paco Avila and Federico Ch. Tomasczik
 
#
 
SOFFICE=/usr/bin/soffice
 
PIDFILE=/var/run/openoffice-server.pid
 
set -e
 
case "$1" in
 
    start)
 
        if [ -f $PIDFILE ]; then
 
            echo "OpenOffice headless server has already started."
 
            sleep 5
 
            exit
 
        fi
 
        echo "Starting OpenOffice headless server"
 
        $SOFFICE -headless -nologo -nofirststartwizard -accept="socket,host=127.0.0.1,port=8100;urp" & > /dev/null 2>&1
 
        touch $PIDFILE
 
        ;;
 
    stop)
 
        if [ -f $PIDFILE ]; then
 
            echo "Stopping OpenOffice headless server."
 
            killall -9 soffice && killall -9 soffice.bin
 
            rm -f $PIDFILE
 
            exit
 
        fi
 
        echo "Openoffice headless server is not running."
 
        exit
 
        ;;
 
    *)
 
        echo "Usage: $0 {start|stop}"
 
        exit 1
 
esac
 
exit 0
 
</source>
 
 
 
Change the permissions to this file:
 
 
 
$ chmod 0755 /etc/init.d/openoffice
 
 
 
Install openoffice init script links:
 
 
 
$ update-rc.d openoffice defaults
 
 
 
And this script will launch OpenOffice.org on every system reboot. Also you can launch it manually this way:
 
 
 
$ /etc/init.d/openoffice start
 
 
 
More info at:
 
* http://www.artofsolving.com/node/10
 
* http://www.oooforum.org/forum/viewtopic.phtml?t=11890
 
* http://code.google.com/p/openmeetings/wiki/OpenOfficeConverter
 
 
 
== Antivirus ==
 
OpenKM can check if a submitted document is infected. It works with an Open Source antivirus software called ClamAV. Edit OpenKM.cfg and add this line:
 
 
 
<source lang="java">
 
system.antivir=/path/to/clamscan
 
</source>
 
 
 
This screenshot shows an error message from OpenKM because the submitted document is infected by a virus:
 
 
 
<center>[[File:Okm 006.png]]</center>
 
 
 
To install ClamAV on Debian / Ubuntu distribution:
 
 
 
  $ sudo aptitude install clamav
 
 
 
To install ClamAV in Centos 5.2 you need more work. First create a file named ''/etc/yum.repos.d/dag.repo'' with this content:
 
 
 
<source lang="text">
 
[dag]
 
name=Dag RPM Repository for Red Hat Enterprise Linux
 
baseurl=http://apt.sw.be/redhat/el$releasever/en/$basearch/dag/
 
gpgcheck=1
 
gpgkey=http://dag.wieers.com/packages/RPM-GPG-KEY.dag.txt
 
enabled=1
 
</source>
 
 
 
Now install the program as root:
 
 
 
$ yum install clamd.i386
 
 
 
Start the daemon:
 
 
 
$ /etc/init.d/clamd start
 
 
 
And update the virus database:
 
 
 
$ freshclam
 

Revision as of 10:44, 25 January 2010

In order o extend OpenKM functionalities, it can be integrated with some external software which improves the OpenKM user experience adding new features to the application. We are working to expand this list of applications, so stay tuned!