Written by Josep Llort on 4 January 2018
In this article, we will describe the migration between DocuWare and OpenKM. In a previous publication, I already described a migration process; in that case between KnowledgeTree and OpenKM.
DocuWare is a document management system, which among many other differences concerning OpenKM, does not have an open source version; that is, it is an application that is offered only under a commercial license. DocuWare Gmbh is the German manufacturer of this document management solution.
Next, I will briefly review the main features of DocuWare document management software and compare them with the enterprise content management solution offered by OpenKM.
First, I would highlight that DocuWare is a solution that is installed only in a Microsoft Windows operating system; that is to say, that for its architecture it is a native solution of the Microsoft ecosystem. Unlike the OpenKM document manager, whose architecture is based on JAVA, so it is a multiplatform environment.
In OpenKM, and I think we can extend it to the majority of manufacturers of document management solutions, records management, and enterprise content management; We advise our clients that whenever possible, Linux is used as an operating system in its various distributions. We recommend Ubuntu, Debian, Centos, and Red Hat. The reason is simple; With the same hardware, Linux will always provide us with higher performance than other operating systems, because the input (I / O) system is more efficient. Additionally, we will observe that a server with Windows has a natural tendency to consume a minimum of 2GB of memory, while in Linux the consumption of resources by the OS by default is much lower.
In short, if we are in a scenario where "performance" is a critical factor, we should always consider the possibility of installing Linux instead of Windows. With this, we are not denying the possibility of mounting high-performance environments under Windows. We simply want to show objective data; all things being equal, with Linux we will have better performance.
Another difference, as I said earlier, is the type of license. While the OpenKM management system has a dual license ("commercial" and "open source"), DocuWare is offered only under a single model ("commercial").
Regarding the databases, we can verify that both DocuWare document management solution and OpenKM, can be configured to work with Microsoft SQL Server, Oracle and MySQL Server. In the case of OpenKM, PostgreSQL can also be used. Concerning databases, we can see that both OpenKM and DocuWare or most content management solution manufacturers choose one of these four relational databases.
In general, we will find users who want to use the Microsoft SQL server database in mainly Windows Server environments; while PostgreSQL, Oracle, MySQL Server, and MariaDB database users we find in Linux-based environments.
Segment the repositories of documentary management in two broad groups: those like OpenKM, which use a taxonomy and those that do not use this paradigm. The word taxonomy has its origin in science, specifically in biology, where a mechanism is needed to hierarchize and systematize the groups of animals and plants. Taxonomy is a system that allows the classification or ordering in groups, of things that have common characteristics.
Document management applications such as OpenKM, Nuxeo or Alfresco, use the concept of taxonomy to classify information. In short, the taxonomy in popular language is a hierarchy of folders that allows us to sort and classify documents.
Other document management systems such as Documentum (now ECM2), OnBase, which also includes DocuWare, use cataloguing based on the concept of "boxes." In the case of DocuWare, the correct name would be the "Cabinets." A "Cabinet" would be a closet or a drawer in the real world, where a single type of document (a single documentary series) is stored. In the case of these document management systems, we will find that for each type of document they have a separate "box" ("Cabinet"); with the corresponding metadata for the documentary type assigned to each of the boxes.
Let's take an example: a company that stores invoices and contracts. In the case of a document management solution such as OpenKM, we could navigate through a directory tree to locate the information:
/ okm: root / Dept management / Contracts / 2019
/ okm: root / Dept management / Contracts / 2018
/ okm: root / Dept management / Invoices / 2019
/ okm: root / Dept management / Invoices / 2018
In the case of DocuWare, Documentum or Onbase, we must first select the type of document to access a list screen that allows us to refine the search. There is no navigation itself, but access to the documentation requires prior knowledge of the type of document that we want to access.
The first document management solutions, such as Documentum and OnBase, appeared under this paradigm; probably influenced by moving the real-world organizational structure to computer solutions without any changes. In the real world, the information is archived on the shelves and each shelf by the type of documentation. Inspired by this solution, the first applications offered the same conceptual format in a digital environment.
Subsequently, the most modern document management, content management, and enterprise content management applications have opted for a model based on classifying information within a taxonomy.
Both models, in my opinion, have advantages and some disadvantages. In the case of solutions based on taxonomies, beyond any benefit regarding user usability, the most significant point is to break with the isolation of the information that the Cabinets entail. That is to say; When we face not only a problem of document management but a problem of records management (case management or business case) where different types of documents have an interrelated life cycle, the approach to the solution based on "Cabinets" results becomes problematic.
There is a common scenario where the document manager is not a simple container of evidence of the company but has a workflow that controls the life cycle of the documents. At the same time, there are files in progress, where several documents are involved (business case). In this scenario, the solution based on "Cabinets," from my point of view is not the most optimal.
In general, companies increasingly go beyond having a simple container of documents and want to control information more actively. That is where both the Workflows and the File Plan (file plan and disposition) appear. The problem is no longer just storage, but rather control over the validity, changes, as well as the distribution and access to information when it is required. And all this actively integrated into the activity of the company (here we find typical cases of integrations with CRM and ERP, among others). Documentary management applications go from being a mere container to being an active part of the software ecosystem with which the company operates.
We can find more information in the DocuWare Knowledge Center.
In the case of DocuWare, the topic at hand; we have made several migrations to OpenKM. In particular, at the moment that I am writing this article, we have made migrations from both version 6.x, and version 7.x.
As usual, we have not found any information from the manufacturer to aid their clients in extracting their data from its repository. This point is repeated as in the case of the migrations we have made with other document management solutions. The client is imprisoned in the computer solution.
That's why I want to appeal again from here, as I did in the article about the migration from KnowledgeTree to OpenKM. From my point of view, in the processes of acquisition of a document management software, the future users of these tools concentrate all their efforts on the functionalities of the tools; The technology on which they are based, as well as the costs. Leaving aside the evaluation of a subject that I would consider key; how to get our data out of this system in the future.
The case of DocuWare is very similar, at least conceptually to OnBase. For each "Cabinet," that is, for each type of document, a set of specific tables are created, where the information will be stored separately. That is to say; If we have 50 types of documents, the application will generate a minimum of 50 tables, each of which will contain the information and metadata of each kind of document.
Something also significant in the case of DocuWare is how data is stored in the file system. There is a possibility that each type of document is stored in a different destination, that is, binary documents depending on the type can be stored in different units. This flexibility in the configuration can complicate life in the migration process because depending on the kind of document we must consider where the information is physically located on the disk.
Another curiosity of DocuWare is how files are stored in the file system. For example, a PDF file of 32 pages will be found separately in 32 files of one page each; when we upload a PDF file (depending on the configuration of Docuware), it will be processed and stored in the form of 32 documents with one page each. Regarding the way of storing documents, we have found variations depending on whether the version was 6.x or 7.x and also depending on the type of PDF document or multipage TIFF.
Another problem that arises is the way in which metadata is stored, especially those of date type. This is a classic problem, which we have found in practically all migration processes of other document management applications and that requires special attention, to capture the value of the date in the corresponding format, and then store it in OpenKM in a format ISO8601.
Finally, the last curiosity of DocuWare is that the disk repository contains a file for each document, with all the information in the document. That is to say; everything seems to indicate that the local DocuWare repository functions as a live export of the repository with its metadata. This implies that any change that we are going to make from the application in a document will have its effects both at the database level and at these local files. This has the advantage that in the repository we already have a backup (hot export); but at the same time, we are duplicating all the data (database and file system) with the corresponding hardware consumption that all this implies. This also means a complex logic on the application side, to regularly synchronize this data.
When the OpenKM R & D department performed the reverse engineering analysis to perform the migration of the data, we considered that it was much more convenient to deal with the migration process using a JAVA script, which worked in combination with the base of data and the file system, connecting it with OpenKM through the JAVA SDK. We discard input and process the text files that contain the DocuWare metadata structure, because the format in which this information is stored was not visible, unlike OpenKM. When we export the structure of metadata in files, we do it with files in JSON format, which significantly facilitates its subsequent processing.
From the R & D department, we also investigate the support of CMIS in DocuWare. Although we do not declare ourselves CMIS fans, it is an option to consider and at the time of writing this article is not available. We found some examples of connection through .NET with the DocuWare API, as we can see in this example DocuWare API. It is a path that we once discarded, partly because of the whole Microsoft development environment that needs to be assembled, but which we consider an option to be taken into account for all those who want to migrate their repository. Everything seems to indicate that the DocuWare Platform API package comprises a set of libraries (similar to what we have in OpenKM under the name of SDKs for JAVA, .NET and PHP) that allow accessing all the functionalities of the repository effortlessly, through the Webservices in REST.
In general, whenever possible and we have an API from which to attack a repository, it will be advisable to use it as a first approach to solving the problem.
Without a better-documented REST service; we were not able to locate documentation online. In OpenKM, like other manufacturers, we have opted for the option that the REST service documentation itself be available through Swagger.io; from the documentation itself as indicated in this section: swagger
I hope that this article can serve as an introduction for all those users wishing in the future to migrate their data from DocuWare. And that the experience that we try to share here will facilitate in some way the solution to the problem. The process of migration of the DocuWare repositories that we have completed have been very similar, but not the same, so we cannot provide a magical recipe that is useful in all cases. In this article, I have tried to share the "common" knowledge that I consider re-usable in all the scenarios in which, at least we, have met.
Some interesting URLs:
Open Document Management System S.L.
Europe Spain: Please call +34 605 074 544.
Monday - Friday: 09:00 am - 14:00 pm, 16:00 pm- 19:00 pm CEST for immediate assistance. Currently, it is Sunday 04:19 am in Palma de Mallorca, Spain.
Europe United Kingdom: Please call +44 774 330 6997.
Monday - Friday: 09:00 am - 12:00 pm, 14:00 - 18:00 pm CEST for immediate assistance. Currently, it is Sunday 03:19 am in Cranfield, Bedfordshire, United Kingdom.
USA: Please call +1 646 206 6071.
Monday - Friday: 08:00 am - 17:00 pm EST for immediate assistance. Currently, it is Saturday 22:19 pm in New York, USA.