Process Mining (not just) for Legacy Applications – this is how it’s done!

Quality matters: workflow data without workflow engine

Process Mining is dependent on the quality of the data used. If you’re using a workflow engine, consider yourself lucky. You can expect meaningful results. But which legacy application is equipped like that? Do you know any? Neither do we.

So, there’s only log data as an alternative which is mostly logged for other purposes – e.g., transaction logs, audit trails, message logs, etc. Almost none can be used „as is“. More often data is missing, it’s imprecise, irrelevant or utterly wrong. Purging and preprocessing is cumbersome and complex. Even if you have high quality data a deducted process can only be so much – a deduction, an interpretation.

And this is where we step in. Instead of settling for flawed data and potential processes we intend to extract data explicitly for Process Mining.

How does that work?

We are able to edit, delete and extend the source code of a running application. Without changing the essence of the application, we can enhance the application in a way that process relevant information as well as the usage of the application can be recorded. Almost as there was a modern workflow engine running.

What are the pros?

We don’t just record results but also the usage of the application itself. Therefore, it is possible to recreate the actual process in use rather than deducing interpretations. Instead of common log data we record data explicitly needed for Process Mining. No more, no less.

What are the challenges?

Apart from activity and timestamp, the case ID plays an important role as it enables us to identify a specific business case. As case ID business objects, such as customer ID, project ID, order ID etc. come into consideration. If a particular business object is suitable depends on the process to be observed and the data model in use. This means you need in-depth knowledge of those aspects.

How do you proceed?

Our idea is to automatically extract candidates for case IDs from both the source code and the data model. This is another aspect we pursue with our approach. And we’d love to talk about that some other time.