Traditional process mining techniques take event data as input where each event is associated with exactly one object. An object represents the instantiation of a process.
Object-centric event data contain events associated with multiple objects expressing the interaction of multiple processes. See the figure below for an example of an object-centric event log and its underlying structure of directly-follows relationships induced by objects.
As traditional process mining techniques assume events associated with exactly one object, these techniques cannot be applied to object-centric event data.
To use traditional process mining techniques, object-centric event data are flattened by removing all object references but one. The flattening process is lossy, leading to inaccurate features extracted from flattened data. Furthermore, the graph-like structure of object-centric event data is lost when flattening. See the figure below for different ways of flattening. Please note, each method either leads to missing events, duplicated events, or wrong precedence constraints.
In this paper, we introduce a general framework for extracting and encoding features from object-centric event data. We calculate features natively on the object-centric event data, leading to accurate measures. See the figure below for an overview of object-centric features.
Furthermore, we provide three encodings for these features: tabular, sequential, and graph-based. While tabular and sequential encodings have been heavily used in process mining, the graph-based encoding is a new technique preserving the structure of the object-centric event data. See the figure below for an example of all three encoding techniques.
We provide six use cases: a visualization and a prediction use case for each of the three encodings. We use explainable AI in the prediction use cases to show the utility of both the object-centric features and the structure of the sequential and graph-based encoding for a predictive model.
We can use tabular encodings to represent the event log as a time series.
We can use sequential encodings to show the traditional control-flow variants of an event log.
We can use tabular (regression) and sequential (LSTM) encodings to predict the remaining time of a process executions and, subsequently, assess, which features and structures are important to the prediction using SHAP values.
We can use the graph encoding as an input for a graph neural network to predict the remaining time of a process execution. Using SHAP values, we can assess the importance of different edges of the graph in the prediction. This shows that the GNN leverages the graph structure for prediction.
Authors
Jan Niklas Adams is a Ph.D. Student at the Chair of Process and Data Science, RWTH Aachen. His research interests are within Process Mining and Data Science. He graduated from RWTH Aachen with a Master’s in Computer Science and a Master’s in Economics.
Gyunam Park is a Ph.D. student in the Department of Computer Science, RWTH Aachen University. His research interest includes process mining, data science, and machine learning.
Sergej Levich is a PhD candidate at the Chair of Information Systems of the University of Freiburg, Germany. His research interest lies in the application of Machine Learning to operational processes in organizations. He holds a bachelor’s degree in Business Administration from the University of St. Gallen and a master’s degree in Decision Sciences from the London School of Economics.
Daniel Schuster is a scientist at the Fraunhofer Institute for Applied Information Technology. In addition, he is a Ph.D. candidate at the Chair for Process and Data Science at RWTH Aachen University.
Prof.dr.ir. Wil van der Aalst is a full professor at RWTH Aachen University, leading the Process and Data Science (PADS) group. He is also the Chief Scientist at Celonis, part-time affiliated with the Fraunhofer FIT, and a member of the Board of Governors of Tilburg University. He also has unpaid professorship positions at Queensland University of Technology (since 2003) and the Technische Universiteit Eindhoven (TU/e). Currently, he is also a distinguished fellow of Fondazione Bruno Kessler (FBK) in Trento, deputy CEO of the Internet of Production (IoP) Cluster of Excellence, co-director of the RWTH Center for Artificial Intelligence.