Our research is about microservice trace anomaly detection. The following sections will summarize our research on the significance of trace anomaly detection, what trace data is, contributions, the proposed approach, experiments, summary and outlook.

Significance. In recent years, microservice architecture has been rapidly developed and widely used. Still, at the same time, it increases the difficulty of system operation and maintenance and the probability of failure. Because the trace data as a microservice monitoring index can better reflect the dependency relationship and response time among services, the microservice abnormality detection and root cause location based on the call chain can effectively guarantee service stability and improve service quality.

Trace. Trace is a record of HTTP and RPC calls between microservices. Each call in a trace is called an event or span, which is a good way to preserve the call relationship and execution path between microservices. Based on the feature that traces can well reflect the dependency relationship between services, the trace is chosen as an indicator to locate the root cause of failures better. Trace is composed of a series of spans similar to the tree structure, each trace will have a unique traceID, and each span in the trace will also have a unique spanID. Each span will record the initiator and the invoked, and each span will also record the duration, URL, method, status code and other field information of this call. Of course, each span (except the root call) will have a parent call as the initiator of this call, and the topology of this trace can be easily constructed by the correspondence between pid and id.

 Table 1. Example of Microservice Trace.

Contribution.

(1) Propose a method for generating trace embedded representation of node attributes based on BiLSTM. A unified modeling framework with effective integration of microservice response time and call paths is achieved by constructing attribute dependency graphs and self-attention mapping graphs and self-attention mapping graphs.

(2) Apply dualGCN to the microservice trace anomaly detection. Through the information propagation of the multi-layer dualGCN and the fusion mechanism based on mutual attention, the effective feature embedding representation performance of the trace is generated.

(3) Constructed Multi-point fault injection (MPFI) microservice trace dataset in the cloud environment, and the accuracy and robustness of the anomaly detection algorithm proposed in this paper are validated.

Method. The model is divided into three parts to facilitate understanding: graph construction module, information fusion module, and anomaly detection module.

The framework of the proposed BSDG.

(1) Graph construction module. Attribute dependency graph construction, obtain node representations of microservice traces and generate learnable node embedding representations by BiLSTM, and construct attribute dependency graphs of microservice traces based on node embedding representations and call dependencies between microservices. Self-attentive mapping graph construction, using the self-attentive mechanism to calculate the relevance of the node embedding representation, generate the self-attentive mapping matrix of the node embedding representation, and construct the self-attentive mapping graph based on the node embedding representation and the self-attentive mapping matrix.

(2) Information fusion module. This module uses a dual graph convolutional neural network and mutual attention neural network to achieve the fusion between the attribute-dependent features of the node-embedded representation in the attribute-dependent graph and the self-attention node-mapped features in the self-attention mapping graph to generate an effective feature-embedded representation of the microservice trace through a unified modeling framework.

(3) Anomaly detection module. This module embeds the above-generated features into a representation to achieve the detection of microservice anomalies through a multi-layer perceptron.

Experiment.

This section shows the experimental results of comparing our proposed BSDG with other methods.

Table 2. Overall performance results on TTFI. P: Precision, R: Recall, F1: F1- score. The best P, R and F1 scores are highlighted in bold.

Summary and outlook.

This paper proposes BSDG, a microservice trace anomaly detection method based on dualGCN. BiLSTM is used to generate node attribute representation, and a trace attribute dependency graph and self-attention mapping are constructed to unified modeling microservice response time and call path. Then, through the propagation of the multi-layer dualCNN and information fusion based on mutual attention, the effective feature embedding representation of microservice trace is generated, and then detect the anomalies are using multi-layer perceptron. In the following work, we will study microservice fault root cause location based on anomaly detection and realize Artificial Intelligence for Operations(AIOPS)of microservices by combining microservice invocation with anomaly detection and microservice fault location through timely detection of anomalies and convenient location of microservice fault root cause according to the propagation path and distribution of anomalies.

Video link: https://vimeo.com/773055661

Auhors

Kuanzhi Shi was born in 2000. He received the bachelor’s degree in engineering from the Nanjing University of Information Science and Technology in 2021. He is currently pursuing the M.S. degree with the Nanjing University of Aeronautics and Astronautics. His research interests include deep learning, data mining, and anomaly detection.

 

JING LI was born in 1976. She received the B.S. and M.S. degrees from the School of Computer Science and Engineering, Changchun University of Technology, in 1998 and 2001, respectively, and the Ph.D. degree in computer science and technology from Nanjing University, China, in 2004. She is currently an Associate Professor with the Nanjing University of Aeronautics and Astronautics, China.

Yuecan Liu is affiliated with STATE GRID INFORMATION & TELECOMMUNICATION BRANCH in Beijing, China. His research interests include cloud architecture design and digitization of information systems.

Yuzhu Chang is affiliated with STATE GRID INFORMATION & TELECOMMUNICATION BRANCH in Beijing, China. His research interests include cloud computing and cloud architecture design on information systems.

Xuyang Li is affiliated with STATE GRID INFORMATION & TELECOMMUNICATION BRANCH in Beijing, China. His research interests include enterprise data analysis and business management

Categories: General