Anomaly detection Engine for Linux Logs (ADE)
Overview
ADE detects anomalous time slices and messages in Linux logs (either RFC3164 or RFC5424 format) using statistical learning.
To predict anomalous behavior ADE processes the Linux logs to create a model of expected behavior and compares the expected behavior with the behavior of the time periods of interest. It does not require that either messages or time slices be labelled. ADE uses unsupervised statistical learning algorithms that depend on the behavior of enterprise IT solutions running on Linux being stable and predictable.
The ADE analysis results are written to files in XML format, which can be viewed using a web browser or used in other processing to support the enterprise Linux IT solution that is generating the logs.
For each time slice (interval), ADE measures how unusual the interval is by
- Calculating an anomaly score that describes how unusual the time slice is
- Determining the number of similar message strings within an interval
- Determining the number of new message strings within an interval
ADE creates a summary file with this information for all of the time slices (interval) within a day (period). Here is an example of the summary file for one day.
Example of ADE analysis for a day (period)
For each message, ADE determines if the message strings or similar strings are unusual by calculating a consolidated anomaly score based on
- Are the message strings issued as part of a pattern of message strings
- Are the message strings occurring when expected
- Have the message strings occurred more often in an interval than expected
- Did the message strings occur in large number of intervals
For each interval ADE creates a file with a detailed description of the time slice (interval) with this information. Here is an example of one the details provided about an interval
Example of ADE analysis for a time slice (interval) - finding an anomalous message
The statistical algorithms used by ADE to detect unexpected behavior requires that
- It generates message keys (message ids). Message ids are generated using Levenshtein distance to identify message strings which are similar. Message strings which are similar are assigned the same message key (message id).
- It chunks the continuous logs into time slices. These time slices are days (periods) and time slices within a period (interval)
Content
The ADE repository is made available under a GPL V3 license.
ADE repository contains
- The code to detect anomalies
- The maven controls statements needed to build the classes files and download the packages which are needed
- Installation test
- Example data
The ADE repository and maven control statements do not provide the JDBC compliant database which is needed to run ADE. The ADE code delivery has been tested using Apache Derby.
How to participate
ADE is a project supported by the Open MainFrame Project. To contribute to ADE requires a Corporate Contributor License Agreement. Use the following link to apply for a Corporate Contributor License Agreement which is needed to contribute to ADE Open MainFrame Project ADE project signup.
To report problems and for the status of problems reported please use GitHub issue support for the ADE repository.
Additional details on
Installing ADE
Tailoring ADE to your environment
Running ADE
How to run ADE in your environment
Verifying that the amount of data is sufficient
Generating a model of expected behavior
Analyzing the Linux Log to check for unusual behavior
Results
Using output from ADE to answer questions about the behavior of Linux systems
How the ADE output is organization - Directory Structure
Details description of the content of period summary file in index.xml
Details description of the content of an interval file in interval_nnn.xml
How ADE detects unusual behavior of Linux systems
Examples
Example of ADE analysis for a day
Example of ADE analysis for a time slice
Example of ADE analysis for another time slice