Anomaly detection Engine for Linux Logs (ADE)

Generating a model of 'normal' behavior - train

train is a bash script which invokes the correct Java class after setting up the environment needed. train creates a baseline of expected behavior against which the Linux logs being processed by analyze are compared.

Usage

Use train to create a model(baseline) that is used by analyze to detect anomalies in Linux logs. train extracts the information it needs from the JDBC compliant database which is populated by upload and analyze. The results are written to the file system when train completes and are read from the file system by analyze.

Notes

train will use all the information in the database to create a model unless either a start date or a start date and end date are provided

not seen in model is based on the information processed during train

Command syntax

Command	Options selected
train model-group	Processes all systems in the model group, starting with the first date with data in the database and continuing to the final date with data in the database

train all	processes all systems in the database, starting with the first date with data in the database and continuing to the final date of data in the database

train model-group start-date	processes all systems in the model group, using data between the specified start date and the last date with data in the database

train model-group start-date end-date	processes all systems in the model group, using data between the specified start and end dates

train all start-date	processes all systems, using data between the specified start date and the last date with data in the database

train all start-date end-date	processes all systems, using data between the specified start and end dates

Restrictions

The quality of the model depends on the amount of information that is available to create the model. analyze will mark certain analysis results as questionable using information created by train. This happens when train is able to detect that the model will have problems differentiating between good and bad intervals. If verify indicated that there was insufficient data, train may not run successfully and will likely not produce a model which has a lot of explanatory power.

ADE