Anomaly detection Engine for Linux Logs (ADE)
Generating a model of 'normal' behavior - train
train is a bash script which invokes the correct Java class after setting up the environment needed. train creates a baseline of expected behavior against which the Linux logs being processed by analyze are compared.
Usage
Use train to create a model(baseline) that is used by analyze to detect anomalies in Linux logs. train extracts the information it needs from the JDBC compliant database which is populated by upload and analyze. The results are written to the file system when train completes and are read from the file system by analyze.
Notes
- train will use all the information in the database to create a model unless either a start date or a start date and end date are provided
- not seen in model is based on the information processed during train
Command syntax
Command | Options selected |
---|---|
train model-group | Processes all systems in the model group, starting with the first date with data in the database and continuing to the final date with data in the database |
train all | processes all systems in the database, starting with the first date with data in the database and continuing to the final date of data in the database |
train model-group start-date | processes all systems in the model group, using data between the specified start date and the last date with data in the database |
train model-group start-date end-date | processes all systems in the model group, using data between the specified start and end dates |
train all start-date | processes all systems, using data between the specified start date and the last date with data in the database |
train all start-date end-date | processes all systems, using data between the specified start and end dates |
Restrictions
The quality of the model depends on the amount of information that is available to create the model. analyze will mark certain analysis results as questionable using information created by train. This happens when train is able to detect that the model will have problems differentiating between good and bad intervals. If verify indicated that there was insufficient data, train may not run successfully and will likely not produce a model which has a lot of explanatory power.