Automated test and evaluation¶
IMPORTANT: this page describes the `gold-mine-tande` tool that is part of the ``gold-miner-ui`` python package, which is distributed separately from the ``gold-miner`` python package
The gold-miner-tande tool is designed to take a series of labeled
pcap files listed in a YAML configuration file and:
take one listed set of PCAP files as a training set and use it to create a gold-miner training profile
take a second list of labeled files to evaluate the effectiveness of traffic detection using the profile and the
gold-minertooloutput an html (and markdown) report that shows the results of these training and testing phases
Note: ``gold-miner-tande`` tool requires ``pandoc`` to be installed on the system.
Example output: see the Example section below, and the example report page.
YAML configuration overview¶
The gold-miner-tande tool is driven by a YAML configuration file
that is divided into parts:
a general configuration section
a train section
a test section
The YAML configuration structure looks like the following example block. Each configuration section is further discussed below.
---
# number of (hyper)cores to use
processes: 32
# where to store the output results/images
output_directory: where-output-should-go
# a temporary directory to store filtered pcaps in
tmp_dir: directory-to-store-temporary-files
# details about the pcaps to use for training
train:
files:
# ...
# details about the pcaps to use for testing
test:
files:
# ...
Training¶
The train section should contain a list of pcap files to use for training, along with their appropriate labels. These files must not contain multiple types of data (i.e., multiple labels).
Note that both the training and testing section entries can take a
filter token that will apply a standard PCAP filter for selecting or
removing certain elements from supplied packet traces. Do note that
these filter sections will create secondary (filtered) PCAPs inside
the temporary directory that is specified by the tmp_dir directive.
The results of the training will produce a training-profile.fsdb file in the output directory.
train:
files:
- file: /path/to/something.pcap
label: something
- file: /path/to/other.pcap
label: other
- file: /path/to/toomuch.pcap
label: filtered
filter: not icmp and not arp
Testing¶
The test sections works similar to the train section,
containing a list of PCAP files and associated labels, to
use for determining when the gold-miner classifier gets a label
prediction of unknown traffic correct or incorrect.
test:
files:
- file: /path/to/something2.pcap
label: something
- file: /path/to/other2.pcap
label: other
- file: /path/to/toomuch2.pcap
label: filtered
filter: not icmp and not arp
Optional attributes¶
The YAML configuration tokens can include additional directives that
affect the processing of the gold-miner-tande run. Be sure to read
the ‘Inheritance and Overrides’_ section below, which discusses how to
apply these at different levels of the configuration hierarchy.
packet_count N¶
This will limit the number of packets to read in a **train* and/or test file
packet_count: 10000
skip_packets N¶
This will cause the gold-miner-tande to skip the first N packets of
the pcap before processing it for the given section (train or test).
packet_count: 10000
Inheritance and Overrides¶
A number of the directives that affect processing of individual
entries can be placed at the top level in the YAML file, underneath
just the test or train sections, or next to each file
itself. Lower level directives will override upper level directives.
As an example, consider the case where you want to read 10000 packets
from every pcap in the train section, 20000 in the test section,
except for one file in particular that isn’t that long. And for the
test files you actually want to read from the same files as
training, but skip the packets that were used for training itself.
The resulting YAML might look like:
# by default, read only 20,000 packets
packet_count: 20000
train:
# for training, read 10,000 though
packet_count: 10000
files:
- file: one.pcap
label: one
- file: two.pcap
label: two
# over-ride the 10,000 packet count to just 500
packet_count: 500
test:
# for testing, skip the first 10,000
# (and evaluate the remaining 20,000)
skip_packets: 10000
files:
- file: one.pcap
label: one
- file: two.pcap
label: two
# over-ride the 10,000 packet and skip counts to just 500
packet_count: 500
skip_packets: 500
Algorithm selection¶
There are actually 4 (sub)algorithms that the gold-miner suite
supports. The algorithm to use can be specified with a top level
algorithm directive:
comparison
comparison-wide
linear
lms
There is additionally a special algorithm that gold-miner-tande
supports called all, which will run the train/test suite repeatedly
– once for each algorithm and generate a resulting comparison summary.
See the gold-miner algorithm documentation for further details on selecting the best algorithm for your use case.
Output¶
The output of the gold-miner-tande tool produces an entire
directory of files in the directory specified by the
output_directory token. An index.html file is built at the top
of the directory to allow easy browsing and understanding of all of
the results.
Example¶
An example report shows what the results look like for a simple test case that involved testing client and server traffic from three different traffic types over an IPsec tunnel. The example configuration shows the YAML configuration file given to the gold-mine-tande application.
Command Line Arguments¶
introduction - CLI interface¶
Imports a list of end-to-end tasks to do for training and evaluation of pcaps for the rapid classifier.
introduction [-h] [--log-level LOG_LEVEL] [-c [CONFIG [CONFIG ...]]] [--train] [--all]
test_plan
introduction positional arguments¶
test_plan- Test plan to execute (default:None)
introduction optional arguments¶
--log-levelLOG_LEVEL,--llLOG_LEVEL- Define the logging verbosity level (debug, info, warning, error, fotal, critical). (default:info)-cCONFIG,--configCONFIG- name/value pairs of configuration to pass to override YAML properties (default:[])--train- Only conduct the training phase--all- Conduct all parts (default if no other parts are specified)