Workflow

To use the gold-miner suite to attempt classification of unknown traffic, the following steps should be done in turn:

  1. Analyze a set of labeled training pcaps containing known, encrypted protocol traffic using gold-miner-trainer

  2. Combining those individual training results into a single, aggregated training profile by using gold-miner-trainer-aggregator.

  3. Use the gold-miner tools with the training profile to analyze unknown traffic in a PCAP or on an interface.

The steps below describe this process at a higher level, and the individual tools pages above provide greater detail about how each of the tools work.

Note that all of these steps can be executed in automated fashion by using the Test and Evaluation Suite, which takes a a YAML configuration file, creates a profile, analyzes the data for accuracy and produces an HTML report. (an example report is available to see what the output looks like). The Test and Evaluation Suite tool may be easier to start with instead of running each of these steps by hand.

Also see the Additional Tools document that describes additional useful tools that are distributed with the gold-miner package.

Steps to Classify Unknown Traffic Samples

The process for using the rapid classifier involves Generating individual training profiles that measures a sample of labeled traffic to build a profile of what each type of traffic looks like. The results of each measurement needs to be combined into a resulting single profile of all traffic. After these steps are completed, the resulting training profile can be used to analyze an unknown traffic stream to see how well it may match a known profile.

1. Generating individual training profiles

To generate individual training profiles based on each type of known, labeled datasets use the gold-miner-trainer command. It can analyze any number of pcaps and generate a starting statistical dataset to be used in later steps. [Hint: Use an output filename that reflects the type of data being analyzed.] Example usage:

gold-miner-trainer -T -o web_traffic.fsdb web_traffic.pcap
gold-miner-trainer -T -o mail_traffic.fsdb mail_traffic.pcap

In these examples, the web_traffic.pcap file is analyzed and a web_traffic.fsdb training profile file is produced. A similar example is shown for (e)mail traffic.

For further information see the gold-miner-trainer tool documentation.

2. Combine individual training profiles together

Once the multiple individual training sets are created, they must be merged before giving them to gold-miner below. To merge them, use gold-miner-trainer-aggregator with label/file pairs to create an aggregated training-profile.fsdb:

gold-miner-trainer-aggregator -o training-profile.fsdb \
web web_traffic.fsdb \
mail mail_traffic.fsdb

Note that the arguments to the script include a repeated set of pairs of a generic word as a label (e.g. web) and the individual training profile for it (e.g. web_traffic.fsdb).

For further information see the gold-miner-trainer-aggregator tool documentation.

3. Analyzing an unknown traffic source

Now that we have trained parameters, we can analyze an existing pcap file or watch an interface for traffic of interest. The output will be a string of FSDB (tab separated) data representing confidence values. We assume we have a protocol of interest matching one of the profile names in the training file (“mail” in the example below).

gold-miner -r unknown.pcap -p training-profile.fsdb -g mail

This tools, by default, generates a tab-separated list of output data that can be easily parsed. A confidence value is given per traffic type being detected that can be compared against other types to determine what the traffic most likely might be.

For further information see the gold-miner tool documentation, which also goes into greater detail about the output, describes the other output format options, along with specifying other sub-algorithms to select between.