Workflow ================ To use the ``gold-miner`` suite to attempt classification of unknown traffic, the following steps should be done in turn: 1. Analyze a set of labeled training pcaps containing known, encrypted protocol traffic using gold-miner-trainer_ 2. Combining those individual training results into a single, aggregated *training profile* by using gold-miner-trainer-aggregator_. 3. Use the gold-miner_ tools with the training profile to analyze unknown traffic in a PCAP or on an interface. .. _gold-miner-trainer: tools/goldminertrainer.html .. _gold-miner-trainer-aggregator: tools/goldminertraineraggregator.html .. _gold-miner: tools/goldminer.html The steps below describe this process at a higher level, and the individual tools pages above provide greater detail about how each of the tools work. Note that all of these steps can be executed in automated fashion by using the `Test and Evaluation Suite `_, which takes a a YAML configuration file, creates a profile, analyzes the data for accuracy and produces an HTML report. (an `example report`_ is available to see what the output looks like). The `Test and Evaluation Suite `_ tool may be easier to start with instead of running each of these steps by hand. .. _example report: tande-example/index.html Also see the `Additional Tools `__ document that describes additional useful tools that are distributed with the ``gold-miner`` package. Steps to Classify Unknown Traffic Samples -------------------------------------------- The process for using the rapid classifier involves *Generating individual training profiles* that measures a sample of labeled traffic to build a profile of what each type of traffic looks like. The results of each measurement needs to be *combined into a resulting single profile of all traffic*. After these steps are completed, the resulting training profile can be used to analyze an unknown traffic stream to see how well it may match a known profile. 1. Generating individual training profiles ------------------------------------------ To generate individual training profiles based on each type of known, labeled datasets use the ``gold-miner-trainer`` command. It can analyze any number of pcaps and generate a starting statistical dataset to be used in later steps. [Hint: Use an output filename that reflects the type of data being analyzed.] Example usage: :: gold-miner-trainer -T -o web_traffic.fsdb web_traffic.pcap gold-miner-trainer -T -o mail_traffic.fsdb mail_traffic.pcap In these examples, the ``web_traffic.pcap`` file is analyzed and a ``web_traffic.fsdb`` training profile file is produced. A similar example is shown for (e)mail traffic. For further information see the gold-miner-trainer_ tool documentation. 2. Combine individual training profiles together ------------------------------------------------ Once the multiple individual training sets are created, they must be merged before giving them to ``gold-miner`` below. To merge them, use ``gold-miner-trainer-aggregator`` with label/file pairs to create an aggregated ``training-profile.fsdb``: :: gold-miner-trainer-aggregator -o training-profile.fsdb \ web web_traffic.fsdb \ mail mail_traffic.fsdb Note that the arguments to the script include a repeated set of pairs of a generic word as a label (e.g. ``web``) and the individual training profile for it (e.g. ``web_traffic.fsdb``). For further information see the gold-miner-trainer-aggregator_ tool documentation. 3. Analyzing an unknown traffic source -------------------------------------- Now that we have trained parameters, we can analyze an existing pcap file or watch an interface for traffic of interest. The output will be a string of FSDB (tab separated) data representing confidence values. We assume we have a protocol of interest matching one of the profile names in the training file (“mail” in the example below). :: gold-miner -r unknown.pcap -p training-profile.fsdb -g mail This tools, by default, generates a tab-separated list of output data that can be easily parsed. A confidence value is given per traffic type being detected that can be compared against other types to determine what the traffic most likely might be. For further information see the gold-miner_ tool documentation, which also goes into greater detail about the output, describes the other output format options, along with specifying other sub-algorithms to select between.