The a time consuming activity, hence a small

The three payload features in this
example were converted to bigram features and Trigram features as in Table 1
and table 2 respectively.
The table’s header represents the standard feature vector for all the payload
features in the taken example. The payload features appear in the table as per
the order of their presentation to the feature extraction algorithm, that is the
first row corresponds to the first payload feature discovered during dictionary
generation and second row corresponds to the second payload feature discovered
during dictionary generation so on.

After all this preprocessing, in
order to prepare the resulting data set for feature selection, a pre-ranking
step has been conducted for the features using a quick filter selection
algorithm that is the well known Correlation-based Feature Selection (CFS). The
reason for this ranking is to make sure that the reduced features are still
informative and thus far, still contained distracting features as well. Hence a
very fast feature selection method is used to pre-rank the features, which is
called Correlation-based Feature Selection (CFS). By using this method, it is
easy to rank the features in order to perform controlled experiments by
manipulating relevant and irrelevant features.

 Because the current number of features is very
huge and the feature selection is really a time consuming activity, hence a small
subset of 250 features from the original features are taken for the
experimental purpose. From the obtained ranked features list from the CFS
method, a smaller subset of this ranked list is extracted in such a way that it
consists of one portion of the top ranked features and nine portions of lowest
ranked features.

The specified portion size in this
work is taken as 25 features; therefore the subset size is 250 features in
total. This way of selection is intentional to test the efficacy of the
proposed system against 90% of the total features are bad features compared to
the good features. The small subset of features is taken as sample from the
huge set of total feature to save time and computational effort.

Those resulting 250 features are
used to generate data sets of size 20, 40, 100, and 400 examples respectively. In
order to simulate “zero-day” attacks, the data sets have been chosen to be
small in terms of number of examples. We generated those data sets as balanced
data sets (i.e. we made them of equal number of normal and attack examples). Different
numbers of examples were used to monitor the behavior of feature selection with
each size of data set.

To observe the effect of including
the payload features to improve the detection accuracy, a very important
experiment has been conducted. The well known machine learning algorithm, the
SVM’s classification was used to find out the accuracy and F-measure on the ISCX 2012 data sets in two cases, first without (Bigram
Features / Trigram features) the payload features  and second with (Bigram / Trigram features)
payload features. The performance metrics have been measured before and after
converting the payload features to bigram and trigram features and applying
feature selection.


The main objective of this
experiment is to show that these payload features include important and useful information
in improving the detection accuracy. Because of the inability of handling the
long payload futures, many of the previous researchers excluded payload
features from the original set of features before looking for intrusions. The
feature selection method was applied on the four generated data sets of ISCX
2012 and presented the maximum obtained accuracy and F-measure as shown in Table 3
and Table 4 respectively.