Network Traffic Analysis: Anomaly Detection and Some Implications of Neutrality

Open Access
Kocak, Fatih
Graduate Program:
Electrical Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
February 14, 2014
Committee Members:
  • David Jonathan Miller, Dissertation Advisor
  • George Kesidis, Dissertation Advisor
  • Kenneth Jenkins, Committee Member
  • John F Doherty, Committee Member
  • Anna Cinzia Squicciarini, Committee Member
  • machine learning
  • anomaly detection
  • p-value
  • clustering
  • feature selection
  • network neutrality
  • game theory
  • caching
  • networks
This thesis makes contributions to two separate topic areas, namely anomaly detection and network neutrality areas, which are related to each other. In the first part, we focus on detecting samples from anomalous latent classes, buried within a collected batch of known (normal) class samples, where the number of features for each sample is high. We assume and observe to be true that careful feature selection within unsupervised anomaly detection may be needed to achieve the most accurate results (depending on the particular feature representation that is in use). We form pairwise feature tests based on Gaussian mixture models, with one test for every pair of features. The mixtures are estimated using known class samples (null training set). Using these mixture models, p-values are obtained on the test batch samples under the null hypothesis. We use these p-values in basically two different ways. In our first approach, we consider sample-by-sample detection of anomalous class samples amongst the batch of collected samples. We propose a novel sample-wise sequential anomaly detection procedure with growing number of tests. New tests are included only when they are needed, i.e., when their use on currently undetected samples will yield greater aggregate statistical significance of multiple testing corrected detections than obtainable using the existing test set. This approach aims to maximize aggregate statistical significance of all detections made up until a finite horizon. We then approach this anomaly detection problem as a clustering problem. We calculate approximate joint p-values for candidate anomalous clusters, defined by (sample subset, test subset) pairs. Our approach sequentially detects the most significant clusters of samples in a networking context. We use different kinds of feature representations and conditioning contexts and experimented on many datasets for comprehensive performance evaluation purposes. Our p-value clustering algorithm is compared, using ROC curves, with alternative p-value based methods, our sample-by-sample sequential detection, and the one-class SVM. All the competing methods make sample-wise detections, i.e., they do not jointly detect anomalous clusters. The anomalous class was either an HTTP bot (Zeus) or peer-to-peer (P2P) traffic. For certain feature representations, our p-value clustering approach gives promising results for detecting the Zeus bot and P2P traffic amongst Web. In the second part, we analyze some issues about the network neutrality. We investigate the relations between caching, pricing, and revenues of entities under the light of network neutrality concerns. Firstly, we consider a model with two ``eyeball" Internet Service Providers (ISPs) (i.e., those acting as both network access and content providers (CP)), with transit pricing of net traffic at their peering point. That is, there is an inter-provider service-level agreement (SLA) involving a revenue based on net transit traffic flow across their peering point(s). We studied the effects of caching remote content via a game between the ISPs on a platform having usage-priced subscribers. We do this for two cases: one is for different congestion points in each ISP (depending traffic origin) leading to tractable Nash equilibria; and the other is for a single congestion point which we herein study numerically. Secondly, we consider a game between an ISP and CP on a platform of end-user demand. A price-convex demand-response is motivated based on the delay-sensitive applications that are expected to be subjected to the assumed usage-priced priority service over best-effort service. Thus, we are considering a two-sided market with multiclass demand wherein one class (that under consideration herein) is delay-sensitive. Both the Internet and proposed Information Centric Network (ICN, encompassing Content Centric Networking (CCN)) scenarios are considered. For our purposes, the ICN case is basically different in the polarity of the side-payment (from ISP to CP in an ICN) and, more importantly here, in that content caching by the ISP is incentivized. A price-convex demand-response model is extended to account for content caching. The corresponding Nash equilibria are derived and studied numerically.