Exploring the Impact of Failures on Network Monitoring Techniques

Open Access
Tati, Srikar S
Graduate Program:
Electrical Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
February 14, 2014
Committee Members:
  • Thomas F Laporta, Dissertation Advisor
  • Aylin Yener, Committee Member
  • Constantino Manuel Lagoa, Committee Member
  • Guohong Cao, Committee Member
  • Kultegin Aydin, Committee Member
  • Network Monitoring; Network Analytics; Independent Failures; Lar
In this thesis, we investigate the impact of failures on network monitoring systems. Network monitoring either explicitly focuses on diagnosing the performance bottlenecks or on localizing the failures in the network. Depending on the objective of network monitoring and the characteristics of failures, different types of challenges and problems arise. We work on two main problems: diagnosis of large-scale failures and robust network tomography to diagnose bottlenecks. In both problems, a Network Operation Controller (NOC) requires end-to-end measurements or connectivity information from end nodes in the network; and failures may impede this process significantly. In the first two chapters, I focus on the effects of failures on collection of end-to-end connectivity information at NOC while diagnosing large-scale failures. We propose inference algorithms that accomplish high accuracy in reasonable time compared to existing approaches that are developed for independent failures. In the first chapter, we propose a combinatorial algorithm called netCSI, and in the second chapter, we introduce a greedy approach called CMC, which performs better than netCSI in terms of runtime. We validate our algorithms with extensive simulations using realistic topologies and realistic failure models. In the latter part of the thesis, I focus on the impact of independent failures on measurement collection in when using network tomography techniques. We model the robustness of paths in this setting by a metric called expected rank. We formulate an optimization problem to cover two complementary performance metrics: robustness and probing cost. We propose an efficient algorithm called RoMe (RObust MEasurements) with a guaranteed approximation ratio and polynomial time complexity. We provide some future directions for the work and other interesting open problems.