Resource and Knowledge Discovery in Large Scale Dynamic Networks
Open Access
- Author:
- Li, Mei
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- March 02, 2007
- Committee Members:
- Wang Chien Lee, Committee Chair/Co-Chair
Anand Sivasubramaniam, Committee Member
Thomas La Porta, Committee Member
Chao Hsien Chu, Committee Member
Peng Liu, Committee Member - Keywords:
- Distributed Systems
Networks
Data Mining
Information Management
Network Security
Database - Abstract:
- A massive amount of information, including multimedia files, relational data, scientific data, system usage logs, etc., is being collected and stored in a large number of host nodes connected as large scale dynamic networks (LSDNs), such as peer-to-peer (P2P) systems and sensor networks. A wide spectrum of applications, e.g., resource locating, network attack detection, market analysis, and scientific exploration, relies on efficient discovery and retrieval of resources and knowledge from the vast amount of data distributed in the network systems. With the rapid growth in the volume of data and the scale of networks, simply transferring the data generated at different host nodes to a single site for storing and processing becomes impractical, incurring excessive communication overhead while raising privacy concerns. Thus, a major challenge faced by LSDNs is to design decentralized infrastructures and algorithms that enable efficient resource and knowledge discovery in large scale dynamic networks. In this dissertation, various resource and knowledge discovery tasks ranging from simple tasks such as query processing to complex tasks such as network attack detection are systematically investigated, with a synergy of research efforts spanning multiple disciplines, including distributed computing, network and data management. Efficient and robust infrastructures and algorithms are proposed to support these tasks, with particular attention paid to various system issues including load balancing, maintenance, adaptivity to dynamic changes, data distribution and users access pattern in the networks. The superiority of these proposed ideas is demonstrated through extensive experiments using both synthetic data and real data. This dissertation provides profound insights on exploiting the vast amount of data for different applications, e.g., system performance tuning, network attack detection, market analysis, opens the new research direction on distributed data mining, and provides a solid foundation for exploring various data management tasks in the networks systems. It is expected that this study will have a deep impact on the deployment of various applications that mandate efficient management and mining of the vast amount of data distributed in the network systems.