LOCAL COMMUNITY DETECTION ON COMPLEX NETWORKS
Open Access
- Author:
- Bian, Yuchen
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- September 25, 2019
- Committee Members:
- Xiang Zhang, Dissertation Advisor/Co-Advisor
Xiang Zhang, Committee Chair/Co-Chair
Dongwon Lee, Committee Member
Zihan Zhou, Committee Member
Chaopeng Shen, Outside Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- Local community detection
random walk
memory-based random walk
multi-walker chain
query replacement
query node
topology potential field - Abstract:
- Community detection or clustering plays a vital role in data mining and machine learning. Recently local community detection attracts more attention due to its online manner. Given a community-structured graph and a set of query nodes (one query node or multiple query nodes), local community detection aims to detect local communities (one community or multiple communities) containing the queries. Two main assumptions exist in the literature. One is the query-bias assumption which assumes that nodes in the target communities locate near the query nodes. The other is the homology assumption which assumes that the query nodes come from the same community. However, these two assumptions prevent local community detection methods to spot target communities accurately. In this dissertation, three models are proposed to address issues resulting from the two assumptions. Suppose only one query node is given and it does not come from communities overlapping region (i.e., there is only one target community). The query-bias assumes that nodes near the query node have high chances to be detected in the target community. Usually, many existing methods send one random walker from the query to explore the graph. The probability of the walker visiting one node is the chance of that node being detected in the target community. Nevertheless, when the query is on the community boundary region, the random walker has a high probability to travel to nearby communities but less probability to visit nodes in the target community but disconnect the query. The query-bias assumption results in false positives (nodes near the query node but not in the target community will be detected) and false negatives (nodes in the target community but not near the query are missed). To avoid the query-bias, in this dissertation, a multiple walkers chain (MWC) model is proposed where a group of mutually affected random walkers is sent from the query node. Due to the interaction between walkers, the walkers group in MWC is very hard to escape from the target community. The results on synthetic networks and real networks show that MWC outperforms other existing works. Suppose multiple query nodes are given. The homology assumption considers the query nodes from the same target community. However, the homology assumption does not hold in many cases especially when different domains are taken into account. Even when the homology assumption is neglected, if there are too many query nodes, detecting the target community for each query at a time will not only suffer from the query-bias but also cost too much. If several query nodes are known in advance that they are homologous, they should be taken care of together to detect their target community. In this dissertation, a memory-based random walker (MRW) model and an interactive framework are proposed to detect multiple communities for a set of query nodes without holding the homology assumption. One random walker is sent from each query node and each walker memorizes its visiting history. During the exploring process, the interaction between walkers helps to divide the multiple query nodes into groups in the meantime of detecting. In this way, the algorithm can execute just once to detect all target communities for multiple queries. Experimental results on real-world networks show that the proposed MRW performs better and faster than the alternatives. Local community detection for query nodes from communities overlapping region becomes extremely hard for exiting methods due to the two assumptions. Most works cannot detect multiple communities for one overlapping query node. In this dissertation, a general replacement mechanism is proposed to deal with all the aforementioned three intractable query cases: boundary query, multiple queries, and overlapping query. The main idea is to replace intractable queries with detection-friendly new queries (nodes in the community core region). First, brought from the field theory in physics, a query-oriented topology potential (TP) filed is built around one query node. Then TP values of core nodes are emphasized with a designed amplifier. Finally, the query-oriented core nodes are distinguished and easy to detect. The detected core nodes are used as new queries to find the target local communities. More importantly, the core nodes detection strategy can be applied in other graph mining applications. Experimental results verify the advantages of the replacement strategy.