Open Access
Lin, Tao
Graduate Program:
Information Sciences and Technology
Master of Science
Document Type:
Master Thesis
Date of Defense:
March 28, 2018
Committee Members:
  • Peng Liu, Thesis Advisor
  • Recurrent Neural Network
  • Machine Learning
  • Retrieval
  • Security
Triage analysis is a fundamental stage in cyber operations in Security Operations Centers (SOCs). The massive data sources generate great demands on cyber security analysts' capability of information processing and analytical reasoning. Furthermore, most junior security analysts perform much less efficiently than senior analysts in deciding what data triage operations to perform. To help analysts perform better, retrieval methods need to be proposed to facilitate data triaging through retrieval of the relevant historical data triage operations of senior security analysts. This thesis conducts a research of retrieval methods based on recurrent neural network, including rule-based retrieval and context-based retrieval of data triage operations. It further discusses the new directions in solving the data triage operation retrieval problem. The present situation is that most novice analysts who are responsible for performing data triage tasks suffer a great deal from the complexity and intensity of their tasks. To fill the gap, we propose to provide novice analysts with on-the-job suggestions by presenting the relevant data triage operations conducted by senior analysts in a previous task. A tracing method has been developed to track an analyst's data triage operations. This thesis mainly presents a data triage operation retrieval system that (1) models the context of a data triage analytic process, (2) uses recurrent neural network to compare matching contexts, and (3) presents the matched traces to the novice analysts as suggestions. We have implemented and evaluated the performance of the system through both automated testing and human evaluation. The results show that the proposed retrieval system can effectively identify the relevant traces based on an analyst's current analytic process.