Evaluating the alignment of privacy policies to NIST cybersecurity framework using Natural Language Processing and Deep Learning

Open Access
- Author:
- Chaudhary, Namrata
- Graduate Program:
- Data Analytics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 15, 2021
- Committee Members:
- Adrian Sorin Barb, Thesis Advisor/Co-Advisor
Nil Hande Ergin, Committee Member
Guanghua Qiu, Committee Member
Colin Neill, Program Head/Chair - Keywords:
- Natural language processing
NLP
IoT Privacy Policy
NIST cybersecurity framework
BERT
ELMo
Word2Vec
Doc2Vec
text similarity
Privacy policy gaps - Abstract:
- With the growing need to convert urban areas into smart cities, the use of Information & Communications Technology has increased and tasks like sustainable energy utilization, law enforcement, traffic decongestion and waste management are handled efficiently using Internet of Things (IoT) devices. These devices however give rise to various security and privacy concerns which makes it important to understand and identify any gaps in a device’s privacy policy. We chose to use the National Institute of Standards and Technology (NIST) cybersecurity framework as a benchmark resource in the IoT security and privacy domain and utilized deep learning algorithms to perform text similarity between sentences from an IoT device’s privacy document and functions and categories from NIST. A detailed quantitative evaluation using four pre-labelled, standard datasets helped us identify that a Siamese implementation of the Bidirectional Encoder representation from Transformers (BERT) algorithm performs best for the task of similarity on policy text. Based on the similarity scores obtained using BERT, we labelled each section of a privacy policy document where labels were NIST categories and functions. An aggregation of these resulting labels and scores helped us gain insights like: ~50% of all four policy documents talk about Protect domain, or that policy documents of Amazon Alexa and Apple IoT device contained gaps regarding the Recover domain whereas policies of Nest and August home device policies were relevant to Recover. The complete set of results entail how each NIST category was relevant to each of the privacy policies along with their proportion relevance.