Computational complexity of data mining algorithms used in fraud detection

Open Access
Author:
Iyer, Karthik Thyagarajan
Graduate Program:
Industrial Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
None
Committee Members:
  • Vittaldas V Prabhu, Thesis Advisor
Keywords:
  • Data mining
  • Complexity
  • Big O
Abstract:
According to estimates by certain government agencies, 10% of the total medical expenditure is lost to healthcare fraud. Similarly the credit card industry loses billions of dollars every year due to fraudulent transactions. The datasets used to identify these fraudulent transactions have millions of rows and each transaction is defined by 15-40 attributes. It is not possible for a human to sift through these massive datasets and find the fraudulent transactions. Hence credit card companies and insurance companies use data mining algorithms to identify fraudulent transactions. These data mining algorithms need to identify the fraudulent transactions efficiently and at the same time they need to process the dataset as quickly as possible. The time taken by the algorithm to execute is a function of the computational complexity of the algorithm. In this thesis, the theoretical running time complexities of the various data mining algorithms used in fraud detection are compared, i.e. the complexity is expressed as a function of the number of instances in the database. These algorithms were then run on statistical tools like Weka and R and on comparing the performance of these algorithms to the theoretical computational efficiency it was found that, all algorithms agree with the big O complexity. Support vector machines and decision trees performed better than the big O complexity.