Dhiraj's Data Analytics Blog: Is it time to reboot our approach to fraud detection?

Disclaimer - The views, thoughts, and opinions expressed in the text belong solely to the author, and should not in any way be attributed to the author’s employer, or to the author as a representative, officer or employee of any organisation.

This article is an excerpt from my book “Practical Data Analysis: Using Open Source Tools & Techniques” (available on Amazon worldwide, iBook Store, and Barnes & Noble).

With advances in computer technology, online banking and e-commerce, also comes increased vulnerability to fraud. Hackers and cyber criminals are continuously finding new ways to target their victims, from phishing attacks and stolen credit card details to creating false accounts. According to a recent report published by the Financial Fraud Action (FFA UK), in the UK alone, financial fraud related losses totaled £768.8 million in 2016, eighty percent of which was attributed to payment card related frauds (see figure below). Prevented fraud during the same period totaled £1.38 billion. This represents incidents that were detected and prevented by the banks and card companies, and is equivalent to £6.40 in every £10 of attempted fraud being stopped. In other words, the UK banks and card companies were unable to timely detect and prevent 36% of the frauds in monetary terms. These figures clearly indicate that there is still much work to be done in the area of payment card fraud detection.

Yearly breakdown of fraud related losses on UK-issued cards.

(Source - FFA UK's "Fraud The Facts 2017" report)

The traditional approach to tackling the problem of payment card related frauds is to use a set of rigid rules and parameters to query transactions, and to direct the suspicious ones through to the fraud department for human review. Rules are extremely easy to understand and are developed by domain experts and consultants who translate their experience and best practices to code to make automated decisions. But when a rules-based fraud detection system gets operationalised, one starts with say 100 fraud scenarios and 100 rules to handle it. As time goes by and as the fraudsters change tactics, we encounter more and more fraud scenarios and start adding more rules to keep the number of false positives and negatives under control. There comes a point where nobody really knows or can measure how well the rules work or how many exceptions there are - this is the situation today with a lot of legacy hand-crafted and rules-based fraud and anomaly detection systems.

But do we have a better alternative? The answer is yes, we do, and it is called Machine Learning! Machine Learning is simply a form of artificial intelligence that enables computers to "learn" (i.e. progressively improve performance on a specific task) with data, without being explicitly programmed to do so. It is based around the idea that we should really just be able to give machines access to data and let them learn for themselves.

Traditional Programming vs. Machine Learning

But why is Machine Learning a better alternative to a rules-based expert system designed by domain specialists and consultants? It is simply because machines are much better than humans at identifying patterns in data (especially in the world of big data that we live in) and detecting anomalies in those patterns. It can also process large datasets, and can recognise thousands of features on a user’s purchasing journey instead of the few that can be captured by creating rules. This ability to see deep into the data and make concrete predictions for large volumes of transactions makes Machine Learning a very promising alternative to the traditional rules-based approach for detecting and preventing frauds.

But words are meaningless without proof! Hence, to demonstrate the power of Machine Learning techniques in detecting fraudulent transactions and anomalous events, in chapter seven (“Fraud Detection Using Machine Learning Techniques”) of my book (“Practical Data Analysis”), I used a real-world credit card fraud dataset (anonymized and freely available) to test the prediction accuracy of four Machine Learning techniques – Random Forest, Boosted Trees (XGBoost), Auto Encoder, and Ensemble model. The table below summarises how a credit card fraud detection system built around these four Machine Learning techniques performed (detailed narrative on how to design and implement them using open source tools is included in my book).

Based on these results, the Ensemble model is clearly the winner - it was able to detect 83% of the fraudulent transactions, and only 9 out of every 100 transactions that the system flagged as fraudulent are genuine. Although the system failed to detect 17% of the fraudulent transactions, given the limited size of the dataset that I used to train these models (60% of two days’ credit card transactions), in my opinion, this is still a very good result. With a bigger training dataset and some feature engineering, it is very likely that the prediction accuracy of these models will improve even further.

Dhiraj Bhuyan, 29 July 2017

Dhiraj's Data Analytics Blog

Sunday, 29 July 2018

Is it time to reboot our approach to fraud detection?

No comments:

Post a Comment

A framework for continuous monitoring and real-time analytics

My Book

Search This Blog