You’ve certainly heard of Machine or Deep Learning? But what exactly are we talking about? And what can these techniques contribute to the fight against fraud?
What is Machine Learning and Deep Learning?
Machine Learning is a branch of artificial intelligence, based primarily on the automatic construction of statistical models using the largest possible learning corpus. Machine Learning is everywhere these days, even in unobtrusive ways: whether it’s the spam filter in your mailbox, the recommendation engines on shopping sites, or search engines. etc.
Deep Learning is a sub-branch of this discipline, which uses so-called “deep” neural networks as models, i.e. highly complex neural networks with many layers. This approach, which has recently been made popular by the availability of low-cost computing power, thanks in particular to recent graphics cards (GPUs: Graphical Processing Units), delivers excellent results, particularly on images.
Field of application
Netheos uses Machine Learning in a variety of ways to process identity documents. In particular, we have trained networks to classify documents, using a mass of sample ID documents. This enables us to classify documents automatically, with a high level of confidence, and especially to discard those whose appearance is too far removed from what is present in the training data. The main advantage is that the system learns all by itself which elements must be present on a document, without having to specify them arbitrarily beforehand (known as “invariants”). As a result, the system is easily adaptable: any new type of document can be easily integrated, it’s just a matter of re-learning and integrating the new features.
Netheos also uses Deep Learning in a more targeted way this time, for data extractions and to verify the authenticity of coins (comparison of security features). Here again, the aim is to gather a large corpus of reference data which are used to train a neural network, this time focusing on pre-targeted elements by document type. For example, this approach enables us to achieve a much higher level of performance than conventional OCR software, particularly on low-quality images. Indeed, we’re seeing more and more images taken with smartphones in poor conditions (low illumination, noise, perspective distortions, etc.).
The above approaches are what we call “supervised” learning, in the sense that the data used for learning must be correctly labeled beforehand (for example, images must be correctly sorted by document type). As far as fraud detection is concerned, we don’t have access to a corpus of correctly worded data, because by definition we don’t know a priori whether a file is a fraud, and the time lag between processing the file and detecting the fraud can be very long. What’s more, these are (fortunately) relatively rare events, and therefore not well represented in the training data. We therefore use an unsupervised approach. This time, the aim is to identify abnormal points that deviate too far from the average: this is known as anomaly detection. This type of system is notably used for electronic payment systems, enabling your bank to block suspicious payments that deviate from your purchasing habits, for example.
Machine Learning algorithms are particularly well-suited to the fight against fraud, offering the key advantage of being rapidly adaptable to other document types or routes, compared with conventional pre-wired logic approaches that require a whole new job to be done for each new situation. What’s more, these approaches, which have no preconceived ideas about the relevance of data, enable us to make the most of available data, whereas an “expert system” is based on intuition, which may come from the business, but which could overlook certain elements that are nonetheless relevant.
However, it’s important to qualify this conclusion by stressing that it’s obviously knowledge of the business that conditions the application of all these algorithms.