Search

Machine learning and spoofing - New technology, rules and tools

15 November 2015

By

Neurensic is a Chicago-based company that has developed a set of tools for the detection of disruptive trading practices such as spoofing. It uses a form of artificial intelligence to search through trading data and identify patterns of activity that may raise a red flag for regulators. In this article, two of Neurensic’s founders explain how machine learning works and why it is particularly suitable for this type of compliance problem.

Since the passage of Dodd-Frank, market regulators and law enforcement officials have had a new tool in their efforts to prevent market abuse. “Spoofing” has been codified as a prohibited market activity and the Commodity Futures Trading Commission is now working with the Department of Justice and the Federal Bureau of Investigation to enforce this new law through several high-profile cases.

Many would argue, however, that the term is ill-defined. The regulators appear to believe that their written guidance communicates their expectations for market participants, but at the same time they describe the law as necessarily vague. Market participants are now struggling to understand what types of trading activity might be viewed as spoofing and compliance officers seeking to protect their firms from possible enforcement action are combing through massive amounts of trading data to determine if any of their trading might be construed as spoofing.

Machine Learning

Existing market surveillance tools search for violations of specific rules. This approach works well with something like cross trading. Has a broker matched the buy order of one customer against the sell order of another? This yes/no question can be evaluated and alerts can be generated when the rule is broken.

The problem with spoofing is that the definition is vague by design. A spoof can involve five contracts or five thousand, seven order messages or seven thousand. The duration could be measured in milliseconds or hours. To be considered spoofing, an order must have been placed with the intent to cancel before execution. A simple software solution cannot solve this problem. The evaluation of trading behavior is not a yes/no problem and requires a new type of tool.

Machine learning is that tool. It is a field of artificial intelligence that involves the development of self-learning algorithms. Rather than following a list of instructions in a program, the machine learns to make classifications from examples. The more data it is exposed to, the more effective it becomes in making those classifications. 

Machine learning techniques are being applied all around us. Today’s cameras, for example, are able to identify faces and adjust the focus accordingly. Although this is an easy task for people, a software solution to this problem proved elusive until advanced machine learning techniques met microprocessors small enough and powerful enough to fit into a camera. 

One of the techniques cameras use is called a support vector machine (SVM). While the mathematics involved are tricky, it relies upon the concept of features. A feature is a measurable piece of data—an area with facial colors, for instance, or the ratio of the distances between the eyes, nose and chin. The software that controls the camera’s viewfinder is continuously analyzing every pixel it captures and scoring pieces of the image based upon those features.

If plotted on a graph, you would find clusters with similar scores. In the middle you would find sets of pixels that scored high and are more likely to be a face. A computer can take this several steps further and plot this problem on as many axes as there are useful features. It is not possible to visualize a support vector machine that evaluates a problem in fifteen dimensions, but the concept is the same.

There are many examples of companies applying machine learning to solve a host of practical problems. We see machine learning tools applied in spam filters that improve over time, recommendation engines on websites such as Netflix, medical technology for disease diagnosis and of course the driver-less cars that have attracted so much attention in the media.

Our company is now applying machine learning to the problem of detecting spoofing. We offer no view on whether the regulatory definition of spoofing is right or wrong, or whether a trading pattern constitutes a violation. We simply offer a tool that allows firms to identify trading activities that may fit this definition. In other words, we train the computer to know what regulators are looking for and then search client data for patterns that resemble this activity. Each pattern receives a risk score of 200-800; the higher the score, the more risk of attracting regulatory attention. 

Keep in mind the monstrous amounts of data that need to be processed. The North American futures industry, for instance, generates over 100 billion order messages each day and the securities markets billions more. In our view, a machine learning solution is much more effective at processing this amount of data than rules-based software solutions.

Risk Score

The anonymized data in the table below is a real example of order messages generated by a trader in the coffee futures market. Our tool identified a cluster of activity that appeared to be related by behavioral intent and scored this cluster with a very high probability of attracting regulatory attention.

The trading activity shown is similar to patterns that regulators have identified as spoofing in recent enforcement actions, where a trader’s alleged intention was to cancel a bid or offer before execution.

Examining the rest of this trader’s activity, multiple instances of this trading pattern were found, further strengthening the impression that the intent of the trader’s order actions was to gain an unfair execution advantage.

Our second tool takes this one step farther. We are training the computer to understand normal market activity and tell us when the market is unbalanced—when liquidity is illusionary or volume is artificial.

As we said above, it is not for us to judge who is a bad actor or not. We cannot judge intent. What we can do is identify clusters of actions that appear to share intent, score the likelihood of attracting regulatory attention and measure the effect that the actions had on the market. The rest is in the hands of the market participants. 

The detection of spoofing is, of course, not the only application for machine learning in the financial markets. In addition to our firm, there are a number of solution providers who have begun to solve difficult problems.

On the software side, Palantir has done a great deal of work on risk indicators and detection tools to protect firms from rogue traders, financial fraud and cyber threats. Their solutions seek to separate actionable intelligence from noise, using an interactive process of machine-driven modeling and analysis. 

On the hardware side, Nervana Systems has developed a cloud computing solution that allows deep learning solutions to better handle large-scale datasets. In other words, they are embedding intelligent software inside computer hardware so that data can be stored, accessed and moved far faster than with conventional hardware.

chart Trading Activity Searching for a Needle in a Haystack
  • MarketVoice
  • Technology