Skip to main content

Main menu

  • Home
  • Current Issue
  • Past Issues
  • Videos
  • Submit an article
  • More
    • About JFDS
    • Editorial Board
    • Published Ahead of Print (PAP)
  • IPR logos x
  • About Us
  • Journals
  • Publish
  • Advertise
  • Videos
  • Webinars
  • More
    • Awards
    • Article Licensing
    • Academic Use
  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

User menu

  • Sample our Content
  • Request a Demo
  • Log in

Search

  • ADVANCED SEARCH: Discover more content by journal, author or time frame
The Journal of Financial Data Science
  • IPR logos x
  • About Us
  • Journals
  • Publish
  • Advertise
  • Videos
  • Webinars
  • More
    • Awards
    • Article Licensing
    • Academic Use
  • Sample our Content
  • Request a Demo
  • Log in
The Journal of Financial Data Science

The Journal of Financial Data Science

ADVANCED SEARCH: Discover more content by journal, author or time frame

  • Home
  • Current Issue
  • Past Issues
  • Videos
  • Submit an article
  • More
    • About JFDS
    • Editorial Board
    • Published Ahead of Print (PAP)
  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection

Kieran Wood, Stephen Roberts and Stefan Zohren
The Journal of Financial Data Science Winter 2022, jfds.2021.1.081; DOI: https://doi.org/10.3905/jfds.2021.1.081
Kieran Wood
is a DPhil student with the Machine Learning Research Group and the Oxford-Man Institute of Quantitative Finance at the University of Oxford in Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Roberts
is the RAEng/Man professor of machine learning at the University of Oxford and the director of the Oxford-Man Institute of Quantitative Finance at the University of Oxford in Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stefan Zohren
is an associate professor (research) with the Machine Learning Research Group and the Oxford-Man Institute of Quantitative Finance at the University of Oxford in Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Info & Metrics
  • PDF
Loading

Abstract

Momentum strategies are an important part of alternative investments and are at the heart of the work of commodity trading advisors. These strategies have, however, been found to have difficulties adjusting to rapid changes in market conditions, such as during the 2020 market crash. In particular, immediately after momentum turning points, when a trend reverses from an uptrend (downtrend) to a downtrend (uptrend), time-series momentum strategies are prone to making bad bets. To improve the responsiveness to regime change, the authors introduce a novel approach, in which they insert an online changepoint detection (CPD) module into a deep momentum network pipeline, which uses a long short-term memory deep-learning architecture to simultaneously learn both trend estimation and position sizing. Furthermore, their model is able to optimize the way in which it balances (1) a slow momentum strategy that exploits persisting trends but does not overreact to localized price moves and (2) a fast mean-reversion strategy regime by quickly flipping its position and then swapping back again to exploit localized price moves. The CPD module outputs a changepoint location and severity score, allowing the model to learn to respond to varying degrees of disequilibrium, or smaller and more localized changepoints, in a data-driven manner. The authors back test their model over the period 1995–2020, and the addition of the CPD module leads to a 33% improvement in the Sharpe ratio. The module is especially beneficial in periods of significant nonstationarity; in particular, over the most recent years tested (2015–2020), the performance boost is approximately 66%. This is especially interesting because traditional momentum strategies underperformed in this period.

Key Findings

  • ▪ Momentum strategies, including deep learning–based deep momentum networks, have underperformed in recent years owing to difficulties in adjusting to rapid changes in the market, such as when a trend reverses from an uptrend to a downtrend, or vice versa.

  • ▪ Inserting an online changepoint detection module into a deep momentum network pipeline leads to large performance gains, especially during periods of significant nonstationarity, as observed in recent years.

  • ▪ The model achieves superior risk-adjusted returns by blending a slow momentum strategy with a fast mean-reversion strategy, with the changepoint detection module helping to balance the two in a data-driven manner.

Time-series momentum (TSMOM) (Moskowitz, Ooi, and Pedersen 2012) strategies are derived from the philosophy that strong price trends tend to persist. These trends have been observed to hold across a range of timescales, asset classes, and time periods (Lempérière et al. 2014; Baz et al. 2015; Hurst, Ooi, and Pedersen 2017). Momentum strategies are often referred to as follow the winner because it is assumed that winners will continue to be winners in the subsequent period.

Momentum strategies are an important part of alternative investments and are at the heart of the work of commodity trading advisors. Much effort goes into quantifying the magnitude of trends (Bruder et al. 2013; Baz et al. 2015; Levine and Pedersen 2016) and sizing traded positions accordingly (Kim, Tse, and Wald 2016; Baltas and Kosowski 2017; Harvey et al. 2018). Rather than using handcrafted techniques to identify trends and select positions, Lim, Zohren, and Roberts (2019) introduced deep momentum networks (DMNs), in which a long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997) deep learning architecture achieves this task by directly optimizing on the Sharpe ratio of the signal. Deep learning has been widely used for time-series forecasting (Lim and Zohren 2020), achieving a high level of accuracy across various fields, including the field of finance for daily data (Bao, Yue, and Rao 2017; Gu, Kelly, and Xiu 2017; Lim, Zohren, and Roberts 2019; Kim 2019; Poh et al. 2021) and in a high-frequency setting, using limit order book data (Sirignano and Cont 2018; Zhang, Zohren, and Roberts 2019). In recent years, implementation of such deep learning models has been made accessible via extensive open-source frameworks such as TensorFlow (Abadi et al. 2015) and PyTorch (Paszke et al. 2017).

Momentum strategies aim to capitalize on persisting price trends; however, occasionally these trends break down, which we label momentum turning points. At these turning points, momentum strategies are prone to performing poorly because they are unable to adapt quickly to this abrupt change in regime. This concept is explored by Garg et al. (2021) who blended a slow momentum signal based on a long lookback window (LBW), such as 12 months, with a fast momentum signal based on a short LBW, such as 1 month. This approach is a balancing act between reducing noise and being quick enough to respond to turning points. Adopting the terminology from Garg et al. (2021), a bull or bear market is when the two momentum signals agree on a long or short position, respectively. If slow momentum suggests a long (short) position and fast momentum a short (long) position, we term this a correction (rebound) phase.

Correction and rebound phases, in which the momentum assumption breaks down, are examples of mean reversion (De Bondt and Thaler 1985; Poterba and Summers 1988; Jegadeesh 1991) regimes. Mean-reversion trading strategies, often referred to as follow the loser strategies, assume losers (winners) over some LBW will be winners (losers) in the subsequent period. If we observe the positions taken by a DMN, alongside exploiting persisting trends, the model also exploits fluctuations in return data at a shorter time horizon by regularly flipping its position and then quickly changing back again. We argue that the high Sharpe ratio achieved by DMNs can be largely attributed to this fast mean-reversion property.

Changepoint detection (CPD) is a field that involves the identification of abrupt changes in sequential data, in which the generative parameters for our model after the changepoint are independent of those that come before. The nonstationarity of real-world time series in fields such as finance, robotics, and sensor data has led to a plethora of research in this field. To respond to CPD in real time, we require an online algorithm, which processes each data point as it becomes available, as opposed to offline algorithms that consider the entire dataset at once and detect changepoints retrospectively. First introduced by Adams and MacKay (2007), Bayesian approaches to online CPD, which naturally accommodate to noisy, uncertain, and incomplete time-series data, have proven to be very successful. Assuming a changepoint model of the parameters, the Bayesian approach integrates out the uncertainty for these parameters as opposed to using a point estimate. Gaussian processes (GPs) (Williams and Rasmussen 1996; Rasmussen 2003), which are collections of random variables any finite number of which have joint Gaussian distributions, are well suited to time-series modeling (Roberts et al. 2013). GPs are often referred to as a Bayesian nonparametric model and have the ability to handle changepoints (Garnett et al. 2010; Saatçi, Turner, and Rasmussen 2010; Lloyd et al. 2014). Rather than comparing slow and fast momentum signals to detect regime change, we use GPs as a more principled method for detecting momentum turning points. For our experiments, we use the Python package GPflow (Matthews et al. 2017) to build Gaussian process models, which leverage the TensorFlow framework.

In this article, we introduce a novel approach, in which we add an online CPD module to a DMN pipeline to improve overall strategy returns. By incorporating the CPD module, we optimize our response to momentum turning points in a data-driven manner by passing outputs from the module as inputs to a DMN, which in turn learns trading rules and optimizes positions based on some finance value function, such as the Sharpe ratio (Sharpe 1994). This approach helps to correctly identify when we are in a bull or bear market and select the momentum strategy accordingly. With the addition of the CPD module, the new model learns how to exploit, but not overreact to, noise at a shorter time scale. Our strategy is able to exploit the fast reversion we observe in DMNs but effectively balance this with a slow momentum strategy and improve returns across an entire bull or bear regime. Effectively, the new pipeline has more knowledge on how to respond to abrupt changes, or a lack of changes, in a data-driven way.

We argue that the CPD is an artificial construct that can have varying degrees of severity and is dependent on choices such as the length of the lookback horizon. Rather than specifying regimes based on some criterion or threshold, we use our CPD module to quantify, or score, the level of disequilibrium, allowing the model to consider smaller or more localized regime changes. The length of the LBW is the most sensitive design choice for the CPD module—if the lookback horizon is too long, we miss smaller but still potentially significant regime changes, and if the horizon is too short, the data become too noisy and are of little value. We introduce the LBW length as a structural hyperparameter that we optimize using the outer optimization loop of our model. This allows the module to be more tightly coupled with our LSTM module, thus helping us to maximize the efficiency of the CPD and allowing us to tweak the LSTM hyperparameters in conjunction with the LBW.

It can be noted that the performance of DMNs, without CPD, deteriorates in more recent years. The deterioration in performance is especially notable in the 2015–2020 period, which exhibits a greater degree of turbulence, or disequilibrium, than the preceding years. One possible explanation for deterioration in momentum strategies in recent years is the concept of factor crowding, which is discussed in depth by Baltas (2019), who argued that arbitrageurs inflict negative externalities on one another. By using the same models, and hence taking the same positions, a coordination problem is created, pushing the price away from fundamentals. It is argued that momentum strategies are susceptible to this scenario. Impressively, the addition of a CPD module helps to alleviate the deterioration in performance, and our model significantly outperforms the standard DMN model during the 2015–2020 period. A similar phenomenon can be observed from around 2003, when electronic trading was becoming more common, where the deep learning–based strategies start to significantly outperform classic TSMOM strategies.

CHANGEPOINT DETECTION USING GAUSSIAN PROCESSES

A classic univariate regression problem of the form y(x) = f(x) + ϵ, where ϵ is an additive noise process, has the goal of evaluating the function f and the probability distribution p(y*|x*) of some point y* given some x*. Our daily time-series data, for asset i, consist of a sequence of observations for (closing) price Embedded Image, up to time T. Because financial time series are nonstationary in the mean, for each time t we take the first difference of the time series, otherwise known as the arithmetic returns

Embedded Image 1

in an attempt to remove any linear trend in the mean. Throughout this article, for brevity, we will refer to rt−1,t simply as rt. For the purposes of CPD, it is not computationally feasible, nor is it necessary, to consider the entire time series; hence, we consider the series Embedded Image, with lookback horizon l from time T. For every CPD window, where Embedded Image, we standardize our returns as

Embedded Image 2

This step is taken for two reasons: We can assume that the mean over our window is zero, and with unit variance, we have more consistency across all windows when we run our CPD module.

Our approach to changepoint detection involves a curve-fitting approach for input–output pairs Embedded Image via the use of GP regression (Rasmussen 2003). GP regression is a probabilistic, nonparametric method, popular in the fields of machine learning and time-series analysis (Roberts et al. 2013). It is a kernel-based technique in which the Embedded Image is specified by a covariance function kξ(·), which is in turn parameterized by a set of hyperparameters ξ. In its common guise, the GP has a stationary kernel; however, it should be noted that GPs can readily work well even when the time series is nonstationary (Brahim-Belhouari and Bermak 2004). We define the GP as a distribution over functions where

Embedded Image 3

given noise variance σn, which helps to deal with noisy outputs that are uncorrelated.

Rizvi (2018) and Liu, Kiskin, and Roberts (2020) demonstrated that a Matérn 3/2 kernel is a good choice of covariance function for noisy financial data, which tend to be highly nonsmooth and not infinitely differentiable. This problem setting favors the least smooth of the Matérn family of kernels, which is the 3/2 kernel. We parametrize our Matérn 3/2 kernel as

Embedded Image 4

with kernel hyperparameters ξM = (λ, σh, σn), where λ is the input scale and σh the output scale. We define our covariance matrix for a set of locations x = [x1, x2, … xn] as

Embedded Image 5

Using Embedded Image, we integrate out the function variables to give Embedded Image, with Embedded Image. Because Embedded Image is intractable, we instead apply Bayes’ rule

Embedded Image 6

and perform type II maximum likelihood on Embedded Image. We minimize the negative log marginal likelihood:

Embedded Image 7

We use the GPflow framework to compute the hyperparameters ξ, which in turn uses the L-BFGS-B optimization algorithm (Zhu et al. 1997) via the scipy.optimize.minimize package.

Garnett et al. (2010) and Roberts et al. (2013) assumed that our function of interest is well behaved, except for a drastic change, or changepoint, at c ∈ {t − l + 1, t − l + 2, …, t − 1}, after which all observations before c are completely uninformative about the observations after this point. It is important to note that the LBW l for this approach needs to be prespecified, and it is assumed that it contains a single changepoint. Each of the two regions is described by different covariance functions kξ1, kξ2, in our case Matérn 3/2 kernels, which are parameterized by hyperparameters ξ1 and ξ2, respectively. The region-switching kernel is

Embedded Image 8

with a full set of hyperparameters ξR = {ξ1, ξ2, c, σn}. Here, a changepoint can take multiple forms, with these cases being a drastic change in covariance, a sudden change in the input scale, or a sudden change in the output scale. In the context of financial time series, we can think of these cases as a change in correlation length, a change in mean-reversion length, or a change in volatility.

It is computationally inefficient to fit 2(l − 1) GPs, to minimize nlmlξR as in Equation 7, owing to the introduction the discrete hyperparameter c. We instead borrow an idea from Lloyd et al. (2014) and approximate the abrupt change of covariance in Equation 8 using a sigmoid function σ(x) = 1/(1 + e−s(x−c)), which has the properties σ(x, x′) = σ(x)σ(x′) and Embedded Image. Here, c ∈ (t − l, t) is the changepoint location, and s > 0 is the steepness parameter. Our changepoint kernel is

Embedded Image 9

with a full set of hyperparameters ξC = {ξ1, ξ2, c, s, σn}. We can compute nlmlξC by optimizing the parameters a single GP, which is significantly more efficient than computing nlmlξR, despite having additional hyperparameters. This new kernel has the added benefit of capturing more gradual transitions from one covariance function to another, owing to the addition of the steepness parameter s. We implement Equation 9 in GPflow via the gpflow.kernels.ChangePoints class, adding the constraint c ∈ (t − l, t), which is not enforced by default.

To quantify the level of disequilibrium, we look at the reduction in negative log marginal likelihood achieved via the introduction of the changepoint kernel hyperparameters through comparison to nlmlξM. If the introduction of additional hyperparameters leads to no reduction in negative log marginal likelihood, then the level of disequilibrium is low. Conversely, a large reduction indicates significant disequilibrium, or a stronger changepoint, because the data are better described by two covariance functions. Our changepoint score Embedded Image and location Embedded Image are

Embedded Image 10

which are both normalized values, which helps to improve stability and performance of our LSTM module.

Exhibit 1 shows plots of daily returns for the S&P 500 composite ratio-adjusted continuous futures contract during the first quarter of 2020, in which returns have been standardized as per Equation 2. The top plot fits a GP using the Matérn 3/2 kernel, and the bottom uses the changepoint kernel specified in Equation 9. The shaded blue region covers ±2 standard deviations from the mean, and we can see that the top plot is dominated by the white noise term σn ≈ 1. The black dotted line indicates the location of the changepoint hyperparameter c after minimizing negative log marginal likelihood, which aligns with the COVID-19 market crash. The negative log marginal likelihood is reduced from 88.0 to 47.9, which corresponds to Embedded Image.

EXHIBIT 1
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 1

Fitting the Matérn 3/2 (top) and Changepoint (bottom) Kernels to Daily Return Data

MOMENTUM STRATEGIES REVIEW

Classical Strategies

In this article, we focus on univariate time-series approaches (Moskowitz, Ooi, and Pedersen 2012), as opposed to cross-sectional (Jegadeesh and Titman 1993) strategies, which trade assets against each other and select a portfolio based on relative ranking. Volatility scaling (Kim, Tse, and Wald 2016; Harvey et al. 2018) has been proven to play a crucial role in the positive performance of TSMOM strategies, including deep learning strategies (Lim, Zohren, and Roberts 2019). We scale the returns of each asset by its volatility so that each asset has a similar contribution to the overall portfolio returns, ensuring that our strategy targets a consistent amount of risk. The consistency over time and across assets has the added benefit of allowing us to benchmark strategies. Targeting an annualized volatility σtgt, which we take to be 15% in this article, the realized return of our strategy from day t to t + 1 is

Embedded Image 11

where Xt is our position size, N the number of assets in our portfolio, and Embedded Image the ex ante volatility estimate of the ith asset. We compute Embedded Image using a 60-day exponentially weighted moving standard deviation.

The simplest trading strategy for which we benchmark performance is long only, for which we always select the maximum position Embedded Image. The original article on time-series momentum (Moskowitz, Ooi, and Pedersen 2012), which we will refer to as Moskowitz, selects a position as Embedded Image, where we are using the volatility scaling framework and rt−252,t is annual return. In an attempt to react more quickly to momentum turning points, Garg et al. (2021) blended a slow signal based on annual returns and a fast signal based on monthly returns to give an intermediate strategy:

Embedded Image 12

We control the relative contribution of the fast and slow signal via w ∈ [0, 1], with the case w = 0 corresponding to the Moskowitz strategy. We additionally use moving average convergence/divergence (MACD) (Baz et al. 2015) as a benchmark; for details on the implementation, we invite the reader to see Lim, Zohren, and Roberts (2019).

Deep Learning

We adopt a number of key choices that lead to the improved performance of DMNs.

LSTM architecture. Of the deep-learning architectures assessed by Lim, Zohren, and Roberts (2019), the LSTM (Hochreiter and Schmidhuber 1997) architecture yields the best results. LSTM is a special kind of recurrent neural network (RNN) (Goodfellow, Bengio, and Courville 2016), initially proposed to address the vanishing and exploding gradient problem (Bengio, Simard, and Frasconi 1994). An RNN takes an input sequence and, through the use of a looping mechanism in which information can flow from one step to another, can be used to transform this into an output sequence while taking into account contextual information in a flexible way. An LSTM operates with cells, which store both short-term memory and long-term memory, using gating mechanisms to summarize and filter information. Internal memory states are sequentially updated with new observations at each step. The resulting model has fewer trainable parameters, is able to learn representations of long-term relationships, and typically achieves better generalization results.

Trading signal and position sizing. Trading signals are learned directly by DMNs, removing the need to manually specify both the trend estimator and maps this into a position. The output of the LSTM is followed by a time-distributed, fully connected layer with a activation function tanh(·), which is a squashing function that directly outputs positions Embedded Image. The advantage of this approach is that we learn trading rules and position sizing directly from the data. Once our hyperparameters θ have been trained via backpropagation (LeCun et al. 2012), our LSTM architecture g(·; θ) takes input features Embedded Image for all time steps in the LSTM looking back from time T with τ steps and directly outputs a sequence of positions:

Embedded Image 13

In an online prediction setting, only the final position in the sequence Embedded Image is of relevance to our strategy.

Loss function. It has been observed (Potters and Bouchaud 2016) that correctly predicting the direction of a stock move does not translate directly into a positive strategy return, because the driving moves can often be large but infrequent. Furthermore, we want to account for trade-offs between risk and reward; hence, we explicitly optimize networks for risk-adjusted performance metrics. One such metric used by DMNs is the Sharpe ratio (Sharpe 1994), which calculates the return per unit of volatility. Our Sharpe loss function is

Embedded Image 14

where Ω is the set of all asset–time pairs {(i, t)|i ∈ {1, 2, …, N}, t ∈ {T − τ + 1, …, T}}. Automatic differentiation is used to compute gradients for backpropagation (Goodfellow, Bengio, and Courville 2016), which explicitly optimizes networks for our chosen performance metric.

Model inputs. For each time step, our model can benefit from inputting signals from various time scales. We normalize returns to be Embedded Image, given a time offset of t′ days. We use offsets t′ ∈ {1, 21, 63, 126, 256}, corresponding to daily, monthly, quarterly, biannual, and annual returns. We also encode additional information by inputting MACD indicators (Baz et al. 2015). MACD is a volatility-normalized moving-average convergence–divergence signal, defining the relationship between a short and long signal. For implementation details, please refer to Lim, Zohren, and Roberts (2019). We use pairs in {(8, 24), (16, 28), (32, 96)}. We can think of these indicators as performing a function similar to a convolutional layer.

TRADING STRATEGY

Strategy Definition

Because we are using a data-driven approach, we split our training data as a first step, setting aside the first 90% for training and the last 10% for validation for each asset. We calibrate our model using the training data by optimizing on the Sharpe loss function (Equation 14) via minibatch stochastic gradient descent (SGD), using the Adam (Kingma and Ba 2015) optimizer. We observe validation loss after each epoch, which is a full pass of the data, to determine convergence. We also use the validation set for the outer optimization loop, in which we tune our model hyperparameters. The hyperparameter optimization process is detailed in Appendix B.

It is necessary to precompute the CPD location Embedded Image and severity Embedded Image parameters as detailed by Equation 10. We do this for each time–asset pair in our training and validation set. It is necessary to do this for a chosen l ∈ {10, 21, 63, 126, 252}, corresponding to two weeks, a month, a quarter, half a year, and a full year. We selected these LBW sizes to correspond to input return timescales, with the exception of the 10-day LBW, which was selected to be as close to daily return data as reasonably possible. We reinitialize our Matérn 3/2 kernel for each time step, with all hyperparameters set to 1. This approach was found to be more stable than borrowing parameters from the previous time step. For our changepoint kernel, we initialize the hyperparameters as Embedded Image and s = 1. All other parameters are initialized as the equivalent parameter from fitting the Matérn 3/2 kernel, initializing kξ1 and kξ2 with the same values. In the rare case this process fails, we try again by reinitializing all changepoint kernel parameters to 1, with the exception of setting Embedded Image. In the event the module still fails for a given time step, we fill the outputs Embedded Image and Embedded Image using the outputs from the previous time step, noting that we need to increment the changepoint location by an additional step.

For each LSTM input, we pass in the normalized returns from the different time scales, our MACD indicators, and CPD severity and location for a chosen l. We can either fix l for our strategy or introduce it as a structural hyperparameter, which is tuned by the outer optimization loop. By doing this, we have information exchange from our CPD module all the way through to our Sharpe ratio loss function and traded positions. Once our model has been fully trained, we can run it online by computing the CPD module for the most recent data points and then using our LSTM module to select positions to hold for the next day for each asset.

Experiments via Backtesting

For all of our experiments, we used a portfolio of 50 liquid, continuous futures contracts over the period 1990–2020. The combination of commodities, equities, fixed income, and FX futures was selected to make up a well-balanced portfolio. The data were extracted from the Pinnacle Data Corp. CLC database (Pinnacle Data Corp. 2021), and the selected futures contracts are listed in Appendix A. All of the selected assets have less than 10% of data missing.

To back test our model, we use an expanding window approach, in which we start by using 1990–1995 for training/validation and then test out of sample on the period 1995–2000. With each successive iteration, we expand the training/validation window by an additional five years, perform the hyperparameter optimization again, and test on the subsequent five-year period. Data were not available from 1990 for every asset, and we only use an asset if there is enough data available in the validation set for at least one LSTM sequence. All of our results are recorded as an average of the test windows. We test our LSTM with the CPD strategy using an LBW l ∈ {10, 21, 63, 126, 252} and then with the optimized l for each window, based on validation loss.

We benchmark our strategy against those we have discussed, in which we choose w ∈ {0, 0.5, 1} for the intermediate strategy. We also compare our strategy to a DMN that does not have the CPD module. To maintain consistency with previous work by Lim, Zohren, and Roberts (2019), we benchmark strategy

  • 1. profitability through annualized returns and percentage of positive captured returns;

  • 2. risk through annualized volatility, annualized downside deviation, and maximum drawdown (MDD); and

  • 3. risk-adjusted performance through annualized Sharpe, Sortino, and Calmar ratios.

We provide results for both the raw signal output and then with an additional layer of volatility rescaling to the target of 15%, for ease of comparison between strategies. It should be noted that this article selects a more realistic 50-asset portfolio instead of the full 88 assets previously selected by Lim, Zohren, and Roberts (2019). We focus on the raw predictive power of the model and do not account for transaction costs at this stage; however, this is a simple adjustment and can easily be incorporated into the loss function. We have included some details and analysis of transaction costs in Appendix C. For further information on the implementation and the effects of transaction costs, please refer to Lim, Zohren, and Roberts (2019).

RESULTS AND DISCUSSION

Our aggregated out-of-sample prediction results, averaged across all five-year windows from 1995–2020, are recorded in Exhibit 3 and again in Exhibit 4 using volatility rescaling. We plot the effect of CPD LBW size on the average Sharpe ratio in Exhibit 2 and demonstrate how optimizing on this as a hyperparameter can improve overall performance. Impressively, due to our GP framework for CPD, we are able to achieve superior results with limited data and hence very small LBWs. There is a notable performance boost from only a two-week LBW, and performance almost maxes out after only one month, with an LBW of one quarter leading to the highest Sharpe ratio. As we approach an LBW of one year, we lose the benefit of the CPD module because it places too much emphasis on larger changepoints that are further in the past. We also note that the CPD computation becomes more intensive for l ∈ {126, 252}. If we introduce LBW as a hyperparameter to be reevaluated as the training window continues to expand, we observe an additional 4% increase in the Sharpe ratio, leading to a total increase of 33% over the LSTM baseline.

EXHIBIT 2
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 2

Risk-Adjusted Strategy Returns for Different Changepoint LBW Lengths

EXHIBIT 3
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 3

Strategy Performance Benchmark—Raw Signal Output

EXHIBIT 4
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 4

Strategy Performance Benchmark—Rescaled to Target Volatility of 15%

Another idea involved passing in outputs from multiple CPD modules with different LBWs in parallel as inputs to the LSTM. This was not found to improve the model and actually resulted in degraded performance. Multiple LBWs could be useful if using a more complex deep learning architecture than LSTM.

In Exhibit 5, we observe slow momentum and fast reversion strategies happening simultaneously. By introducing CPD, we are able to achieve superior returns because we are better able to learn the timing of these strategies and when to place more emphasis on one of them, using a data-driven approach. These plots examine the positions our DMN takes for single assets during periods of regime change, providing a comparison of a DMN with and without the CPD module. The top plots track the daily closing price, with the alternating white and gray regions indicating regimes separated by significant changepoints. CPD is performed online with a 63-day LBW, with the changepoint severity Embedded Image on the left plot and Embedded Image on the right plot indicating a changepoint. Each case uses a 63-day burn-in time before we can classify a subsequent changepoint. The middle plots compare the moving averages of position size taken for over a long time scale of one year, indicated by the solid lines, and a shorter timescale of one month, indicated by the dashed line. The bottom plots indicate cumulative returns for each strategy. The plots on the left look at the FTSE 100 Index during the lead up to the 2008 final crash and its aftermath. With the addition of CPD, our strategy is able to exploit persisting trends with better timing. It is quicker to react to the first dip in 2008, taking short positions to exploit the bear market with a slow momentum strategy, and is similarly able to react to adapt to the bull market established in 2009 by more quickly moving to a long strategy. Both approaches exhibit a fast reverting strategy; however, after the addition of CPD, the strategy is slightly less aggressive with positions taken in response to localized changes. The plots on the right look at the British pound exchange rate in the lead up to the Brexit vote in 2016 and its aftermath. Here, the bull and bear regimes are both less defined, and there is a higher level of nonstationarity. With the addition of the CPD module, our model takes a much more conservative slow momentum strategy and instead opts to focus more on achieving positive returns via a fast mean-reverting strategy.

EXHIBIT 5
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 5

Slow Momentum and Fast Reversion Strategies Happening Simultaneously

Our results demonstrate that, via the introduction of the CPD module, we outperform the standard DMN in all performance metrics. Our model correctly classifies the direction of the return more often and has a higher average profit-to-loss ratio. We can see that the CPD module helps to reduce risk, thus reducing volatility, downside deviation, and MDD while still achieving slightly higher raw returns. This translates to an improvement in risk-adjusted performance, improving the Sortino ratio by 35% and the Calmar ratio by 25%. These metrics suggest that the CPD module makes our model more robust to market crashes. We observe an improvement in the Sharpe ratio, our target metric, of 33%, which translates to an improvement of 130% in comparison to the best-performing TSMOM strategy.

We plot the raw and rescaled signals to benchmark strategies in Exhibit 6. The plot on the left, of raw signal output, demonstrates that via the introduction of the CPD module, we are able to reduce the strategy volatility, especially during the market nonstationarity of more recent years. With the exception of long only, we omit the reference strategies in this plot to avoid clutter. The plot on the right, of signal with rescaled volatility, demonstrates that our strategy outperforms all benchmarks with risk-adjusted performance. We show intermediate strategy output for w ∈ {0, 0.5, 1}. We can see the difficulties of trying to address regime change with handcrafted techniques such as the intermediate w = 0.5, which in our experiments actually fails to outperform the w = 0 Moskowitz strategy on all risk-adjusted performance ratios.

EXHIBIT 6
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 6

Benchmarking Strategy Performance

EXHIBIT 7
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 7

Hyperparameter Search Range

NOTES: *CPD LBW length can be either a hyperparameter or fixed.

EXHIBIT 8
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT 8

Looking at the Impact of Increasing Average Transaction Cost C from 0 to 5 bps and Comparing with Long-Only Benchmark (dashed line)

We note that up until about 2003, when the uptake of electronic trading was becoming much more widespread, the traditional TSMOM and MACD strategies are comparable to the results achieved via the LSTM DMN architecture. At this point, the LSTM starts to significantly outperform these traditional strategies until more recent years, when we see volatility increase and performance, especially risk-adjusted performance, drop significantly. This drop in performance can be largely attributed to increased market nonstationarity. Impressively, with the addition of the CPD module, our DMN pipeline continues to perform well even during the market nonstationarity of the 2015–2020 period. Using five repeated trials of the entire experiment, with and without CPD, the average improvement for the Sharpe ratio in this period is 70%, for LBW l = 21.

CONCLUSIONS

We have demonstrated that the introduction of an online CPD module is a simple, yet effective, way to significantly improve model performance, specifically DMNs. Our model is able to blend different strategies at different timescales, learning to do so in a data-driven manner directly based on our desired risk-adjusted performance metric. In periods of stability, our model is able to achieve superior returns by focusing on slow momentum while exploiting but not overreacting to local mean reversion. The impressive performance increase in periods of nonstationarity, such as recent years, can be attributed to the fact that we (1) can effectively incorporate CPD online with a very short LBW because we do so using GP and (2) pass changepoint score Embedded Image from our CPD module to the DMN, helping our model learn how to respond to varying degrees of disequilibrium. As a result, we enhance performance in such conditions in which we observe a more conservative slow momentum strategy with a focus on fast mean reversion.

Future work includes incorporating a CPD module into other deep learning architectures or performing CPD on a model representation as opposed to model inputs. The work in this article has natural parallels to the field of continual learning, which is a paradigm whereby an agent sequentially learns new tasks. Another direction of work will involve using continual learning for momentum trading, in which CPD is used to determine task boundaries.

ACKNOWLEDGMENT

We would like to thank the Oxford-Man Institute of Quantitative Finance for financial and computing support.

APPENDIX A

EXHIBIT A1
  • Download figure
  • Open in new tab
  • Download powerpoint
EXHIBIT A1

Dataset Details

APPENDIX B

EXPERIMENT DETAILS

We split our data into training and validation datasets using a 90%/10% split. We winsorize our data by limiting them to be within five times their exponentially weighted moving (EWM) standard deviations from their EWM average, using a 252-day half-life. We calibrate our model using the training data by optimizing on the Sharpe loss function via minibatch SGD, using the Adam optimizer. We limit our training to 300 epochs, with an early stopping patience of 25 epochs, meaning training is terminated if there is no decrease in validation loss during this time period. The model is implemented via the Keras API in TensorFlow. Our LSTM sequence length was set to 63 for all experiments. For training and validation, in an attempt to prevent overfitting, we split our data into non-overlapping sequences, rather than using a sliding window approach. A stateless LSTM is used, meaning the last state from the previous batch is not used as the initial state for the subsequent batch. Keeping the order of each individual sequence intact, we shuffle the order in which each sequence appears in an epoch. We employ dropout regularization (Srivastava et al. 2014) as another technique to avoid overfitting, applying it to LSTM inputs and outputs.

We tune our hyperparameters, with options listed in Exhibit 7, using an outer optimization loop. We achieve this via 50 iterations of random grid search to identify the optimal model. We perform the full experiment for each choice of CPD LBW length and then use the model that achieved the lowest validation loss for the optimized CPD model.

APPENDIX C

TRANSACTION COSTS

In Exhibit 8, we demonstrate the impact of transaction costs on our raw signal, in which we increase the average transaction cost from 0 to 5 bps. The black dotted line indicates the long-only reference. Our strategy outperforms classical strategies for transaction costs of up to 2 bps, at which point it rapidly deteriorates owing to the fast reverting component. We note that a larger CPD LBW size becomes favorable as we increase C. We suspect this is because the model focuses on larger long-term changepoints and favors slow momentum over fast reversion. For larger average transaction costs greater than 1 bp, we suggest incorporating turnover-adjusted returns into the loss function (Equation 14). This adjustment is detailed by Lim, Zohren, and Roberts (2019), who demonstrated that it works well when transaction costs are high. Assuming an average transaction cost of C, we calculate turnover adjusted returns as

Embedded Image C1
  • © 2021 Pageant Media Ltd

REFERENCES

  1. ↵
    1. Abadi, M.,
    2. Agarwal, A.,
    3. Barham, P.,
    4. Brevdo, E.,
    5. Chen, Z.,
    6. Citro, C.,
    7. Corrado, G. S.,
    8. Davis, A.,
    9. Dean, J.,
    10. Devin, M.,
    11. Ghemawat, S.,
    12. Goodfellow, I.,
    13. Harp, A.,
    14. Irving, G.,
    15. Isard, M.,
    16. Jia, Y.,
    17. Jozefowicz, R.,
    18. Kaiser, L.,
    19. Kudlur, M.,
    20. Levenberg, J.,
    21. Mane, D.,
    22. Monga, R.,
    23. Moore, S.,
    24. Murray, D.,
    25. Olah, C.,
    26. Schuster, M.,
    27. Shlens, J.,
    28. Steiner, B.,
    29. Sutskever, I.,
    30. Talwar, K.,
    31. Tucker, P.,
    32. Vanhoucke, V.,
    33. Vasudevan, V.,
    34. Viegas, F.,
    35. Vinyals, O.,
    36. Warden, P.,
    37. Wattenberg, M.,
    38. Wicke, M.,
    39. Yu, Y., and
    40. Zheng, X.
    2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. tensorflow.org, https://www.tensorflow.org/.
  2. ↵
    1. Adams, R. P., and
    2. MacKay, D. J. C.
    2007. “Bayesian Online Changepoint Detection.” arXiv 0710.3742.
  3. ↵
    1. Baltas, N.
    2019. “The Impact of Crowding in Alternative Risk Premia Investing.” Financial Analysts Journal 75 (3): 89–104.
    OpenUrl
  4. ↵
    1. Baltas, N., and
    2. Kosowski, R.
    2017. “Demystifying Time-Series Momentum Strategies: Volatility Estimators, Trading Rules and Pairwise Correlations.” SSRN, https://ssrn.com/abstract=2140091.
  5. ↵
    1. Bao, W.,
    2. Yue, J., and
    3. Rao, Y.
    2017. “A Deep Learning Framework for Financial Time Series Using Stacked Autoencoders and Long-Short Term Memory.” PLOS ONE 12 (7): 1–24.
    OpenUrlCrossRefPubMed
  6. ↵
    1. Baz, J.,
    2. Granger, N.,
    3. Harvey, C. R.,
    4. Le Roux, N., and
    5. Rattray, S.
    2015. “Dissecting Investment Strategies in the Cross Section and Time Series.” SSRN, https://ssrn.com/abstract=2695101.
  7. ↵
    1. Bengio, Y.,
    2. Simard, P., and
    3. Frasconi, P.
    1994. “Learning Long-Term Dependencies with Gradient Descent Is Difficult.” IEEE Transactions on Neural Networks 5 (2): 157–166.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Brahim-Belhouari, S., and
    2. Bermak, A.
    2004. “Gaussian Process for Nonstationary Time Series Prediction.” Computational Statistics & Data Analysis 47 (4): 705–712.
    OpenUrl
  9. ↵
    1. Bruder, B.,
    2. Dao, T. L.,
    3. Richard, J. C., and
    4. Roncalli, T.
    2013. “Trend Filtering Methods for Momentum Strategies.” SSRN, https://ssrn.com/abstract=2289097.
  10. ↵
    1. De Bondt, W. F. M., and
    2. Thaler, R.
    1985. “Does the Stock Market Overreact?” The Journal of Finance 40 (3): 793–805.
    OpenUrlCrossRef
  11. ↵
    1. Garg, A.,
    2. Goulding, C. L.,
    3. Harvey, C. R., and
    4. Mazzoleni, M.
    2021. “Momentum Turning Points.” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3489539.
  12. ↵
    1. Garnett, R.,
    2. Osborne, M. A.,
    3. Reece, S.,
    4. Rogers, A., and
    5. Roberts, S. J.
    2010. “Sequential Bayesian Prediction in the Presence of Changepoints and Faults.” The Computer Journal 53 (9): 1430–1446.
    OpenUrlCrossRef
  13. ↵
    1. Goodfellow, I.,
    2. Bengio, Y., and
    3. Courville, A.
    2016. Deep Learning. Cambridge, MA: MIT Press.
  14. ↵
    1. Gu, S.,
    2. Kelly, B. T., and
    3. Xiu, D.
    2017. “Empirical Asset Pricing via Machine Learning.” Research paper no. 18-04, Chicago Booth, https://ssrn.com/abstract=3159577.
  15. ↵
    1. Harvey, C. R.,
    2. Hoyle, E.,
    3. Korgaonkar, R.,
    4. Rattray, S.,
    5. Sargaison, M.,
    6. Van Hemert, O.
    2018. “The Impact of Volatility Targeting.” SSRN, https://ssrn.com/abstract=3175538.
  16. ↵
    1. Hochreiter, S., and
    2. Schmidhuber, J.
    1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–1780.
    OpenUrlCrossRefPubMed
  17. ↵
    1. Hurst, B.,
    2. Ooi, Y. H., and
    3. Pedersen, L. H.
    2017. “A Century of Evidence on Trend-Following Investing.” The Journal of Portfolio Management 44 (1): 15–29.
    OpenUrlAbstract/FREE Full Text
  18. ↵
    1. Jegadeesh, N.
    1991. “Seasonality in Stock Price Mean Reversion: Evidence from the US and the UK.” The Journal of Finance 46 (4): 1427–1444.
    OpenUrlCrossRef
  19. ↵
    1. Jegadeesh, N., and
    2. Titman, S.
    1993. “Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency.” The Journal of Finance 48 (1): 65–91.
    OpenUrlCrossRef
  20. ↵
    1. Kim, S.
    2019. “Enhancing the Momentum Strategy through Deep Regression.” Quantitative Finance 0 (0): 1–13.
    OpenUrl
  21. ↵
    1. Kim, A. Y.,
    2. Tse, Y., and
    3. Wald, J. K.
    2016. “Time Series Momentum and Volatility Scaling.” Journal of Financial Markets 30: 103–124.
    OpenUrl
  22. ↵
    1. Kingma, D., and
    2. Ba, J.
    2015. “Adam: A Method for Stochastic Optimization.” International Conference on Learning Representations.
  23. ↵
    1. LeCun, Y. A.,
    2. Bottou, L.,
    3. Orr, G. B., and
    4. Muller, K. R.
    2012. “Efficient BackProp.” In Neural Networks: Tricks of the Trade, pp. 9–48. Berlin: Springer.
  24. ↵
    1. Lempérière, Y.,
    2. Deremble, C.,
    3. Seager, P.,
    4. Potters, M., and
    5. Bouchard, J. P.
    2014. “Two Centuries of Trend Following.” Journal of Investment Strategies 3 (3): 41–61.
    OpenUrl
  25. ↵
    1. Levine, A., and
    2. Pedersen, L. H.
    2016. “Which Trend Is Your Friend.” Financial Analysts Journal 72 (3).
  26. ↵
    1. Lim, B., and
    2. Zohren, S.
    2020. “Time Series Forecasting with Deep Learning: A Survey.” arXix 2004.13408.
  27. ↵
    1. Lim, B.,
    2. Zohren, S., and
    3. Roberts, S.
    2019. “Enhancing Time-Series Momentum Strategies Using Deep Neural Networks.” The Journal of Financial Data Science 1 (4): 19–38.
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. Liu, B.,
    2. Kiskin, I., and
    3. Roberts, S.
    2020. “An Overview of Gaussian Process Regression for Volatility Forecasting.” 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 681–686. IEEE.
  29. ↵
    1. Lloyd, J. R.,
    2. Duvenaud, D.,
    3. Grosse, R.,
    4. Tenenbaum, J. B., and
    5. Ghahramani, Z.
    2014. “Automatic Construction and Natural-Language Description of Nonparametric Regression Models.” Proceedings of the AAAI Conference on Artificial Intelligence 28 (1).
  30. ↵
    1. Matthews, A. G. G.,
    2. van der Wilk, M.,
    3. Nickson, T.,
    4. Fujii, K.,
    5. Boukouvalas, A.,
    6. Leon-Villagra, P.,
    7. Ghahramani, Z., and
    8. Hensman, J.
    2017. “GPflow: A Gaussian Process Library Using TensorFlow.” Journal of Machine Learning Research 18 (40): 1–6.
    OpenUrl
  31. ↵
    1. Moskowitz, T. J.,
    2. Ooi, Y. H., and
    3. Pedersen, L. H.
    2012. “Time Series Momentum.” Journal of Financial Economics 104 (2): 228–250.
    OpenUrl
  32. ↵
    1. Paszke, A.,
    2. Gross, S.,
    3. Chintala, S.,
    4. Chanan, G.,
    5. Yang, E.,
    6. DeVito, Z.,
    7. Lin, Z.,
    8. Desmaison, A.,
    9. Angita, L., and
    10. Lerer, A.
    2017. “Automatic Differentiation in PyTorch.” Autodiff Workshop—Conference on Neural Information Processing (NIPS).
  33. ↵
    Pinnacle Data Corp. 2021. Pinnacle Data Corp. CLC Database. https://pinnacledata2.com/clc.html.
  34. ↵
    1. Poh, D.,
    2. Lim, B.,
    3. Zohren, S., and
    4. Roberts, S.
    2021. “Building Cross-Sectional Systematic Strategies by Learning to Rank.” The Journal of Financial Data Science 3 (2): 70–86.
    OpenUrlAbstract/FREE Full Text
  35. ↵
    1. Poterba, J. M., and
    2. Summers, L. H.
    1988. “Mean Reversion in Stock Prices: Evidence and Implications.” Journal of Financial Economics 22 (1): 27–59.
    OpenUrl
  36. ↵
    1. Potters, M., and
    2. Bouchaud, J.-P.
    2016. “Trend Followers Lose More Than They Gain.” Wilmott Magazine.
  37. ↵
    1. Rasmussen, C. E.
    2003. “Gaussian Processes in Machine Learning.” In: Summer School on Machine Learning, pp. 63–71. New York: Springer.
  38. ↵
    1. Rizvi, S. A. A.
    “Analysis of Financial Time Series Using Non-Parametric Bayesian Techniques.” PhD Thesis, University of Oxford, 2018.
  39. ↵
    1. Roberts, S.,
    2. Osborne, M.,
    3. Ebden, M.,
    4. Reece, S.,
    5. Gibson, N., and
    6. Aigrain, S.
    2013. “Gaussian Processes for Time-Series Modelling.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371: 20110550.
    OpenUrlPubMed
  40. ↵
    1. Saatçi, Y.,
    2. Turner, R. D., and
    3. Rasmussen, C. E.
    2010. “Gaussian Process Change Point Models.” ICML.
  41. ↵
    1. Sharpe, W. F.
    1994. “The Sharpe Ratio.” The Journal of Portfolio Management 21 (1): 49–58.
    OpenUrlFREE Full Text
  42. ↵
    1. Sirignano, J., and
    2. Cont, R.
    2018. “Universal Features of Price Formation in Financial Markets: Perspectives from Deep Learning.” SSRN, https://ssrn.com/abstract=3141294.
  43. ↵
    1. Srivastava, N.,
    2. Hinton, G.,
    3. Krizhevsky, A.,
    4. Sutskever, I., and
    5. Salakhutdinov, R.
    2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15: 1929–1958.
    OpenUrlCrossRef
  44. ↵
    1. Williams, C. K. I., and
    2. Rasmussen, C. E.
    1996. “Gaussian Processes for Regression.” NeurIPS Proceedings.
  45. ↵
    1. Zhang, Z.,
    2. Zohren, S., and
    3. Roberts, S.
    2019. “DeepLOB: Deep Convolutional Neural Networks for Limit Order Books.” IEEE Transactions on Signal Processing.
  46. ↵
    1. Zhu, C.,
    2. Byrd, R. H.,
    3. Lu, P., and
    4. Nocedal, J.
    1997. “Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization.” ACM Transactions on Mathematical Software (TOMS) 23 (4): 550–560.
    OpenUrl
Back to top

Explore our content to discover more relevant research

  • By topic
  • Across journals
  • From the experts
  • Monthly highlights
  • Special collections

In this issue

The Journal of Financial Data Science: 4 (3)
The Journal of Financial Data Science
Vol. 4, Issue 3
Summer 2022
  • Table of Contents
  • Index by author
  • Complete Issue (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on The Journal of Financial Data Science.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection
(Your Name) has sent you a message from The Journal of Financial Data Science
(Your Name) thought you would like to see the The Journal of Financial Data Science web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection
Kieran Wood, Stephen Roberts, Stefan Zohren
The Journal of Financial Data Science Dec 2021, jfds.2021.1.081; DOI: 10.3905/jfds.2021.1.081

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Save To My Folders
Share
Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection
Kieran Wood, Stephen Roberts, Stefan Zohren
The Journal of Financial Data Science Dec 2021, jfds.2021.1.081; DOI: 10.3905/jfds.2021.1.081
del.icio.us logo Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Tweet Widget Facebook Like LinkedIn logo

Jump to section

  • Article
    • Abstract
    • CHANGEPOINT DETECTION USING GAUSSIAN PROCESSES
    • MOMENTUM STRATEGIES REVIEW
    • TRADING STRATEGY
    • RESULTS AND DISCUSSION
    • CONCLUSIONS
    • ACKNOWLEDGMENT
    • APPENDIX A
    • APPENDIX B
    • APPENDIX C
    • REFERENCES
  • Info & Metrics
  • PDF

Similar Articles

Cited By...

  • No citing articles found.
  • Google Scholar
LONDON
One London Wall, London, EC2Y 5EA
0207 139 1600
 
NEW YORK
41 Madison Avenue, 20th Floor, New York, NY 10010
646 931 9045
pm-research@pageantmedia.com

Stay Connected

  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

MORE FROM PMR

  • Home
  • Awards
  • Investment Guides
  • Videos
  • About PMR

INFORMATION FOR

  • Academics
  • Agents
  • Authors
  • Content Usage Terms

GET INVOLVED

  • Advertise
  • Publish
  • Article Licensing
  • Contact Us
  • Subscribe Now
  • Sign In
  • Update your profile
  • Give us your feedback

© 2022 Pageant Media Ltd | All Rights Reserved | ISSN: 2640-3943 | E-ISSN: 2640-3951

  • Site Map
  • Terms & Conditions
  • Privacy Policy
  • Cookies