TY - JOUR T1 - Benchmark Dataset for Short-Term Market Prediction of Limit Order Book in China Markets JF - The Journal of Financial Data Science DO - 10.3905/jfds.2021.1.074 SP - jfds.2021.1.074 AU - Charles Huang AU - Weifeng Ge AU - Hongsong Chou AU - Xin Du Y1 - 2021/09/06 UR - https://pm-research.com/content/early/2021/09/05/jfds.2021.1.074.abstract N2 - Limit order books (LOBs) have generated big financial data for analysis and prediction from both academic community and industry practitioners. This article presents a benchmark LOB dataset from the Chinese stock market, covering a few thousand stocks for the period of June to September 2020. Experiment protocols are designed for model performance evaluation: at the end of every second, to forecast the upcoming volume-weighted average price change and volume over 12 horizons ranging from 1 second to 300 seconds. Results based on a linear regression model and deep learning models are compared. A practical short-term trading strategy framework based on the alpha signal generated is presented. The data and code are available on Github (github.com/HKGSAS).TOPICS: Security analysis and valuation, emerging markets, big data/machine learning, performance measurementKey Findings▪ There is a gap between benchmarking a high-frequency LOB dataset and model for researchers to objectively assess prediction performances, which this article serves to bridge.▪ A more practically effective set of features is proposed to capture both LOB snapshots and periodic data. The prediction target is similarly too simplistic in the published literature—mid-price direction change for the next few events, which is not suitable for a practical trading strategy. The authors propose to predict the price change and volume magnitude over 12 short-term horizons. ▪ This article proposes comparing the performance of baseline linear regression and state-of-the-art deep learning models, based on both accuracy statistics and trading profits. ER -