The selection of models for an ensemble is problem dependent. In the first article, “Ensemble Meta-Labeling,” Dennis Thumm, Paolo Barucca, and Jacques Francois Joubert develop a framework for choosing model architectures for meta-labeling and investigate the incorporation of ensembles. Their framework demonstrates how ensembles obtain a better generalization performance and explains how ensembles improve regimes’ detection and extraction, leading to higher model robustness. Furthermore, they show how ensembles increase model efficiency by decreasing the rate of false positives, thus increasing the meta premium. Their article builds on two previous articles published in 2022 in this journal—“Meta-Labeling Architecture” by Meyer, Joubert, and Mesias where various meta-labeling architectures such as ensembles were proposed, and “Meta-Labeling: Theory and Framework” by Joubert where an experiment and performance attribution for meta-labeling were described. Concerning the ensemble framework, the LightGBM and homogeneous dynamically selected ensembles (with a random forest classifier), the authors find offer the most promising results. Due to their great performance and superior robustness, the authors recommend that practitioners begin with these before branching out into heterogeneous pools, depending on the sophistication of their primary model.
A new approach to prediction based on relevance, which gives a measure of the importance of an observation to a prediction, and fit, which measures the reliability of a specific prediction task, is proposed by Megan Czasonis, Mark Kritzman, and David Turkington in “Relevance-Based Prediction: A Transparent and Adaptive Alternative to Machine Learning.” The authors demonstrate how by using a regression algorithm that they propose, relevance-based prediction can address the codependence of observations and variables. This algorithm, which they call “CKT regression”, identifies the optimal combination of observations and predictive variables for any given prediction task. Relevance-based prediction, they argue, compares favorably to linear regression analysis because the proposed CKT regression is more transparent and efficiently adapts to asymmetry between predictive variables and outcomes. Relevance-based prediction, the authors claim, also has important advantages with respect to machine learning due to its greater transparency and flexibility, and because it is less arbitrary than commonly used machine learning algorithms.
Asset managers are searching for methods to integrate environmental, social, and governance (ESG) criteria into portfolio selection. While text sources such as company statements, press releases, and regulatory disclosures are available to manually extract ESG data, this can be expensive and inconsistent due to human interpretation. In “ESG Text Classification: An Application of the Prompt-Based Learning Approach,” Zhengzheng Yang, Le Zhang, Xiaoyun Wang, and Yubo Mai propose how prompt-based learning, a cutting-edge natural language processing (NLP) technology, can be applied to classify textual data into ESG and non-ESG categories. More specifically, using data from Refinitiv, the authors create a prompt-based ESG classifier and benchmark it against a traditional pre-train and fine-tune classifier using statistical test. Based on their experiments, they report that this approach outperforms the traditional fine-tuning approach and could perform well when labeled data size is small.
In credit risk modeling, emphasis is placed on building predictive yet intuitive and explainable risk models. In “An Integrated Framework on Human-in-the-Loop Risk Analytics,” Peng Liu proposes a constrained and partially regularized logistic regression (CPR-LR) model in the context of credit scoring. The model proposed is designed to flexibly incorporate user preferences and constraints on coefficient sign and feature importance. To provide sufficient transparency and user control in the model development process while ensuring decent predictive performance, every constraint in the model is explicitly added as either a soft or a hard constraint. Running experiments on several benchmark datasets, Peng demonstrates the advantages of the proposed model with respect predictive performance.
An innovative approach to building a real-time geopolitical risk index from news data using textual analysis is introduced by Matthias Apel, André Betzer, and Bernd Scherer in their article “Point-in-Time Language Model for Geopolitical Risk Events.” With little input required, the proposed method generates point-in-time dictionaries of terms related to political tension. Selecting topic-related news articles based on these dictionaries, the authors construct a global media attention index from country-by-country data for the purpose of identifying different dimensions of geopolitical risk. Their findings suggest how topic identification and news index construction may benefit from a time-dependent dictionary generation and show that their approach can resemble the results of other more supervised methods.
It is well established through empirical evidence that financial time series exhibit autocorrelation, nonstationarity, and nonlinearity. Gabriel Borrageiro, Nick Firoozye, and Paolo Barucca, in “Online Learning with Radial Basis Function Networks,” report on experiments that demonstrate when making multihorizon forecasts the added value provided by feature selection, nonlinear modeling, and online learning. By combining feature representation transfer with sequential optimization to provide multihorizon returns forecasts, their online learning RBFNet outperforms a random-walk baseline and several powerful batch learners.
The complex nature of blockchain and the inability to clearly measure and properly communicate the risks associated with this new technology has hampered the development, growth, proper regulation, and, ultimately, the true societal beneficial contributions of blockchains. In an attempt to clarify the confusion about blockchain risks, Jiarui Chen, Qilin QiQi Tong, Himanshu Verma, Avinash Sharma, Anton Dahbura, and Jim Kyung-Soo Liew offer some initial thoughts in their article “The Complexity of Blockchain Risks Simplified and Displayed: Introduction of the Johns Hopkins Blockchain Risk Map.” To do so, the authors introduce a new independent and academically rigorous attempt to both measure and add transparency into the complex dimensions of blockchain risks. The Johns Hopkins Blockchain Risk Map that they propose presents their risk map prototype, their current multidimensional exhibit of risks across the various stakeholders, and their current modest progress with some data on their current risk measures.
It is challenging to visualize a financial time-series dataset not only because it is usually in high dimensions but also because the data contain a lot of noise that can make less obvious the patterns of interest. In “Visualizing Structures in Financial Time-Series Datasets through Affinity-Based Diffusion Transition Embedding,” the problem of visualizing financial assets in a low-dimensional embedding (which generally means a two-dimensional graph) is tackled by Rui Ding. The author proposes a modification of a diffusion transition embedding algorithm (PHATE) that is employed for working on financial time-series data. The new embedding algorithm for visualizing structures in financial time-series datasets, which the author calls FATE, is based on using specific distance metrics that are meaningful for time-series data. The author finds that the visualization results from applying FATE on multiple stock returns data and synthetic time-series data demonstrate its ability to reveal meaningful local and global structures underlying the data. The author concludes that the selection of distance metrics is critical in the kind of structure that can be uncovered from the time-series dataset.
Francesco A. Fabozzi
Managing Editor
- © 2023 Pageant Media Ltd