## Abstract

In this article, the authors present a conceptual framework named *adaptive seriational risk parity* (ASRP) to extend hierarchical risk parity (HRP) as an asset allocation heuristic. The first step of HRP (quasi-diagonalization), determining the hierarchy of assets, is required for the actual allocation done in the second step (recursive bisectioning). In the original HRP scheme, this hierarchy is found using single-linkage hierarchical clustering of the correlation matrix, which is a static tree-based method. The authors compare the performance of the standard HRP with other static and adaptive tree-based methods, as well as seriation-based methods that do not rely on trees. Seriation is a broader concept allowing reordering of the rows or columns of a matrix to best express similarities between the elements. Each discussed variation leads to a different time series reflecting portfolio performance using a 20-year backtest of a multi-asset futures universe. Unsupervised learningbased on these time-series creates a taxonomy that groups the strategies in high correspondence to the construction hierarchy of the various types of ASRP. Performance analysis of the variations shows that most of the static tree-based alternatives to HRP outperform the single-linkage clustering used in HRP on a risk-adjusted basis. Adaptive tree methods show mixed results, and most generic seriation-based approaches underperform.

**Key Findings**

▪ The authors introduce the

*adaptive seriational risk parity*(ASRP) framework as a hierarchy of decisions to implement the quasi-diagonalization step of hierarchical risk parity (HRP) with seriation-based and tree-based variations as alternatives to single linkage. Tree-based variations are further separated in static and adaptive versions. Altogether, 57 variations are discussed and connected to the literature.▪ Backtests of the 57 different HRP-type asset allocation variations applied to a multi-asset futures universe lead to a correlation matrix of the resulting 57 portfolio return time series. This portfolio return correlation matrix can be visualized as a dendrogram using single-linkage clustering. The correlation hierarchy reflected by the dendrogram is similar to the construction hierarchy of the quasi-diagonalization step. Most seriation-based strategies seem to underperform HRP on a risk-adjusted basis. Most static tree-based variations outperform HRP, whereas adaptive tree-based methods show mixed results.

▪ The presented variations fit into a triple artificial intelligence approach to connect synthetic data generation with explainable machine learning. This approach generates synthetic market data in the first step. The second step applies an HRP-type portfolio allocation approach as discussed in this article. The third step uses a model-agnostic explanation such as the SHAP framework to explain the resulting performance with features of the synthetic market data and with model selection in the second step.

To achieve more robust portfolios, there is a growing effort to replace mean–variance optimization in allocation models. Many modeling approaches reflect the hierarchical correlation dynamics of financial markets. Such approaches have gained much attention in the academic and quantitative investment community, and many extensions and modifications have been proposed in recent years.

Recent approaches for simulating realistic financial correlation matrixes explicitly address hierarchy as stylized facts (see Huettner and Mai 2019; Marti 2019; Jaeger et al. 2021; Papenbrock et al. 2021). Modeling the correlation hierarchy of markets has also been used for the recognition of market regimes (Papenbrock and Schwendner 2015).

Related approaches model market complexity as networks or use partitional/flat clustering. Examples include work by Onnela et al. (2003), Lohre, Papenbrock, and Poonia (2014), and Baitinger and Papenbrock (2015, 2017). In this article, however, we focus on purely hierarchical approaches. In recent years, new algorithms for portfolio construction explicitly take the hierarchy of financial markets into account, acknowledging that the behavior of the financial markets is similar to a complex system. These new methods address three major concerns about quadratic optimizers: instability, concentration, and out-of-sample underperformance. Some of these hierarchical methods are even optimization free, recognizing traditional optimizers as being vulnerable because they require a well-conditioned covariance matrix. These types of sampling errors stemming from historical time series data are well documented (e.g., Pafka and Kondor 2002; Menchero and Ji 2019).

According to López de Prado (2016a), there are numerical instabilities caused by both noise (i.e., when the dimension of the covariance matrix outnumbers observations) and signal (the source of this instability is distinct and unrelated to noise). In financial markets, correlation clusters exist as a consequence of hierarchical relationships destabilizing optimization. Therefore, López de Prado (2016b) proposed an optimization-free, heuristic algorithm called hierarchical risk parity (HRP). Such portfolios can outperform traditional approaches out of sample.

This article starts by discussing the origins and properties of hierarchical approaches such as HRP. Second, we highlight practical implementation issues and discuss underlying assumptions. Third, we introduce variations, extensions, and generalizations in a conceptual framework labeled *adaptive seriational risk parity* (ASRP).

In an empirical study, we back test the discussed ASRP variations using a multi-asset universe of 17 liquid futures markets. We further analyze the correlation hierarchy of the resulting portfolio return time series in a dendrogram and show the risk-adjusted returns relative to the HRP strategy. The risk-adjusted return measures are validated with bootstrapping. Finally, we suggest embedding the discussed allocation variations in a *triple artificial intelligence *(AI) approach to validate investment strategies.

## ORIGINS OF THE HIERARCHICAL APPROACHES

The idea of grouping stocks in hierarchical ways dates back to the 1960s (for a short introduction, see Papenbrock 2011). A seminal work on hierarchical clustering of financial markets was presented by Mantegna (1999), followed by more than two decades of research on correlations, hierarchies, networks, and clustering in financial markets (see Marti et al. 2017 for a very comprehensive overview).

Some papers focus on cleaning or filtering the correlation or covariance matrix, with a special focus on hierarchical clustering. Based on that input, in a second step, a global minimum variance portfolio is built (e.g., Tola et al. 2008,^{1} and Bongiorno and Challet 2020). These approaches could be called hierarchical minimum variance (HMV). A second stream of literature employs factor models to identify hierarchies (Tumminello, Lillo, and Mantegna 2007; Avellaneda 2019).

This article focuses on outright utilization of hierarchical clustering in portfolio allocation. Hierarchical models can build portfolios bottom-up or top-down. A top-down example is the so-called waterfall approach by Papenbrock (2011) that was further discussed by Raffinot (2017). It is a machine learning (ML) approach that uses a distance matrix based on a group of asset return time series and learns a hierarchical representation using a clustering procedure that is visualized as a hierarchical tree or dendrogram. It reveals the hierarchical distance or proximity among assets in a first step. Thereafter, it allocates capital according to hierarchical splits from top to bottom in a waterfall-like style: At each binary split, it allocates capital 50/50 downward to the leaves of the tree. In the following, we will call this procedure *hierarchical equal weight* (HEW). The HEW approach implicitly assumes each splitting point in the cluster tree of two clusters on the left and right branch exhibits the same risk level. Therefore, the splitting of capital allocation at each consecutive branch in the top-down hierarchy assumes risk parity among the two branch clusters.

These approaches can be varied in a number of ways: Different investors may choose different linkage types of the clustering algorithms, representing differently implemented strategies. For example, single-linkage hierarchical clustering exhibits a chaining effect, whereas a Ward-based hierarchical clustering tends to create clusters of similar sizes (Ward 1963). The results are based on the correlation coefficient so that individual risk measures such as variance or reward measures (e.g., expected returns) may enter the model. The distance matrix could incorporate correlation- and risk-level information on the assets.

López de Prado (2016b) proposed a similar method called HRP. The author introduced it in the following way:

The Hierarchical Risk Parity approach addresses three major concerns of quadratic optimizers, in general, and Markowitz’s critical line algorithm (CLA), in particular: instability, concentration, and underperformance. HRP applies modern mathematics (graph theory and machine-learning techniques) to build a diversified portfolio based on the information contained in the covariance matrix. However, unlike quadratic optimizers, HRP does not require the invertibility of the covariance matrix. In fact, HRP can compute a portfolio on an ill-degenerated or even a singular covariance matrix—an impossible feat for quadratic optimizers. Monte Carlo experiments show that HRP delivers lower out-of-sample variance than CLA, even though minimum variance is CLA’s optimization objective. HRP also produces less risky portfolios out of sample compared to traditional risk parity methods.

HRP also asserts that the correlation structure contains ordinal information, which can be exploited by organizing the assets into a hierarchy. HRP reorganizes the covariance matrix such that it is as close as possible to a diagonal matrix, without altering the covariance estimates. The minimum variance portfolio of a diagonal matrix is the inverse variance portfolio. For this reason, the method is also known as HMV.

HRP basically consists of two steps:

**1. Quasi-diagonalization:**Permutation of the covariance matrix according to a hierarchical clustering approach, based on hierarchical clustering (tree/dendrogram).**2. Recursive bisectioning:**Splitting the matrix into equally sized clusters and allocating capital in inverse proportion to the risk of the clusters. This splitting is executed recursively until the clusters consist of single assets. Thus, the algorithm naively bisects all assets into two equal-sized groups. The covariance matrix is simply split into two equal-sized covariance submatrixes.

HRP and HEW have similarities and differences: They both start with hierarchical clustering and distribute capital according to binary splits. Both may also be adapted by changing the hierarchical clustering and/or distance matrix. One difference between the approaches is that HRP divides the matrix into equally sized bins, whereas HEW splits along the hierarchical tree structure. A second difference is that the HEW approach creates a 50/50 notional split, whereas HRP splits according to the inverse risk of the two clusters. If the tree is symmetric and contains clusters of equal risk at all tree levels, then HRP and HEW do coincide. A final difference is that HRP uses both the correlation matrix and the volatilities of the assets to weight the portfolio. In contrast, HEW only uses all pairwise correlation information. However, HEW can be modified to also account for the asset volatilities.

Jaeger et al. (2021) confirm the robustness of HRP for a multi-asset futures universe. It minimizes the variance, but not in too concentrated a way, as can be observed with the minimum variance approach. Its weight structure is balanced, as can be quantified by a number of diversification measures. In this way, it is relatively robust against to idiosyncratic and systemic shocks.

## AGONY OF CHOICE

HRP and related approaches based on hierarchical clusters can be designed, configured, and parameterized on an almost infinite space of combinations and variations. The reported two steps of HRP may be adapted, as is often done in the literature. For example, the tree clustering step can be done with different linkage approaches to hierarchical clustering. The sectioning step does not necessarily have to be carried out in equal-size splits but, rather, can explicitly use the dendrogram structure. Bisectioning might also be stopped at a certain point, especially when portfolios are large, correlations increase, and clustering becomes less separated. In these cases, cutting of the dendrogram may take place at any plausible point, such as when an optimal number of clusters is reached or when the number of clusters approximates some external grouping criteria such as industry sector, style, geography, or (sub) asset class. Consequently, there are many more choices for preprocessing regarding the input matrixes of correlation, similarity, and distance.

Next, the risk of an asset or cluster may be estimated by variance, volatility, or other risk measures, and the correlations within and across clusters may be included. Tail correlation and other higher-order effects may also be modeled. There are also configuration choices related to the investment mandate and to the institutional requirements and constraints, addressing questions such as the size of the universe, the rebalancing and data frequency, the trading cost, the amount of turnover allowed during the year, the level of concentration allowed in the portfolio, how it is supposed to respond to specific and systematic shocks, and whether there are group or box constraints.

Introducing more or less sophistication and precision in modeling leads to different sets of assumptions and potential overexpression in balancing precision, bias, and time lag. For example, a higher update frequency for a sampled covariance matrix might decrease the tracking error of a portfolio while increasing costly portfolio turnover. The portfolio should reflect the data-generating process, nonstationarity, asset class and hedging relationships, as well as the market conditions.

These examples show that finding the right HRP configuration or hierarchical cluster model is not trivial. Therefore, the following section gives an overview of HRP variations and some examples of how to address these issues.

## LITERATURE ON HRP EXTENSIONS

### Tree-Based Sectioning

In the standard HRP approach, bisectioning might separate highly correlated assets. The waterfall approach of HEW by Papenbrock (2011) introduces splitting according to (correlation cluster) tree structure and therefore is called tree-based sectioning. In contrast, the standard HRP bisection step might separate highly correlated assets by construction. The loss of information resulting from ignoring the correlation between two clusters should, however, be minimized. The next step in HRP weights the two split clusters according to the inverse of their risk whereas, HEW weights them equally.

Other approaches, such as those by Alipour et al. (2016) and Lohre, Rother, and Schäfer (2020), also apply the tree-sectioning method. They are thus similar to the HEW approach. However, they use alternative tree construction methodologies and do not equally weight the clusters at each sectioning step.

Pfitzinger and Katzke (2019) developed a flexible extension of the bisectioning step of HRP in which there is a parameter, tau, between zero and one, parameterizing a bisection of HRP at one end and the split according to the tree-section method on the other. The tree-section step seems to be the superior and a more intuitive way to approach the recursive sectioning step of HRP. However, note that such methods rely on tree cluster quality not only for the diagonalization step but also for the sectioning. Whenever there is suboptimal fit of tree cluster structures to real data, tree-section strategies are subject to higher model risk.^{2}

### Cleaned/Filtered Correlation/Covariance Matrix

Molyboga (2020) introduced a modified HRP (MHRP) approach that extends the HRP approach by incorporating three intuitive elements commonly used by practitioners. The new approach (1) replaces the sample covariance matrix with an exponentially weighted covariance matrix with Ledoit–Wolf shrinkage; (2) improves diversification across portfolio constituents both within and across clusters by relying on an equal volatility, rather than an inverse variance, allocation approach; and (3) improves diversification across time by applying volatility targeting to portfolios. The author examines the impact of the enhancements on portfolios of commodity trading advisors within a large-scale Monte Carlo simulation framework that accounts for the realistic constraints of institutional investors. The author finds a striking improvement in the out-of-sample Sharpe ratio of 50%, on average, along with a reduction in downside risk.

Jothimani and Bener (2019) combined the idea of HRP with robust Gerber statistics (HRP-GS). They tested the model using stocks composing the TSX index for a period of 10 years (2007–2016). Their results suggest that the proposed HRP-GS model outperforms the standard HRP model.

### Alternative Codependence and Distance Metrics

Barziy and Chlebus (2020) compared HRP’s performance under various codependence and distance metrics. The algorithm is tested using modified codependence metrics for the instruments in a portfolio (distance correlation, mutual information, variation of information) and distance metrics to transform the codependence matrix into the distance matrix (angular, absolute angular, and squared angular).

Jain and Jain (2019) highlighted the need to account for covariance misspecification and test for predictive ability in out-of-sample portfolio performance. Next to sample-based covariance (SMPL), they used exponentially weighted moving average and dynamic conditional correlation GARCH. They found that when the covariance estimates are crude, inverse volatility weighted portfolios are more robust, followed by HRP. Minimum variance and maximum diversification are most sensitive to covariance misspecification. HRP seems to offer a compromise: It is less sensitive to covariance misspecification compared with minimum variance or maximum diversification portfolio, but it is not as robust as the inverse volatility weighted portfolio.

Lohre, Rother, and Schäfer (2020) tested a codependence measure incorporating tail behavior of assets. In a multifactor multistyle universe, they investigated the tail-HRP approach using the conditional Spearman’s rho estimator by Schmid and Schmidt (2007). They also used the following distance measure: *d*(*X*, *Y*) = –log(lambda), where lambda is the tail dependence coefficient.

### Simultaneous Modifications to HRP

Pfitzinger and Katzke (2019) introduced a constrained HRP with box and group constraints relevant for solving practical portfolio problems. These constraints can be combined with other extensions that the authors described: bisectioning or tree sectioning (with respective parameter tau) plus an alternative quasi-diagonalization generated by genetic HRP.

Raffinot (2018) combined several potential modifications of HRP in the HERC algorithm: The cluster dendrogram can be pruned at a point corresponding to some optimality criterion regarding the number of flat/partitional clusters, and cluster risk contributions can be measured by variance, standard deviation, conditional value at risk, and conditional drawdown at risk. Different weighting schemes can be applied to the assets within the clusters.

### Other HRP Solutions

Burggraf (2020) applied HRP to a large portfolio of cryptocurrencies and finds that HRP outperforms in terms of tail risk-adjusted return.

Finally, HRP leads occasionally to higher turnover, as reported by Kolrep et al. (2020). The authors suggest a smooth HRP method to mitigate this issue.

### Adaptive Seriational Strategies in Heuristic Portfolio Construction

The main idea of ASRP is that the tree-based quasi-diagonalization step of HRP is adaptively extended to a wider class of methodologies to diagonalize a distance matrix based on a technique called *seriation*. Remember that the goal of HRP is to translate/reorganize the covariance matrix such that it is as close as possible to a diagonal matrix, without altering the covariance estimates. The minimum variance portfolio of a diagonal matrix is the inverse variance portfolio. For this reason, HRP is sometimes also called HMV. Inverse-variance asset allocation is most appropriate for assets with an approximately diagonal correlation matrix. High correlations are placed adjacent and close to the matrix diagonal, achieving the desired quasi-diagonal structure.

There are many ways to diagonalize a matrix, so it does not necessarily require a tree structure. The tree structure, however, often has the advantage of fast computation, and it extracts the hierarchical nature of complex data. However, sometimes non–tree-based seriation can be more appropriate. Even when sticking to tree structures, there are numerous hierarchical clustering algorithms and not just a single one like single linkage, as in standard HRP. There are many tree-based and non–tree-based seriation methods. Therefore, we introduce an automated selection procedure that picks the most suitable seriation for the quasi-diagonalization step in HRP.

As markets evolve over time, the choice of seriation alters, requiring updating whenever another seriation method becomes more appropriate. Consequently, the seriation method could be chosen adaptively depending on market situations. Even if no adaptive procedure is needed, it is crucial to decide which seriation method to use for an entire strategy or dataset as a one-time choice. The following section will describe methodologies for choosing appropriate seriation and criteria. Both seriation and criteria may or may not be based on trees.

Exhibit 1 shows our ASRP construction hierarchy of strategies. The first choice is between static seriation-based and tree-based approaches, which include static and adaptive variations. HRP uses single linkage as an example for a static tree-based method. Adaptive methods may be based on the distance matrix but also on other methods, as discussed earlier. We compare this construction hierarchy to the empirical hierarchy of performance similarity for the empirical dataset.

### Empirical Study

We rely on a multi-asset universe of equity index, sovereign bond, and commodity futures from May 3, 2000 to June 30, 2020 with a daily frequency, as by Jaeger et al. (2021) and Papenbrock et al. (2021). Exhibit 2 shows the Bloomberg tickers, asset classes, currency, and names of the 17 futures markets. At each monthly rebalancing date, the ASRP variations are applied to a rolling historical one-year window. The leverage of the resulting portfolio allocation is set with the aim to realize a 5% p.a. volatility target. This is achieved by setting leverage to the ratio of the target volatility to the maximum of the empirical portfolio volatilities computed in rolling windows of 20 and 60 trading days. We account for a 2-bps transaction cost for a halfturn.

### The Impact of Quasi-Diagonalization

We use this multi-asset universe and take 2019 market data for estimation. The period from January 1, 2020 to June 30, 2020 is the investment period for an out-of-sample test. The standard HRP strategy with static weights from January 1, 2020 without further rebalancing and without a volatility target results in a Sharpe ratio of 0.732 for the six-month investment period. To assess the impact of the quasi-diagonalization step, we sample 10,000 random asset permutations instead of the single-linkage hierarchical clustering of HRP. Thereafter, we apply the bisection step of the standard HRP strategy to each permutation using the original covariance matrix and measure the annualized Sharpe ratios. Exhibit 3 presents the Sharpe ratio statistics.

Exhibit 4 shows the smoothed density of Sharpe ratios resulting from HRP strategies across the sampled random matrix permutations. The Sharpe ratio of the unchanged HRP strategy (0.732) is marked as a vertical red line. The large standard deviation σ = 0.184 of Sharpe ratios translates into a half-width of about σ × 2.4 = 0.44, illustrating that the permutation in this dataset plays a huge role.

### Seriation-Based Quasi-Diagonalization

Seriation, also referred to as *ordination* or *matrix permutation*, dates back to Petrie (1899). One more recent application of seriation is the visualization of tables, matrixes, clusters, and networks outlined by Behrisch et al. (2016).

In this article, we focus on the criteria and seriation methods as described by Hahsler, Hornik, and Buchta (2008) and Hahsler (2017). These resources contain a number of seriation methods (tree and non-tree), as well as criteria that measure the quality of some matrix permutations. One such seriation method, called *inertia*, is further described by Hahsler, Hornik, and Buchta (2008) and Caraux and Pinloche (2005). Alipour et al. (2016) used a variation of it for HRP-like strategies. Quantum computers could solve this problem, but currently it can also be addressed by a genetic algorithm, as shown by Pfitzinger and Katzke (2019), if the number of iterations is reasonably small.

Specifically, Alipour et al. (2016) used a variation of inertia to quasi-diagonalize the matrix for the first HRP step. Many alternatives to this seriation are outlined in the section “Non–Tree-Based Quasi-Diagonalization.” After the inertia-like seriation step, the authors do not execute the bisection step of HRP but insert another additional step that produces a tree based on the seriation. This in turn enables fine-tuned sectioning that better addresses a block structure in the diagonalization, namely a tree-sectioning instead of a naive bisectioning.^{3} According to the authors, this approach yields more robust results than HRP. An explanation could be that the diagonalization of the seriation method is very effective and that the tree construction based on the seriation enables a tree-sectioning instead of naive bisectioning. An important step would be to analyze the performance attribution of (1) the introduction of inertia-like seriation and (2) tree-sectioning instead of bisectioning.

To summarize, the methodology of Alipour et al. (2016) can be generalized in the following two ways:

**1.**The seriation method may be replaced by numerous alternatives that potentially better meet some criteria for quasi-diagonalization.**2.**The extra step of creating a tree out of a seriation may be applied or skipped, depending on the extra performance contribution of this step.

Steps 1 and 2 in combination are very advantageous for HRP-like portfolio construction because inverse-variance asset allocation is most appropriate for assets with an approximately diagonal correlation matrix and because the potential block structure is recognized by the tree-sectioning and not ignored as in the naive bisectioning. However, superimposed tree structures for tree-sectioning may carry additional model risk in that the tree might not represent the structures very well.

In our empirical study, we use the non–tree-based (nonhierarchical) seriation methods described by Hahsler, Hornik, and Buchta (2008) and shown in Exhibit 5, accepting a distance matrix as input.

The VAT method by Bezdek and Hathaway (2002) creates an order based on Prim’s algorithm for finding a minimum spanning tree in a weighted connected graph representing the distance matrix. The order is given by the order in which the nodes (objects) are added to the MST.

### Tree-Based Quasi-Diagonalization

The topology or inherent shape and form of an object is important. A formal definition of hierarchical structure is provided by ultrametric topology. According to Murtagh (2007), ultrametricity is a pervasive property of observational data. Thus, identifying and exploiting ultrametricity is important when analyzing complex financial data. Ultrametricity is a natural property of sparse, high-dimensional spaces, and it emerges as a consequence of randomness and the law of large numbers.

The strong triangular inequality, or ultrametric inequality, is *d*(*x*, *z*)* ≤ max*[*d*(*x*,* y*)*, d*(*y*,* z*)] for any triplet *x*, *y*, *z*. The subdominant ultrametric is also known as the ultrametric distance resulting from the single-linkage agglomerative hierarchical clustering method as used in HRP. Closely related graph structures include the minimum spanning tree. Agglomerative nesting is a bottom-up procedure in which objects initially represent individual clusters and are successively merged into larger clusters until the full hierarchical structure is obtained. Single-linkage clustering is a particular type of agglomerative nesting that calculates the distance between two clusters as the shortest distance between any member of one cluster and any member of another cluster. By contrast, divisive analysis is a top-down approach that begins with a single cluster containing all objects and successively subdivides assets into smaller clusters until each cluster contains only a single object. Characterization, stability, and convergence of hierarchical clustering has been discussed by, for example, Carlsson and Mémoli (2010).

Subdominance provides a good fit to a given distance, but it suffers from the friends-of-friends, or chaining, effect. Many other hierarchical clustering approaches have been developed with specific properties, such as creating spherical and more equally sized clusters as in Ward (1963) clustering.

Tree-based quasi-diagonalization has the advantage of mostly being fast to compute and extracting block structures that are helpful for the tree-sectioning step in HRP. They are less optimized for diagonalization—the first step of HRP. HRP-style portfolio construction will exhibit further advancements in the future as hierarchical clustering and exploitation of ultrametric seem to be ongoing, active research fields with applications in many domains. Examples include hierarchical clustering with prior knowledge (e.g., Ma and Dhavala 2018) and ultrametric fitting (e.g., Chierchia and Perret 2019). Recently, López de Prado (2019) introduced an approach to estimate forward-looking correlation matrixes implied by economic theory. Given a particular theoretical representation of the hierarchical structure that governs a universe of securities, the method fits the correlation matrix that complies with that theoretical representation of the future. The output is a tree, so it is straightforward to use in the quasi-diagonalization step of HRP. Babynin (2020) has illustrated this idea.

We provide a description of the hierarchical cluster methods used in the empirical part of this article:

▪ Fast hierarchical, agglomerative clustering routines are available in fastcluster (Müllner 2013; R version 1.1.25, also available in Python).

▪ Another approach is to first build a minimal spanning tree with igraph and then find a hierarchical community (Csardi and Nepusz 2006; R version 1.2.5, also available in Python).

▪ Minimax linkage hierarchical clustering is available in protoclust (Bien and Tibshirani 2011; R version 1.6.3).

▪ Divisive hierarchical clustering is available in cluster (Kaufman and Rousseeuw 1990; R version 2.1.0).

▪ Further agglomerative methods and dendrogram descriptive measures are available in mdendro (Fernández and Gómez 2008; R version 1.0.1).

Exhibit 6 shows names and types of the clustering approaches used.

The following two sections define criteria on how to find the best clustering approach. The first section compares the original correlation distance matrix with the hierarchical clustering output. The section introduces criteria to determine the quality of seriations (tree-based seriations in our case).

Strategy names are given by HRP_hcs_XXX with the following meanings:

▪ hcs: hierarchical clustering

▪ XXX: one of the clustering methods from Exhibit 6

### Adaptive Tree-Based Strategies Based on the Distance Matrix

The quasi-diagonalization step reorders the rows and columns such that the largest values lie close to the diagonal. This is achieved by rearranging the matrix based on the ordering generated by the cluster algorithm as described in the previous section. A criterion to find the best match between a hierarchical cluster and a given distance matrix is to generate the ultametric distance of a hierarchical cluster and compare it with the original distance matrix without clustering.

The *cophenetic distance* between two observations that have been clustered is defined as the intergroup dissimilarity at which the two observations first combine into a single cluster. In a dendrogram, this can be compared to traversing from one leaf to another and recording the dendrogram height at which the two leaves are connected.

To derive the ultrametric distance matrix, we first compute all pairwise cophenetic distances and then order the matrix identical to the original distance matrix. Ultrametric matrixes of different clustering methods can be compared with each other to determine how close or similar the methods are. The ultrametric matrix also can be compared with the original distance matrix in the same way.

We show the measures used (some from package clue, R version 0.3–57; see Hornik 2005) in Exhibit 7 with *u* as the ultrametric distance matrix and *v* as the original distance matrix.

Strategy names are given by HRP_hcs-adaptive_XXX_YYY with the following meanings:

▪ hcs-adaptive: adaptive hierarchical clustering

▪ YYY: criterion from Exhibit 7

▪ XXX: either from the clue or mdendro package

### Adaptive Tree-Based Strategies Based on Other Criteria

Based on a given distance matrix, the effectiveness of a permutation can be evaluated by certain criteria. Exhibit 8 shows our selected seriation criteria as described by Hahsler, Hornik, and Buchta (2008) and Hahsler (2017).

Strategy names are given by HRP_hcs-adaptive_criterions_XXX_YYY with the following meanings:

▪ hcs-adaptive: adaptive hierarchical clustering

▪ Criteria: adaptive tree-based strategies based on other criteria

▪ XXX: criterion from Exhibit 8

▪ YYY: criterion is minimized or maximized

## BACKTESTS OF ALL ASRP VARIATIONS

We apply all 57 ASRP variations to the multi-asset futures portfolio from May 3, 2000 to June 30, 2020 and compute the 57 × 57 correlation matrix between all strategy return time series. Exhibit 9 shows the results of a single-linkage clustering applied to this correlation matrix and the related Sharpe ratios of each strategy relative to HRP.

We compare the strategy correlation hierarchy from the dendrogram to the construction hierarchy of Exhibit 1. Most seriation-based strategies are in cluster 1. Static tree-based strategies can be found in clusters 2, 3, and 4. Adaptive tree-based strategies are in all clusters except cluster 1. This confirms our initial hypothesis about the importance of the quasi-diagonalization step from the broad Sharpe ratio density in Exhibit 3.

The right-hand side of the dendrogram shows the Sharpe ratios relative to HRP for each strategy. Most seriation-based strategies seem to underperform HRP on a risk-adjusted basis. Most static tree-based variations outperform HRP, whereas adaptive tree-based methods show mixed results.

To assess the robustness of the dendrogram, we employ a bootstrap study: 1,000 resamples of the asset return time series with a block length of 60 days lead to 1,000 portfolio return time series. Across the lower triangle of the 57 × 57 strategy correlation matrix, we compute the ratios of all matrix elements of the point estimate of correlation divided by the standard deviation of correlations across the 1,000 resamples. The lowest signal-to-noise ratio across all matrix elements of the lower triangle is 161. This large number points to only slight statistical correlation noise.

## CONCLUSION

We present a systematic approach to generate a family of variations of the classical HRP for portfolio construction and asset allocation. These variations are directed toward the first step of HRP (quasi-diagonalization). We call this concept ASRP. Randomization of the quasi-diagonalization step shows a large variation in Sharpe ratios and thus indicates the large impact that model choice in this step has on the resulting performance.

Backtests of all 57 strategies with a multi-asset futures universe of 17 liquid markets across 20 years of data lead to a taxonomy-like representation of the resulting 57 strategy return time series in the form of a dendrogram. Bootstrap validation points to the robustness of this dendrogram. The pronounced hierarchy of strategies almost resembles the original construction hierarchy of ASRP and thus confirms the large impact of the quasi-diagonalization step in HRP-style strategies. From the viewpoint of risk-adjusted returns, most of the static tree-based alternatives to HRP outperform, whereas seriation-based and adaptive methods tend to underperform with a few notable exceptions (e.g., the seriation-based VAT and SPIN_STS strategies).

## OUTLOOK: SYNTHETIC DATA AND EXPLAINABLE ML

The overall ambition to develop robust and transparent investment strategies requires several additional building blocks based on intelligent analytics. Together with the procedure to compute synthetic correlation matrixes provided by Papenbrock et al. (2021) and the procedure to deliver local and global explanations using the SHAP framework discussed by Jaeger et al. (2021), this article addresses variations of the allocation step. Thus, it complements the resulting workflow of a triple AI approach for robust, adaptive, risk-based portfolio construction. It can be used to test, explore, and understand the variations on HRP described in this article and consists of the following three elements:

**1.**A market generator to create synthetic time series or synthetic correlation data for Monte Carlo simulation, both serving as appropriate input to the HRP variations. Examples are matrix evolutions (Papenbrock et al. 2021) and CorrGAN (Marti 2019).**2.**One or several of the ASRP alternatives to HRP as presented in this article (some use tree-based representation learning).**3.**Explainable ML to explain and understand the risk and performance attributions of single strategies and the drivers of a performance ranking in terms of features of the synthetic time series, as described by Jaeger et al. (2021) and Papenbrock et al. (2021), using a model-agnostic explanation method such as the SHAP framework.

## ACKNOWLEDGMENTS

The implementation was sponsored by Munich Re Markets. We appreciate the infrastructure from Open Telekom Cloud and the NVIDIA GPU resources provided for this research.

We also acknowledge support from the European Union’s Horizon 2020 research and innovation program “FIN-TECH: A Financial Supervision and Technology Compliance Training Programme” under grant agreement no. 825215 (Topic: ICT-35-2018, Type of action: CSA) and from the COST Action CA19130 “Fintech and Artificial Intelligence in Finance—Towards a Transparent Financial Industry.”

## ENDNOTES

↵

^{1}Markowitz with average and single linkage because only these create positive semi-definite matrixes.↵

^{2}Fermin Cota (2019) additionally has reported that:“A second shortfall we have identified occurs when there is a highly correlated group of assets. We have noticed poor performance out-of-sample compared with a naive equal-weight or inverse volatility weighting. To address this shortfall, we have introduced an early stopping step within the recursive bisection that identifies when a submatrix of the covariance matrix contains high pairwise correlations. Weighting the assets according to equal-weight or inverse volatility has proven to provide better out-of-sample results based on our research.”

↵

^{3}It is a recursive search over the range of potential split position that optimizes a suitably chosen metric, such as the mean absolute distance matrix values of the off-diagonal blocks’ entries or the mean absolute correlation of off-diagonal cluster blocks in the matrix.

- © 2021 Pageant Media Ltd