Introduction

This series of article/analysis are meant to rappresent my journey in how to define archetypes.

Archetypes in Legends of Runeterra are currently defined by me by the combinations of champions and regions. While it’s a lazy method, it’s (was?) a first approximation that worked decently aside for some exception. A limitation I always knew is that the number of possible archetypes increase too rapidaly with the release of new cards and will probably get too high compared to the number of available games (at least at Master) sooner or later. On the other side of the spectrum the most restricting classification would with a categorical (maybe even just dichotomical) variable, something like using the Super Archetypes from viciousSyndicate (vS): Initiative and Resource decks ¹. Every other archetype classification is in immediatiate and it’s a vast ocean of possibilities with no ‘correct’ solution.

As everything needs to be done step by step, I’ll start with what was pointed to me a few weeks ago:

How do you feel about combining Akshan/Zed/Sivir with Akshan/Sivir since it's generally just 1 card difference? And maybe even with Zed/Sivir (a bit more different on average, but still conceptually the same deck).
— Dr. LoR (@drlor4) July 28, 2021

Around the times Dragons and Overwhelm (SH/FR) decks turned into legit meta options, decks with 3 champions have become more widely used and they are their acceptance as legit option has increased, no more being just a bad choice by default.

During the times of that tweet, in the patch 2.12/2.13 there was an higly performing deck with three champions: Akshan/Sivir/Zed (ASZ).

What I aim to answer is related so the suggestion in the tweet : is my current aggregation too strict? Is Sivir/Zed or Akshan/Sivir the same deck as ASZ, only lacking a champion? While by intuition they may looks similar, it’s not a proper answer which I tried to respond with this article. ²

Data

The sample is made of 845370 Ranked games from 2021-07-14 21:00:00 to 2021-08-25 21:00:00, so covering the patch 2.12/2.13 after the start of the Ruination Event so that there aren’t any changes in the card pool in the timeframe analyzed by amount of cards or balance changes.

Because there was the raise of the Demacian deck with Akshan/Sivir, these decks are removed as they could create a few problem in the following steps. In addition I only consider decks with 6 copies of champions (the max value). This is done to remove the outliers in deck creation and making the sample more homogeneous for a step that will follow.³

The sample is reduced to 704464 games

Characteristic	Zed			Akshan
Characteristic	Overall, N = 704,464¹	no Sivir/Zed, N = 662,982¹	Sivir/Zed, N = 41,482¹	no Sivir/Akshan, N = 644,741¹	Sivir/Akshan, N = 59,723¹
#Champion
2	649,177 (92%)	634,195 (96%)	14,982 (36%)	617,513 (96%)	31,664 (53%)
3	54,373 (7.7%)	27,878 (4.2%)	26,495 (64%)	26,414 (4.1%)	27,959 (47%)
4	718 (0.1%)	713 (0.1%)	5 (<0.1%)	663 (0.1%)	55 (<0.1%)
5	38 (<0.1%)	38 (<0.1%)	0 (0%)	37 (<0.1%)	1 (<0.1%)
6	158 (<0.1%)	158 (<0.1%)	0 (0%)	114 (<0.1%)	44 (<0.1%)
¹ n (%)

Note : The percetages are column-wise

The overall prevalence of ‘3 champions deck’ is around 7.72%. This value is mostly carried by the ASZ decks with 26418 games which amoutn to almost half the cases of 3 champs decks 48.6%

More specificaly, those 26418 ASZ decks are the main subset of both SZ decks when using 3-champions (99.7% of the cases) and for AZ decks too (94.5% of the cases). In other words, when a third champion is added to SZ or AZ decks the result is almost always an ASZ, meaning ASZ seems a common ground to both these two archetypes. But are ASZ just a common ground or the general cases are already almost equal (AZ and SZ).

Methods

Hierarchy of steps

This section illustrate the mathematical/statistical theory and application I’ll use to tackle the question the article’s question.

But before that, what are the steps I want to follows?

Figure 1: ASZ as special case of AS

Figure 2: ASZ as special case of SZ

As mentioned I first need to see if the hyphothesis rappresented in the venn diagrams (Fig:1 and Fig:2) is correct: ASZ is just a special case of one/or both of AS or SZ.

Figure 3: ASZ as special case of both AS and SZ

If it’s true for both, then I will check AS and SZ can be considered similar, so, by looking at Fig:3 how the intersection is, if Sivir/Zed and Akshan/Zed do seems to overlap and how.

Distance among decks

When looking at a deck it’s possible to visualize them a network-graph where each node is a card and the edges the connection between cards.

Figure 4: Deck as a Network graph

It’s possible than to expand the concept to archetypes where they would be a collection of decks, expanding the graph to more cards and for example adding the information related to the playrates of cards and which are commonly played toghether by using weighted edges.

The advantage of this approach for the question of this article would be having to compare a single item for each archetype, the archetype-network, but with the disadvantage of working with a more complex structure which is the reason it’s not applied in this paper.

To compare archetypes we start by comparing decks, but what does it means to “compare” decks? To compare somethings we must to able to measure the similarity or dissimilarity which requires the use of a metric. What follows in an example to illustrate an example of metric and how to evalute the similarity of decks.

Suppose we have the following decks of 10 cards:

  cardcode count faction set card_number             name
1  01IO009     2      IO   1         009              Zed
2  04SH020     2      SH   4         020            Sivir
3  04SH093     2      SH   4         093     Shaped Stone
4  04SH103     2      SH   4         103 Merciless Hunter
5  04SH130     2      SH   4         130           Akshan

  cardcode count faction set card_number             name
1  04SH020     3      SH   4         020            Sivir
2  04SH130     3      SH   4         130           Akshan
3  04SH093     2      SH   4         093     Shaped Stone
4  04SH103     2      SH   4         103 Merciless Hunter

The only differences are the champions and their amount.

Overall there’s a similarity of 8 (out of 10) or a distance of 2 (out of 10).

It can proven that this is indeed what it’s called a distance or metric as it has all the necessary properties:

Not-negative codomain, so \([0,Inf)\)⁴. This is true as it’s definite in \([0,40]_\mathbb{N}\)

\(d(x,y) = d(y,x)\) (Simmetry)
\(d(x,x)=0\)
\(d(x,y)\leq d(x,z)+d(z,y)\) (The Triangle inequality)

Drisoth already showed that it can be used with success in hierarchical clustering (DBSCAN in his case)⁵ to good success.

While a proper metric, there is a problem with this measure that makes us prefer another one. The following example explain the problem with the “counting of cards difference”, it could be said that the metric lack subtlety as it doesn’t reward decks having a similar distribution of cards’ copies as can be seen in the following example:

  cardcode count faction set card_number             name
1  04SH020     3      SH   4         020            Sivir
2  04SH130     3      SH   4         130           Akshan
3  04SH093     2      SH   4         093     Shaped Stone
4  04SH103     2      SH   4         103 Merciless Hunter

  cardcode count faction set card_number             name
1  04SH020     3      SH   4         020            Sivir
2  04SH130     3      SH   4         130           Akshan
3  04SH055     2      SH   4         055      Ruin Runner
4  04SH103     2      SH   4         103 Merciless Hunter

  cardcode count faction set card_number             name
1  04SH020     3      SH   4         020            Sivir
2  04SH130     3      SH   4         130           Akshan
3  04SH103     2      SH   4         103 Merciless Hunter
4  04SH055     1      SH   4         055      Ruin Runner
5  04SH093     1      SH   4         093     Shaped Stone

The difference among these deck is the same at 2 cards, what changes are either 2 copies or Merciless Hunter, 2 copies of Ruin Runner or a single copy for both of the cards Marciless Hunter and Ruin Runner.

Again, we want to remark, this is not “wrong”, just we would prefer an alternative with more discriminatory power. A possible solution, is the commonly used metric that add this nuance, the cosine distance.

The cosine similarity is defined as:

\[\begin{equation} \cos(A,B) = \frac{A \cdot B}{||A||_2||B||_2} \tag{1} \end{equation}\]

which can be written as

\[\begin{equation} \cos(A,B) = \frac{\sum{A_iB_i}}{ \sqrt{\sum{A^2_i}} \sqrt{\sum{B^2_i}}} \tag{2} \end{equation}\]

The measure runs from 0 (orthogonal vectors or maximum dissimilarity) to 1 (parallel vectors or maximum similarity) so with max and min at the opposite cases of what we want as it is indeed a measure of similarity and not dissimilarity. The cosine distance is simply defined as 1 - cosine similarity.

The previous example that would always have same distance 0.2

\[ \begin{bmatrix} 0 & 0.2 & 0.2 \\ 0.2 & 0 & 0.2 \\ 0.2 & 0.2 & 0 \end{bmatrix} \]

have now distance matrix:

\[\begin{bmatrix}0&0.15&0.04 \\0.15&0&0.04 \\0.04&0.04&0 \\\end{bmatrix}\]

It’s now possible to better explain the reason why the data was restricted to cases with 6 copies of Champions cards.

We have to measure the similarity in three different archetypes (AS,SZ,ASZ) and no matter our choices, from the raw data we would always find at least a difference related to one card among these archetypes.

For example between an AS and SZ deck the min card-difference would be a single copy of Akshan with a single copy of Zed because of the definition of archetypes applied.

In order not to have to account the difference in champions and the number of their copies we will evalutate the difference among all non-champion cards. If the distance is zero or near it, it would mean that aside for the champions of choice the decks are similar giving support to the hypthesis of being part of the same “archetype”. Or, to put it differently, we modify out data so that when comparing different deck from different archetypes the possible values for the cosine distance remain in \([0,1]\) helping the comparison in the following steps, without for example having to rescale the values. Because of the choice of removing the champions there’s the suggestion of also filtering all cases with less than 6 champions cards. While it’s possible to compute the cosine distance with deck of a different number of cards computable for this problem we consider more appropriate working with decks with the same number of cards.

Decklist Distance Matrix

We described how to compare single decks but we want to answer a question related to archetypes. To do as such, let’s say we want to start with the comparison with AS and ASZ, from the decklist/deckcode of these decks we create the distance matrix among decks.

\(A={\begin{pmatrix}A_{{11}}&A_{{12}}\\A_{{21}}&A_{{22}}\end{pmatrix}}\)

Where A is a \((n+m)×(n+m)\) simmetric block matrix where

n is the number of deckcodes/decklist from \(Archetype_1\)
m is the number of deckcodes/decklist from \(Archetype_2\)
\(A_{{11}}\) is the \(n×n\) distance matrix relative to deck of \(Archetype_1\)
\(A_{{22}}\) is the \(n×n\) distance matrix relative to deck of \(Archetype_2\)
\(A_{{12}}=A_{{21}}^T\) is the \(n×m\) matrix containing the distances between \(Archetype_1\) vs \(Archetype_2\)

What we propose is to compare the values between \(A_{{11}}\), \(A_{{22}}\) ⁶ and \(A_{{12}}\) with the hyphotesis that if the archetypes are indeed similar/equal the distances should have the same mean (and variance) which is equivalent to apply and ANOVA test.

Analysis

Not all deckcodes have been used to create the submatrices of the block matrix. The number of unique deckcodes for each archetypes was considered too big, relative to the number of games ⁷, and as each deck have the same weight there could be risk of not properly rapresenting the distances distribution by using all deckcodes. Only the most frequent deck codes that account for at least 50% of the games have been used 60 decks for AS, 49 decks for SZ, 10 decks for ASZ.

Two distance matrix have been created for AZ decks vs ASZ decks and SZ decks vs ASZ decks.

Results are provided in Tab:1 and Fig:5 for the ASZ vs AS decks while for ASZ vs SZ decks in Tab:2 and Fig:6

Table 1: Summary statistic ASZvsAS decks
Group	Mean	Sd	Skew
AS	0.069	0.035	0.333
ASZ	0.056	0.029	0.715
ASZvsAS	0.058	0.036	0.329

Figure 5: distribution of cosine distances for ASZ vs AS decks

Table 2: Summary statistic ASZvsSZ decks
Group	Mean	Sd	Skew
ASZ	0.056	0.029	0.715
ASZvsSZ	0.061	0.043	0.930
SZ	0.078	0.049	0.881

Figure 6: distribution of cosine distances for ASZ vs SZ decks

And lastly we show the result of the ANOVA applied first for AZ decks

             Df  Sum Sq  Mean Sq F value Pr(>F)
group         2 0.00444 0.002217   1.895  0.154
Residuals   168 0.19659 0.001170

Here we don’t reject \(H_0\) as the p-values (the values the \(Pr(>F)\) columns ) is above 0.05 giving support to the hyphotesis that ASZ is just a special case of AS and they can be aggregated.

In th case with SZ decks the ANOVA test gives:

             Df  Sum Sq  Mean Sq F value Pr(>F)  
group         2 0.01075 0.005375   3.106 0.0474 *
Residuals   168 0.29076 0.001731                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Meaning we reject the null hyphotesis \(H_0\) that the distances are from the same population and their distribution is the same. But while this is true, we can also see that the value is near 0.5 meaning it would be wiser to check Tukey post-hoc tests.

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = value ~ group, data = DSim.tbl_2)

$group
                   diff           lwr        upr     p adj
ASZvsSZ-ASZ 0.005239898 -0.0127209395 0.02320074 0.7697297
SZ-ASZ      0.022181593  0.0001841494 0.04417904 0.0476100
SZ-ASZvsSZ  0.016941695 -0.0024582271 0.03634162 0.1003015

Figure 7: Tukey Post-Hoc test

We can see that the rejection for the ANOVA test can be explained by the resulting differences for SZ and ASZ decks. But simply looking at the threshold of 0.05 would ignore how this is a very borderline result. So, while rules are not meant to be broken (when doing analysis) this is a rare case where we don’t blindy follows the raw numbers, meaning we accept the hyphotesis that SZ and ASZ too are from the same population / or the same Archetype.

Conclusion

The analysis gives support to the hyphotesis of aggregating the archetypes defined as ASZ, SZ and AS as Dr.LoR suggested. ⁸

Further testing should check is the sample and condition we choose are too strict or too lenient.

The next article in this series will introduce the application of hierarchical clustering methods to the archetypes problem both replicating Drisoth methodology and possible alternatives.

their definition, example and so on will be a topic for the future.↩︎
I’m aware that the aggregation problem is not limited to cases with three champions, some cases may even be related to a single card like “Feel the Rush” or “ARAM (Howling Abyss)” but as mentioned, one must proceed with baby steps and this is probably a good starting point.↩︎
all deck with no Champion or mono Champion are excluded by default because of this.↩︎
Not actually a properties but a requirement in its definition↩︎
Archetypes - Cluster ↩︎
As \(A_{{11}}\) and \(A_{{22}}\) are simmetric matrix relative to the same Archetype’s decks we’ll only use the values from the upper triangular matrix without diagonal.↩︎
2113 decks for AS, 1350 decks for SZ, 1336 decks for ASZ↩︎
any change will probably occour on the first report after the release of this article.↩︎

Defining Archetypes #1: Looking at the similarity of Akshan/Sivir/Zed with similar archetypes