1.The Origins of Elo
In the 1960s, Arpad Elo — a physics professor at Marquette University and a Master-level chess player — devised a statistical system to rate chess players. His method replaced the inconsistent class-and-norm systems that preceded it with a single ordinal scale where every player has a numerical rating, the expected probability of one player beating another is a function of their rating difference, and after each game ratings update by an amount proportional to how surprising the outcome was given expectations.
The system was adopted by the United States Chess Federation (USCF) in 1960 and by the World Chess Federation (FIDE) in 1970. Elo published the definitive treatment in 1978 — The Rating of Chessplayers, Past and Present — and the system has since spread far beyond chess to nearly every competitive ranking domain: Go, Scrabble, video games (Microsoft's TrueSkill is a multiplayer extension), American football (FiveThirtyEight, ESPN ratings), tennis (the Universal Tennis Rating), and now horse racing.
The Elo formula's appeal is its simplicity and its theoretical grounding. Each rating is, in effect, a Bayesian point estimate of the player's true strength given their results to date; each update is the Bayesian revision after one more observation. Two players close in rating are expected to share results 50/50; a 200-point gap implies the higher-rated player will win roughly 76% of the time.
2.Why Elo Works for Horse Racing
The same logical structure that fits chess fits racing — with one critical adaptation. A race is not a one-on-one game. A 12-runner handicap at Cheltenham is, in Elo terms, 66 simultaneous pairwise contests (each runner against each other runner). Naïve, textbook Elo can't handle this directly: it expects one winner, one loser, two ratings to update.
RaceMetrics' adaptation, in plain English: for every horse in every race, we compute the Elo expectation against the entire field — what percentage of rivals would this horse be expected to beat, given its rating versus the field's ratings? — then update each horse's rating once based on how its actual finishing position compared to that expectation.
A horse rated 1620 in a field averaging 1500 is expected to finish near the top; finishing 8th surprises the model and pulls its rating downward. A horse rated 1480 in the same field finishing 2nd surprises the model in the other direction and pulls its rating up.
The same machinery runs in parallel for the connection types — trainer, jockey, owner, sire, dam, damsire. Each is an Elo pool of its own, populated by every race the connection has been involved in, going back through 25+ years of data.
3.The RaceMetrics Adaptation
Three departures from textbook Elo make racing-Elo work:
3.1 Multi-runner expectation
Instead of one Elo expectation per pair, we compute each horse's expected Percentage of Rivals Beaten (PRB) — a function of its rating versus the average of the field's ratings, weighted by field size. The actual PRB after the race (finishing position relative to field size) is compared to the expectation, and the rating updates proportionally. The K-factor — the magnitude of the update per unit of surprise — is calibrated against the historical race population, large enough to track real form change, small enough to filter noise.
3.2 Seven independent Elo pools
Horse, Trainer, Jockey, Owner, Sire, Dam and Damsire each have their own independent rating populations. The horse's H rating reflects its own form. The Trainer's T rating reflects the cumulative performance of every runner the trainer has saddled in the dataset. The Sire's S rating reflects the cumulative performance of every progeny that has run. This is a more granular form picture than the single-figure horse-only ratings published by most racing-data providers.
3.3 Combined Score weighting
For race-level prediction, we combine the six non-horse connection ratings into a single Combined Score. The weighting was derived empirically from a held-out test set, choosing weights that maximised the rank-correlation between Combined Score and finishing position:
| Connection | Combined Score weight |
|---|---|
| Owner | 20% |
| Trainer | 20% |
| Jockey | 20% |
| Dam | 18% |
| Sire | 12% |
| Damsire | 10% |
The horse's own H rating is reported separately and is deliberately not folded into the Combined Score. The two figures answer different questions: "How talented is this horse?" (H) versus "How strong is the package around this horse?" (Combined Score). A horse with a modest H rating but elite connections (high T, J, O, S, D, DS) carries a real Combined Score signal that a single horse-rating system would miss entirely.
4.Reading the Scale
The RaceMetrics rating scale follows the chess convention of a 1500 anchor — that is, the population mean sits around 1500 by construction, and the standard deviation is calibrated so that bands at 50-point intervals correspond to roughly one quarter of a standard deviation of underlying strength:
| Rating band | Interpretation |
|---|---|
| 1600 + | Elite — top-tier performance |
| 1550 – 1599 | Strong — consistently above average |
| 1500 – 1549 | Average |
| 1450 – 1499 | Below average |
| Under 1450 | Struggling form |
The cut-offs aren't arbitrary. They were chosen to match observed strike-rate brackets in historical data — the rating that empirically wins close to 1-in-2 at the top is called Elite; the rating that empirically wins close to 1-in-3 is called Strong; and so on (see Section 5).
5.Empirical Validation
Across 25+ years of UK and Irish racing in the Proform Racing dataset, the horse with the highest Combined Score in each race wins 22.97% of the time. Given UK racing's average field size, random selection would produce a win rate around 10-11% — so the top Combined Score outperforms random selection by 2.15×.
Breaking that down by the Combined Score tier of the top-rated horse in each race:
| Combined Score tier | Score range | Strike rate | vs random |
|---|---|---|---|
| Elite | 1600 + | 43.75% | 3.54× |
| Very High | 1575 – 1599 | 33.41% | 3.18× |
| High | 1550 – 1574 | 24.06% | 2.49× |
| Above Avg | 1525 – 1549 | 17.78% | 1.86× |
| Average | 1500 – 1524 | 12.38% | 1.32× |
| Below Avg | 1475 – 1499 | 7.99% | 0.87× |
| Low | 1450 – 1474 | 4.95% | 0.54× |
| Bottom | Under 1450 | 2.81% | 0.30× |
The relationship is monotonic across all eight tiers: each higher band is reliably stronger than the band below it, with no inversions. That monotonicity is the diagnostic of a properly-calibrated rating system. A rating system whose strike-rate tiers cross over is mis-calibrated; one where they cleanly stack tells you the ordinal scale is doing what an ordinal scale is meant to do.
6.What Elo Doesn't Tell You
An honest accounting of what a single rating figure cannot capture:
6.1 Going, distance, course, class
A horse's RaceMetrics rating is form-based. It does not condition on the race conditions in front of it. A 1620-rated horse running on a surface it has never coped with may still underperform a 1550-rated horse perfectly suited to today's conditions. RaceMetrics layers separate tools on top of the Elo ratings — Form Expert for historical condition-specific strike rates, Pattern detection for saved profitable angles, Trip Predictor for surface/distance preference, Draw & Pace for course-specific draw biases — to answer the "is this rating likely to translate today?" question.
6.2 Small-sample priors
A brand-new sire's progeny haven't run yet. The S rating starts at the 1500 anchor and converges toward its true value as runners accumulate. For the first dozen or so progeny, the rating is dominated by the prior, not the data. This is intentional — overconfident priors built from pedigree alone would inject bias. The cost is that genuinely elite young sires take a year or two to climb the rankings; the benefit is that we don't penalise horses for being sired by a stallion with an unfortunate Group race or two.
6.3 Recency and regime change
Pure Elo treats every race equally. Form genuinely degrades and recovers, and a horse that ran in 2018 is not the same horse running in 2026. RaceMetrics' practical implementation includes recency weighting so the most recent results dominate, while older races still contribute proportionally — a compromise between full-history accuracy and current-form responsiveness.
6.4 Single-figure simplification
Any rating is a point estimate of a probabilistic quantity. A 1620 rating doesn't mean the horse will win — it means the model's central expectation, given everything it knows, places this horse higher than a 1550-rated horse. Variance, draw, going, jockey choice, traffic and pace dynamics all matter on the day, and none are inside the Elo number. They are inside the surrounding tools.
7.Comparison with Other Rating Systems
Each of the major UK racing rating systems answers a slightly different question. They are not directly comparable across scales:
| System | Basis | Updated | Scale |
|---|---|---|---|
| BHA Official Rating (OR) | Handicapper judgement, anchored on weight | Weekly | 0 – 140+ |
| Topspeed (TS) | Time-based — finishing time vs par | Per race | Internal |
| Form-based commercial ratings | Subjective form + context | Per race | Internal |
| RaceMetrics (H, T, J, O, S, D, DS) | Elo-style — form vs opposition quality | Daily | 1500-anchored |
RaceMetrics' structural advantage is breadth and statistical foundation: separate ratings for all seven connection types, updated daily, on a single scale calibrated against 25+ years of out-of-sample race results. The handicapper's OR is an authority figure; the Elo system is a statistical estimator. Different tools for different jobs — and ideally both inputs to a serious form student's judgement.
8.References
- Elo, A. E. (1978). The Rating of Chessplayers, Past and Present. Arco Publishing, New York.
- Glickman, M. E. (1995). A comprehensive guide to chess ratings. American Chess Journal, 3, 59-102.
- Glickman, M. E. (2012). Example of the Glicko-2 system. Boston University. [PDF]
- Herbrich, R., Minka, T., & Graepel, T. (2007). TrueSkill: A Bayesian skill rating system. Advances in Neural Information Processing Systems, 19.
- Sismanis, Y. (2010). How I won the "Chess Ratings — Elo vs the Rest of the World" competition. arXiv:1012.4571.
- World Chess Federation (FIDE). FIDE Rating Regulations. Current edition. FIDE Handbook.
- Wikipedia: Elo rating system. Sports rating system.