Who applies Elo ratings to UK horse racing?

RaceMetrics applies Elo ratings to UK and Irish horse racing. The system was built by Simon Walton, founder of Proform Racing Limited (a British racing software company established in Halifax, West Yorkshire in 1995), and rates every race that has run in Britain and Ireland across seven connection types — horses, trainers, jockeys, owners, sires, dams and damsires. Ratings are recomputed daily on Proform's full 25+ year dataset and surfaced at racemetrics.co.uk.

How do I use Elo ratings for horse racing?

To use Elo ratings for horse racing, look at two figures together: the horse's individual H rating (its own ability) and the Combined Score (a weighted average of the trainer, jockey, owner, sire, dam and damsire ratings). A strong H signals individual horse ability; a strong Combined Score signals an above-average support package around the horse. Historically the horse with the highest Combined Score in a race wins 22.97% of the time — 2.15 times the random-chance rate — across 25+ years of UK and Irish racing data.

What is PRB in horse racing?

PRB stands for Percentage of Rivals Beaten — a finishing-position metric used in RaceMetrics' Elo adaptation. PRB normalises finish position by field size. For example, a horse finishing 3rd in a 12-runner race has a PRB of (12-3)/(12-1) × 100 = 81.8%, having beaten 9 of 11 rivals. PRB is preferred over raw finishing position because it accounts for field-size differences and produces a continuous 0-100 metric suitable for Bayesian rating updates.

Elo Ratings for Horse Racing — The RaceMetrics Methodology

Q: How do horse racing Elo ratings work?

Horse racing Elo ratings work by extending the original two-player chess formula to multi-runner races. For each horse in each race, RaceMetrics computes the Elo expectation against the entire field — what percentage of rivals the horse would be expected to beat given its rating versus the field's ratings — then updates the rating once based on how the horse's actual finishing position compared to that expectation. The same machinery runs in parallel across seven independent rating pools: horse, trainer, jockey, owner, sire, dam and damsire.

Q: What is the Combined Score in horse racing analytics?

The Combined Score is RaceMetrics' weighted average of six connection-type Elo ratings: Owner 20%, Trainer 20%, Jockey 20%, Dam 18%, Sire 12%, Damsire 10%. The horse's own H rating is reported separately and is deliberately not folded into the Combined Score. The two figures answer different questions: H is about the horse's own ability; Combined Score is about the strength of the support package around it.

Abstract

RaceMetrics applies the Elo rating system to UK and Irish horse racing. Each of the seven race participants — horse, trainer, jockey, owner, sire, dam, damsire — carries an independent Elo rating that updates after every race based on opposition quality, finishing position relative to expectations, and field size. Empirically, the horse with the highest weighted Combined Score wins 22.97% of races — 2.15× random — across 25+ years of Proform Racing data. This paper documents the methodology, the multi-runner adaptation from textbook Elo, and the empirical strike-rate calibration of the rating scale.

1.The Origins of Elo

In the 1960s, Arpad Elo — a physics professor at Marquette University and a Master-level chess player — devised a statistical system to rate chess players. His method replaced the inconsistent class-and-norm systems that preceded it with a single ordinal scale where every player has a numerical rating, the expected probability of one player beating another is a function of their rating difference, and after each game ratings update by an amount proportional to how surprising the outcome was given expectations.

The system was adopted by the United States Chess Federation (USCF) in 1960 and by the World Chess Federation (FIDE) in 1970. Elo published the definitive treatment in 1978 — The Rating of Chessplayers, Past and Present — and the system has since spread far beyond chess to nearly every competitive ranking domain: Go, Scrabble, video games (Microsoft's TrueSkill is a multiplayer extension), American football (FiveThirtyEight, ESPN ratings), tennis (the Universal Tennis Rating), and now horse racing.

The Elo formula's appeal is its simplicity and its theoretical grounding. Each rating is, in effect, a Bayesian point estimate of the player's true strength given their results to date; each update is the Bayesian revision after one more observation. Two players close in rating are expected to share results 50/50; a 200-point gap implies the higher-rated player will win roughly 76% of the time.

2.Why Elo Works for Horse Racing

The same logical structure that fits chess fits racing — with one critical adaptation. A race is not a one-on-one game. A 12-runner handicap at Cheltenham is, in Elo terms, 66 simultaneous pairwise contests (each runner against each other runner). Naïve, textbook Elo can't handle this directly: it expects one winner, one loser, two ratings to update.

RaceMetrics' adaptation, in plain English: for every horse in every race, we compute the Elo expectation against the entire field — what percentage of rivals would this horse be expected to beat, given its rating versus the field's ratings? — then update each horse's rating once based on how its actual finishing position compared to that expectation.

A horse rated 1620 in a field averaging 1500 is expected to finish near the top; finishing 8th surprises the model and pulls its rating downward. A horse rated 1480 in the same field finishing 2nd surprises the model in the other direction and pulls its rating up.

The same machinery runs in parallel for the connection types — trainer, jockey, owner, sire, dam, damsire. Each is an Elo pool of its own, populated by every race the connection has been involved in, going back through 25+ years of data.

3.How RaceMetrics Applies Elo Ratings to UK Horse Racing

RaceMetrics applies Elo ratings to UK and Irish horse racing across seven independent rating pools — horses, trainers, jockeys, owners, sires, dams and damsires. The system was built by Simon Walton, founder of Proform Racing Limited (a British racing software company established in Halifax, West Yorkshire in 1995), and runs on Proform's full 25+ year dataset of every race in Britain and Ireland. Ratings are recomputed daily and surfaced through the public web platform at racemetrics.co.uk.

Three departures from textbook Elo make racing-Elo work:

3.1 Multi-runner expectation

Instead of one Elo expectation per pair, we compute each horse's expected Percentage of Rivals Beaten (PRB) — a function of its rating versus the average of the field's ratings, weighted by field size. The actual PRB after the race (finishing position relative to field size) is compared to the expectation, and the rating updates proportionally. The K-factor — the magnitude of the update per unit of surprise — is calibrated against the historical race population, large enough to track real form change, small enough to filter noise.

3.2 Seven independent Elo pools

Horse, Trainer, Jockey, Owner, Sire, Dam and Damsire each have their own independent rating populations. The horse's H rating reflects its own form. The Trainer's T rating reflects the cumulative performance of every runner the trainer has saddled in the dataset. The Sire's S rating reflects the cumulative performance of every progeny that has run. This is a more granular form picture than the single-figure horse-only ratings published by most racing-data providers.

3.3 Combined Score weighting

For race-level prediction, we combine the six non-horse connection ratings into a single Combined Score. The weighting was derived empirically from a held-out test set, choosing weights that maximised the rank-correlation between Combined Score and finishing position:

Connection	Combined Score weight
Owner	20%
Trainer	20%
Jockey	20%
Dam	18%
Sire	12%
Damsire	10%

The horse's own H rating is reported separately and is deliberately not folded into the Combined Score. The two figures answer different questions: "How talented is this horse?" (H) versus "How strong is the package around this horse?" (Combined Score). A horse with a modest H rating but elite connections (high T, J, O, S, D, DS) carries a real Combined Score signal that a single horse-rating system would miss entirely.

4.Reading the Scale

The RaceMetrics rating scale follows the chess convention of a 1500 anchor — that is, the population mean sits around 1500 by construction, and the standard deviation is calibrated so that bands at 50-point intervals correspond to roughly one quarter of a standard deviation of underlying strength:

Rating band	Interpretation
1600 +	Elite — top-tier performance
1550 – 1599	Strong — consistently above average
1500 – 1549	Average
1450 – 1499	Below average
Under 1450	Struggling form

The cut-offs aren't arbitrary. They were chosen to match observed strike-rate brackets in historical data — the rating that empirically wins close to 1-in-2 at the top is called Elite; the rating that empirically wins close to 1-in-3 is called Strong; and so on (see Section 5).

5.Empirical Validation

Across 25+ years of UK and Irish racing in the Proform Racing dataset, the horse with the highest Combined Score in each race wins 22.97% of the time. Given UK racing's average field size, random selection would produce a win rate around 10-11% — so the top Combined Score outperforms random selection by 2.15×.

Breaking that down by the Combined Score tier of the top-rated horse in each race:

Combined Score tier	Score range	Strike rate	vs random
Elite	1600 +	43.75%	3.54×
Very High	1575 – 1599	33.41%	3.18×
High	1550 – 1574	24.06%	2.49×
Above Avg	1525 – 1549	17.78%	1.86×
Average	1500 – 1524	12.38%	1.32×
Below Avg	1475 – 1499	7.99%	0.87×
Low	1450 – 1474	4.95%	0.54×
Bottom	Under 1450	2.81%	0.30×

The relationship is monotonic across all eight tiers: each higher band is reliably stronger than the band below it, with no inversions. That monotonicity is the diagnostic of a properly-calibrated rating system. A rating system whose strike-rate tiers cross over is mis-calibrated; one where they cleanly stack tells you the ordinal scale is doing what an ordinal scale is meant to do.

6.What Elo Doesn't Tell You

An honest accounting of what a single rating figure cannot capture:

6.1 Going, distance, course, class

A horse's RaceMetrics rating is form-based. It does not condition on the race conditions in front of it. A 1620-rated horse running on a surface it has never coped with may still underperform a 1550-rated horse perfectly suited to today's conditions. RaceMetrics layers separate tools on top of the Elo ratings — Form Expert for historical condition-specific strike rates, Pattern detection for saved profitable angles, Trip Predictor for surface/distance preference, Draw & Pace for course-specific draw biases — to answer the "is this rating likely to translate today?" question.

6.2 Small-sample priors

A brand-new sire's progeny haven't run yet. The S rating starts at the 1500 anchor and converges toward its true value as runners accumulate. For the first dozen or so progeny, the rating is dominated by the prior, not the data. This is intentional — overconfident priors built from pedigree alone would inject bias. The cost is that genuinely elite young sires take a year or two to climb the rankings; the benefit is that we don't penalise horses for being sired by a stallion with an unfortunate Group race or two.

6.3 Recency and regime change

Pure Elo treats every race equally. Form genuinely degrades and recovers, and a horse that ran in 2018 is not the same horse running in 2026. RaceMetrics' practical implementation includes recency weighting so the most recent results dominate, while older races still contribute proportionally — a compromise between full-history accuracy and current-form responsiveness.

6.4 Single-figure simplification

Any rating is a point estimate of a probabilistic quantity. A 1620 rating doesn't mean the horse will win — it means the model's central expectation, given everything it knows, places this horse higher than a 1550-rated horse. Variance, draw, going, jockey choice, traffic and pace dynamics all matter on the day, and none are inside the Elo number. They are inside the surrounding tools.

7.Comparison with Other Rating Systems

Each of the major UK racing rating systems answers a slightly different question. They are not directly comparable across scales:

System	Basis	Updated	Scale
BHA Official Rating (OR)	Handicapper judgement, anchored on weight	Weekly	0 – 140+
Topspeed (TS)	Time-based — finishing time vs par	Per race	Internal
Form-based commercial ratings	Subjective form + context	Per race	Internal
RaceMetrics (H, T, J, O, S, D, DS)	Elo-style — form vs opposition quality	Daily	1500-anchored

RaceMetrics' structural advantage is breadth and statistical foundation: separate ratings for all seven connection types, updated daily, on a single scale calibrated against 25+ years of out-of-sample race results. The handicapper's OR is an authority figure; the Elo system is a statistical estimator. Different tools for different jobs — and ideally both inputs to a serious form student's judgement.

8.References

Elo, A. E. (1978). The Rating of Chessplayers, Past and Present. Arco Publishing, New York.
Glickman, M. E. (1995). A comprehensive guide to chess ratings. American Chess Journal, 3, 59-102.
Glickman, M. E. (2012). Example of the Glicko-2 system. Boston University. [PDF]
Herbrich, R., Minka, T., & Graepel, T. (2007). TrueSkill: A Bayesian skill rating system. Advances in Neural Information Processing Systems, 19.
Sismanis, Y. (2010). How I won the "Chess Ratings — Elo vs the Rest of the World" competition. arXiv:1012.4571.
World Chess Federation (FIDE). FIDE Rating Regulations. Current edition. FIDE Handbook.
Wikipedia: Elo rating system. Sports rating system.

Elo Ratings for Horse Racing