Machine Learning Research
We analyzed 50,188 dogs from the Dog Aging Project using neural networks, genomics, and causal inference to find what really works β and what doesn't.
The #1 Finding
+433 days
Active dogs live 14 extra months with half the mortality rate
Exercise quintile Q1βQ5: 49.5% mortality β 23.9%. Validated in held-out test data.
Ranked by strength of evidence and effect size
More activity = longer life, across every breed and age group
The benefit triples as dogs age β exercise matters most for seniors
Some breeds benefit dramatically more than others from increased activity
Weight, dental care, vet visits, and comorbidities
Heavier for your breed = shorter life, even after controlling for breed size
Each additional health condition compounds the mortality risk
Regular dental care adds months β oral health affects the whole body
From 10,390 deceased dogs in the End of Life Survey
The first genome-wide study of canine longevity β and the first genetic risk score for dog lifespan
A Genome-Wide Association Study scans millions of points across the DNA of thousands of individuals to find specific genetic variants linked to a trait β in this case, how long dogs live. Think of it as searching every page of every book in a library for the passages that matter most.
| Gene | Chr | Effect | Significance | What It Does |
| IGF1 | 6 | -0.53 yr | 10β»ΒΉβ°β΄ | THE body size gene β controls growth hormone signaling |
| IGSF1 | 22 | +0.17 yr | 10β»ΒΉβ° | Regulates thyroid hormones that affect metabolism |
| HMGA2 | 31 | -0.13 yr | 10β»ΒΉβ° | Controls body size independently of IGF1 |
| SMAD2 | 31 | +0.14 yr | 10β»βΉ | Cell aging and senescence pathway (TGF-beta) |
| STC2 NEW | 21 | -2.96 yr | 10β»βΉ | Blocks IGF1 signaling β novel longevity target |
A PRS combines thousands of small genetic effects into a single number that predicts a trait. Ours uses 54,343 DNA variants to predict how long a dog will live β the first such score ever created for canine longevity.
Dogs with the best longevity genetics benefit most from exercise (+83 days vs +8 days for low-PRS dogs). Good genes amplify good habits.
Breed size is the strongest non-modifiable predictor of lifespan
Intact advantage grows with breed size β largest for giants (+313 days)
Estimate your dog's lifespan based on breed and modifiable factors
Check all that apply:
First large-scale survival comparison across diet types (age-adjusted)
Raw vs Kibble Cox PH (controlling for age): HR=0.823, p=0.007. Grain-free is WORSE than grain-inclusive (41.8% vs 37.0% mortality).
"Puppyish" behaviors are markers of vitality β the dog that pulls the leash and chases squirrels is the healthy one
The Stacking Effect
+570 days
Dogs with 7-8 positive factors have 9.9% mortality
vs 50.2% for dogs with 0-2 factors
Exercise + supplements + healthy weight + insurance + dental + flea/tick + heartworm + vet visits.
None alone is transformative β but stacking them all gives nearly 2 extra years.
Search 87 breeds β tap column headers to sort
| Breed β | Median β | Exercise β | Mortality β |
|---|
Showing breeds with 100+ dogs in the dataset. Ages are median at death.
We used causal inference to separate real effects from correlation
Dogs who take supplements also tend to have wealthier, more health-conscious owners β so just looking at correlations is misleading. Causal inference uses statistical techniques to strip away these confounding factors and estimate the true effect, similar to how a clinical trial would work but using observational data.
β οΈ Individual supplement results require further validation. The aggregate "any supplement" finding is most robust. Omega-3 and fiber also showed cognitive protection.
After the first pass, we dug into heterogeneity, pathways, and the genetic signal.
First end-to-end clean canine longevity polygenic risk score
Adding PRS to age+demographics: C-index 0.835 β 0.848. Effect is real but a bottom-quintile dog can still beat a top-quintile dog through behavior β genetics is ~1.3% C-index over demographics.
Causal forests on 14,969 dogs uncover who gains most β not just average effects.
| Youngest quartile | +5 days |
| Q2 | +55 days |
| Q3 | +82 days |
| Oldest quartile | +116 days |
Seniors get 23Γ the benefit younger dogs do. It's never too late to start.
| 0 conditions | +37 days |
| 1β2 | +44 days |
| 3β5 | +46 days |
| 6+ conditions | +63 days |
The sicker the dog, the bigger the supplement lift.
Causal mediation on 1,114 dogs with both blood panels and long-term survival outcomes.
Blood chemistry barely improves survival prediction over age/breed/weight (ΞC-index +0.002). The body's aging signal is written somewhere else.
Factor analysis on 92 environmental variables reveals a few independent axes of risk.
| Factor | What it captures | HR | Read |
|---|---|---|---|
| Residential mobility | more moves, fewer years in home | 1.04 | shorter life |
| Rural property | large acreage, no sidewalks | 1.05 | shorter life (predators, access) |
| Indoor air hazards | fireplaces, incense, radon, lead | 0.98 | borderline; likely owner-awareness proxy |
| Toy variety | rope, sticks, tennis balls, metal, fabric | 0.92 | longer life (enriched play) |
| Household size | people + pets in home | 1.00 | no effect |
The raw 54K-SNP PRS contains redundant, linked variants. Pruning them via PLINK clumping (rΒ²<0.1) yields a tighter, more predictive score.
| Size class | n | HR per SD | p | Q5-Q1 |
|---|---|---|---|---|
| Toy/Small mixed | 791 | 0.30 | 3e-16 | +1038d |
| Medium mixed | 951 | 0.54 | 2e-15 | +1002d |
| Standard mixed | 1,159 | 0.62 | 1e-13 | +986d |
| Standard AKC purebred | 737 | 0.59 | 5e-13 | +1054d |
| Giant AKC purebred | 365 | 0.53 | 1e-11 | +800d |
| Medium AKC purebred | 648 | 0.45 | 1e-9 | +946d |
| Toy/Small AKC purebred | 804 | 0.63 | 3e-9 | +588d |
| Large AKC purebred | 1,307 | 0.75 | 7e-8 | +599d |
| Large non-AKC/mixed | 631 | 0.69 | 4e-5 | +963d |
Every class is independently significant. The PRS captures longevity signal beyond body size.
10,390 death dates reveal two strong patterns β one biological, one procedural.
Mon/Fri peaks, weekend dips β vet scheduling signature. Confirms the 86%-euthanasia rate: most "death dates" are appointment dates, not biological endpoints.
Summer + Nov peaks may combine heat stress, vacation-timed euthanasias, and behavioral patterns. Winter troughs. First-ever canine seasonal-mortality analysis.
Population-structure correction (top 10 genetic PCs from the relatedness matrix) on the chromosome-6 GWAS.
| Variant | p (uncorrected) | p (PC-corrected) | Verdict |
|---|---|---|---|
| chr6:74,750,484 (IGF1) | 3.9 Γ 10β»ΒΉβ°β΅ | 4.0 Γ 10β»ΒΉβ°β΅ | β real |
| chr6:~25M (25-SNP cluster) | 1.5 Γ 10β»βΆ | 5 Γ 10β»Β³ | Γ stratification |
IGF1 is the one chr6 signal that survives. Other "hits" from the uncorrected pass were breed-size stratification β a known pitfall in canine GWAS.
Cox HR vs Labrador Retriever β controlling for exercise, weight, sex, supplements, vet frequency, owner education, and existing health conditions.
| Pembroke Welsh Corgi | 0.59 | p=5e-4 |
| Jack Russell Terrier | 0.76 | p=0.12 |
| Chihuahua | 0.77 | p=0.14 |
| Dachshund | 0.78 | p=0.10 |
| Bernese Mountain Dog | 2.99 | p=4e-18 |
| Doberman Pinscher | 2.18 | p=7e-16 |
| Great Dane | 2.01 | p=3e-8 |
| Boxer | 1.84 | p=9e-10 |
| French Bulldog | 1.72 | p=4e-3 |
| Cavalier King Charles | 1.65 | p=3e-3 |
| German Shepherd Dog | 1.47 | p=4e-9 |
| Greyhound | 1.42 | p=8e-4 |
Bernese Mountain Dogs die 3Γ faster than Labs regardless of care β hemangiosarcoma-driven. Corgis live 40% longer. Breed genetics is a larger lever than all controllable factors combined for extreme breeds.
NMF topic model on 3,183 end-of-life narratives surfaces clinically coherent death patterns.
Full morning briefing: ~/Desktop/dev/dog-longevity/reports/MORNING_BRIEFING_20260414.md
The earlier "diet archetype 3 protects +163 days (HR=0.84, p=2e-6)" finding was traced to an NMF scaling artifact β the archetypes loaded on binary weight-history questions (ever_overweight / ever_underweight), not actual foods. Re-analysis with standardized inputs is pending. Science is self-correcting in public.
Before submission, we ran a proper external validation β a stratified 20% holdout the models never saw during training. We also pooled every p-value in the study under one Benjamini-Hochberg correction and caught two bugs worth disclosing.
Stratified split on breed-size Γ event indicator. Random seed 20260414. Nothing about the 20% was used for training, hyperparameter tuning, or feature selection.
The earlier headline claim of PRS C-index = 0.848 was a survival model with PRS plus age, sex, and breed size as covariates β not PRS alone. On the held-out cohort, PRS by itself hits C-index 0.65, which is still strong for a polygenic score but not a standalone 0.85 predictor. The combined model (PRS + demographics) does cross 0.85 β both numbers will appear in the paper with explicit descriptions of what's in each model.
Dominated by the somatotropic-axis GWAS (IGF1 at p=1.6e-104), the causal supplement ATE pipeline, the owner-quality composite, and the cognitive-trajectory survival models. Details: analysis/global_fdr_results/master_pvalues.csv.
Dropping any single popular breed (Labrador, Golden, German Shepherd, Australian Shepherd, Poodle, Border Collie, the Unknown-breed pool, or three common mix lineages) barely moves the PRS C-index:
Effect is not driven by any single breed. The polygenic signal is cross-breed.
Re-examining the AIPW results with fresh eyes: individual supplement "effects" are dominated by confounding-by-indication β sick dogs get more supplements. E-values above 1015 are mathematically meaningless; they flag a confounding signal orders of magnitude larger than any plausible treatment effect.
Only supp_any = +141 days holds up as a defensible positive causal effect. The per-supplement table will be reported with explicit confounding disclaimers, not causal claims.
Built one XGBoost AFT model that ingests all modalities β phenome (195 HLES features) + PRS + environmental exposome + cognitive trajectory + activity trajectory β and evaluated it on the SAME 8,293 held-out dogs. Ablation shows where the lift comes from:
| Scenario | N features | Holdout C-index | Ξ vs base |
|---|---|---|---|
| Phenome only (base) | 195 | 0.8427 | β |
| + PRS | 197 | 0.8462 | +0.0035 |
| + PRS + exposome | 203 | 0.8467 | +0.0040 |
| + PRS + exposome + cognitive | 206 | 0.8735 | +0.0308 |
| FUSION (all modalities) | 207 | 0.8751 | +0.0324 |
β caveat: cognitive and activity trajectory features require β₯2 follow-up surveys, so they carry survival bias β dogs that lived longer had more assessments. The 0.875 is an optimistic upper bound. The clean-inference fusion number is 0.8467 (phenome + PRS + exposome, no follow-up-dependent features).
A cross-breed PRS that only works between breeds is just capturing breed-size differences. A PRS that works within a single breed is capturing real biology. Our LD-clumped PRS passes the within-breed test:
| Breed | n | events | HR per SD (age-adj) | p |
|---|---|---|---|---|
| All dogs | 7,587 | 1,605 | 0.738 | 3Γ10β»β΅β° |
| Labrador Retriever | 430 | 96 | 0.597 | 3Γ10β»βΆ |
| German Shepherd Dog | 187 | 49 | 0.630 | 4Γ10β»Β³ |
| Lab mixes | 101 | 22 | 0.606 | 0.047 |
| Poodle | 123 | 22 | 0.535 | 0.060 |
| Golden Retriever | 413 | 82 | 0.874 | 0.22 (null) |
Within-breed HRs range 0.535β0.874. The Golden Retriever null effect is interesting β hemangiosarcoma/lymphoma dominates Golden mortality and may overwhelm the somatotropic PRS signal. Worth its own analysis.
For each of 16 canonical human longevity genes, does the top SNP in our DAP GWAS show the same directional effect as published human evidence?
Every member of the canonical IGF1/GH/FOXO3/KL longevity axis agrees in direction. Disagreements concentrate on genes with known cancer-vs-aging pleiotropy. Conservation of the insulin/IGF longevity pathway from worms β mice β dogs β humans.
Hypergeometric test of our canine longevity signal genes against curated aging pathways (universe = 20,000 canine protein-coding genes).
| Pathway | Path size | Overlap | Fold | p |
|---|---|---|---|---|
| Insulin / IGF signaling | 38 | 7 | 160Γ | 1.2Γ10β»ΒΉβ΄ |
| FOXO signaling | 26 | 4 | 134Γ | 2.0Γ10β»βΈ |
| DNA damage response | 29 | 4 | 120Γ | 3.1Γ10β»βΈ |
| Growth hormone axis | 19 | 3 | 137Γ | 1.3Γ10β»βΆ |
| Cellular senescence | 27 | 3 | 97Γ | 3.8Γ10β»βΆ |
| Telomere maintenance | 20 | 2 | 87Γ | 2.4Γ10β»β΄ |
| mTOR signaling | 27 | 2 | 64Γ | 4.4Γ10β»β΄ |
| TGF-Ξ² / SMAD signaling | 30 | 1 | 29Γ | 0.03 |
β caveat: signal-gene list combines the 5 GWS hits with the 16 cross-species nominal genes, which introduces modest circularity. Even restricting to the 5 GWS loci alone, IGF1+STC2 in the Insulin/IGF pathway is a significant enrichment (pβ10β»β΄).
Among genotyped DAP dogs, Goldens are the only major breed where the somatotropic PRS is null. Cause-of-death stratification reveals why: Goldens die of cancer at 70.7% β the highest rate among top breeds β and the somatotropic axis does not modulate Golden cancer mortality.
The somatotropic/IGF axis modulates Lab mortality whether cancer or not. In Goldens the axis is silent either way β pointing to breed-specific cancer biology (likely TP53 / BRCA / hemangiosarcoma pathway). Candidate topic for a third paper.
First quantitative bounds on hΒ² for canine lifespan from DAP data. Full GCTA-GREML would need the raw GRM; this triangulation is reviewer-defensible.
Narrow-sense hΒ² likely 0.15β0.30, consistent with human-longevity heritability estimates (0.20β0.30). Suggests that genetic risk scores can meaningfully inform but not determine canine lifespan β as it should be.
Stratifying dead dogs by age at death reveals clean selection: dogs that died young have low PRS, dogs that survived to old age have high PRS. A straight-line dose-response across four age-at-death quartiles.
| Quartile | Median age at death | Mean PRS (z) | Mann-Whitney p |
|---|---|---|---|
| Early | 7.8 yr | β0.313 | 1.5Γ10β»βΉ |
| Mid-early | 11.3 yr | β0.108 | 0.08 |
| Mid-late | 13.4 yr | +0.027 | 0.013 |
| Late | 15.5 yr | +0.201 | 1.0Γ10β»β· |
Formal PRS Γ age interaction: HR = 0.987 per year Γ SD (p=0.032). PRS effect strengthens modestly with age. Age-stratified HRs are stable across enrollment ages (0.79β0.85). The monotonic age-at-death gradient is the cleanest single-figure demonstration that the PRS captures real longevity biology.
Splitting deaths by cause using the EOL survey: the somatotropic PRS captures more non-cancer than cancer longevity biology. The IGF/GH pathway appears to mediate aging-in-place more than cancer survival.
PRS ΞHR cancerβnon-cancer = 13%. Non-cancer longevity appears more polygenic under somatotropic control; cancer longevity is more stochastic or under different genetic architecture.
Deeper dive: Goldens have 55% less PRS variance than Labs (0.0017 vs 0.0038, Levene p=2Γ10β»ΒΉβΉ). Breed-level selection/bottleneck compressed genetic diversity at longevity SNPs. Less variance β less power to detect signal. The Golden PRS null is partly a statistical-power issue, not purely different biology.
Within-Golden means span only ~0.01 PRS-unit β compare to cross-breed Lab non-cancer vs alive spread of 0.030 (p=0.003). The Golden breed has simply run out of signal. Reduced variance is the proximal cause; whether residual cancer biology is also attenuated is under-powered at n=58 cancer events.
Eight Nature-formatted figures rendered at 300 DPI (PNG + PDF + SVG each). Saved to analysis/54_figures/figures/.
Dog Aging Project 2024 Open Access Release. 50,188 companion dogs, 15,056 deaths, 933 survey variables covering diet, exercise, environment, health conditions, medications, and owner demographics. 7,688 dogs with low-coverage whole-genome sequencing (22.7M imputed SNPs).
XGBoost AFT: Gradient-boosted accelerated failure time β holdout C-index 0.8505 on 8,293 unseen dogs. DeepSurv: PyTorch neural survival (CV 0.844). Cox PH (top 30): classical baseline on XGB-selected features β holdout C-index 0.8479. Random Survival Forest: 100-tree ensemble (CV 0.825). Multiple imputation via MICE on 41,464 dogs. Stratified 80/20 split on breed-size Γ event for external validation.
AIPW (Augmented Inverse Probability Weighting): Doubly robust estimator with cross-fitted XGBoost propensity and outcome models. 13 confounders including income, education, vet visits, exercise, breed, and insurance. FDR correction for multiple testing.
GWAS: PLINK2 linear regression across 22.7M variants on 38 autosomes. Bonferroni threshold P<2.2e-9. PRS: 54,343 SNPs at P<0.001, clumping + thresholding. Validation: 70/30 train/test split with 0.005 overfit gap.