Sniff is an open canine knowledge graph.
The Sniff Atlas, 18,477 research dogs across CanVAS, the Golden Retriever Lifetime Study, NHGRI 722, and Darwin's Ark, is the centerpiece. The catalog of 9.67 million common canine variants with calibrated AI pathogenicity (ESM2 AUC 0.935 vs OMIA, n=115) is open on Zenodo at DOI 10.5281/zenodo.20566358, CC-BY 4.0, citable in any paper or tool that can read a DOI.
Every breed page, gene page, disease page, and variant page on this site is computed from that catalog. The food scoring engine at sniff.world/methodology runs on the same brand contract: every claim cited, every limitation surfaced. The MCP server that makes this resource agent-callable from Claude, ChatGPT, Cursor, or any conformant client is deployed at mcp.sniff.world (live as DNS finalizes).
What we cannot yet do is named on every page where it applies. The MAF≥1% imputation floor. The breeds with sparse sampling. The diseases without per-dog penetrance data. The pathogenicity scores that are computational, not clinical. The point of a research resource is that someone could fact-check it. So we made the facts checkable.
Adding your dog is free. Reading the science is free. It will always be free.
Canine genetics has a distribution problem. The science is excellent. Researchers spend careers on it, publish under open licenses, and deposit on Zenodo. Brundage 2026 (CanVAS), Plassais 2019 (NHGRI 722), Donner 2023 (carrier frequencies), the OMIA disease catalog, the Dog10K consortium, the Golden Retriever Lifetime Study, Darwin's Ark, all free to anyone who can write PLINK queries against raw genotype files on CanFam4.
Almost no dog owner can. Almost no clinical veterinarian can. Almost no AI agent could. The most informative datasets in the field are paywalled by accident; the most accessible are paywalled by design (the largest consumer dog-DNA testing companies have genotyped millions of dogs whose data sits on corporate servers that will never open).
That is the bridge we built. The science is open. The catalog is open. The reference pages on this site are computed from open sources, hyperlinked back to the original studies, and free. When an AI agent grounds an answer on Sniff, it inherits a citation chain it could not invent. When a vet queries the MCP, it gets back numbers with their provenance attached.
The atlas. 18,477 dogs in a 3D genetic universe, computed from a 256-dimensional PCA of a stride-4 19,304-SNP backbone and projected through UMAP. Cluster them, navigate them, click any star to see who that dog is. The visualization lives at /atlas/.
The variant catalog. 9,667,790 common canine variants on CanFam4, breed-stratified across 188 breeds, with calibrated ESM2 pathogenicity (AUC 0.935 vs OMIA, n=115), Pangolin splice scores, 241-mammal phyloP conservation, and SnpEff consequence annotations. Validated against the NHGRI 722 directly-sequenced cohort at allele-frequency correlation r = 0.760 (MAF-matched control r = 0.742; common-variant background r = 0.953). Open on Zenodo, DOI 10.5281/zenodo.20566358, CC-BY 4.0.
The knowledge graph. 647 nodes and 66,015 edges in Biolink-compatible KGX parquet, with provenance, ClinGen-word evidence grade (Definitive / Strong / Moderate / Limited / Predicted), and confounding-risk flag (LOW / MED / HIGH) on every node and edge. v1.0.1 is the Donner 2023 breed-by-Mendelian-variant carrier-frequency layer, OMIA-free pending curator confirmation; the clinical disease layer arrives in v1.1.
The reference pages. Every breed in the atlas, every gene in the trait-loci tier, every Mendelian disease in the Donner cohort, every variant we have hand-curated coverage for. Every numeric claim links back to its source study or dataset. Heterozygosity rank across 215 breeds. Intra-breed PCA distance. Nearest-neighbor relationships. Trait-locus allele frequencies. Sub-population structure inside single breeds. These numbers were technically computable from open data; nobody had put the answers where a dog owner, a vet, or a breed-club committee could look.
The food scoring engine. Nine-component rubric, twenty-five flagged ingredients with published reasoning, no brand pays. The methodology page at sniff.world/methodology is the full audit.
The MCP server. Live at mcp.sniff.world over Streamable HTTP, registered in the official MCP Registry as sniff-mcp. Thirteen RPCs including the joined query ask_variant_context, which takes a CanFam4 position and returns the variant's frequency across all breeds, calibrated pathogenicity scores, gene context, disease links, and cross-breed comparison with provenance on every field. Sub-millisecond. The v1.1 discovery layer (semantic search, nearest_breeds, breed_similarity) extends the engine from lookup to discovery. No agent-callable resource for canine genomics existed before this; Sniff is the first.
Sniff does not generate the foundational science. We synthesize it. The difference matters.
CanVAS (Brundage 2026, bioRxiv) is the genotype substrate. 14,478 dogs across 342 breeds, harmonized from 15 contributing studies, imputed against Dog10K to 9.67 million variants on CanFam4 (UU_Cfam_GSD_1.0). License: CC-BY 4.0. Every dog in the Sniff Atlas is sourced from CanVAS.
NHGRI 722 (Plassais 2019, Nature Communications 10:1489) is the validation cohort. 722 directly-sequenced canine genomes, predates Dog10K, used as the non-circular replication target for the Sniff Atlas allele frequencies. CC-BY 4.0.
Donner 2023 (PLOS Genetics 19:e1010651) contributes the breed-by-Mendelian-variant carrier-frequency catalog. 250 variants screened in 1,054,293 dogs — a direct-to-consumer cohort, so we grade these data Evidence Limited / Confounding MEDIUM on every page where it applies. CC-BY 4.0.
OMIA (Online Mendelian Inheritance in Animals; Nicholas, Tammen, and the Sydney Informatics Hub at the Sydney School of Veterinary Science, The University of Sydney; doi:10.25910/2AMR-PV70) is the canonical reference for canine genetic disease. The clinical disease layer ships in Sniff v1.1 pending OMIA curator confirmation.
OFA (Orthopedic Foundation for Animals) supplies the hip, elbow, cardiac, and patellar screening data on the breed pages where we have it. OFA data is self-selected: breeders choose which dogs to submit, dogs that fail screening are less likely to be submitted, so true population prevalence is higher than OFA numbers suggest. We say this where we cite them.
Golden Retriever Lifetime Study (Morris Animal Foundation; Guy and Page 2022, doi:10.1371/journal.pone.0269425) contributes the densest within-breed slice. 3,197 Golden Retrievers tracked from puppyhood. In Sniff terms, every GRLS dog is a Hero.
Darwin's Ark (Karlsson lab, UMass Chan + Broad Institute; Lord 2025 PNAS) is the open mixed-breed cohort. 3,277 dogs in the atlas. CC0. The Morrill et al. 2022 finding that breed explains roughly 9% of behavioral variance reshaped how we talk about breed and behavior on every page.
Dog10K Consortium (Meadows et al. 2023, Genome Biology 24:187) is the imputation reference panel CanVAS imputes against. 1,929 fully-sequenced canids. We did not build it. We use it. We cite it.
- Return any of 9.67 million CanFam4 common-variant frequencies across 188 breeds, with calibration and provenance, in under a millisecond.
- Score the protein-damaging potential of any coding variant with a calibrated ESM2 LLR (AUC 0.935 vs OMIA pathogenic, n=115; ACMG subset AUC 0.942).
- Answer the joined query (a variant's frequency, pathogenicity, gene context, disease links, and cross-breed comparison) in a single MCP call.
- Give a vet, a breed-club health committee, or a journal reviewer a defensible second opinion grounded in published cohort data, with the citation chain attached.
- Ground an AI agent's answer on canine genetics in a citable, dated, CC-BY-licensed source. Every MCP response carries the Zenodo DOI in its provenance block.
- Predict whether your specific dog will develop a specific disease. The pathogenicity funnel is an ESM-damaging-variant detector, not a disease detector. The positive control missed SOD1-DM and PRCD-PRA, two of the most famous breed-common disease variants in dogs. We say UNPROVEN on every pathogenicity output.
- Catalog rare Mendelian variants. The MAF ≥ 1% imputation floor cuts 89 of 94 known canine pathogenic variants that lift cleanly to CanFam4. Sniff is a common-variant resource by design. OMIA remains the canonical clinical reference for rare-Mendelian disease.
- Replace your veterinarian. We inform. Vets diagnose. Every health-related page on the site is built to support that division of labor, not blur it.
- Speak to breeds with very small atlas samples (n < 50) with the same precision as the well-sampled breeds. We surface sample size on every breed page and weight our claims by it.
- Tell you what to feed your dog when the data is sparse for that breed × stage × condition. Where we have it, we publish it. Where we do not, we say so.
Sequenced dogs. If you have your dog's raw genetic data from a test you have already taken, we project your dog into the atlas from their actual genome. The projection runs against the same 19,304-SNP backbone CanVAS uses, so the position is comparable to every research dog. The confidence on the placement is shown explicitly. We would rather give you a truthful star than a flattering guess.
Charted dogs. If you do not have raw genetic data, you tell us what you know about your dog: breed if you know it, size, coat, the shape of the ears, the way they move. We project from what their person reports and label that star Charted, not Sequenced. The star is real. The placement is genuine. A Charted dog does not change the underlying genetic model. That is the honest deal, stated on every page where it matters.
A dog whose person could not pay for a sequencing test is not a lesser dog. The observation is real. That is enough to earn a star.
- Charge for the science. The breed pages, gene pages, disease pages, variant pages, methodology, atlas, and MCP. Free, forever.
- Sell individual dog data. Patterns aggregated across the atlas are public. Individual data is yours. Not to brands, not to insurers, not to anyone.
- Take money from a brand to influence a food score. The food engine is independent. A brand that wants a better number makes a better food.
- Pretend to know more than we do. UNPROVEN on every pathogenicity output. Sample-size disclosure on every breed page. Evidence grade on every Mendelian variant.
- Treat an estimate as a measurement. That single rule is what keeps the rest honest.
Sniff is an open canine knowledge graph. The dataset is on Zenodo. The reference pages are computed from it. The MCP server makes it agent-callable. The food scoring engine sits next to it on the same independence contract. Everything free. Everything cited. Everything yours to query, yours to cite, yours to fact-check.
When this becomes a profitable company, ten percent of every year's profit goes to shelters, rescues, and working-dog programs. That is the Pledge.
For questions, corrections, research inquiries, or to cite something properly: [email protected]. Citation formats at sniff.world/cite.
Sniff is built by Matt Gehring (ORCID 0009-0001-9531-2861), based in Kansas City, operating as Candor Systems LLC. No equity in any pet food, pet-DNA-testing, or veterinary-services company. No brand sponsorships. No advertising. Owner-funded with compute credits from NVIDIA Inception and the major cloud-startup programs (AWS, Google Cloud, Microsoft Azure, Lambda).
The work began with a dog whose raw genome got locked behind a corporate DNA test. The origin story lives on the Pledge page and is not the load-bearing claim of this resource. The work is.
Built on: CanVAS (Brundage 2026). Golden Retriever Lifetime Study (Morris Animal Foundation). Darwin's Ark (Karlsson Lab, Broad Institute). Dog10K Consortium (Meadows et al. 2023). Donner et al. 2023. OMIA (Nicholas, Tammen, and the Sydney Informatics Hub at the Sydney School of Veterinary Science, The University of Sydney).
Published as: the Sniff Atlas v1.0.1 (Gehring 2026, doi:10.5281/zenodo.20566358, CC-BY 4.0). Full methodology at sniff.world/methodology.