Exploring a Distilled Genealogy Model

Feasibility, Cost, and Next Steps

Why This Conversation?

  • LLMs are powerful, but expensive at scale
  • Genealogy workflows are structured and repetitive

  • Long term, we may need:

    ✔ lower latency

    ✔ predictable behavior

    ✔ private inference

    ✔ cost efficiency

What Is a Distilled Model?

Teacher → Student training approach

Curated genealogical examples + reasoning

GPT-5.1 / Claude Sonnet (Teacher)

LLM

Fine-tuned smaller model (e.g., 7B)

SLM

Is it any good?

Is it any good?

According to Chat GPT

  • Less drift than LLMs

  • Faster

  • Cheaper

  • More predictable/accurate

Why 7B?

Size Runs Where Notes
3B mobile/laptop Fast, but may lack reasoning depth
7B AWS GPU Best balance of accuracy + cost
13B Larger GPU More capable, more cost
300B? Dedicated farm of H200s LLMs

Cost

Cost to Distill

Stage Cost Notes
Dataset prep (internal work)
Time-based Needs sampling and formatting
Training Compute $2K-$10K Depends on dataset size + iterations
Evaluation/Refinement $0K-$5K Optional second pass

AI Research Team - $30K on hardware (need to confirm)

ATB Team - $10K to train a model (need to confirm)

Inference Cost

Approach Pricing Model Cost of 150 rps per month
Hosted Sonnet/GPT per-token $1M - $14M
Self Hosted (AWS) Fixed compute $5K - $10K

Tree-data costs around $1100/month

150rps = num of rps on person detail page in the last month

Proposed Next Steps

MVP

  1. Curate 20 experiences using LLM
  2. Hypothesize what we think users might want
  3. Test that experience with 10s of users and see if we're on the right track 

MVTP

Minimum Viable Technical Proof

  1. Gather high-quality genealogist contributions
  2. Have an LLM help determine why edits were made
  3. Make a small 2-3B param model prototype to see if it works

Unsolved Problems

  • 3rd party Sources (ancestry/www)
    • We can't scrape
  • Image Sources (memories) 
    • Maybe we can distill a model for this?
    • Maybe we can leverage Indexing's model?
  • Unknown unknowns

Exploring a Distilled Genealogy Model: Feasibility, Cost, and Next Steps

By Tyler Graf

Exploring a Distilled Genealogy Model: Feasibility, Cost, and Next Steps

  • 31