Exploring a Distilled Genealogy Model
Feasibility, Cost, and Next Steps
Why This Conversation?
- LLMs are powerful, but expensive at scale
-
Genealogy workflows are structured and repetitive
-
Long term, we may need:
✔ lower latency
✔ predictable behavior
✔ private inference
✔ cost efficiency
What Is a Distilled Model?
Teacher → Student training approach
Curated genealogical examples + reasoning
GPT-5.1 / Claude Sonnet (Teacher)
LLM
Fine-tuned smaller model (e.g., 7B)
SLM
Is it any good?

Is it any good?
According to Chat GPT
-
Less drift than LLMs
-
Faster
-
Cheaper
-
More predictable/accurate
Why 7B?
| Size | Runs Where | Notes |
|---|---|---|
| 3B | mobile/laptop | Fast, but may lack reasoning depth |
| 7B | AWS GPU | Best balance of accuracy + cost |
| 13B | Larger GPU | More capable, more cost |
| 300B? | Dedicated farm of H200s | LLMs |
Cost
Cost to Distill
| Stage | Cost | Notes |
|---|---|---|
| Dataset prep (internal work) |
Time-based | Needs sampling and formatting |
| Training Compute | $2K-$10K | Depends on dataset size + iterations |
| Evaluation/Refinement | $0K-$5K | Optional second pass |
AI Research Team - $30K on hardware (need to confirm)
ATB Team - $10K to train a model (need to confirm)
Inference Cost
| Approach | Pricing Model | Cost of 150 rps per month |
|---|---|---|
| Hosted Sonnet/GPT | per-token | $1M - $14M |
| Self Hosted (AWS) | Fixed compute | $5K - $10K |
Tree-data costs around $1100/month
150rps = num of rps on person detail page in the last month
Proposed Next Steps
MVP
- Curate 20 experiences using LLM
- Hypothesize what we think users might want
- Test that experience with 10s of users and see if we're on the right track
MVTP
Minimum Viable Technical Proof
- Gather high-quality genealogist contributions
- Have an LLM help determine why edits were made
- Make a small 2-3B param model prototype to see if it works
Unsolved Problems
- 3rd party Sources (ancestry/www)
- We can't scrape
- Image Sources (memories)
- Maybe we can distill a model for this?
- Maybe we can leverage Indexing's model?
- Unknown unknowns
Exploring a Distilled Genealogy Model: Feasibility, Cost, and Next Steps
By Tyler Graf
Exploring a Distilled Genealogy Model: Feasibility, Cost, and Next Steps
- 31