The mystery of the remnant symbol
Agriya Khetarpal
software, society, and our digital heritage
Slides to follow along

- Computer science + applied math background
- Twenty-three years old and graduated in late 2024
- Software engineer at Quansight
- Privileged to contribute to the Scientific Python ecosystem and help maintain the Pyodide ecosystem
-
Other work: JupyterLite,
autograd
, and the PyBaMM ecosystem (Python Battery Mathematical Modelling) - Interested in
- Python packaging 📦🐍
- Scientific computing ➗🧪
- Compilers and toolchains 🛠️⛓️
- Documentation and technical writing 📝🌉
- ...and more 👾
About me 😁
Python packaging 📦
Python wheels 🛞
A wheel is a ZIP file that contains:
- Compiled object files and libraries (.so, .dylib, .pyd), i.e., for projects with compiled code
-
Python source code (
.py
files) - Package metadata (dependencies, version info)
- Entry points (console scripts)
What's in a Python wheel?
package-1.0-py3-none-any.whl
├──package/
│ ├──__init__.py
│ └──lib_package.so ← binary file!
└──package-1.0.dist-info/
├──METADATA
└──WHEEL
What's in a Python wheel?
pydistcheck
A tool and a linter for examining built files in Python distributions (source distributions and wheels)
Many thanks to James Lamb (@jameslamb
on GitHub) for authoring it!
A build pipeline for a compiled wheel
Debug symbols are added during compilation, if you don't strip them.
The strip
utility
Removes debug information from binaries
What it removes
- Function names and variable names
- Source file references
- Line number information
- Debugging metadata
Why strip binaries?
- Smaller file sizes
- Faster loading
- Hides implementation details
[compiled-objects-have-debug-symbols]
$ pydistcheck --inspect ./dist/*.whl
[compiled-objects-have-debug-symbols] Found compiled object
containing debug symbols. For details, extract the distribution
contents and run 'nm -a "lightgbm/lib/lib_lightgbm.dylib"'.
errors found while checking: 1
🤔 ...but the binary was stripped!

Let's check nm
's output
$ nm -a "lightgbm/lib/lib_lightgbm.dylib"
0000000005614542 - 00 0000 OPT radr://5614542
What is radr://5614542?
radr://5614542




Apple’s bug tracking system: Radar
The “black hole” of users’ bug reports
Used as an internal ticketing system, too
Bug reports were not natively searchable and this hasn't necessarily been acted upon
For bugs across the web, iOS, macOS, and all other Apple devices and software; from compilers to end-user apps
But where is Apple’s source code?
https://opensource.apple.com/source/cctools/cctools-822/misc/strip.c
https://opensource.apple.com/source/cctools/cctools-973.0.1/misc/strip.c.auto.html
https://opensource.apple.com/source/cctools/cctools-751/misc/strip.c.auto.html
https://github.com/opensource-apple/cctools/blob/fdb4825f303fd5c0751be524babd32958181b3ed/misc/strip.c
taken down, no longer accessible
not the same code; various commits in the version control history are lost
{



/*
* If there is a chance that we could end up with an indirect symbol
* with an index of zero we need to avoid that due to a work around
* in the dynamic linker for a bug it is working around that was in
* the old classic static linker. See radar bug 5614542...
*/
if(new_nlocalsym == 0 && nindirectsyms != 0){
len = strlen("radr://5614542") + 1;
new_strsize += len;
new_nlocalsym++;
new_nsyms++;
hack_5614542 = TRUE;
}
The solution: add a dummy symbol at index 0 to prevent dynamic linker confusion
Why this workaround exists
The original problem
- Dynamic linker expects no indirect symbols at index 0
- Classic static linker sometimes put them there
- This caused runtime linking failures
The fix
- Always add
radr://5614542
, a harmless dummy symbol at index 0 - Push any indirect symbols to index 1+
- Prevents the old linker bug
🎯 Like a reserved parking spot
The digital archaeology hunt
What I tried to find:
- Apple's cctools source code
- Original strip.c implementation
- Documentation about RADR bugs
- Historical context
The result:
- ❌ Most opensource.apple.com links broken
- ❌ Internet Archive incomplete
- ❌ Multiple mirrors down
- ✅ Found ONE working archive.is link
🔍 The Search
Hours of digital detective work to understand a 20-year-old workaround

All we do is political
critical knowledge is disappearing from the world
>8,000 web pages from federal websites and ~3000 datasets removed
Attacks on gender identity, DEIA initiatives, and LGBTQIA+ content
Research funding cuts
Takedowns of critical research data





Source code for the AGC, Apollo 11
- 145,000 lines of moon landing code
- Nearly lost forever in the 1990s
- Museums showed no interest in preservation
- One amateur enthusiast (Ron Burkey) saved it in 2003
- The MIT Museum had paper printouts gathering dust
🚀 The code that took us to the moon
almost lost to bureaucratic indifference

Apple's selective pruning of source code
- ZFS project (2009) - 3 years development terminated
- Darwin binary releases (2005) - OS discontinued
- OpenDarwin community (2006) - hosting shutdown
- Source documentation gaps - cctools, WebKit histories


left-pad
package removal incident from npm
module.exports = leftpad;
function leftpad (str, len, ch) {
str = String(str);
var i = -1;
if (!ch && ch !== 0) ch = ' ';
len = len - str.length;
while (++i < len) {
str = ch + str;
}
return str;
}

GeoCities: the largest single digital loss
October 26, 2009 - Digital apocalypse
- 38 million user websites deleted
- 190 million hours of collective work lost
- Early web culture, erased forever
- Yahoo's decision with a 30-day notice
GeoCities: the largest single digital loss: rescue
🌐 The web's first mass extinction
38M sites gone forever
190M hours of human creativity


The economic cost of digital amnesia
$31.5B
Lost annually due to poor knowledge sharing [1]
$47M
Per company annually (30K employees on average) [2]
The hidden cost: Teams spend more time recreating knowledge than creating new solutions
[1] International Data Corp, 2017; [2] Panopto 2018

- 143+ million projects archived
- 9.1+ billion unique source files
- UNESCO partnership for digital heritage
- Network of global mirrors
Digital preservation
Key preservation strategies:
📥 Capture
Automated ingestion from multiple sources
🔄 Replicate
Geographic distribution and redundancy
🏷️ Metadata
Rich descriptions and provenance tracking
🔍 Access
APIs and tools for discovery
⏰ Monitor
Long-term integrity and migration
Tools: Archivematica, DSpace, Fedora, LOCKSS
Policy responses
- EU Cyber Resilience Act – security requirements
- US Federal Source Code Policy – open by default
- Germany's Sovereign Tech Fund – €10M annually
- UNESCO Software Heritage - heritage protection
- Internet Archive – legal protection
- The UK's National Archives – obsolete file formats
- Nationaal Archief (Netherlands) – video games, via emulation
- National Digital Preservation Program by MeitY – electronic evidence, डिजिटालय, and so on
What you can do today
🔄 Mirror everywhere
- Don't rely only on GitHub
- Use GitLab, Codeberg, sourcehut,...
- Self-host Git repositories if you can (Gitea)
- Regular backups to multiple cloud providers or network storage
🏛️ Use code archival platforms
- Submit save requests and get permanent identifiers
- Include your code in citations
- Support the mission however you can
Takeaways
- 🔍 Technical archaeology is now routine
Modern developers regularly need detective skills to understand legacy systems and missing documentation - 🌐 Knowledge preservation is political
Those who control access to technical information shape who can innovate and participate in digital society -
🤝 Collective memory requires collective action
Individual efforts to document and preserve create shared infrastructure that benefits everyone
Questions and discussion
Thank you for your time! Please feel free to say hello!

in/agriyakhetarpal
agriyakhetarpal
agriyakhetarpal
agriyakhetarpal [at] outlook [dot] com
Content licensed under the CC-by-SA Attribution-ShareAlike Version 4.0 International License

Emoji by openmoji.org
@agriyakhetarpal@fosstodon.org
@agriyakhetarpal.bsky.social
These slides

pydelhi-july-2025
By Agriya Khetarpal
pydelhi-july-2025
A talk that is a potpourri of some topics around Python packaging, macOS binaries and linkers, and on the digital preservation in the age of disappearing code
- 95