The mystery of the remnant symbol

Agriya Khetarpal

software, society, and our digital heritage

Slides to follow along

  • Computer science + applied math background
  • Twenty-three years old and graduated in late 2024
  • Software engineer at Quansight
  • Privileged to contribute to the Scientific Python ecosystem and help maintain the Pyodide ecosystem
  • Other work: JupyterLite, autograd, and the PyBaMM ecosystem (Python Battery Mathematical Modelling)

  • Interested in
    • Python packaging 📦🐍
    • Scientific computing ➗🧪
    • Compilers and toolchains 🛠️⛓️
    • Documentation and technical writing 📝🌉
    • ...and more 👾

About me 😁

Python packaging 📦

Python wheels 🛞

A wheel is a ZIP file that contains:

  • Compiled object files and libraries (.so, .dylib, .pyd), i.e., for projects with compiled code
  • Python source code (.py files)
  • Package metadata (dependencies, version info)
  • Entry points (console scripts)

What's in a Python wheel?

package-1.0-py3-none-any.whl
├──package/
│  ├──__init__.py
│  └──lib_package.so ← binary file!
└──package-1.0.dist-info/
    ├──METADATA
    └──WHEEL

What's in a Python wheel?

pydistcheck

A tool and a linter for examining built files in Python distributions (source distributions and wheels)

Many thanks to James Lamb (@jameslamb on GitHub) for authoring it!

A build pipeline for a compiled wheel 

📝 Source Code
🔧 Compiler
📦 Object Files
🔗 Linker
📚 Shared Library
🎯 Python Wheel

Debug symbols are added during compilation, if you don't strip them.

The strip utility

Removes debug information from binaries


What it removes

  • Function names and variable names
  • Source file references
  • Line number information
  • Debugging metadata


Why strip binaries?

  • Smaller file sizes
  • Faster loading
  • Hides implementation details

[compiled-objects-have-debug-symbols]

$ pydistcheck --inspect ./dist/*.whl

[compiled-objects-have-debug-symbols] Found compiled object 
containing debug symbols. For details, extract the distribution 
contents and run 'nm -a "lightgbm/lib/lib_lightgbm.dylib"'.

errors found while checking: 1

🤔 ...but the binary was stripped!

Let's check nm's output

$ nm -a "lightgbm/lib/lib_lightgbm.dylib"

0000000005614542 - 00 0000   OPT radr://5614542

What is radr://5614542?

A mysterious symbol that appears even in stripped binaries on macOS, if stripped

radr://5614542

Apple’s bug tracking system: Radar

The “black hole” of users’ bug reports

Used as an internal ticketing system, too

Bug reports were not natively searchable and this hasn't necessarily been acted upon

For bugs across the web, iOS, macOS, and all other Apple devices and software; from compilers to end-user apps

But where is Apple’s source code?

https://opensource.apple.com/source/cctools/cctools-822/misc/strip.c

https://opensource.apple.com/source/cctools/cctools-973.0.1/misc/strip.c.auto.html

https://opensource.apple.com/source/cctools/cctools-751/misc/strip.c.auto.html

https://github.com/opensource-apple/cctools/blob/fdb4825f303fd5c0751be524babd32958181b3ed/misc/strip.c

taken down, no longer accessible

not the same code; various commits in the version control history are lost

{

/*
 * If there is a chance that we could end up with an indirect symbol
 * with an index of zero we need to avoid that due to a work around
 * in the dynamic linker for a bug it is working around that was in
 * the old classic static linker. See radar bug 5614542...
 */
if(new_nlocalsym == 0 && nindirectsyms != 0){
    len = strlen("radr://5614542") + 1;
    new_strsize += len;
    new_nlocalsym++;
    new_nsyms++;
    hack_5614542 = TRUE;
}

The solution: add a dummy symbol at index 0 to prevent dynamic linker confusion

Why this workaround exists

The original problem

  • Dynamic linker expects no indirect symbols at index 0
  • Classic static linker sometimes put them there
  • This caused runtime linking failures

The fix

  • Always add radr://5614542, a harmless dummy symbol at index 0
  • Push any indirect symbols to index 1+
  • Prevents the old linker bug

🎯 Like a reserved parking spot

The digital archaeology hunt

What I tried to find:

  • Apple's cctools source code
  • Original strip.c implementation
  • Documentation about RADR bugs
  • Historical context

The result:

  • ❌ Most opensource.apple.com links broken
  • ❌ Internet Archive incomplete
  • ❌ Multiple mirrors down
  • ✅ Found ONE working archive.is link

🔍 The Search

Hours of digital detective work to understand a 20-year-old workaround

All we do is political

critical knowledge is disappearing from the world

>8,000 web pages from federal websites and ~3000 datasets removed

Attacks on gender identity, DEIA initiatives, and LGBTQIA+ content

Research funding cuts

Takedowns of critical research data

Source code for the AGC, Apollo 11

  • 145,000 lines of moon landing code
  • Nearly lost forever in the 1990s
  • Museums showed no interest in preservation
  • One amateur enthusiast (Ron Burkey) saved it in 2003
  • The MIT Museum had paper printouts gathering dust

 

🚀 The code that took us to the moon

almost lost to bureaucratic indifference

 

Apple's selective pruning of source code

  • ZFS project (2009) - 3 years development terminated
  • Darwin binary releases (2005) - OS discontinued
  • OpenDarwin community (2006) - hosting shutdown
  • Source documentation gaps - cctools, WebKit histories

left-pad package removal incident from npm

module.exports = leftpad;
function leftpad (str, len, ch) {
  str = String(str);
  var i = -1;
  if (!ch && ch !== 0) ch = ' ';
  len = len - str.length;
  while (++i < len) {
    str = ch + str;
  }
  return str;
}

GeoCities: the largest single digital loss

October 26, 2009 - Digital apocalypse

  • 38 million user websites deleted
  • 190 million hours of collective work lost
  • Early web culture, erased forever
  • Yahoo's decision with a 30-day notice

GeoCities: the largest single digital loss: rescue

🌐 The web's first mass extinction

 

38M sites gone forever

190M hours of human creativity

 

The economic cost of digital amnesia

$31.5B

Lost annually due to poor knowledge sharing [1]

$47M

Per company annually (30K employees on average) [2]

 

The hidden cost: Teams spend more time recreating knowledge than creating new solutions

[1] International Data Corp, 2017; [2] Panopto 2018

  • 143+ million projects archived
  • 9.1+ billion unique source files
  • UNESCO partnership for digital heritage
  • Network of global mirrors

Digital preservation

Key preservation strategies:

📥 Capture

Automated ingestion from multiple sources

🔄 Replicate

Geographic distribution and redundancy

🏷️ Metadata

Rich descriptions and provenance tracking

🔍 Access

APIs and tools for discovery

⏰ Monitor

Long-term integrity and migration

Tools: Archivematica, DSpace, Fedora, LOCKSS

Policy responses

  • EU Cyber Resilience Act – security requirements
  • US Federal Source Code Policy – open by default
  • Germany's Sovereign Tech Fund – €10M annually
  • UNESCO Software Heritage - heritage protection
  • Internet Archive – legal protection
  • The UK's National Archives – obsolete file formats
  • Nationaal Archief (Netherlands) – video games, via emulation
  • National Digital Preservation Program by MeitY – electronic evidence, डिजिटालय, and so on

What you can do today

🔄 Mirror everywhere

  • Don't rely only on GitHub
  • Use GitLab, Codeberg, sourcehut,...
  • Self-host Git repositories if you can (Gitea)
  • Regular backups to multiple cloud providers or network storage

 

🏛️ Use code archival platforms

  • Submit save requests and get permanent identifiers
  • Include your code in citations
  • Support the mission however you can
Remember: Every technical problem has social dimensions

Takeaways

  • 🔍 Technical archaeology is now routine
    Modern developers regularly need detective skills to understand legacy systems and missing documentation
  • 🌐 Knowledge preservation is political
    Those who control access to technical information shape who can innovate and participate in digital society
  • 🤝 Collective memory requires collective action
    Individual efforts to document and preserve create shared infrastructure that benefits everyone

Questions and discussion

Thank you for your time! Please feel free to say hello!

in/agriyakhetarpal

agriyakhetarpal

agriyakhetarpal

agriyakhetarpal [at] outlook [dot] com

Emoji by openmoji.org

 @agriyakhetarpal@fosstodon.org 

@agriyakhetarpal.bsky.social

These slides

pydelhi-july-2025

By Agriya Khetarpal

pydelhi-july-2025

A talk that is a potpourri of some topics around Python packaging, macOS binaries and linkers, and on the digital preservation in the age of disappearing code

  • 95