Introspecting your machine with osquery
texas linux fest 2025
Slides at https://ian.im/txlf25
Ian Littman / @ian@phpc.social / @ian.im / @iansltx
osquery: query systems with SQL(ite)
installable via apt/yum/tarball
let's get the sha1 of all open files <= 100KiB
SELECT path, sha1 FROM hash WHERE path IN ( SELECT path FROM process_open_files JOIN file USING (path) WHERE file.size <= 1024 * 100 );
Why and when?
- Why? Facebook Sysadmins and security engineers had some problems
- Needed to collect telemetry from tons of sources
- Existing tools were particularly scarce on macOS and Linux
- Most existing tools hide internals
- Timeline
- 2014: osquery is built as a Facebook project
- 2016: Windows support is added (still serviceable but WIP)
- 2019: Joined The Linux Foundation
Key Tenets
- Write SQL, not procedural code
- osquery pulls and parses requisite data
- SQLite (embedded in osquery) queries data via virtual tables
- If you know how to write SQL, you can figure out how to use osquery
- Reliable and performant enough to run on corporate machines and in prod
- Worker process does the heavy lifting
- Watchdog process kills/respawns the worker if resource limits are exceeded
- Heavy queries are denylisted
- Some tables require WHERE clauses because e.g. hashing files is expensive
- Dual-licensed open-source GPLv2 + Apache 2.0
270+ tables built in
Hardware
-
acpi_tables
-
battery
-
block_devices
-
chassis_info
-
connected_displays
-
cpu_info
-
cpuid
-
device_firmware
-
fan_speed_sensors
-
hardware_events
-
ibridge_info
-
intel_me_info
-
memory_devices
-
nvram
-
pci_devices
-
power_sensors
-
temperature_sensors
-
usb_devices
-
video_info
-
drivers (Windows)
software*
-
apps (macOS)
-
apt_sources
-
chocolatey_packages
-
deb_packages
-
gatekeeper
-
gatekeeper_approved_apps
-
homebrew_packages
-
package_bom
-
package_install_history
-
package_receipts
-
python_packages
-
yum_sources
-
authenticode (Windows)
-
startup_items
-
launchd
-
chrome_extensions
-
firefox_addons
-
safari_extensions
-
patches (Windows)
-
npm_packages
-
programs (Windows)
-
rpm_packages
-
rpm_package_files
-
windows_update_history
Auth and certs
-
authorized_keys
-
certificates
-
groups
-
keychain_items
-
known_hosts
-
last (logins + logouts)
-
logged_in_users
-
logon_sessions (Windows)
-
password_policy (macOS)
-
security_profile_info (Windows)
-
selinux_*
-
shadow
-
ssh_configs
-
sudoers
-
suid_bin
-
user_events
-
user_groups
-
user_ssh_keys
-
users
Disk encryption
-
disk_encryption
-
bitlocker_info
Networking
-
connectivity
-
dns_cache (Windows)
-
dns_resolvers
-
etc_hosts
-
interface_addresses
-
interface_details
-
interface_ipv6
-
listening_ports
-
process_open_sockets
-
routes
-
wifi_*
-
curl
Firewalls
-
alf
-
alf_exceptions
-
alf_explicit_auths
-
windows_firewall_rules
-
windows_security_center
-
iptables
What's running?
-
crontab
-
scheduled_tasks (Windows)
-
startup_items
-
processes
-
process_*
-
running_apps
-
systemd_units
Logging
-
windows_event_logs
-
syslog_events
-
unified_log
-
asl
-
shell_history
Files (including file integrity monitoring)
-
carves
-
disk_*
-
file
-
file_events
-
hash
-
md_* (software RAID)
-
mdls / mdfind (Spotlight)
-
mounts
-
nfs_shares
-
ntfs_*
-
office_mru
-
process_file_events
-
signature
Containers
-
docker_*
-
lxd_*
sudo ln -s $HOME/.docker/run/docker.sock /var/run/docker.sock
Cloud
-
azure_instance_metadata
-
azure_instance_tags
-
ec2_instance_metadata
-
ec2_instance_tags
configuration
-
plist
-
registry
-
kernel_*
Time
-
time
-
uptime
-
load_average
But wait, with ATC, there's more!
if you need sqlite in your sqlite
But wait, with extensions, there's more!
for example, fleetd adds 30+ tables
Comprehensiveness over consistency.
you might want an abstraction layer on top.
...at least likely join keys tend to be named the same (pid, uid, path)
Nuance #1: Cross joins for users
- osquery queries as the current user
- osqueryd generally runs as root
- Retired: SELECT * FROM chrome_extensions
- Inspired: SELECT cx.* FROM users CROSS JOIN chrome_extensions cx USING (uid)
- Some tables work without the cross join, but give incomplete results
Bonus tip: Check out the SQLite docs on SELECT and the SQLite tutorial
Nuance #2: Computation effort
- ORDER/LIMIT tend not to reduce computation effort
- WHERE, depending on table, does
- Some tables (e.g. hash) require a WHERE condition
- For file paths, % == glob *; use %% for recursive subdirectory scans
- Sometimes you can get a hash more cheaply (e.g. signature table)
- Sometimes you need subqueries rather than JOINs
Displaying results
-
Provide query as an osqueryi argument to output and quit
-
At the CLI: --json, --csv
-
Inside the shell: .mode line, .mode column
osqueryd: JSON query log to disk
can also push externally over TLS with host identification
Can run scheduled over time with either the entire result or diffs
Threat Hunting && Indicators of Compromise
-
SELECT * FROM processes WHERE on_disk = 0
-
SELECT COUNT(*) > 0 FROM files WHERE path = "/my/malware"
-
SELECT * FROM process_open_sockets WHERE remote_address = ""
- YARA allows for more file contents searching
- Use diff logging to catch anomalies
Threat Hunting && Indicators of Compromise
Thanks!
Slides at https://ian.im/txlf25
Questions? Find me here / @ian@phpc.social / @ian.im / @iansltx
Introspecting your machine with osquery - Texas Linux Fest 2025
By Ian Littman
Introspecting your machine with osquery - Texas Linux Fest 2025
Somewhere in your organization, maybe on your desk, maybe in your data center, is infrastructure that isn't immutable, and it's helpful to know what's going on with that infrastructure. osquery lets you do just that with SQL, joins and all. We'll take a look at the sorts of data you can pull with osquery, and the use cases that data fits into, ranging from "is this machine set up correctly?" to "is this machine infected?"
- 60