Alan King
Software Developer, iRODS Consortium
June 8-10, 2021
iRODS User Group Meeting 2021
Virtual
iRODS Logical Locking
iRODS Logical Locking
...but first, a story
Why did it take so long?
- 2019: The Missing Link
- https://slides.com/irods/ugm2019-technology-update#/9
- 2020: Logical Locking
- https://slides.com/irods/ugm2020-technology-update#/12
- 2021: Logical locking is real, but we had to change everything inside
What is a Data Object? What is a Replica?
Data Object: a logical representation of data that maps to one or more physical instances (Replicas) of the data at rest in Storage Resources
Replica: an identical, physical copy of a Data Object
iRODS supports a POSIX-like interface for opening, writing, and closing. Every data movement operation in iRODS boils down to:
Open replica, move data to replica, close replica
Most users deal with high-level APIs (put, cp, repl, etc.) which are built using these lower-level APIs.
Policy can be invoked as a result of an operation which can and often is itself a data-moving operation.
How do we create and modify data in iRODS?
Truth: The latest data known to be "correct"; or, how the data "should" be
Replica status: The state of the data as it relates to the physical storage, the catalog, and the Truth
Good: Data is at rest, matches the catalog, and reflects the Truth
Stale: Data is at rest, but does not meet all criteria for being Good
- It may not match what is in the catalog: data transfer errors, mismatched checksum, corruption, etc.
- It may not reflect the Truth (anymore): more-recently-written data understood as being correct exist (may or may not differ!)
- Note: stale does not necessarily mean the data are incorrect, it is just at least not known to reflect the Truth
How do we define the state of data? What is Truth?
Why Locking? Concurrency in a Distributed System
Uncoordinated, concurrent writing to a single replica can lead to data corruption.
Uncoordinated, concurrent writing to multiple replicas of the same data object causes truth corruption.
Uncoordinated, concurrent operation execution can lead to policy violations.
All of these things endanger our understanding of the state of the data, which is how we know that our data is stored and cataloged safely.
Example: iput
Data Corruption: Intermediate Replicas
Uncoordinated, concurrent writing to a single replica can lead to data corruption.
Problem: In-flight replicas can be opened and modified concurrently by multiple agents in an uncoordinated fashion, and the catalog does not reflect the current, true state of the data.
Solution: Mark in-flight replicas as intermediate at open time and update the status at close to reflect the state of the replica
- Status of the replica is accurately represented in the catalog
- The system and users can take appropriate action based on whether or not the replica is at rest
Truth Corruption: Logical Locking
Uncoordinated, concurrent writing to multiple replicas of the same data object causes truth corruption.
Problem: It is unclear which replica for a given data object represents the Truth when multiple replicas are in flight at the same time.
Solution: Prevent opening any replica for a given data object when any one of the replicas opened for write.
- The opened replica is marked intermediate, as shown previously
- The other replicas are write locked which prevents any additional opens for read or write; it is clear which replica represents the Truth
Known Limitations/Trade-offs
Maintaining replica information implies catalog round-trips (time):
- Create: up to 3 (lock object, register replica, close/finalize)
- Open for write: 2 (lock object, close/finalize)
Additional system metadata in the catalog and in memory for in-flight data (space)
What write locks do NOT solve:
- Database race conditions
- Protection against rogue administrators
Locking is currently scoped by open/close, not by operation
Future Work
Example: iput to a replication resource hierarchy
Policy Violation: Operation Locking (name pending)
Uncoordinated, concurrent operation execution can lead to policy violations.
Problem: If a data-modifying operation is impacted by policy execution which leads to other data-modifying operations, other concurrent, uncoordinated data-modifying operations can lead to violations in said policy.
Solution: Keep data object locked over the lifetime of any given data-modifying operation.
Problem: Modifying or unlinking objects which are being read.
Solution: Extend ILL to allow multi-reader, single-writer access
Implementation details:
- Disallows open for write, unlink, or rename while locked
- Maintain list of agent PIDs/hostnames holding locks in the catalog
- Last agent to release lock will be responsible for unlocking the data object by restoring the replica states
- Agents need to be more self-aware with respect to locking to avoid deadlocks (also useful for operation locking); possibly use irods::replica_access_table
Future Work: Read Locks
Problem: Agent holding open descriptors is responsible for locking/unlocking. Killed agents can leave objects stuck in locked or intermediate state. Leaves admins to identify and modify replica states so they can be healed.
Solution: Asynchronous server task which checks for locked data objects and checks to see if the listed agent(s) are still running.
Schedules asynchronous unlocking of data objects which are no longer owned by a living agent.
Future Work: Lock Checker
Thanks for listening
iRODS UGM 2021 - iRODS Logical Locking
By Alan King
iRODS UGM 2021 - iRODS Logical Locking
- 733