GitHub: excsn/ironwal
Crates.io: ironwal
For a while now, I've been diving deep into the foundations of durable storage. Every high-performance database or distributed system eventually needs a Write-Ahead Log (WAL): that critical, append-only record that ensures your data survives a crash. I wanted one that was deterministic, multi-stream, and above all, "battle-tested" against real-world failures. I mostly wanted one without write amplication, typically I would just build a logical WAL on RocksDB with FIFO compaction, but I believed a dedicated WAL for my use case made way more sense.
I call it IronWal, and I’ve just released it as an open-source crate with MPL 2.0 license. It was born out of a need for a foundational piece of Hi Stakes Market Game's ledger and state synchronization features and building it has been a masterclass in Rust systems programming and the brutal reality of filesystem I/O.
There just isn't a WAL crate I trust out there, so I built one that I trust and handles all of my use cases and many of yours too.
Durability without the Bottleneck
Many WAL implementations are simple: you append to a file and occasionally sync. But for the systems I’m building, I needed more. I needed a WAL that could handle thousands of independent streams (like individual user ledgers) without one stream's high-traffic blocking another.
I also wanted determinism. In IronWal, stream keys map explicitly to physical directories on disk. This makes auditing, manual replication, and backups straightforward because the physical layout matches your logical data model.
Fine-Grained Concurrency
One of the first challenges I tackled was the global lock bottleneck. In early iterations, writing to "Stream A" would block "Stream B" because they shared a global mutex. That doesn't scale.
I built the StreamStateMap to solve this. It’s a sharded locking container that uses an RwLock to manage the collection of streams and individual Mutexes for each stream’s metadata. The result? Throughput that scales nearly linearly with your thread count. In my benchmarks, moving from 1 to 16 threads yielded a nearly 80% increase in ops/sec, proving that the locking is exactly where it needs to be: out of the way.
Testing The Power Plug Simulation
You can't claim a WAL is durable just because the code looks correct. You have to try and break it. During development, I implemented a "Power Plug" simulation suite. I wrote tests that sabotage the WAL by:
Truncating payloads in the middle of a write.
Leaving partial 32-byte headers.
Flipping random bits in the middle of a file (Bit Rot).
Appending 1KB of zeros to simulate filesystem pre-allocation crashes.
This led to the creation of the repair() logic. Now, every time IronWal opens an active segment, it performs a structural scan. If it finds a partial frame or a corrupted tail caused by a sudden power loss, it detects the last valid boundary and truncates the garbage automatically. It’s a self-healing system that prioritizes data integrity over everything else.
Hybrid Compression and Mmap Performance
Storage isn't free, but neither is CPU time. IronWal uses a Hybrid Compression strategy. Using LZ4, it evaluates frames based on a min_compression_size. Small, single appends stay raw to keep latency low, while large batches are automatically compressed. This gives you the best of both worlds: snappy writes for small events and massive disk savings for bulk operations.
For reading, I leaned heavily into Mmap. While standard file I/O is provided for safety, the memory-mapped read strategy is a beast, achieving random read latencies of ~200µs in benchmarks. That’s a 75x speedup over standard syscall-based reads.
Lessons from the Fuzzer
The real "aha!" moment came when I integrated proptest for property-based fuzzing. I built a harness that generates random sequences of appends, batch writes, and sudden restarts, comparing the WAL's state against a simple in-memory model.
The fuzzer found edge cases I never would have scripted: subtle bugs in how segments.idx was initialized for empty streams and race conditions in the truncation logic. Fixing these took IronWal from "it works on my machine" to a library I would actually trust with a RELIABLE financial ledger.
What’s Next?
IronWal is now live on GitHub and Crates.io. It’s currently a synchronous library, which is perfect for embedded use cases or systems where you want total control over threading.
The next mountain to climb is a native async API and potentially exploring sharding strategies for even higher throughput. But for now, I’m happy with the result: a robust, deterministic, and battle-tested log that does exactly what it says on the tin.
If you’re building a system that needs to remember what happened, no matter what, give IronWal a look.
