[Video Podcast] Unlocking Valkey's Potential with Madelyn Olson
Transcript
Thomas Betts: Welcome to the InfoQ Podcast. I'm Thomas Betts, and today I'm thrilled to be joined by Madelyn Olson, the driving force behind the Valkey project and a principal software development engineer at Amazon ElastiCache and Amazon Memory DB. Madelyn's expertise lies in crafting secure and highly reliable features for the Valkey engine, and she recently shared her insights at QCon San Francisco.
Madelyn Olson: Thank you for having me, Thomas. The QCon conference was exceptional, with an engaged audience asking insightful questions. Valkey's journey began in 2020 when I became a maintainer of the open-source Redis project. Along with other major contributors, we cultivated a vibrant development community. However, in March 2024, Redis shifted from an open-source BSD license to a commercial SSPL license, prompting us to take action.
We, along with another Redis maintainer from Alibaba, gathered engineers from Ericsson, Tencent, Huawei, and Google, and with the Linux Foundation's support, we swiftly created Valkey in just eight days. Since its inception 18 months ago, Valkey has been making waves. We've had several releases, including Valkey 7.2, a fork of Redis, and Valkey 8.0, our first significant release, showcasing our capabilities. We've since released Valkey 8.1 and 9.0, with the latter in November.
Valkey's Growing Ecosystem
Thomas Betts: It's impressive to see the momentum Valkey has gained. How can developers get started with Valkey? Is it offered as a service? Can they easily switch from Redis?
Madelyn Olson: Valkey is designed as a drop-in replacement for Redis open-source 7.2, the last BSD version. Upgrading to any Valkey version is seamless. However, newer Redis versions, now under the AGPL license, may present compatibility issues. Most users can migrate smoothly, and the process is straightforward. Valkey and Redis are often used as caches, and migrating typically involves deleting the cache and moving it to Valkey.
Valkey's High Availability
Thomas Betts: One of Valkey's standout features is its high availability. How does this work?
Madelyn Olson: Valkey's high availability is achieved by attaching replicas to your existing cluster, synchronizing data, and enabling failovers. Managed services like ElastiCache, Memorystore, and Aiven streamline this process, making it as simple as clicking a button. For instance, in ElastiCache, you can initiate the migration with a single click, ensuring a seamless and online upgrade.
Valkey's Compatibility and Tooling
Thomas Betts: What about compatibility and tooling? How does Valkey ensure a smooth transition for developers?
Madelyn Olson: Most clients that work with Redis will also work seamlessly with Valkey. Popular libraries like redis-py and Spring Data Redis provider function equally well with both. We're encouraging users to share their migration experiences, but the process is so effortless that we often hear, "We migrated and learned nothing. We just clicked a button." It's a testament to Valkey's ease of use.
Valkey's Core: A Hash Map Over TCP
Thomas Betts: Let's dive into the heart of Valkey. You mentioned it's more than just a cache or key-value store. Can you elaborate on its core functionality?
Madelyn Olson: Valkey is indeed a hash map, but its true power lies in its ability to handle complex data types or values. For instance, you can store sets to track user login status and display targeted advertisements. This sets Valkey apart from simple hash maps. Building the hash map is straightforward, but the real complexity lies in surrounding features like horizontal clustering, replication, durability, observability, and statistics.
Performance Enhancements Without Compromise
Thomas Betts: These features transform Valkey into a robust product. Your QCon presentation focused on performance improvements. How did you achieve this without disrupting existing functionality?
Madelyn Olson: Our goal was to enhance performance while maintaining compatibility. We rebuilt the hash table, ensuring no performance regressions. In 2022, we realized that our original hash table, designed in 2009, could be significantly improved. We were making excessive memory allocations and using older techniques, underutilizing modern hardware capabilities. We set out to address these issues, and it took us until the end of 2023 to complete the new hash table.
Challenges in Clustering Mode
Thomas Betts: What challenges did you encounter, especially in clustering mode?
Madelyn Olson: In clustering mode, Valkey distributes keys across multiple servers. Each key is hashed to a specific slot, and these slots are dynamically distributed across nodes. To migrate keys, we needed an efficient way to iterate over the keys in a given slot. Initially, we used a linked list data structure, which was costly in terms of memory overhead. We replaced this with a binary index tree, allowing us to sample data across dictionaries proportionally to their size.
Memory Optimization and Performance
Thomas Betts: How did these changes impact memory usage and performance?
Madelyn Olson: We optimized memory usage by consolidating memory allocations and adopting a more modern approach. This resulted in a significant memory reduction for customers with small keys and values. While most real-world examples won't see such dramatic savings, even an 8% reduction can significantly delay the need for scaling. Additionally, we focused on throughput, ensuring that we didn't hit performance bottlenecks.
Performance Measurement and Optimization
Thomas Betts: How do you measure and optimize performance?
Madelyn Olson: Performance is often associated with latency, but Valkey's speed is so high that network latency dominates. Most commands take around one microsecond, with network hops adding hundreds of microseconds to milliseconds. We focus on throughput, ensuring we don't hit engine contention-related latency spikes. We measure throughput by sending traffic to the engine and analyzing its processing capacity. We also use micro-benchmarking, running C code multiple times to gauge performance.
Performance Monitoring and Prefetching
Thomas Betts: How do you monitor performance and ensure optimal memory usage?
Madelyn Olson: We closely monitor CPU counters to understand memory access times and prefetching efficiency. We aim to prefetch memory before executing commands, ensuring data is readily available in CPU caches for quick execution. We also analyze performance using tools like perf and flame graphs, which provide insights into the program's behavior.
Designing for Diverse Use Cases
Thomas Betts: How do you accommodate various key and value sizes in your design?
Madelyn Olson: We analyze key and value sizes to ensure optimal performance. While most keys are small (16-32 bytes), values can be larger (70-100 bytes). Valkey's flexibility allows users to store any key size and value up to 512 MB. We test a range of sizes to ensure performance, from 50-60 bytes to tens or hundreds of kilobytes. We also consider performance regressions in unusual cases, as demonstrated in my QCon talk.
Valkey 8 and 9: Under the Hood
Thomas Betts: Can you explain the changes you made in Valkey 8 and 9, especially regarding the hash table and memory usage?
Madelyn Olson: In Valkey 8, we embedded the key within the structure, reducing memory overhead. In the previous design, we had a separate pointer for the key, but now we store it directly in the structure. This required custom code to locate the key within the memory block, but modern hardware excels at this, minimizing performance degradation. We also embedded the container, saving additional pointers.
In Valkey 9, we improved collision handling. Instead of linked lists, we adopted a modern approach called probing, specifically linear probing. If a key should be in bucket 10, we place it in bucket 11. If bucket 10 is empty, we check bucket 11. This approach ensures efficient key retrieval. We also implemented SwissTable, which leverages CPU cache lines to store multiple pointers in a 64-byte block, reducing memory overhead.
Performance and Memory Savings
Thomas Betts: How did these changes impact performance and memory savings?
Madelyn Olson: Our primary goal was to save memory without degrading performance. For the main key-value workload, performance remained flat due to aggressive memory prefetching. However, other workloads saw significant improvements, with up to 30% higher throughput. These savings can delay scaling and expiration, providing users with more flexibility.
Performance Benchmarks and Scalability
Thomas Betts: What are your performance benchmarks, and how does Valkey scale?
Madelyn Olson: We measure performance in two ways: throughput per core and vertical scalability per key. We can handle approximately 250,000 requests per second per core, and for a specific key, we can serve 1.2 million requests per second, with improvements in the pipeline to reach 1.4 million. These benchmarks ensure Valkey's efficiency and scalability, catering to various use cases.
Valkey's Open Source Governance
Thomas Betts: What's the current state of Valkey's open-source governance?
Madelyn Olson: Valkey's governance is led by the original six creators, who form the Technical Steering Committee (TSC). While there are plans to expand the TSC, it remains vendor-neutral. We encourage new engineers to get involved and contribute to the project.
Valkey's Unique Use Cases
Thomas Betts: Where have you seen Valkey used in unexpected ways?
Madelyn Olson: Ericsson's use case in telecommunication equipment is fascinating. I've also run Valkey on my Steam Deck for conference demos. Valkey's versatility extends beyond cloud and business applications, powering home automation systems and more.
Valkey's Language Choice: C vs. Rust
Thomas Betts: Valkey is written in C. Are there plans to rewrite it in Rust?
Madelyn Olson: No, we have no plans to rewrite Valkey in Rust. While I'm a Rust enthusiast, porting C code to Rust without significant structural changes doesn't fully leverage Rust's benefits. We might risk performance and memory efficiency, and the gains in dependency management and testing don't outweigh these potential drawbacks.
Valkey's Future: Rust for New Features
Thomas Betts: So, you're advocating for writing new code in Rust but not porting existing C code?
Madelyn Olson: Exactly. We have a Rust model, and some Valkey extensions are written in Rust. For instance, LDAP authentication is elegantly implemented in 300 lines of Rust code. While Rust offers advantages, it's essential to consider the trade-offs. For deep core infrastructure like Valkey, sticking with C is a prudent choice, but new features should be explored in Rust.
Conclusion and Resources
Thomas Betts: Madelyn, thank you for sharing your insights on Valkey's journey and performance enhancements. It's been fascinating to learn about the technical intricacies and the project's future direction.
Madelyn Olson: Thank you, Thomas. I encourage listeners to explore Valkey's blog (http://valkey.io/blog) for more information, especially the new hash table details. Join our Slack community (http://valkey.io/slack) to connect with the team and stay updated on Valkey's development.