<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Database on Yarang's Tech Lair</title><link>https://blog.fcoinfup.com/tags/database/</link><description>Recent content in Database on Yarang's Tech Lair</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Tue, 05 May 2026 09:00:52 +0900</lastBuildDate><atom:link href="https://blog.fcoinfup.com/tags/database/index.xml" rel="self" type="application/rss+xml"/><item><title>The Evolution of Redis Arrays: An Architectural Analysis for Large-Scale Data Processing</title><link>https://blog.fcoinfup.com/post/the-evolution-of-redis-arrays-an-architectural-analysis-for-large-scale-data-processing/</link><pubDate>Tue, 05 May 2026 09:00:52 +0900</pubDate><guid>https://blog.fcoinfup.com/post/the-evolution-of-redis-arrays-an-architectural-analysis-for-large-scale-data-processing/</guid><description>&lt;h1 id="the-evolution-of-redis-arrays-an-architectural-analysis-for-large-scale-data-processing"&gt;The Evolution of Redis Arrays: An Architectural Analysis for Large-Scale Data Processing
&lt;/h1&gt;&lt;p&gt;Hello everyone! I recently came across an interesting article on Hacker News, written by Oran Agra, one of Redis&amp;rsquo;s core developers, titled &lt;strong&gt;&amp;ldquo;Redis array: short story of a long development process.&amp;rdquo;&lt;/strong&gt; This wasn&amp;rsquo;t just a story about adding a new feature; it was a testament to the dedication of developers who tackled 25-year-old legacy code, ensuring performance, maintaining stability, and formatting a massive codebase overnight.&lt;/p&gt;
&lt;p&gt;Today, based on this article, we&amp;rsquo;ll dive deep into how the Array data structure has evolved within Redis and what lessons we can learn for designing large-scale systems.&lt;/p&gt;
&lt;h2 id="1-the-problem-the-shackle-of-25-year-old-legacy-code"&gt;1. The Problem: The Shackle of 25-Year-Old Legacy Code
&lt;/h2&gt;&lt;p&gt;Redis&amp;rsquo;s &lt;code&gt;LIST&lt;/code&gt; data structure internally uses &lt;code&gt;QuickList&lt;/code&gt;. &lt;code&gt;QuickList&lt;/code&gt; combines the advantages of &lt;code&gt;ziplist&lt;/code&gt; and &lt;code&gt;linkedlist&lt;/code&gt;, which are doubly linked lists. However, when dealing with massive lists containing tens of millions of elements, memory fragmentation and cache misses were causing significant performance degradation.&lt;/p&gt;
&lt;p&gt;Specifically, when processing array-type data, the existing structure had the following bottlenecks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Memory Overhead:&lt;/strong&gt; Additional memory usage due to pointer connections.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sequential Access Cost:&lt;/strong&gt; Latency caused by inefficient use of cache lines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To address this, the development team decided to overhaul the internal structure at the C language level. The biggest challenge here was the &lt;strong&gt;&amp;ldquo;legacy code that had to be changed.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="2-the-solution-formatting-a-25m-line-codebase"&gt;2. The Solution: Formatting a 25M-line Codebase
&lt;/h2&gt;&lt;p&gt;The most impressive part of the article was &lt;strong&gt;&amp;ldquo;Formatting a 25M-line codebase overnight.&amp;rdquo;&lt;/strong&gt; The process of formatting and refactoring 25 million lines of code required more than just technical challenges; it demanded strategy akin to chess.&lt;/p&gt;
&lt;h3 id="21-preparations-for-refactoring"&gt;2.1. Preparations for Refactoring
&lt;/h3&gt;&lt;p&gt;The biggest fear in large-scale refactoring is &lt;strong&gt;&amp;ldquo;regression.&amp;rdquo;&lt;/strong&gt; Modifying the array structure could affect hundreds of Redis commands (like &lt;code&gt;LPUSH&lt;/code&gt;, &lt;code&gt;RPUSH&lt;/code&gt;, &lt;code&gt;LINDEX&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;p&gt;To mitigate this, the team adopted the following approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Expand Test Coverage:&lt;/strong&gt; Ensure existing commands pass unit tests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strengthen CI/CD Pipeline:&lt;/strong&gt; Implement benchmarking scripts to immediately detect performance degradation upon code changes.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="22-the-new-structure-of-redis-arrays"&gt;2.2. The New Structure of Redis Arrays
&lt;/h3&gt;&lt;p&gt;The improved Array structure moved beyond simply allocating memory and was modified to maximize data locality. The core principle was &lt;strong&gt;&amp;ldquo;maximizing the use of contiguous memory blocks while allowing for segmentation and management when necessary.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This yielded the following benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Improved CPU Cache Hit Rate:&lt;/strong&gt; Significantly increased L1/L2 cache hit rates due to contiguous memory access.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Savings:&lt;/strong&gt; Reduced actual data storage space by minimizing unnecessary pointer connections.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="3-practical-guide-efficient-array-usage-in-redis"&gt;3. Practical Guide: Efficient Array Usage in Redis
&lt;/h2&gt;&lt;p&gt;Now that we&amp;rsquo;ve covered the theoretical background, let&amp;rsquo;s look at how to apply it in practice with code.&lt;/p&gt;
&lt;h3 id="31-problems-with-existing-list-usage"&gt;3.1. Problems with Existing List Usage
&lt;/h3&gt;&lt;p&gt;First, let&amp;rsquo;s consider the traditional way of adding tens of millions of items to a list. This operates based on &lt;code&gt;QuickList&lt;/code&gt;, and as the number of items increases, the number of jumps also increases.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Traditional Method (QuickList based)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Add 10,000,000 items (potential for memory and speed degradation)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i in &lt;span style="color:#f92672"&gt;{&lt;/span&gt;1..10000000&lt;span style="color:#f92672"&gt;}&lt;/span&gt;; &lt;span style="color:#66d9ef"&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; redis-cli LPUSH my_huge_list &lt;span style="color:#e6db74"&gt;&amp;#34;item:&lt;/span&gt;$i&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="32-optimization-using-streams-and-hashes"&gt;3.2. Optimization using Streams and Hashes
&lt;/h3&gt;&lt;p&gt;While the internal improvements to Redis Arrays are transparent to users, when designing, we need to consider &lt;strong&gt;&amp;ldquo;data size&amp;rdquo;&lt;/strong&gt; and &lt;strong&gt;&amp;ldquo;access patterns.&amp;rdquo;&lt;/strong&gt; If simple sequential storage is all that&amp;rsquo;s needed, using the latest version of Redis alone will provide benefits.&lt;/p&gt;
&lt;p&gt;However, if you need to search or modify data within the array, it&amp;rsquo;s advisable to use &lt;code&gt;HASH&lt;/code&gt; instead of &lt;code&gt;LIST&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; redis
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; time
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;r &lt;span style="color:#f92672"&gt;=&lt;/span&gt; redis&lt;span style="color:#f92672"&gt;.&lt;/span&gt;Redis(host&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;localhost&amp;#39;&lt;/span&gt;, port&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6379&lt;/span&gt;, db&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Scenario: Storing Log Data (Large Scale)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# 1. Using List (for sequential storage)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;push_to_list&lt;/span&gt;(count):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; start &lt;span style="color:#f92672"&gt;=&lt;/span&gt; time&lt;span style="color:#f92672"&gt;.&lt;/span&gt;time()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i &lt;span style="color:#f92672"&gt;in&lt;/span&gt; range(count):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r&lt;span style="color:#f92672"&gt;.&lt;/span&gt;lpush(&lt;span style="color:#e6db74"&gt;&amp;#34;logs:timeline&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;log_entry_&lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;i&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; print(&lt;span style="color:#e6db74"&gt;f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;List pushed &lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;count&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt; items in &lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;time&lt;span style="color:#f92672"&gt;.&lt;/span&gt;time() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; start&lt;span style="color:#e6db74"&gt;:&lt;/span&gt;&lt;span style="color:#e6db74"&gt;.4f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt;s&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# 2. Using Hash (for search and modification)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;push_to_hash&lt;/span&gt;(count):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; start &lt;span style="color:#f92672"&gt;=&lt;/span&gt; time&lt;span style="color:#f92672"&gt;.&lt;/span&gt;time()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pipe &lt;span style="color:#f92672"&gt;=&lt;/span&gt; r&lt;span style="color:#f92672"&gt;.&lt;/span&gt;pipeline()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i &lt;span style="color:#f92672"&gt;in&lt;/span&gt; range(count):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pipe&lt;span style="color:#f92672"&gt;.&lt;/span&gt;hset(&lt;span style="color:#e6db74"&gt;&amp;#34;logs:details&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;entry_&lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;i&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;log_content_&lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;i&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pipe&lt;span style="color:#f92672"&gt;.&lt;/span&gt;execute()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; print(&lt;span style="color:#e6db74"&gt;f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;Hash pushed &lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;count&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt; items in &lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;time&lt;span style="color:#f92672"&gt;.&lt;/span&gt;time() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; start&lt;span style="color:#e6db74"&gt;:&lt;/span&gt;&lt;span style="color:#e6db74"&gt;.4f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt;s&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; __name__ &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Test inserting 100,000 data points&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; push_to_list(&lt;span style="color:#ae81ff"&gt;100000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; push_to_hash(&lt;span style="color:#ae81ff"&gt;100000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Execution Result Analysis:&lt;/strong&gt;
In recent Redis versions (7.x and above), the internal Array structure is optimized, making &lt;code&gt;LPUSH&lt;/code&gt; very fast. However, if you frequently need to retrieve data at a specific index, &lt;code&gt;LINDEX&lt;/code&gt; has a complexity of O(N), making the O(1) approach using &lt;code&gt;HGET&lt;/code&gt; much more advantageous.&lt;/p&gt;
&lt;h2 id="4-conclusion-the-harmony-of-development-culture-and-technology"&gt;4. Conclusion: The Harmony of Development Culture and Technology
&lt;/h2&gt;&lt;p&gt;The development process of Redis Arrays offers us important lessons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Performance Isn&amp;rsquo;t Free:&lt;/strong&gt; Improving 25-year-old code requires commensurate refactoring and testing costs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Investment in Tools:&lt;/strong&gt; This work was possible due to automated tools and a CI/CD environment capable of formatting 25 million lines of code.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When we design systems, we need to go beyond simply asking &amp;ldquo;Is it fast?&amp;rdquo; and consider &amp;ldquo;How can we achieve maintainable performance?&amp;rdquo; As the Redis team demonstrated, sometimes we must not shy away from large-scale improvements that shake the foundations of the architecture.&lt;/p&gt;
&lt;h2 id="5-references"&gt;5. References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://news.ycombinator.com/item?id=41284521" target="_blank" rel="noopener"
 &gt;Formatting a 25M-line codebase overnight (Hacker News)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://redis.io/docs/data-types/lists/" target="_blank" rel="noopener"
 &gt;Redis Internals: QuickList&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thank you!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description></item></channel></rss>