An In-Memory Database (IMDB), also called Main Memory Database (MMDB), is a database whose primary data store is the RAM. This is in contrast with traditional databases which keep their data in disk storage and use RAM as a buffer cache for frequently/recently used data.
IMDB has gained traction in recent times because of increasing speeds, reducing costs and growing size of RAM devices, combined with powerful multi-core processors. Since only RAM is accessed and there is no disk operation for an IMDB query, speeds are extremely high.
However, RAM storage is volatile. So there will be data loss when the device loses power. To overcome this, IMDB employ several techniques such as checkpoints, transaction logging, and non-volatile RAM.
Among commercially available IMDB is SAP HANA. Open source options are Redis, VoltDB, Memcached, and an extension of SQLite.
What's the context for the growing interest in IMDB?
From 1995 to 2015, RAM got 6000 times cheaper. Prior to that, disk drives were the main option to store databases. Disks were great for sequential access but poor for random access since lot of time was spent in rotating the disk and seeking the exact location.
Meanwhile, computing power has increased via faster clocks and multi-core processors. Networking speeds have also gone up. However, disk access speeds have not gone up fast enough. In fact, they've become the bottleneck in computing systems. Worse still, the amount of data being generated has grown exponentially. There's also a need to analyse all this data (via Machine Learning) in almost real time.
This is where in-memory databases become suitable. They're now affordable enough to store large amounts of data, particularly compressed columnar data. They're fast enough for real-time analytics. We can continue to use disks where sequential access is desired, such as for logging. The notable database scientist Jim Gray once supposedly said,
Memory is the new disk
What are the key features of IMDB? How do they vary from disk-based RDBMS?
Disk access is sequential. In disk-based DBs, the seek time to locate records on the physical disk is the biggest contributor to query time. In an IMDB, since entire data is in memory, this burden is entirely eliminated.
While RAM is volatile, data persistence is achieved through multiple methods and it’s done very efficiently. These safeguard from data loss due to power failure. Since only a minority of DB operations are data change operations (about 10-15%), disk operations are minimal.
Most IMDB support columnar data storage, where table records are stored as a sequence of columns, in contiguous memory locations. This speeds up analytical querying greatly and minimizes CPU cycles. When data is stored in columnar form, column-wise compression techniques (such as sparse columns) are used to minimise memory footprint. Moreover, there's less dependence on indexing. IMDB delivers performance similar to having an index on every column, but with much less transactional overhead.
What are the popular applications of IMDB?
IMDB works well for applications that require very fast data access, storage and manipulation. Real-time embedded systems, music and call databases in mobile phones, telecommunication access networks, programming data in set-top boxes, e-commerce applications, social media sites, equity financial services are the most prevalent applications of in-memory databases. IMDBs have also gained a lot of traction in the data analytics space.
How do the performance characteristics compare between IMDB and traditional DB?
- CPU and Memory - Accessing data in memory is much faster than writing to/reading from file systems. IMDB design is simpler than on-disk databases, so they have significantly lower memory/CPU requirements. Even if a machine hosting an RDBMS has enough memory on board to fit the entire data, IMDB would be faster. That's because it performs fewer copy operations and has more advanced in-memory data structures, optimized for working with memory. However, IMDB scales poorly to multiple CPU cores.
- Data Query and Update functions - Applications requiring random data access under 1ms (such as online/real-time applications) benefit from IMDB. If access time over 100ms is acceptable, traditional RDBMS works fine. For sequential access, the difference is even more pronounced. Persistence operations don’t affect data update times in IMDB since they happen offline.
- Size constraints - Generally, not more than 200-300GB RAM is installed on a machine in order to keep machine start-up time acceptable. So when DB size is in TB range supporting millions of transactions per second, memory sharding is done to partition one logical DB into multiple physical DBs.
What are the ways in which persistence is supported in IMDB?
There are many methods by which IMDB might persist the data state on disk. End goal is to ensure complete recovery of data but without compromising on query speeds.
- Transaction Logs - Each data update is applied to the IMDB and also on a transaction log on disk. Change entries done at the end of the append-only log file. When the file size rolls over, its contents are archived.
- Checkpoint Images - Checkpoint files contain an image of the database on disk. Some IMDB use dual checkpoint files for additional safety, in case the system fails while a checkpoint operation is in progress. For recovery, the database checkpoint on disk is merged with the latest transaction log entries.
- High Availability - To protect against memory outages in data centers, the data cluster is replicated asynchronously into a second read-only cluster. If outage occurs, hot swap gets triggered to configure the secondary as primary.
- Non-volatile RAM - Using battery powered RAM devices or supercapacitors, all write operations can be persistent even after power loss. These are slightly slower than DRAM, but much faster than RDBMS disk operations.
What are some commercial IMDB products?
Without being exhaustive, we describe three commercial IMDB products:
- SAP HANA - In-memory, column-based data store from SAP. Available as local appliance or cloud service. The in-memory data is CPU-aligned, no virtual expensive calculation of LRU, logical block addresses, just direct (pointer) addressing of data. Supports server scripts in SQLScript, JSON and R formats. Good support for predictive analytics, spatial data processing, text analytics, text search, streaming analytics, graph data processing, and ETL operations. Guarantees microsecond response and extremely high throughput performance.
- Oracle TimesTen - In-memory OLTP RDBMS acquired by Oracle. Guarantees microsecond response and extremely high throughput performance. Provides application level data cache for improved response time. High availability through replication.
- eXtremeDB - Combines on-disk and in-memory data storage in a single embedded database system. It can be deployed as an embedded database system or elastically scalable client/server distributed database.
What are some open source IMDB platforms?
Here are some open source IMDB platforms to consider:
- Apache Ignite – Java-based middleware that forms an in-memory layer over any existing DB. Can work in a single or distributed environment. Seamless integration with MapReduce and Hadoop systems.
- Altibase - Hybrid database that combines an in-memory database and an on-disk database into a single product to achieve the speed of memory and the storage capacity of disk.
- SQLite - Instruct an SQLite database to exist purely in memory using the special filename
:memory:instead of the real disk filename.
- Redis - Key-value store based system with support for key data structures. Schema free. Data durability feature is optional. Good programming language support.
- VoltDB - Traditional RDBMS with schema support. Works using Java Stored Procedures that applications can invoke through JDBC. The company is collaborating with Samsung for Scaling In-Memory Data Processing with Samsung DRAM/SSD devices.
What are the use cases where an IMDB is not suitable?
Volatile memory in affordable and servers with support for 24TB of RAM are now available. But IMDB cannot replace the traditional RDBMS in all scenarios. Following are some of the use cases where IMDB is not suitable when:
- Persistence is critical - Applications with confidential/critical data undergoing frequent updates might be at risk in IMDB. Unless persistence features are in the IMDB, there's risk of data loss during power failure.
- Very small scale data - Small and medium enterprises can simply run on low-cost server with acceptable performance.
- Memory-intensive applications - When IMDB is used, the bulk of RAM is going be occupied by the DB itself. So if the application itself requires high memory (such as 3D games, live streaming), then memory costs will rise significantly.
- Very large scale data applications - Memory requirements would be prohibitively expensive, hence not recommended.
- Non-mission-critical operations - Backend operations or applications where data can be batch processed offline don't require millisecond query response times of IMDB.
- Altibase. 2019. "How does Altibase compare to Oracle, SAP Hana and IBM DB2?" Altibase Corp. Accessed 2019-04-09.
- Anikin, Denis. 2016. "What an in-memory database is and how it persists data efficiently" Medium Corporation. Accessed 2019-04-03.
- Anikin, Dennis. 2016a. "When and why I use an in-memory database or a traditional database management system" Medium Corporation. Accessed 2019-04-09.
- Anikin, Dennis. 2017. "Choosing Between an In-Memory and a Traditional DBMS" Database Zone. Accessed 2019-04-09.
- CMU. 2019. "Non-volatile Memory Databases" Carnegie Mellon Database Research Group. Accessed 2019-04-09.
- Garg, Nikhil. 2017. "RAM Is the New Disk." Thought Frameworks, on Medium, July 30. Accessed 2019-04-13.
- Grassl, Thomas. 2017. "Why NOT to Use an In-Memory Database" SAP Blogs. Accessed 2019-04-09.
- GridGain. 2019. "Introducing Apache Ignite White Paper" GridGain Systems Inc. Accessed 2019-04-09.
- HANA. 2019. "In-Memory Data Platform" SAP HANA. Accessed 2019-04-09.
- HANA. 2019a. "SAP HANA" SAP. Accessed 2019-04-09.
- IBM. 2019. "Relational Database" IBM 100. Accessed 2019-04-09.
- Ignite. 2019. "Write-Ahead Log" Apache Ingite. Accessed 2019-04-09.
- Kodamasimham, Pridhvi. 2013. "In-Memory DataBase" LinkedIn SlideShare. Accessed 2019-04-03.
- Lemire, Daniel. 2010. "For your in-memory databases, do you really need an index?" Daniel Lemire's blog. Accessed 2019-04-09.
- McObject. 2019. "In-Memory Database Questions & Answers" McObject LLC. Accessed 2019-04-09.
- MemSQL. 2019. "High Availability for In-Memory Databases" MemSQL Inc. Accessed 2019-04-09.
- Metz, Cade. 2011. "Say Hello to Memory. It's the New Hard Disk." Wired, December 28. Accessed 2019-04-13.
- Oracle. 2010b. "Real-Time Applications:Oracle In-Memory Database Cache and Oracle Exadata" Oracle Corporation. Accessed 2019-04-09.
- Oracle. 2019. "Oracle Database In-Memory" Oracle Integrated Cloud and Platform Services. Accessed 2019-04-09.
- Oracle, Help Center. 2019a. "In-Memory Column Store Architecture" Oracle and/or its affiliates. Accessed 2019-04-09.
- Oracle. 2019b. "Oracle In-Memory Database Cache Architecture and Components" Oracle Corporation. Accessed 2019-04-09.
- OracleFAQ. 2016. "Oracle 7" Oracle Wiki. Accessed 2019-04-09.
- Redis. 2019. "Introduction to Redis" RedisLabs. Accessed 2019-04-09.
- SAP, Community WIKI. 2016. "What Is The Difference Between SAP HANA And A Traditional RDBMS" SAP Help Portal. Accessed 2019-04-03.
- SQLite. 2019. "In-Memory Databases" SQLite.org. Accessed 2019-04-09.
- TimesTen. 2019. "TimesTen: Fastest OLTP Database, Ultra High Availability, Elastic Scalability" Oracle Integrated Cloud and Platform Services. Accessed 2019-04-09.
- Venkatesh, Prasanna, and BVP Vamsi. 2012. "The Importance of In-memory Databases." OpenSourceForU, January 30. Accessed 2019-04-13.
- VoltDB. 2019. "Scaling In-Memory Data Processing with Samsung Advanced DRAM and NAND/SSD Solutions" Samsung Electronics, Co. Ltd. and VoltDB Inc. Accessed 2019-04-09.
- eXtremeDB. 2019. "eXtremeDB" McObject LLC. Accessed 2019-04-09.
- shyamuthaman. 2016. "Row Vs Column store – What’s all the fuss about?" TEACHMEHANA. Accessed 2019-04-09.
- McObject. 2019. "In-Memory Database Questions & Answers." McObject LLC. Accessed 2019-04-09.
- VoltDB. 2019. "Scaling In-Memory Data Processing with Samsung Advanced DRAM and NAND/SSD Solutions." Samsung Electronics, Co. Ltd. and VoltDB Inc. Accessed 2019-04-09.
- 1&1 IONOS. 2019. "In-memory databases: the storage of big data." Digital Guide, 1&1 IONOS, March 07. Accessed 2019-04-09.
- Database Sharding
- Database Compression
- Big Data
- Database Partitioning
- Types of Databases