• In-memory Database Architecture. Source: Kodamasimham 2013.
• Column Store Indexing and Compression. Source: shyamuthaman 2016.
• Performance comparison IMDB Vs RDBMS. Source: Altibase 2019.
• SAP HANA Business Data Platform. Source: HANA 2019.

# In-Memory Database

1111 DevCoins

arvindpdmn
320 DevCoins
2 authors have contributed to this article
Last updated by arvindpdmn
on 2019-09-23 13:31:36
on 2019-04-03 16:46:41

## Summary

An In-Memory Database (IMDB), also called Main Memory Database (MMDB), is a database whose primary data store is the RAM. This is in contrast with traditional databases which keep their data in disk storage and use RAM as a buffer cache for frequently/recently used data.

IMDB has gained traction in recent times because of increasing speeds, reducing costs and growing size of RAM devices, combined with powerful multi-core processors. Since only RAM is accessed and there is no disk operation for an IMDB query, speeds are extremely high.

However, RAM storage is volatile. So there will be data loss when the device loses power. To overcome this, IMDB employ several techniques such as checkpoints, transaction logging, and non-volatile RAM.

Among commercially available IMDB is SAP HANA. Open source options are Redis, VoltDB, Memcached, and an extension of SQLite.

## Milestones

1970

The term "relational database" is invented by E. F. Codd at IBM in 1970.

1978

IBM's IMS/VS FastPath is one of the earliest in-memory engines.

1980

In telecom and defence domains, some companies start using in-memory databases. However, these are internal to those who used them and generally not available for purchase.

1992

Oracle 7 version supports data buffers where a snapshot of a data block would be taken from disk and stored in RAM for faster access.

1997

TimesTen releases its first version in-memory database. TimesTen is later acquired by Oracle.

2009

Redis, an in-memory data structure project, makes its first release with a BSD license. It becomes one of the most popular key-value store databases.

2012

Early versions of SAP HANA were present from 2005. In 2012, SAP promotes its commercial version for cloud based applications. HANA now includes a business suite of applications covering ERP and CRM domains under one umbrella.

2014

Oracle releases its Oracle 12c cloud RDBMS business suite with in-built support for in-memory DB operations.

## Discussion

• What's the context for the growing interest in IMDB?

From 1995 to 2015, RAM got 6000 times cheaper. Prior to that, disk drives were the main option to store databases. Disks were great for sequential access but poor for random access since lot of time was spent in rotating the disk and seeking the exact location.

Meanwhile, computing power has increased via faster clocks and multi-core processors. Networking speeds have also gone up. However, disk access speeds have not gone up fast enough. In fact, they've become the bottleneck in computing systems. Worse still, the amount of data being generated has grown exponentially. There's also a need to analyse all this data (via Machine Learning) in almost real time.

This is where in-memory databases become suitable. They're now affordable enough to store large amounts of data, particularly compressed columnar data. They're fast enough for real-time analytics. We can continue to use disks where sequential access is desired, such as for logging. The notable database scientist Jim Gray once supposedly said,

Memory is the new disk
• What are the key features of IMDB? How do they vary from disk-based RDBMS?

Disk access is sequential. In disk-based DBs, the seek time to locate records on the physical disk is the biggest contributor to query time. In an IMDB, since entire data is in memory, this burden is entirely eliminated.

While RAM is volatile, data persistence is achieved through multiple methods and it’s done very efficiently. These safeguard from data loss due to power failure. Since only a minority of DB operations are data change operations (about 10-15%), disk operations are minimal.

Most IMDB support columnar data storage, where table records are stored as a sequence of columns, in contiguous memory locations. This speeds up analytical querying greatly and minimizes CPU cycles. When data is stored in columnar form, column-wise compression techniques (such as sparse columns) are used to minimise memory footprint. Moreover, there's less dependence on indexing. IMDB delivers performance similar to having an index on every column, but with much less transactional overhead.

• What are the popular applications of IMDB?

IMDB works well for applications that require very fast data access, storage and manipulation. Real-time embedded systems, music and call databases in mobile phones, telecommunication access networks, programming data in set-top boxes, e-commerce applications, social media sites, equity financial services are the most prevalent applications of in-memory databases. IMDBs have also gained a lot of traction in the data analytics space.

• How do the performance characteristics compare between IMDB and traditional DB?
• CPU and Memory - Accessing data in memory is much faster than writing to/reading from file systems. IMDB design is simpler than on-disk databases, so they have significantly lower memory/CPU requirements. Even if a machine hosting an RDBMS has enough memory on board to fit the entire data, IMDB would be faster. That's because it performs fewer copy operations and has more advanced in-memory data structures, optimized for working with memory. However, IMDB scales poorly to multiple CPU cores.
• Data Query and Update functions - Applications requiring random data access under 1ms (such as online/real-time applications) benefit from IMDB. If access time over 100ms is acceptable, traditional RDBMS works fine. For sequential access, the difference is even more pronounced. Persistence operations don’t affect data update times in IMDB since they happen offline.
• Size constraints - Generally, not more than 200-300GB RAM is installed on a machine in order to keep machine start-up time acceptable. So when DB size is in TB range supporting millions of transactions per second, memory sharding is done to partition one logical DB into multiple physical DBs.
• What are the ways in which persistence is supported in IMDB?

There are many methods by which IMDB might persist the data state on disk. End goal is to ensure complete recovery of data but without compromising on query speeds.

• Transaction Logs - Each data update is applied to the IMDB and also on a transaction log on disk. Change entries done at the end of the append-only log file. When the file size rolls over, its contents are archived.
• Checkpoint Images - Checkpoint files contain an image of the database on disk. Some IMDB use dual checkpoint files for additional safety, in case the system fails while a checkpoint operation is in progress. For recovery, the database checkpoint on disk is merged with the latest transaction log entries.
• High Availability - To protect against memory outages in data centers, the data cluster is replicated asynchronously into a second read-only cluster. If outage occurs, hot swap gets triggered to configure the secondary as primary.
• Non-volatile RAM - Using battery powered RAM devices or supercapacitors, all write operations can be persistent even after power loss. These are slightly slower than DRAM, but much faster than RDBMS disk operations.
• What are some commercial IMDB products?

Without being exhaustive, we describe three commercial IMDB products:

• SAP HANA - In-memory, column-based data store from SAP. Available as local appliance or cloud service. The in-memory data is CPU-aligned, no virtual expensive calculation of LRU, logical block addresses, just direct (pointer) addressing of data. Supports server scripts in SQLScript, JSON and R formats. Good support for predictive analytics, spatial data processing, text analytics, text search, streaming analytics, graph data processing, and ETL operations. Guarantees microsecond response and extremely high throughput performance.
• Oracle TimesTen - In-memory OLTP RDBMS acquired by Oracle. Guarantees microsecond response and extremely high throughput performance. Provides application level data cache for improved response time. High availability through replication.
• eXtremeDB - Combines on-disk and in-memory data storage in a single embedded database system. It can be deployed as an embedded database system or elastically scalable client/server distributed database.
• What are some open source IMDB platforms?

Here are some open source IMDB platforms to consider:

• Apache Ignite – Java-based middleware that forms an in-memory layer over any existing DB. Can work in a single or distributed environment. Seamless integration with MapReduce and Hadoop systems.
• Altibase - Hybrid database that combines an in-memory database and an on-disk database into a single product to achieve the speed of memory and the storage capacity of disk.
• SQLite - Instruct an SQLite database to exist purely in memory using the special filename :memory: instead of the real disk filename.
• Redis - Key-value store based system with support for key data structures. Schema free. Data durability feature is optional. Good programming language support.
• VoltDB - Traditional RDBMS with schema support. Works using Java Stored Procedures that applications can invoke through JDBC. The company is collaborating with Samsung for Scaling In-Memory Data Processing with Samsung DRAM/SSD devices.
• What are the use cases where an IMDB is not suitable?

Volatile memory in affordable and servers with support for 24TB of RAM are now available. But IMDB cannot replace the traditional RDBMS in all scenarios. Following are some of the use cases where IMDB is not suitable when:

• Persistence is critical - Applications with confidential/critical data undergoing frequent updates might be at risk in IMDB. Unless persistence features are in the IMDB, there's risk of data loss during power failure.
• Very small scale data - Small and medium enterprises can simply run on low-cost server with acceptable performance.
• Memory-intensive applications - When IMDB is used, the bulk of RAM is going be occupied by the DB itself. So if the application itself requires high memory (such as 3D games, live streaming), then memory costs will rise significantly.
• Very large scale data applications - Memory requirements would be prohibitively expensive, hence not recommended.
• Non-mission-critical operations - Backend operations or applications where data can be batch processed offline don't require millisecond query response times of IMDB.

## Milestones

1970

The term "relational database" is invented by E. F. Codd at IBM in 1970.

1978

IBM's IMS/VS FastPath is one of the earliest in-memory engines.

1980

In telecom and defence domains, some companies start using in-memory databases. However, these are internal to those who used them and generally not available for purchase.

1992

Oracle 7 version supports data buffers where a snapshot of a data block would be taken from disk and stored in RAM for faster access.

1997

TimesTen releases its first version in-memory database. TimesTen is later acquired by Oracle.

2009

Redis, an in-memory data structure project, makes its first release with a BSD license. It becomes one of the most popular key-value store databases.

2012

Early versions of SAP HANA were present from 2005. In 2012, SAP promotes its commercial version for cloud based applications. HANA now includes a business suite of applications covering ERP and CRM domains under one umbrella.

2014

Oracle releases its Oracle 12c cloud RDBMS business suite with in-built support for in-memory DB operations.

Author
No. of Edits
No. of Chats
DevCoins
3
1
1111
2
2
320
1613
Words
3
Chats
5
Edits
5
Likes
2503
Hits

## Cite As

Devopedia. 2019. "In-Memory Database." Version 5, September 23. Accessed 2020-09-18. https://devopedia.org/in-memory-database
• Site Map