Many companies have recently dealt with huge amounts of data and all of them have problems in making efficient use of it. According to EMC, at the end of 2013 there were 4.4 zettabytes of data ( 2.9 zettabytes generated by consumers and 1.5 zettabytes generated by companies ).
Kilobyte>Megabyte>Gigabyte>Terabyte>Petabyte>Exabyte>Zetabyte>Yottabyte.
There must be a way to process and analyze this huge amount of data in a short time. This is where SAP HANA, the new in-memory database from SAP, comes in. Unlike other "traditional" databases, HANA loads all data in RAM memory.
The HANA acronym stands for High Performance Analytical Appliance and is a combination of Hardware and Software.
In recent years, two major hardware trends have dominated the hardware world.
Instead of further increasing the clock speed per CPU core (central processing unit), the number of CPU cores per CPU increased.
From the point of view of program execution, this means that the instructions are no longer executed sequentially, but in parallel, reaching the desired performance.
A Node Server contains up to 8 CPU with 10 -Core CPU / CPU each and 4TB RAM, with the possibility of extension ( scale- up ). For internal tests, systems with up to 100 TB RAM and 4,000 CPU cores are currently already combined.
In addition to the increase in RAM memory, hard disk drives have been replaced with solid state drives to increase the access time to the data. See Fig . 1 .
Thus, reading data from disk is no longer a problem since they are in RAM memory. The new challenge is the transfer from RAM to CPU, see Fig . 2.
Such servers are built in partnership with companies like HP, IBM, Fujitsu, Cisco and Dell and, therefore, HANA can only run on certified hardware by SAP. When setting up a HANA server, enough RAM should be allocated so that all the data can fit. If there is not enough RAM memory, HANA loads the most used tables in memory.
As we mentioned at the beginning of this paper, HANA is a combination of HW and SW and, hence, high performance derives not only from hardware but from software innovation as well. Among such many innovations, Data layout, Compression and Partitioning are worth mentioning. In software development projects, developers’ work is simplified since the compression and partitioning are fully automated.
In any relational database, the information must be stored in a certain format, at a row or at a column level, irrespective of whether the information is in RAM memory (such as SAP HANA) or on HDD / SSD. We have two options: either at the level of rows or at the level of columns. HANA can work with both.
When working as such, all the data from a table (Table 1) is stored under rows, or in other words “side by side", which makes the reading of the entire row easy. The access to a column is more problematic, since transferring data from memory to the CPU is not done as effectively as column storage.
The content of table columns is saved “side by side " in memory. This means that operations that can be performed on columns (SUM, AVG) are executed faster. With this type of storage, there is a disadvantage in accessing the entire table row.
As can be seen, each storage mode has advantages and disadvantages. When creating tables, developers must choose the storage type, depending on the operations performed on that table.
Data compression has a positive impact on performance, because it reduces the volume of the data transferred from the memory to the CPU. The use of compression reduces the data volume by a factor of 5 % to 10 %.
Partitioning helps us when we are dealing with large volumes of data. For example, when we want to delete data, we should not look up the data in the database, but rather delete an entire partition. There are 2 types of partitioning: vertical and horizontal.
The tables are divided into smaller sections, based on columns. For example, columns 1-5 are stored in a partition and columns 5-9 on another.
Tables are divided into smaller sections, based on rows. Rows 1 to 100.000 are stored on a partition and rows 100,001 to 200,000 on another. SAP HANA uses only horizontal partitioning. The data is distributed in different partitions based on rows, while the data is stored in columns.
SAP HANA is a relational database, similar to other databases used by SAP. It combines column storage and row storage, and is optimized for the parallel processing provided by the new hardware technology.
In the next issue of the magazine, we will talk about SAP HANA as a Platform and how this helps reduce implementation time for various projects.