Skip to main content

Best practices for Linux tuning

These steps outline the basic Linux tuning and configuration parameters required for best Aerospike stability and performance.

min_free_kbytes#

The min_free_kbytes kernel parameter controls how much memory should be kept free and not occupied by filesystem caches. Normally, the kernel will occupy almost all free RAM with filesystem caches and free memory up for allocation by processes as required. As Aerospike performs large allocations in shared memory (1GB chunks), the default kernel value may result in an unexpected OOM (out-of-memory kill). It is advisable to configure the parameter to at least 1.1GB, preferably 1.25GB if using cloud vendor drivers - as these too can make large allocations. This will ensure Linux always keeps enough memory as available and free for large allocations.

note

Setting this too high will OOM your machine instantly.

Check the parameter value:

$ cat /proc/sys/vm/min_free_kbytes

If the value is lower, adjust it accordingly to the running kernel and persist across reboots:

$ echo 3 > /proc/sys/vm/drop_caches$ echo 1310720 > /proc/sys/vm/min_free_kbytes$ echo "vm.min_free_kbytes=1310720" >> /etc/sysctl.conf

swappiness#

For low-latency operations, using swap to any extent will drastically slow down performance. It as advisable to disable swap with swapoff -a and remove the swap partition from /etc/fstab.

If that's not possible for operational reasons, at the very least set the swappiness to 0, as per below:

$ echo 0 > /proc/sys/vm/swappiness$ echo "vm.swappiness=0" >> /etc/sysctl.conf

THP - transparent huge pages#

In order to improve overall system responsiveness and allocation speed, The Linux kernel has a feature called THP, Transparent Huge Pages Unfortunately, for high-throughput and low-latency databases, which perform multiple small allocations, this can be counter productive. Having THP can cause the system to run out of RAM, with similar symptoms to a memory leak. Another issue is latency caused by THP defragmentation page locking.

THP must be disabled before the asd daemon (Aerospike process) starts. If it was running before, first setup the below, and then restart the operating system.

Create an init.d file:

cat << 'EOF' >/etc/init.d/disable-transparent-hugepages#!/bin/bash### BEGIN INIT INFO# Provides:          disable-transparent-hugepages# Required-Start:    $local_fs# Required-Stop:# X-Start-Before:    aerospike# Default-Start:     2 3 4 5# Default-Stop:      0 1 6# Short-Description: Disable Linux transparent huge pages# Description:       Disable Linux transparent huge pages, to improve#                    database performance.### END INIT INFO
case $1 in  start)    if [ -d /sys/kernel/mm/transparent_hugepage ]; then      thp_path=/sys/kernel/mm/transparent_hugepage    elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then      thp_path=/sys/kernel/mm/redhat_transparent_hugepage    else      return 0    fi
    echo 'never' > ${thp_path}/enabled    echo 'never' > ${thp_path}/defrag
    re='^[0-1]+$'    if [[ $(cat ${thp_path}/khugepaged/defrag) =~ $re ]]    then      echo 0  > ${thp_path}/khugepaged/defrag    else      echo 'no' > ${thp_path}/khugepaged/defrag    fi
    unset re    unset thp_path    ;;esacEOF

Make the file executable:

chmod +x /etc/init.d/disable-transparent-hugepages

Enable the script (non-systemd system):

# on debian/ubuntuupdate-rc.d disable-transparent-hugepages defaults# on RHEL/centoschkconfig --add disable-transparent-hugepages

If using systemd, create a systemd script:

cat << 'EOF' > /etc/systemd/system/disable-transparent-huge-pages.service[Unit]Description=Disable Transparent Huge Pages
[Service]Type=oneshotExecStart=/bin/bash /etc/init.d/disable-transparent-hugepages start
[Install]WantedBy=multi-user.targetEOF

Enable systemd script:

systemctl daemon-reloadsystemctl enable disable-transparent-huge-pages.service

NVMe partitioning#

Note that NVMe devices are normally capable of 4 simultaneous I/O operations, due to their connection design - these occupy 4 PCIe I/O lanes. If using raw devices for Aerospike storage, it is therefore advisable to partition each NVMe device used to at least 4 partitions. This will allow 4 write threads to operate in Aerospike and greatly improve the disk speed. If using a single partition with Aerospike as raw device, iostat may show 100% disk utilization (%util), while the await operation queuing statistic may be showing no queuing (await <1 means no queuing is happening) - this indicates that the disk itself can do more, while the PCIe lanes that are used are already being saturated.

Refer to the Partition Your Flash Devices paragraph for further details on device partitioning.

vm.max_map_count#

If using k8s or docker, it is advisable to raise the max_map_count parameter. This parameter controls how many memory map operations can be performed by a process at most. This can be too low and may result in memory allocation issues during normal operation.

To change this parameter:

$ echo "vm.max_map_count=262144" >> /etc/sysctl.conf$ echo 262144 > /proc/sys/vm/max_map_count
note

You may need to restart the docker daemon and all it's containers after making this change in order for the changes to take effect.

Containers - Networks#

When using k8s or docker, the default behavior is to use EXPOSE and PUBLISH features to publish ports from a container through the host to the outside world. This will cause the docker process to listen on a given port on the host and forward all packets to the container itself. This is highly inefficient, may cause latencies, packet drops and other crashes within the containers under heavy loads.

If using containers, it is advisable to configure said containers to either:

  1. use bridged networking as opposed to docker-only NAT
  2. use iptables to forward packets to the NAT network Aerospike containers as opposed to the docker EXPOSE port feature.

Both solutions presented above will result in much better network latencies and a more stable network.

Refer to the docker configuration manuals for further details.

Max Open File limits#

Aerospike clients perform dynamic connections to the database nodes as and when required. This may result in many active connections. These connections, on a Linux system, hold a file descriptor and are treated as open files. Aerospike has a configuration parameter proto-fd-max to control the maximum number of allowed client connections. If this is higher than the Linux system's maximum open files configuration for the process, it will cause the Aerospike process to crash.

After installing Aerospike, ensure that the max open files for the asd process is configured to have a higher max open file value than proto-fd-max - to allow for fabric and heartbeat connections as well as any open files.

Non-systemd: edit /etc/init.d/aerospike.conf and change the value of the below line.

$ ulimit -n 100000

For systemd, create an override.conf file to control this:

$ cat <<EOF > /etc/systemd/system/aerospike.service.d/override.conf[Service]LimitNOFILE=<MAX NUMBER OF FILE DESCRIPTORS>EOF

Then reload the systemd daemon:

$ systemctl daemon-reload
note

This change will require restarting the Aerospike server for the new value to be applied.

For versions 5.0 and above, you may apply this change also dynamically to the asd process if prlimit is available:

$ prlimit --pid $(pgrep asd) --nofile=200000