Our mission
Future of compute - green, open, efficient

Blog.

Server test routine (load and benchmarking).

18.09.2023

Written by Hauke Beer

Introduction

Besides the provision of open-source and energy-efficient Cloud technologies, we are specialised in the development of customised liquid cooling solutions for servers. We also offer a comprehensive energy efficiency assessment of servers. In order to verify the effectiveness of our liquid cooling conversions and to provide server manufacturers with a holistic evaluation of their water-cooled solutions, we conduct extensive server testing.

Motivation

Our motivation is to replace conventional air cooling systems with efficient liquid cooling and significantly improve the performance of the servers. To ensure that the converted servers function optimally, we carry out extensive tests before and after the conversion. Our focus is on monitoring and controlling the chip temperatures in a fully loaded state. For this purpose, we use specialised test benches that simulate data centre conditions. In addition to benchmarks to evaluate IT computing performance, we also determine the thermal power dissipated into the water. This aspect is crucial for heat recovery, where the discharged heat can be used for various processes and applications. Our goal is to achieve an outlet temperature of about 60 °C.

Objective

Our main objective in the server testing is to fully utilise the converted servers and then run benchmarks to evaluate the IT performance. In addition, during the testing process, the energy efficiency of the server is determined in the form of the heat capture rate. This value indicates how much of the server’s electrical power is dissipated to the cooling fluid. In order to minimise man-hours, it is of great importance to automate the entire process. In this article, we therefore address the utilisation of the servers and the benchmarks used. We show how we continuously optimise the performance of our liquid cooling solutions and offer our customers customised solutions that ensure the highest efficiency, reliability and performance. In another article we will discuss possibilities to automate this testing process.

Description of the test routine

Utilisation

A central part of the energy efficiency assessment of a server is to keep it at 100% load for a period of at least 2 hours. Two tools that can be used for this are Stress-NG and Firestarter.

Stress-NG

Stress-ng is an open-source tool developed primarily by Colin Ian King to perform various types of stress tests on computer systems. These stress tests are used to test the resilience and stability of hardware components and the performance of operating systems. The original main objective of stress-ng is to test the limits of a system and reveal potential weaknesses or problems. It creates a variety of artificial stress scenarios to test the system’s resource utilisation, responsiveness and robustness. It is used, for example, to test new architectures for hardware problems as well as to trigger thermal overloads.

Since Stress-NG can be installed on Ubuntu servers directly via the package management, this has long been the tool of choice for keeping servers busy. It can be installed via the command

sudo snap install stress-ng

Subsequently, the RAM can be continuously written to 85% with data via the command

stress-ng --vm 180 --vm-bytes 85% --vm-method all --verify -t 120m -v

This command also simultaneously loads the processor on all cores. The time limit here is 120 minutes. The utilisation can be viewed using „htop“:

As can be seen, the load in all 96 cores is 100% and 1.11 TB RAM is used to 951 GB. In addition, Stress-NG offers options to utilise other hardware components in servers. The options „–hdd“ or „–hdd-ops“ stress hard disk space. Stress-NG also has some options for network utilisation. By adding „–sock“, Stress generates traffic over a TCP/IP socket connection. UDP traffic can also be generated using „–udp“.

The following figure shows the measured electrical power consumption of a server that is loaded via the above-mentioned command using Stress-NG. It can be seen that the power varies between 220 W and 255 W. The average power consumption is 230 W.

Firestarter

Firestarter is a development of the Dresden University of Technology. The freely available programme was developed to determine the power consumption of standard computing nodes near their peak value2. It can also be used to test cooling infrastructure, system stability or to determine the maximum power consumption in energy efficiency studies. The open-source tool is easy to use and promises to reliably outperform the power consumption of other stress tests such as Stress-NG or LINPACK. Since Firestarter is still not available via the package management, the source file must first be downloaded and unpacked:

wget https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/firestarter
/FIRESTARTER_2.0.tar.gz

tar -xf FIRESTARTER_2.0.tar.gz This is version 2.0. Whether a newer version is available can be found on the Firestarter project page. The programme can then be used. The following command loads the machine for 300 seconds (5 minutes):

./FIRESTARTER -t 300

The different „load functions“ are particularly interesting.The available load options can be displayed with the command

./FIRESTARTER: -a

and then executed with

./FIRESTARTER: -i: 1\dots22.

Depending on the machine and the components installed, different functions are available. The function 21 („FUNC\_ZEN\_2\_EPYC\_FMA\_1T“) is particularly effective and frequently used. In contrast to Stress-NG, this test does not load the working memory not with data.

However, the electrical power measurement shows that Firestarter produces a higher load. As can be seen in Picture 4, the power varies between 255 W and 290 W and averages 265 W. Thus, the electrical power consumption is about 15 % higher than with stress-NG.

Furthermore, Firestarter offers other additional options, such as oscillating loads, freely selectable load between 1…100 % and an optimization mode that determines the best load function. This flexibility and especially the higher electrical load make Firestarter the preferred choice when testing servers.

Benchmarking

Benchmarking plays a central role in evaluating and, if necessary, improving the functionality of the developed fluid-based cooling solutions. Furthermore, it enables us to prove that the servers function without restrictions even when operated at high fluid temperatures. To do this, we use the phoronix Test Suite, a comprehensive open-source software designed to run performance tests and benchmarks on various computers and operating systems. It offers a variety of test tools and benchmark suites that allow users to measure and compare the performance of their hardware, drivers and software.

To install the Test Suite, gdebi-core is required. The following steps are to be carried out on an Ubuntu server:

sudo apt update
sudo apt install gdebi-core
wget http://phoronix-test-suite.com/releases/repo/pts.debian/files/
phoronix-test-suite_10.8.4_all.deb

sudo gdebi phoronix-test-suite_10.8.4_all.deb
sudo reboot

It can be operated in different ways. Either via the interactive mode, which guides the user through the operation:

phoronix-test-suite interactive

or directly by specifying a test:

phoronix-test-suite interactive

The results are displayed and can also be saved in a PDF file or uploaded to a website anonymously. For our test routine, we have selected benchmarks for central components such as CPU and RAM that are easy to install and provide comparable results.

OpenSSL Multithread CPU-Benchmark

The OpenSSL benchmark calculates an RSA key with a bit length of 4096 and uses all cores. The results are expressed in sign/s
and verify/s, with higher values indicating higher performance of the system. Figure 5 shows the results of two Intel® Xeon® Platinum 8368Q processors at different fluid temperatures.

The graph clearly shows how the performance of the processor decreases as the fluid temperature increases.

RNNoise Singlethread CPU-Benchmark

The RNNoise benchmark performs noise reduction by means of a neural network. The time required to denoise a 26-minute 16-bit RAW audio file is measured. Only one core is utilised. Single-thread benchmarks are useful because some specific applications or tasks are limited to one processor core.

RAMspeed SMP RAM-Benchmarks

The RAMspeed SMP benchmark determines the maximum possible performance of cache and main memory when copying data blocks consisting of floating point numbers. The result is given in $M B / s$ and the higher this value, the more potent the main memory. With this test, it was repeatedly observed that higher temperatures for RAM can increase the performance by 1%. However, it should be noted that the increase not takes place to a significant extent.

Conclusion

This article provides a detailed insight into how servers are examined at Cloud and Heat Technologies GmbH. The examinations serve to monitor the quality of our own development and also to offer this know-how to interested companies for the evaluation of their systems. It shows which tools are necessary and how the process can be automated to minimise the human effort.

Outlook

In the future, it is planned to expand the test routine to include further benchmarks. An important part of AI servers is the benchmarking of GPU performance. There are also considerations to include the SERT suite, but we refrain from doing so because it only determines the CPU performance unilaterally.

What have the open source cloud providers ever done for us?

15.04.2024

Server test routine (automation)

21.02.2024

Parameters for optimising the energy efficiency of data centres

15.08.2023