Resource Managers
This section contains information on the configuration required by the resource managers supported by CEEMS.
SLURM
The SLURM collector in the CEEMS exporter relies on the job accounting information (like CPU time and memory usage) in the cgroups that SLURM creates for each job to estimate the energy and emissions for a given job. However, depending on the cgroups version and SLURM configuration, this accounting information might not be available. The following section provides guidelines on how to configure SLURM to ensure that this accounting information is always available.
Starting from SLURM 22.05, SLURM supports both cgroups v1 and v2. When using cgroups v1, SLURM might not contain accounting information in the cgroups.
cgroups v1
The following configuration enables the necessary cgroups controllers and provides the accounting information for jobs when cgroups v1 is used.
As stated in the cgroups docs of SLURM, the cgroups plugin can be controlled by the configuration in this file. An example config is also provided, which serves as a good starting point.
Along with the cgroups.conf
file, certain configuration parameters are required in the slurm.conf
file as well. This information is provided in the SLURM docs as well.
Although JobAcctGatherType=jobacct_gather/cgroup
is presented as an optional configuration parameter, it must be used to get the accounting information for CPU usage. Without this configuration parameter, the CPU time of the job will not be available in the job's cgroups.
Besides the above configuration, SelectTypeParameters must be configured to set the core or CPU and memory as consumable resources. This is highlighted in the documentation of the ConstrainRAMSpace configuration parameter in the cgroups.conf
docs.
In conclusion, here are the necessary configuration excerpts:
# cgroups.conf
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
# slurm.conf
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup,task/affinity
JobAcctGatherType=jobacct_gather/cgroup
SelectType=select/con_tres
SelectTypeParameters=CR_CPU_Memory # or CR_Core_Memory
AccountingStorageTRES=gres/gpu # or any other TRES resources declared in your SLURM config
cgroups v2
For cgroups v2, SLURM should create the proper cgroups for every job without any special configuration. However, the configuration presented for cgroups v1 is applicable to cgroups v2, and it is advised to use that configuration for cgroups v2 as well.
Libvirt
The libvirt collector is meant to be used for OpenStack clusters. There is no special configuration needed, as OpenStack will take care of configuring libvirt and QEMU to enable all relevant cgroup controllers.