Resource Managers
This section contains information on the configuration required by the supported resource managers by CEEMS.
SLURM
Slurm collector in the CEEMS exporter relies on the job accounting information (like CPU time and memory usage) in the cgroups that SLURM create for each job to estimate the energy and emissions for a given job. However, depending on the cgroups version and SLURM configuration, this accounting information might not be available. The following section will give guidelines on how to configure SLURM to ensure that this accounting information is always available.
Starting from SLURM 22.05 SLURM supports both cgroups v1 and v2. When cgroups v1 is being used, SLURM might not contain accounting information in the cgroups.
cgroups v1
The following configuration will enable necessary cgroups controllers and provide the accounting information of the jobs when cgroups v1 is used.
As stated in the cgroups docs of SLURM, cgroups plugin can be controlled by the configuration in this file. An example config is also provided which should be a good starting point.
Along the cgroups.conf
file, certain configuration parameters are required in the
slurm.conf
file as well. This is provided in the
SLURM docs as
well.
Although JobAcctGatherType=jobacct_gather/cgroup
is presented as optional
configuration parameter, it must be used to get the accounting information on CPU
usage. Without this configuration parameter, CPU time of the job will not be available
in the job's cgroups
Besides the above configuration, SelectTypeParameters
must be configured to set core or CPU and memory as consumable resources. This is
highlighted in the documentation of ConstrainRAMSpace
configuration parameter in cgroups.conf
docs.
In conclusion, here are the excerpts of configuration needed:
# cgroups.conf
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
# slurm.conf
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup,task/affinity
JobAcctGatherType=jobacct_gather/cgroup
SelectType=select/con_tres
SelectTypeParameters=CR_CPU_Memory # or CR_Core_Memory
cgroups v2
In the case of cgroups v2, SLURM should create a proper cgroup for every job without any special configuration. However, the configuration presented for cgroups v1 is applicable for cgroups v2 and it is advised to use that configuration for cgroups v2 as well.
Libvirt
The libvirt collector is meant to be used for Openstack clusters. There is no special configuration needed as Openstack will take care of confirguring libvirt and QEMU to enable all relevant cgroup controllers.