Guide

This guide provides step-by-step instructions for deploying the CEEMS stack. The stack consists of two main components:

CEEMS Exporter: Installed on all compute nodes to collect metrics
CEEMS API Server: Installed on a service node to store and manage job/VM data

Optionally, you can install CEEMS LB on the service node to enforce access control on metrics.

Prerequisites

Before installation, ensure your resource managers (SLURM/OpenStack) are properly configured to work with CEEMS components.

Compute Nodes Requirements

There are no special requirements for the CEEMS exporter to work on compute nodes. Although the exporter is not extensively tested on different OS distributions/architectures, it should work on all major distributions supported by SLURM/OpenStack. The exporter is very lightweight, and when all the available collectors are enabled, it will have a maximum memory consumption of around 150 MB and take a CPU time of around 0.05 seconds per scrape request.

If the compute nodes have NVIDIA GPUs, NVIDIA DCGM and NVIDIA DCGM Exporter must be installed on the compute nodes. Installation instructions for these packages can be found in their corresponding documentation.

Similarly, if the compute nodes have AMD GPUs, AMD SMI Exporter or AMD Device Metrice Exporter must be installed on the compute nodes to get power consumption and performance metrics of GPUs.

Finally, for SLURM or k8s clusters, if continuous profiling of the jobs/pods is required, Grafana Alloy must be installed on the compute nodes.

Service Node

Different services must be deployed for CEEMS. They can all be deployed on the same service node or on different nodes. Installing them on the same machine will help to manage the services easily and reduce the attack surface, as all services can be bound to localhost. The list of required services is:

Prometheus (required): To scrape metrics from exporters running on compute nodes
CEEMS API server (required): To store the jobs/VMs data in a standardized database
Grafana (required): To construct dashboards to expose metrics for operators and end users
CEEMS LB (optional): To enforce access control to the Prometheus metrics
Pyroscope (optional): When continuous profiling of SLURM jobs/k8s pods is needed

NOTE

The present guide assumes that Prometheus, Pyroscope (if needed), and Grafana are already installed and configured on the service node. Installation instructions for each component can be found in their respective documentation and hence, they are omitted here.

Installation instructions for Prometheus, Grafana and Pyroscope can be found in their documentation. The CEEMS API server and CEEMS LB require very modest system resources and hence, they can be run alongside Prometheus and Pyroscope on the same service node. The scaling of this service node must take into account the size of the cluster, number of Prometheus targets, Prometheus data retention period, etc. A good recommendation is to have at least 32 GiB of memory and 8 CPUs, which should be enough to host all the necessary services.

When it comes to storage, Prometheus works best on local disk storage. Thus, depending on the required retention period, local SSD/NVMe disks with RAID to achieve fault tolerance can be a good starting point. There are other options like Thanos and Cortex to achieve long-term storage and fault tolerance for Prometheus data.

Installation Steps

The installation steps in this section make the following assumptions:

There are two sets of compute nodes: 1 Compute node without GPUs compute-0 and 1 Compute node with NVIDIA GPUs compute-gpu-0.
A single service node service-0 is used to install all CEEMS related services

For containerized deployments, podman will be used along with Quadlet to manage container services.

Installing Exporter(s)

Firstly, all the necessary repositories must be added to the local YUM or DEB repositories. If local repositories are not maintained, it is possible to download the package files and install from the package files. The following packages and/or repositories must be added:

CEEMS Exporter, API Server and Load Balancer RPM and DEB files can be downloaded from GH Releases.
When NVIDIA GPUs are present on the cluster CUDA Repos must be added.

Once all the necessary packages are downloaded and/or added to the repositories, they can be installed on the compute nodes.

On the compute nodes, the following packages must be installed:

RHEL/CentOS/Rockylinux/Alma

  whoami
  # root

  hostname
  # compute-0 or compute-gpu-0 or service-0

  dnf install ceems_exporter -y

When nodes have NVIDIA GPUs, we need to install NVIDIA DCGM and NVIDIA DCGM exporter.

NOTE

Current guide assumes that NVIDIA driver >=550 and CUDA >=12 are available on compute nodes.

  whoami
  # root

  hostname
  # compute-gpu-0

  dnf install datacenter-gpu-manager-4-core datacenter-gpu-manager-4-cuda12 datacenter-gpu-manager-4-devel datacenter-gpu-manager-4-proprietary datacenter-gpu-manager-4-proprietary-cuda12 datacenter-gpu-manager-exporter -y

Debian/Ubuntu

  whoami
  # root

  hostname
  # compute-0

  apt-get install ceems_exporter -y

When nodes have NVIDIA GPUs, we need to install NVIDIA DCGM and NVIDIA DCGM exporter.

NOTE

Current guide assumes that NVIDIA driver >=550 and CUDA >=12 are available on compute nodes.

  whoami
  # root

  hostname
  # compute-gpu-0

  apt-get install datacenter-gpu-manager-4-core datacenter-gpu-manager-4-cuda12 datacenter-gpu-manager-4-devel datacenter-gpu-manager-4-proprietary datacenter-gpu-manager-4-proprietary-cuda12 datacenter-gpu-manager-exporter -y

We install ceems_exporter on service node service-0 to export real-time and static emission factor data.

Configuring Exporter(s)

CEEMS Exporter

At minimum, CEEMS exporter must be configured with the CLI arguments that enable the relevant collectors. This can be done using environment variables which can be provided to the systemd service file installed by the package. For instance, to enable the SLURM collector and to disable collector metrics, the following must be added to the systemd service file:

whoami
# root

hostname
# compute-0 or compute-gpu-0

cat > /etc/systemd/system/ceems_exporter.service.d/override.conf << EOF
[Service]
Environment=CEEMS_EXPORTER_OPTIONS="--collector.slurm --web.disable-exporter-metrics"
EOF

Similarly for OpenStack compute nodes, a basic runtime configuration would be as follows:

whoami
# root

hostname
# compute-0 or compute-gpu-0

cat > /etc/systemd/system/ceems_exporter.service.d/override.conf << EOF
[Service]
Environment=CEEMS_EXPORTER_OPTIONS="--collector.libvirt --web.disable-exporter-metrics"
EOF

Optionally, if emissions must be estimated using real-time emission factors, we need to deploy another instance of CEEMS exporter on the service node, service-0, to pull the emission factors and export them to Prometheus. To enable real-time emission factors from Electricity Maps and RTE eCO2 Mix, the CLI options for this exporter must be:

whoami
# root

hostname
# service-0

cat > /etc/systemd/system/ceems_exporter.service.d/override.conf << EOF
[Service]
Environment=CEEMS_EXPORTER_OPTIONS="--collector.emissions --collector.emissions.provider="rte" --collector.emissions.provider="emaps" --collector.disable-defaults --web.disable-exporter-metrics"
EOF

WARNING

Operators need to verify the usage policy of Electricity Maps API before using it in their production environment.

CEEMS package supports static emission factors from historical data provided by OWID. To estimate emissions using this static factor, there is no need to deploy the above instance of CEEMS exporter and emissions will be estimated directly using the static factor value for a given country.

More details on runtime configuration of CEEMS exporter can be consulted from the docs.

By default, no authentication is enabled on CEEMS exporter and it is strongly recommended to add at least basic authentication. This is done using a web configuration file which is installed by packages. More details on all the available options for web configuration can be found in its dedicated section.

There is a utility tool ceems_tool distributed with CEEMS API server package that can be used to generate web config file. Assuming ceems_tool is available on the current host, web config file can be generated as follows:

ceems_tool config create-web-config

This command will generate a web config file with basic auth configuration in config folder in the current directory named as web-config.yml. The config file will only contain hashed password and the output of the command shows the password in plain text. For example, the output of above command would be:

web config file created at config/web-config.yml
plain text password for basic auth is <PASSWORD_WILL_BE_DISPLAYED_HERE>
store the plain text password securely as you will need it to configure Prometheus

This password must be stored securely to use it when configuring Prometheus. The generated web configuration file must be placed at /etc/ceems_exporter/web-config.yml on the compute nodes.

Finally, CEEMS exporter must be enabled to start at boot and restarted for the changes to take effect.

whoami
# root

hostname
# compute-0 or compute-gpu-0

systemctl enable ceems_exporter.service
systemctl start ceems_exporter.service

For the case of containerized deployments using Podman Quadlets, sample systemd Quadlet files are provided in the repository. Steps to follow to deploy Quadlets:

Copy ceems_exporter.network and ceems_exporter.container files to /etc/containers/systemd folder.
Create /etc/ceems_exporter folder on the host and copy the generated web configuration file to /etc/ceems_exporter/web-config.yml.
Modify the Exec directive in ceems_exporter.container file to add relevant CLI options.
Execute systemctl daemon-reload which should generate necessary service files.
Finally launch the service using systemctl start ceems_exporter.service.

DCGM Exporter

DCGM exporter needs a CSV file that lists all the metrics that will be monitored. datacenter-gpu-manager-exporter package installs a default file at /etc/dcgm-exporter/default-counters.csv which enables important metrics. Replace the contents of default-counters.csv file with the one provided in the CEEMS repo, which enables more profiling metrics than the default one.

By default, DCGM exporter runs without any authentication and it is desirable to run it behind basic auth. DCGM exporter supports the same web configuration file as CEEMS exporter and hence, the same web configuration can be used for both exporters. Assuming the web configuration file is installed as /etc/dcgm-exporter/web-config.yml, it can be passed to the DCGM exporter using environment variable DCGM_EXPORTER_WEB_CONFIG_FILE.

whoami
# root

hostname
# compute-0 or compute-gpu-0

cat > /etc/systemd/system/nvidia-dcgm-exporter.service.d/override.conf << EOF
[Service]
Environment=DCGM_EXPORTER_WEB_CONFIG_FILE=/etc/dcgm-exporter/web-config.yml
EOF

Final step is to enable and start DCGM exporter service.

whoami
# root

hostname
# compute-0 or compute-gpu-0

systemctl enable nvidia-dcgm-exporter.service
systemctl start nvidia-dcgm-exporter.service

IMPORTANT

To deploy DCGM exporter as Podman container, ensure the version of Podman is > 4.3. We need to ensure to install NVIDIA Container Toolkit before deploying DCGM exporter container. For Podman, Container Device Interface (CDI) must be configured and more details can be found in the NVIDIA CDI Docs.

For the case of containerized deployments using Podman Quadlets, sample systemd Quadlet files are provided in the repository. Steps to follow to deploy Quadlets:

Copy nvidia-dcgm-exporter.container file to /etc/containers/systemd folder.
Create /etc/dcgm-exporter folder on the host and copy the generated web configuration file to /etc/dcgm-exporter/web-config.yml and counters.csv to /etc/dcgm-exporter/default-counters.csv.
Execute systemctl daemon-reload which should generate necessary service files.
Finally launch the service using systemctl start nvidia-dcgm-exporter.service.

Configuring Prometheus

Assuming Prometheus has already been installed on service-0, following scrape configuration must be added to Prometheus. Remember that in the current deployment scenario, we have two sets of compute nodes:

1 Compute node without GPUs compute-0
1 Compute node with NVIDIA GPUs compute-gpu-0
1 Service node where emission factors are fetched and exported

We define three different scrape jobs: cpu-nodes, gpu-nodes and service-nodes to set up CEEMS exporter targets. We can either add DCGM exporter targets in gpu-nodes job or define a separate scrape job for DCGM exporter. In the current scenario, we set up DCGM exporters in the same job.

NOTE

We will need the plain text basic auth password generated for CEEMS and DCGM exporters in the previous step to configure Prometheus scrape jobs.

The scrape jobs configuration would be as follows:

# A list of scrape configurations.
scrape_configs:
  - job_name: cpu-nodes
    scheme: http
    metrics_path: /metrics
    basic_auth:
      username: ceems
      password: <BASIC_AUTH_PLAIN_TEXT_PASSWORD>
    static_configs:
      - targets:
        - compute-0:9010

  - job_name: gpu-nodes
    scheme: http
    metrics_path: /metrics
    basic_auth:
      username: ceems
      password: <BASIC_AUTH_PLAIN_TEXT_PASSWORD>
    # This relabel_config must be added to all
    # scrape jobs that have DCGM targets
    metric_relabel_configs:
      - source_labels:
          - modelName
          - UUID
        target_label: gpuuuid
        regex: NVIDIA(.*);(.*)
        replacement: $2
        action: replace
      - source_labels:
          - modelName
          - GPU_I_ID
        target_label: gpuiid
        regex: NVIDIA(.*);(.*)
        replacement: $2
        action: replace
      - regex: UUID
        action: labeldrop
      - regex: GPU_I_ID
        action: labeldrop
    static_configs:
      - targets:
        - compute-gpu-0:9010
        - compute-gpu-0:9400

  # This job is needed only when exporter is deployed
  # on service node to pull real-time emission factors
  # from RTE eCo2 mix and/or Electricity Maps
  - job_name: service-nodes
    scheme: http
    metrics_path: /metrics
    basic_auth:
      username: ceems
      password: <BASIC_AUTH_PLAIN_TEXT_PASSWORD>
    static_configs:
      - targets:
        - service-0:9010

IMPORTANT

All the Prometheus scrape jobs that have DCGM exporter targets must include a metric_relabel_configs as follows:

metric_relabel_configs:
  - source_labels:
      - modelName
      - UUID
    target_label: gpuuuid
    regex: NVIDIA(.*);(.*)
    replacement: $2
    action: replace
  - source_labels:
      - modelName
      - GPU_I_ID
    target_label: gpuiid
    regex: NVIDIA(.*);(.*)
    replacement: $2
    action: replace
  - regex: UUID
    action: labeldrop
  - regex: GPU_I_ID
    action: labeldrop

For the case of AMD exporters, a similar relabel config must be applied and it can be consulted from Prometheus config section.

This is only basic configuration and more options can be found in the Prometheus configuration docs. Once this configuration has been added, reload Prometheus and check if it is able to scrape the targets. This can be verified using Prometheus Web UI.

Once Prometheus is able to scrape targets and ingest metrics, we will need to add recording rules to create new derived metrics from the raw metrics exported by CEEMS and DCGM exporters. The advantage of using recording rules is that Prometheus will calculate these metrics at ingest time once and there is no need to make calculation each time we want to make queries.

Recording rules can be created using ceems_tool using the following command:

./bin/ceems_tool tsdb create-recording-rules --url=http://<PROMETHEUS_BASIC_AUTH_USERNAME>:<PROMETHEUS_BASIC_AUTH_PASSWORD>@service-0:9090 --country-code=FR

IMPORTANT

When Redfish Collector is enabled on CEEMS exporters and if Redfish server has multiple chassis defined, the above command will ask for the user input on which chassis must be used in estimated power consumption. As different chassis can report power consumption of different components, operators must choose a chassis that reports power consumption of host.

The --url must be URL at which Prometheus server is running and --country-code must be ISO2 country code to get the emission factor. This command will generate recording rules files in the current directory inside a folder named rules. Copy these rules files to /etc/prometheus/rules directory and ensure to set following configuration for Prometheus:

rule_files:
  - /etc/prometheus/rules/*.rules

Reload Prometheus and verify the rules are being evaluated and recorded correctly.

Installing and Configuring CEEMS API Server

Before going to this step, ensure that Prometheus is able to scrape the targets and all the metrics are being monitored.

CEEMS API server can be installed on the same host as Prometheus or a different one. In this example, we use the same host as Prometheus for simplicity. Assuming Prometheus has been installed on service node service-0, CEEMS API server can be installed as follows:

RHEL/CentOS/Rockylinux/Alma

  whoami
  # root

  hostname
  # service-0

  dnf install ceems_api_server -y

Debian/Ubuntu

  whoami
  # root

  hostname
  # service-0

  apt-get install ceems_api_server -y

CEEMS API server stores all the data related to compute units and hence, it is strongly recommended to protect the server using authentication. It supports the same authentication mechanism as CEEMS and DCGM exporter as explained in the previous section. ceems_tool can be leveraged to generate a web configuration file. Copy the generated configuration file to /etc/ceems_api_server/web-config.yml.

Now CEEMS API server config must be updated. ceems_api_server package installs a default configuration file at /etc/ceems_api_server/config.yml with sane defaults. More details about configuration parameters can be consulted in the Configuration Reference. Here we need to add configuration for clusters and updaters sections in the file.

First, we will start with updaters section. updaters are list of servers that will be used to estimate aggregate metrics of each compute unit and store them in a SQL DB. For example in this scenario, Prometheus server is an updater from which we can estimate the aggregate metrics of compute units. The advantage of using updaters is that we do not need to make expensive repeated queries to Prometheus to get aggregate values of the metrics.

In order to estimate aggregated metrics, we need to configure the updater with TSDB queries that estimates aggregated metrics. Assuming that recording rules for Prometheus are added as explained in the Configuring Prometheus section, we can generate the queries needed for the updater using ceems_tool as follows:

ceems_tool tsdb create-ceems-tsdb-updater-queries --url=http://<PROMETHEUS_BASIC_AUTH_USERNAME>:<PROMETHEUS_BASIC_AUTH_PASSWORD>@service-0:9090

The --url must point to Prometheus URL and above command will output the queries configuration section to the terminal. Copy this output and store it on clipboard. Every updater must have an unique identifier and assuming prom-tsdb as identifier and QUERIES_OUTPUT as the queries returned by the above command, following configuration must be added to updaters section in file /etc/ceems_api_server/config.yml:

updaters:
  - id: prom-tsdb
    updater: tsdb
    web:
      url: http://service-0:9090
      basic_auth:
        username: <PROMETHEUS_BASIC_AUTH_USERNAME>
        password: <PROMETHEUS_BASIC_AUTH_PASSWORD>
    extra_config:
      queries: <QUERIES_OUTPUT>

Finally, we need to configure clusters section in the configuration file. clusters section defines the list of clusters from where we fetch compute units data. It can be multiple clusters of same kind or multiple clusters of different kind. Each cluster must be identified by a unique identifier like in the case of updater.

We assume that the resource manager of this scenario is SLURM. In this case, the host where CEEMS API server will be deployed must be configured as a SLURM client to be able to execute sacct command to get list of jobs. Assuming that it has been done, the clusters section in file /etc/ceems_api_server/config.yml must have the following configuration:

clusters:
  - id: slurm-cluster
    manager: slurm
    # Updater id that we defined in the `updaters` section
    # Aggregate metrics of job will be estimated by querying
    # against this Prometheus server
    updaters:
      - prom-tsdb
    # If `sacct` command is installed in a non-standard location,
    # set the path here
    cli:
      path: /usr/bin

With the above clusters and updaters configurations in-place in /etc/ceems_api_server/config.yml, we can enable and start the CEEMS API server

systemctl enable ceems_api_server.service
systemctl start ceems_api_server.service

Once the API server has started, we can check for its health by hitting endpoint http://localhost:9020/api/v1/health assuming we are on the host where API server has been deployed.

Once the Prometheus and CEEMS API server are up and running, we can configure Grafana to use these servers as data sources for building dashboards.

Configuring Grafana

The final step of the deployment guide is to configure Grafana to use Prometheus and CEEMS API server as datasources to build dashboards. Assuming Grafana server is also installed on the same service node service-0, first, we need to ensure that Grafana server is configured to send the user header to datasources. This can be done using the following configuration in grafana.ini file:

[dataproxy]
send_user_header = true

or setting GF_DATAPROXY_SEND_USER_HEADER=true environment variable on Grafana server.

Following this, we need to install Grafana Infinity Datasource plugin using following command:

grafana-cli plugins install yesoreyeram-infinity-datasource

Once the plugin has been installed, restart Grafana server.

IMPORTANT

For the plugin versions yesoreyeram-infinity-datasource < 3.x, plugin does not support X-Grafana-User header which we need for CEEMS API server to identify current user. So, it is recommended to use version >= 3.x.

We use Grafana provisioning to define the datasources. A sample provisioned datasources file is provided in the repository. For the current scenario, the provisioning file would be as follows:

# Configuration file version
apiVersion: 1

# List of datasources that CEEMS uses
datasources:
  # Vanilla Prometheus datasource that DOES NOT IMPOSE ANY ACCESS CONTROL
  - name: prom
    type: prometheus
    access: proxy
    # Replace it with Prometheus URL
    url: <PROMETHEUS_URL>
    basicAuth: true
    # Replace it with Prometheus basic auth username
    basicAuthUser: <PROMETHEUS_BASIC_AUTH_USERNAME>
    secureJsonData:
      # Replace it with Prometheus basic auth password
      basicAuthPassword: <PROMETHEUS_BASIC_AUTH_PASSWORD>

  # CEEMS API server JSON datasource
  - name: ceems-api
    type: yesoreyeram-infinity-datasource
    url: <CEEMS_API_SERVER_URL>
    basicAuth: true
    # Replace it with CEEMS API server basic auth username
    basicAuthUser: <CEEMS_API_SERVER_BASIC_AUTH_USERNAME>
    jsonData:
      auth_method: basicAuth
      timeout: 120
      # Replace it with CEEMS API server URL
      allowedHosts:
        - <CEEMS_API_SERVER_URL>
      httpHeaderName1: X-Grafana-User
    secureJsonData:
      # Replace it with CEEMS API server basic auth password
      basicAuthPassword: <CEEMS_API_SERVER_BASIC_AUTH_PASSWORD>
      # This will be replaced by username before passing to API server
      # This feature is available only for yesoreyeram-infinity-datasource >= 3.x
      # IMPORTANT: Need $$ to escape $
      httpHeaderValue1: $${__user.login}

Replace the placeholders with values and install the file at /etc/grafana/provisioning/datasources. Now restarting Grafana must include all the newly provisioned datasources.

The next step is to setup dashboards to visualize the metrics of compute units. This can be done using Grafana provisioning as well. A reference set of dashboards is provided in the repository. More details on the dashboards are provided in the README.

Optional Steps

Note that with the above installation steps, a functional CEEMS deployment can be assured. However, if access control to Prometheus data must be enforced, an additional component CEEMS LB must be also deployed. In a nutshell this component sits between Grafana and Prometheus to introspect the queries coming from Grafana to verify if the user making the query has view access to the metrics of the compute unit they are querying for.

As discussed in the Prerequisites, in order to enable continuous profiling of SLURM jobs or k8s pods, Grafana Alloy must be installed on compute nodes and Pyroscope must be installed on service node.

Deploying Grafana Alloy and Pyroscope

NOTE

CEEMS exporter now supports doing eBPF based continuous profiling of compute units natively and hence, there is no need to install Grafana Alloy on the compute nodes. This functionality can be enabled using --profiling.ebpf flag to CEEMS exporter and passing an appropriate configuration file discussed in Configuration Section.

First, ensure that Grafana Alloy and Pyroscope packages must be added and enabled.

First we must install Pyroscope server so that Grafana Alloy running on compute nodes can send profile data to Pyroscope. We deploy Pyroscope on the service node service-0:

RHEL/CentOS/Rockylinux/Alma

  whoami
  # root

  hostname
  # service-0

  dnf install pyroscope -y

Debian/Ubuntu

  whoami
  # root

  hostname
  # service-0

  apt-get install pyroscope -y

A basic configuration file is provided in the repository and it can be used as a good starting point. It must be installed at /etc/pyroscope/config.yml. More details on Pyroscope configuration can be found in the documentation.

NOTE

It is highly recommended to configure TLS auth for Pyroscope to enforce authentication. If managing TLS certificates is not desired, we recommend to use basic auth by exposing Pyroscope behind a reverse proxy like nginx and configuring the nginx server block with basic auth credentials. In the absence of any form of authentication, end users in a typical HPC environment will be able to query Pyroscope server directly which is not desired.

On the compute nodes, the following packages must be installed:

RHEL/CentOS/Rockylinux/Alma

  whoami
  # root

  hostname
  # compute-0 or compute-gpu-0

  dnf install alloy -y

Debian/Ubuntu

  whoami
  # root

  hostname
  # compute-0 or compute-gpu-0

  apt-get install alloy -y

A sample configuration file is provided in the repository. Necessary placeholders on the sample config file must be replaced and the file must be installed at /etc/alloy/config.alloy.

We need to enable Grafana Alloy targets discoverer component on CEEMS exporter so that it provides a list of targets to profile to Grafana Alloy. This can be done by configuring CEEMS_EXPORTER_OPTIONS environment variable for CEEMS exporter service:

whoami
# root

hostname
# compute-0 or compute-gpu-0

cat > /etc/systemd/system/ceems_exporter.service.d/override.conf << EOF
[Service]
Environment=CEEMS_EXPORTER_OPTIONS="--collector.slurm --collector.alloy-targets --web.disable-exporter-metrics"
EOF

Finally, enable and restart both CEEMS Exporter and Grafana Alloy services:

whoami
# root

hostname
# compute-0 or compute-gpu-0

systemctl enable ceems_exporter.service
systemctl restart ceems_exporter.service

systemctl enable alloy.service
systemctl restart alloy.service

TIP

If Grafana Alloy throws any errors, ensure that alloy.service is running as root user in the systemd service file. Grafana Alloy needs to access a lot of files in /proc and /sys file systems to be able to continuously profile processes which are otherwise not permitted for non-privileged users.

After this step, Grafana Alloy should be sending the profiles data to Pyroscope for every SLURM job on the compute node.

As mentioned before Grafana Alloy is not strictly necessary for enabling continuous profiling and it can be done by the CEEMS exporter natively. This enables operators to manage fewer components and helps maintainability of the stack on the platform. In order to enable profiling natively, first a configuration file with at least client configuration of the Grafana Pyroscope server must be created and saved to /etc/ceems_exporter/ebpf_profiling_config.yml and then the CLI flags to the exporter can be modified as follows:

whoami
# root

hostname
# compute-0 or compute-gpu-0

cat > /etc/systemd/system/ceems_exporter.service.d/override.conf << EOF
[Service]
Environment=CEEMS_EXPORTER_OPTIONS="--collector.slurm --profiling.ebpf --profiling.ebpf.config-file=/etc/ceems_exporter/ebpf_profiling_config.yml --web.disable-exporter-metrics"
EOF

Finally, restarting the CEEMS exporter will start collecting profiles of compute units.

whoami
# root

hostname
# compute-0 or compute-gpu-0

systemctl enable ceems_exporter.service
systemctl restart ceems_exporter.service

Installing and Configuring CEEMS LB

Before going to this step, ensure that CEEMS API server, Prometheus and Grafana are installed, configured and working as expected.

CEEMS LB can be installed on the same host as Prometheus. It is more practical and secure to deploy it on the same node where Prometheus is running. It is a simple proxy/load balancer that does not need a lot of resources.

In the current example, we use the same host as Prometheus for simplicity. Assuming Prometheus has been installed on service node service-0, CEEMS LB can be installed as follows:

RHEL/CentOS/Rockylinux/Alma

  whoami
  # root

  hostname
  # service-0

  dnf install ceems_lb -y

Debian/Ubuntu

  whoami
  # root

  hostname
  # service-0

  apt-get install ceems_lb -y

Just like CEEMS exporter and API server, it is strongly recommended to protect the load balancer using authentication. It supports the same authentication mechanism as other components as explained in the previous section. ceems_tool can be leveraged to generate a web configuration file. Copy the generated configuration file to /etc/ceems_lb/web-config.yml.

Now CEEMS LB config must be updated. ceems_lb package installs a default configuration file at /etc/ceems_lb/config.yml with sane defaults. More details about configuration parameters can be consulted in the Configuration Reference. The core configuration for CEEMS LB is simple and it takes two keys strategy and backends. strategy is the load balancing strategy whereas backends is the list of TSDB (and/or Pyroscope) backends.

The true value that CEEMS LB offers is the ability to provide access control to Prometheus query data. Deploying CEEMS LB without access control enabled is not very useful and not recommended. In order to enable access control an additional section ceems_api_server must be provided to CEEMS LB config. This section must provide either the ceems_api_server.data section or ceems_api_server.web section. If CEEMS LB is able to access the DB files of CEEMS API server, it is recommended to setup ceems_api_server.data.path file so that CEEMS LB will make queries directly to the DB. If CEEMS API server's DB files are not available to CEEMS LB, it will make HTTP requests to CEEMS API server to verify the ownership of compute units. It should be preferred to gives direct access to DB files to CEEMS LB to maximize performance and minimize latencies.

In this scenario, as both CEEMS API server and CEEMS LB are deployed on the same physical host, we use ceems_api_server.data.path method for DB access. The configuration file would be as follows:

ceems_lb:
  # Load balancing strategy
  strategy: round-robin

  # List of Prometheus and/or Pyroscope backends
  backends:
      # `id` should be the same as configured in `clusters` config.
    - id: slurm-cluster
      tsdb: 
        - web:
            url: http://<PROMETHEUS_URL>
            basic_auth:
              username: <PROMETHEUS_BASIC_AUTH_USERNAME>
              password: <PROMETHEUS_BASIC_AUTH_PASSWORD>
      
      # When Pyroscope is also deployed
      pyroscope:
        - web:
            url: <PYROSCOPE_URL>

# Must be same config as configured for `ceems_api_server` at `/etc/ceems_api_server/config.yml`
ceems_api_server:
  data: /var/lib/ceems

Replace the configuration file's content at /etc/ceems_lb/config.yml with above file after replacing placeholders and restart the CEEMS LB service.

whoami
# root

hostname
# service-0

systemctl enable ceems_lb.service
systemctl start ceems_lb.service

This should ensure that CEEMS LB running at localhost:9030. When Pyroscope server has also been deployed and configured in ceems_lb.backends, we will notice another HTTP server running at localhost:9040. Normally, the server running at localhost:9030 is load balancer for Prometheus backends while the server running at localhost:9040 is load balancer for Pyroscope backends. This can be confirmed by looking at the logs of the ceems_lb.

time=2025-02-13T16:43:51.775Z level=INFO source=frontend.go:220 msg="Starting ceems_lb" backend_type=pyroscope listening=127.0.0.1:9040
time=2025-02-13T16:43:51.775Z level=INFO source=tls_config.go:347 msg="Listening on" backend_type=pyroscope address=127.0.0.1:9040
time=2025-02-13T16:43:51.775Z level=INFO source=tls_config.go:350 msg="TLS is disabled." backend_type=pyroscope http2=false address=127.0.0.1:9040
time=2025-02-13T16:43:51.775Z level=INFO source=helpers.go:55 msg="Starting health checker" backend_type=tsdb
time=2025-02-13T16:43:51.775Z level=INFO source=frontend.go:220 msg="Starting ceems_lb" backend_type=tsdb listening=127.0.0.1:9030
time=2025-02-13T16:43:51.776Z level=INFO source=tls_config.go:347 msg="Listening on" backend_type=tsdb address=127.0.0.1:9030
time=2025-02-13T16:43:51.776Z level=INFO source=tls_config.go:350 msg="TLS is disabled." backend_type=tsdb http2=false address=127.0.0.1:9030
time=2025-02-13T16:43:51.776Z level=INFO source=helpers.go:55 msg="Starting health checker" backend_type=pyroscope

Adding CEEMS LB and Pyroscope Datasources on Grafana

When CEEMS LB and Pyroscope have been deployed, in addition to the datasources configured for Grafana in the section above, we need to add three new datasources: two for CEEMS LB (Prometheus and Pyroscope backends) and one for vanilla Pyroscope (without any access control). A sample provisioned config file for these datasources is shown below:

# Configuration file version
apiVersion: 1

# List of additional datasources that CEEMS uses
datasources:
  # Vanilla Pyroscope datasource that DOES NOT IMPOSE ANY ACCESS CONTROL
  - name: pyro
    type: pyroscope
    access: proxy
    # Replace it with Pyroscope URL
    url: <PYROSCOPE_URL>
    # If Pyroscope server has basic authentication
    # configured ensure that it has been added here as well

  - name: ceems-lb-tsdb
    # It should be of type Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9030
    basicAuth: true
    basicAuthUser: <CEEMS_LB_BASIC_AUTH_USERNAME>
    jsonData:
      prometheusVersion: 2.51
      prometheusType: Prometheus
      timeInterval: 30s
      incrementalQuerying: true
      cacheLevel: Medium
      # This is CRUCIAL. We need to send this header for CEEMS LB
      # to proxy the request to correct backend
      httpHeaderName1: X-Ceems-Cluster-Id
    secureJsonData:
      basicAuthPassword: <CEEMS_LB_BASIC_AUTH_PASSWORD>
      # It must be the same `id` configured across CEEMS components
      httpHeaderValue1: slurm-cluster

  - name: ceems-lb-pyro
    # It should be of Pyroscope type
    type: pyroscope
    access: proxy
    url: http://localhost:9040
    basicAuth: true
    basicAuthUser: <CEEMS_LB_BASIC_AUTH_USERNAME>
    jsonData:
      # This is CRUCIAL. We need to send this header for CEEMS LB
      # to proxy the request to correct backend
      httpHeaderName1: X-Ceems-Cluster-Id
    secureJsonData:
      basicAuthPassword: <CEEMS_LB_BASIC_AUTH_PASSWORD>
      # It must be the same `id` configured across CEEMS components
      httpHeaderValue1: slurm-cluster

After replacing the placeholders, this file must be installed at /etc/grafana/provisioning/datasources folder and restart the Grafana server.

Finally, while importing dashboards, the datasources for SLURM Single Job Metrics and for Openstack Single VM Metrics must be configured as ceems-lb-tsdb and ceems-lb-pyro (only for SLURM). This ensures that the queries made by Grafana will be intercepted by CEEMS LB, enforce the access control and then decide whether to proxy request to backend or not.

Conclusion

This guide provides an overall view of all necessary steps needed to configure CEEMS, Prometheus and Grafana. This should be only used as a guide and it must be adapted to the needs and constraints of individual data center. Any suggestions to improve this guide are always welcome and please do not hesitate to open a bug report, if any errors are found here.

Prerequisites​

Compute Nodes Requirements​

Service Node​

Installation Steps​

Installing Exporter(s)​

Configuring Exporter(s)​

CEEMS Exporter​

DCGM Exporter​

Configuring Prometheus​

Installing and Configuring CEEMS API Server​

Configuring Grafana​

Optional Steps​

Deploying Grafana Alloy and Pyroscope​

Installing and Configuring CEEMS LB​

Adding CEEMS LB and Pyroscope Datasources on Grafana​

Conclusion​

Prerequisites

Compute Nodes Requirements

Service Node

Installation Steps

Installing Exporter(s)

Configuring Exporter(s)

CEEMS Exporter

DCGM Exporter

Configuring Prometheus

Installing and Configuring CEEMS API Server

Configuring Grafana

Optional Steps

Deploying Grafana Alloy and Pyroscope

Installing and Configuring CEEMS LB

Adding CEEMS LB and Pyroscope Datasources on Grafana

Conclusion