CEEMS Load Balancer
The CEEMS load balancer supports TSDB and Pyroscope servers. When both TSDB and Pyroscope backend servers are configured, the CEEMS LB will launch two different web servers listening at different ports - one for TSDB and one for Pyroscope.
CEEMS Load Balancer Configuration
The CEEMS Load Balancer configuration has one main section and two optional sections. A basic configuration skeleton is as follows:
# CEEMS Load Balancer configuration skeleton
ceems_lb: <CEEMS LB CONFIG>
# Optional section
ceems_api_server: <CEEMS API SERVER CONFIG>
# Optional section
clusters: <CLUSTERS CONFIG>
The CEEMS LB uses the same configuration sections for ceems_api_server
and clusters
, so it is possible to merge config files of the CEEMS API server and CEEMS LB. Each component will read the necessary configuration from the same file.
A valid sample configuration file can be found in the repository.
A sample CEEMS LB config file is shown below:
ceems_lb:
strategy: round-robin
backends:
- id: slurm-0
tsdb:
- web:
url: http://localhost:9090
pyroscope:
- web:
url: http://localhost:4040
- id: slurm-1
tsdb:
- web:
url: http://localhost:9090
- id: slurm-2
pyroscope:
- web:
url: http://localhost:4040
strategy
: Load balancing strategy. Besides the classicalround-robin
andleast-connection
strategies, a customround-robin
strategy is supported. In theround-robin
strategy, the query will be proxied to the TSDB instance that has the data based on the time period in the query.backends
: A list of objects describing each TSDB backend.backends[0].id
: It is important that theid
in the backend must be the sameid
used in the Clusters Configuration. This is how the CEEMS LB will know which cluster to target.backends[0].tsdb
: A list of TSDB servers that scrape metrics from the cluster identified byid
.backends[0].tsdb.web
: Client HTTP configuration of the TSDBbackends[0].tsdb.filter_labels
: A list of labels to filter before sending the response to the client. Useful to filter hypervisor or compute node specific information for OpenStack and k8s clusters.
backends[0].pyroscope
: A list of Pyroscope servers that store profiling data from the cluster identified byid
.backends[0].pyroscope.web
: Client HTTP configuration of Pyroscope
The round-robin
strategy is only supported for TSDB and when used along with Pyroscope, the load balancing strategy for Pyroscope servers will be defaulted to least-connection
.
The CEEMS LB is meant to be deployed in the same DMZ as the TSDB servers and hence, it does not support TLS for the backends.
CEEMS Load Balancer CLI configuration
By default, the CEEMS LB servers listen at ports 9030
and 9040
when both TSDB and Pyroscope backend servers are configured. If intended to use custom ports, the CLI flag --web.listen-address
must be repeated to set up ports for TSDB and Pyroscope backends. For instance, for the sample config shown above, the CLI arguments to launch LB servers at custom ports will be:
ceems_lb --config.file config.yml --web.listen-address ":8000" --web.listen-address ":9000"
This will launch the TSDB load balancer listening at port 8000
and the Pyroscope load balancer listening at port 9000
.
When both TSDB and Pyroscope backend servers are configured, the first listen address is attributed to TSDB and the second one to Pyroscope.
Matching backends.id
with clusters.id
Using custom header
This is the tricky part of the configuration which can be better explained with an example. Consider we are running the CEEMS API server with the following configuration:
ceems_api_server:
data:
path: /var/lib/ceems
update_interval: 15m
clusters:
- id: slurm-0
manager: slurm
updaters:
- tsdb-0
cli:
<omitted for brevity>
- id: slurm-1
manager: slurm
updaters:
- tsdb-1
cli:
<omitted for brevity>
updaters:
- id: tsdb-0
updater: tsdb
web:
url: http://tsdb-0
extra_config:
<omitted for brevity>
- id: tsdb-1
updater: tsdb
web:
url: http://tsdb-1
extra_config:
<omitted for brevity>
Here we are monitoring two SLURM clusters: slurm-0
and slurm-1
. There are two different TSDB servers tsdb-0
and tsdb-1
where tsdb-0
is scraping metrics from slurm-0
and tsdb-1
scraping metrics from only slurm-1
. Assuming tsdb-0
is replicating data onto tsdb-0-replica
and tsdb-1
onto tsdb-1-replica
, we need to use the following config for ceems_lb
:
ceems_lb:
strategy: round-robin
backends:
- id: slurm-0
tsdb:
- web:
url: http://tsdb-0
- web:
url: http://tsdb-0-replica
- id: slurm-1
tsdb:
- web:
url: http://tsdb-1
- web:
url: http://tsdb-1-replica
As metrics data of slurm-0
only exists in either tsdb-0
or tsdb-0-replica
, we need to set backends.id
to slurm-0
for these TSDB backends.
Effectively we will use the CEEMS LB as a Prometheus datasource in
Grafana and while doing so, we need to target the correct cluster.
This is done using a custom header X-Ceems-Cluster-Id
. When
configuring the datasource in Grafana, we need to add X-Ceems-Cluster-Id
to the custom headers section and set the value to the cluster ID.
For instance, for the slurm-0
cluster, the provisioned datasource config for Grafana will look as follows:
- name: CEEMS-TSDB-LB
type: prometheus
access: proxy
url: http://localhost:9030
basicAuth: true
basicAuthUser: ceems
jsonData:
prometheusVersion: 2.51
prometheusType: Prometheus
timeInterval: 30s
incrementalQuerying: true
cacheLevel: Medium
httpHeaderName1: X-Ceems-Cluster-Id
secureJsonData:
basicAuthPassword: <ceems_lb_basic_auth_password>
httpHeaderValue1: slurm-0
assuming the CEEMS LB is running at port 9030 on the same host as Grafana. Similarly, for Pyroscope, the provisioned config must look like:
- name: CEEMS-Pyro-LB
type: pyroscope
access: proxy
url: http://localhost:9040
basicAuth: true
basicAuthUser: ceems
jsonData:
httpHeaderName1: X-Ceems-Cluster-Id
secureJsonData:
basicAuthPassword: <ceems_lb_basic_auth_password>
httpHeaderValue1: slurm-0
Notice that we set the header and value in jsonData
and secureJsonData
, respectively. This ensures that the datasource will send the header with every request to the CEEMS LB, and then the LB will redirect the query request to the correct backend. This allows a single instance of CEEMS to load balance across different clusters.
Using query label
If for any reason, the above strategy does not work for a given deployment, it is also possible to identify target clusters using query labels. However, for this strategy to work, it is needed to inject labels to Prometheus metrics. For example, in the above case, using static_config, we can set a custom label as follows:
- job_name: ceems
static_configs:
- targets:
- compute-0:9100
labels:
ceems_id: slurm-0
The CEEMS LB will read the value of the ceems_id
label and then redirect the query to the appropriate backend.
If both custom header and label ceems_id
are present in the request to the CEEMS LB, the query label will take precedence.
Similarly, for setting up this label on profiling data in Pyroscope, it is necessary to use the external_labels
config parameter for Grafana Alloy when exporting profiles to the Pyroscope server. A sample config for Grafana Alloy that pushes profiling data can be as follows:
pyroscope.write "monitoring" {
endpoint {
url = "http://pyroscope:4040"
}
external_labels = {
"ceems_id" = "slurm-0",
}
}
CEEMS API Server Configuration
This is an optional configuration that, when provided, will enforce access control for the backend TSDBs. A sample config file is given below:
ceems_api_server:
web:
url: http://localhost:9020
web.url
: Address at which the CEEMS API server is running. The CEEMS LB will make a request to the CEEMS API server to verify the ownership of the compute unit before proxying the request to TSDB. All the possible configuration parameters forweb
can be found in the Web Client Configuration Reference.
If both the CEEMS API server and CEEMS LB have access to the CEEMS data path, it is possible to use the ceems_api_server.db.path
as well to query the DB directly instead of making an API request. This will have much lower latency and higher performance.
Clusters Configuration
The same configuration as discussed in the CEEMS API Server's Cluster Configuration can be provided as an optional configuration to verify the backends
configuration. This is not mandatory and if not provided, the CEEMS LB will verify the backend ids
by making an API request to the CEEMS API server.
Example configuration files
As it is clear from the above sections, there is a lot of common configuration between the CEEMS API server and CEEMS LB. Thus, when possible, it is advised to merge two configurations in one file.
Taking one of the examples in the CEEMS API server section, we can add the CEEMS LB config as follows:
ceems_api_server:
data:
path: /var/lib/ceems
update_interval: 15m
admin:
users:
- adm1
web:
url: http://localhost:9020
clusters:
- id: slurm-0
manager: slurm
updaters:
- tsdb-0
cli:
<omitted for brevity>
- id: os-0
manager: openstack
updaters:
- tsdb-1
web:
<omitted for brevity>
updaters:
- id: tsdb-0
updater: tsdb
web:
url: http://tsdb-0
extra_config:
<omitted for brevity>
- id: tsdb-1
updater: tsdb
web:
url: http://tsdb-1
extra_config:
<omitted for brevity>
ceems_lb:
strategy: round-robin
backends:
- id: slurm-0
tsdb:
- web:
url: http://tsdb-0
- web:
url: http://tsdb-0-replica
- id: os-0
tsdb:
- web:
url: http://tsdb-1
- web:
url: http://tsdb-1-replica
This config assumes tsdb-0
is replicating data to tsdb-0-replica
, tsdb-1
to tsdb-1-replica
, and the CEEMS API server is running on port 9020
on the same host as the CEEMS LB.