CEEMS Load Balancer
CEEMS load balancer supports providing load balancer for TSDB and Pyroscope servers. When both TSDB and Pyroscope backend servers are configured, CEEMS LB will launch two different web servers listening at two different ports one for TSDB and one for Pyroscope.
CEEMS Load Balancer Configuration
CEEMS Load Balancer configuration has one main section and two optional section. A basic skeleton of the configuration is as follows:
# CEEMS Load Balancer configuration skeleton
ceems_lb: <CEEMS LB CONFIG>
# Optional section
ceems_api_server: <CEEMS API SERVER CONFIG>
# Optional section
clusters: <CLUSTERS CONFIG>
CEEMS LB uses the same configuration section of ceems_api_server
and
clusters
and hence, it is possible to merge config files of CEEMS
API server and CEEMS LB. Each component will read the necessary config
from the same file.
A valid sample configuration file can be found in the repo.
A sample CEEMS LB config file is shown below:
ceems_lb:
strategy: resource-based
backends:
- id: slurm-0
tsdb_urls:
- http://localhost:9090
pyroscope_urls:
- http://localhost:4040
- id: slurm-1
tsdb_urls:
- http://localhost:9090
- id: slurm-2
pyroscope_urls:
- http://localhost:4040
strategy
: Load balancing strategy. Besides classicalround-robin
andleast-connection
strategies, a customresource-based
strategy is supported. In theresource-based
strategy, the query will be proxied to the TSDB instance that has the data based on the time period in the query.backends
: A list of objects describing each TSDB backend.backends.id
: It is important that theid
in the backend must be the sameid
used in the Clusters Configuration. This is how CEEMS LB will know which cluster to target.backends.tsdb_urls
: A list of TSDB servers that scrape metrics from the cluster identified byid
.backends.pyroscope_urls
: A list of Pyroscope servers that store profiling data from the cluster identified byid
.
resource-based
strategy is only supported for TSDB and when used along with
Pyroscope, the load balancing strategy for Pyroscope servers will be defaulted
to least-connection
.
CEEMS LB is meant to deploy in the same DMZ as the TSDB servers and hence, it does not support TLS for the backends.
CEEMS Load Balancer CLI configuration
By default CEEMS LB servers listen at ports 9030
and 9040
when both
TSDB and Pyroscope backend servers are configured. If intended to use
custom ports, the CLI flag --web.listen-address
must be repeated to set up
port for TSDB and Pyroscope backends. For instance, for the sample config shown
above, the CLI arguments to launch LB servers at custom ports will be:
ceems_lb --config.file config.yml --web.listen-address ":8000" --web.listen-address ":9000"
This will launch TSDB load balancer listening at port 8000
and Pyroscope load
balancer listening at port 9000
.
When both TSDB and Pyroscope backend servers are configured, the first listen address is attributed to TSDB and second one to Pyroscope.
Matching backends.id
with clusters.id
Using custom header
This is the tricky part of the configuration which can be better explained with an example. Consider we are running CEEMS API server with the following configuration:
ceems_api_server:
data:
path: /var/lib/ceems
update_interval: 15m
clusters:
- id: slurm-0
manager: slurm
updaters:
- tsdb-0
cli:
<omitted for brevity>
- id: slurm-1
manager: slurm
updaters:
- tsdb-1
cli:
<omitted for brevity>
updaters:
- id: tsdb-0
updater: tsdb
web:
url: http://tsdb-0
extra_config:
<omitted for brevity>
- id: tsdb-1
updater: tsdb
web:
url: http://tsdb-1
extra_config:
<omitted for brevity>
Here are we monitoring two SLURM clusters: slurm-0
and slurm-1
.
There are two different TSDB servers tsdb-0
and tsdb-1
where tsdb-0
is scrapping metrics from slurm-0
and tsdb-1
scrapping metrics from only slurm-1
. Assuming
tsdb-0
is replicating data onto tsdb-0-replica
and tsdb-1
onto tsdb-1-replica
, we need to use the following config for
ceems_lb
ceems_lb:
strategy: resource-based
backends:
- id: slurm-0
tsdb_urls:
- http://tsdb-0
- http://tsdb-0-replica
- id: slurm-1
tsdb_urls:
- http://tsdb-1
- http://tsdb-1-replica
As metrics data of slurm-0
only exists in either tsdb-0
or
tsdb-0-replica
, we need to set backends.id
to slurm-0
for
these TSDB backends.
Effectively we will use CEEMS LB as a Prometheus datasource in
Grafana and while doing so, we need to target correct cluster.
This is done using a custom header X-Ceems-Cluster-Id
. When
configuring the datasource in Grafana, we need to add X-Ceems-Cluster-Id
to the custom headers section and set the value to cluster ID.
For instance, for slurm-0
cluster the provisioned datasource
config for Grafana will look as follows:
- name: CEEMS-TSDB-LB
type: prometheus
access: proxy
url: http://localhost:9030
basicAuth: true
basicAuthUser: ceems
jsonData:
prometheusVersion: 2.51
prometheusType: Prometheus
timeInterval: 30s
incrementalQuerying: true
cacheLevel: Medium
httpHeaderName1: X-Ceems-Cluster-Id
secureJsonData:
basicAuthPassword: <ceems_lb_basic_auth_password>
httpHeaderValue1: slurm-0
assuming CEEMS LB is running at port 9030 on the same host as Grafana. Similarly, for Pyroscope the provisioned config must look like:
- name: CEEMS-Pyro-LB
type: pyroscope
access: proxy
url: http://localhost:9040
basicAuth: true
basicAuthUser: ceems
jsonData:
httpHeaderName1: X-Ceems-Cluster-Id
secureJsonData:
basicAuthPassword: <ceems_lb_basic_auth_password>
httpHeaderValue1: slurm-0
Notice that we set the header and value in jsonData
and secureJsonData
,
respectively. This ensures that datasource will send the header with
every request to CEEMS LB and then LB will redirect the query request
to correct backend. This allows a single instance
of CEEMS to load balance across different clusters.
Using query label
If for any reason, the above strategy does not work for a given deployment, it is also possible to identify target clusters using query labels. However, for this strategy to work, it is needed to inject labels to Prometheus metrics. For example in the above case, using static_config we can set a custom label as follows:
- job_name: ceems
static_configs:
- targets:
- compute-0:9100
labels:
ceems_id: slurm-0
CEEMS LB will read value of ceems_id
label and then redirects the query
to the appropriate backend.
If both custom header and label ceems_id
are present in the request to
CEEMS LB, the query label will take the precedence.
Similarly for setting up this label on profiling data in Pyroscope,
it is necessary to use external_labels
config parameter for Grafana
Alloy when exporting profiles to Pyroscope server. A sample config
for Grafana Alloy that pushes profiling data can be as follows:
pyroscope.write "monitoring" {
endpoint {
url = "http://pyroscope:4040"
}
external_labels = {
"ceems_id" = "slurm-0",
}
}
CEEMS API Server Configuration
This is an optional config when provided will enforce access control for the backend TSDBs. A sample config file is given below:
ceems_api_server:
web:
url: http://localhost:9020
web.url
: Address at which CEEMS API server is running. CEEMS LB will make a request to CEEMS API request to verify the ownership of the comput unit before proxying request to TSDB. All the possible configuration parameters forweb
can be found in Web Client Configuration Reference.
If both CEEMS API server and CEEMS LB has access to CEEMS data path,
it is possible to use the ceems_api_server.db.path
as well to
query the DB directly instead of making an API request. This will have
much lower latency and higher performance.
Clusters Configuration
Same configuration as discussed in
CEEMS API Server's Cluster Configuration
can be provided as an optional configuration to verify the backends
configuration.
This is not mandatory and if not provided, CEEMS LB will verify the backend
ids
by making an API request to CEEMS API server.
Example configuration files
As it is clear from above sections, there is a lot of common configuration between CEEMS API server and CEEMS LB. Thus, when it is possible, it is advised to merge two configurations in one file.
Taking one of the examples in CEEMS API server section, we can add CEEMS LB config as follows:
ceems_api_server:
data:
path: /var/lib/ceems
update_interval: 15m
admin:
users:
- adm1
web:
url: http://localhost:9020
requests_limit: 30
clusters:
- id: slurm-0
manager: slurm
updaters:
- tsdb-0
cli:
<omitted for brevity>
- id: os-0
manager: openstack
updaters:
- tsdb-1
web:
<omitted for brevity>
updaters:
- id: tsdb-0
updater: tsdb
web:
url: http://tsdb-0
extra_config:
<omitted for brevity>
- id: tsdb-1
updater: tsdb
web:
url: http://tsdb-1
extra_config:
<omitted for brevity>
ceems_lb:
strategy: resource-based
backends:
- id: slurm-0
tsdb_urls:
- http://tsdb-0
- http://tsdb-0-replica
- id: os-0
tsdb_urls:
- http://tsdb-1
- http://tsdb-1-replica
This config assumes tsdb-0
is replicating data to tsdb-0-replica
,
tsdb-1
to tsbd-1-replica
and CEEMS API server is running on
port 9020
on the same host as CEEMS LB.