HPC Platforms
This document outlines reference deployments of CEEMS alongside Prometheus and Grafana on typical HPC platforms.
Using Thanos
Key Features
- Thanos is used for data replication and long-term storage of TSDB data
- Litestream is used to replicate and create snapshots of the CEEMS API server SQLite database
- Minio is used to store TSDB data on typical parallel file systems like Lustre, Spectrum Scale, BGFS, etc. in HPC platforms
- CEEMS load balancer is used to enforce access control on TSDB backends
Using Prometheus' remote write
Key Features
- A second instance of Prometheus with its remote write protocol is used for data replication and long-term storage
- Litestream is used to replicate and create snapshots of the CEEMS API server SQLite database
- CEEMS load balancer is used to enforce access control on TSDB backends