Deployment¶
Service Deployment¶
The services on the virtual machine can be deployed using a single docker-compose file:
There are configuration parameters for all the services for production and develoment:
Production¶
### Global
# Host where this instance is hosted
HOST=scopem-openem.ethz.ch
ENVIRONMENT=prod
# Certificate
CERTIFICATE_FILE=.certs/cert_bundle.pem
# Private Key
CERTIFICATE_KEY_FILE=.certs/cert.key
### Identity Provider / Broker (Keycloak)
IDP_URL=https://kc.psi.ch
IDP_USERNAME=scopem-archiver-service
IDP_REALM=awi
IDP_AUDIENCE=account
IDP_CLIENT_ID=scopem-archiver-service-api
IDP_CLIENT_SECRET_FILE=./.secrets/idpclientsecret_prod.txt
IDP_PASSWORD_FILE=./.secrets/idppassword_prod.txt
### Archiver Service API
# Image used for backend service
OPENEM_BACKEND_IMAGE_NAME=ghcr.io/swissopenem/scopemarchiver-archiver-service-api
OPENEM_IMAGE_TAG=latest
# Archiver Root Folder
ARCHIVER_SCRATCH_FOLDER=/storage
# Backend server api root path
API_ROOT_PATH=/archiver/api/v1
#### Minio
S3_REGION="eu-west-1"
S3_ENDPOINT="sp109.ethz.ch:18000"
S3_EXTERNAL_ENDPOINT=scopem-openem.ethz.ch
S3_TOTAL_LANDING_SPACE_TB=100
#### PREFECT
# Prefect version used in all images
PREFECT_VERSION=3.7.2-python3.13
# Logging level
PREFECT_LOGGING_LEVEL=INFO
# Image name for containers used to execute flows
PREFECT_RUNTIME_IMAGE_NAME=ghcr.io/swissopenem/scopemarchiver-archiver-service-workflow
# Image name for configuration container
PREFECT_CONFIG_IMAGE_NAME=ghcr.io/swissopenem/scopemarchiver-archiver-service-config
# Working directory of archiver
PREFECT_ARCHIVER_HOST_SCRATCH=/mnt/openemdata/scratch
# Production Prefect job template
PREFECT_JOB_TEMPLATE=prefect-jobtemplate-prod.json
# Workpool name for archiver jobs
PREFECT_ARCHIVAL_WORKPOOL_NAME=archival-docker-workpool
# Workpool name for retrieval jobs
PREFECT_RETRIEVAL_WORKPOOL_NAME=retrieval-docker-workpool
PREFECT_VARS_FILE=../backend/prefect/vars_prod.toml
### Authentik
# Use `AUTH_MIDDLEWARE=authentik` to protect access to dashboards
AUTH_MIDDLEWARE=authentik
AUTHENTIK_HOST=https://authentik.ethz.ch
# Check whether the Authentik infrastructure uses a self-signed certificate (true) or not
AUTHENTIK_INSECURE=true
### Scicat
SCICAT_ENDPOINT=https://dacat.psi.ch
SCICAT_API_PREFIX=/api/v3
SCICAT_USER_FILE=.secrets/scicatuser_prod.txt
SCICAT_PASSWORD_FILE=.secrets/scicatpass_prod.txt
SCICAT_INGESTOR_GROUPS=ethz-scopem;ethz-scopem-ops
Development¶
For development, it is useful to override some configuration:
Note: The
lts-mock-volumeis a local volume here and not the LTS share.
Prefect Deployment¶
Prefect is set up in a slightly non-standard way (with respect to their described use cases). There are two workers deployed (archival/retrieval) that mount the hosts Docker socket in order to create containers at runtime in which the flows run. The flows are baked into the containers and the code is not pulled from any repository (Prefect would allow to, for example, store the code in an S3 bucket). The ETHZ LTS volume is mounted in a Docker volume such that the runtime containers can mount those during startup.
| Name | Technology | Description | Endpoint |
|---|---|---|---|
| Prefect Server | Workflow orchestration https://www.prefect.io | http://localhost/prefect-ui/dashboard | |
| Postgres Database | Database for Prefect | n/a | |
| Prefect Worker | https://docs.prefect.io/3.0/deploy/infrastructure-concepts/workers | n/a | |
| Runtime Container | runtime.Dockerfile | n/a |
Prefect Server¶
In order to run Prefect server, variables, secrets and concurrency limits need to be configured.
Configuration¶
All of the configuration can be done by running
with the appropriate PREFECT_API_URL set.
Variables¶
Variables are used at runtime and are fetched from the server by the flow. External endpoints and other parameters of the flow belong here:
Concurrrency Limits¶
There are certain sections of the code (tasks) that can only run in a limited manner concurrently (i.e. writing to the LTS), see https://docs.prefect.io/3.0/develop/task-run-limits#limit-concurrent-task-runs-with-tags.
Internal Secrets¶
Internal secrets can be created at deployment time.
# Postgres
echo "postgres_user" > .secrets/postgresuser.txt
openssl rand -base64 12 > .secrets/postgrespass.txt # creates random string
External Secrets¶
The Minio deployment might already provide its own secrets and can be added manually in the UI too.
# Minio
echo "minioadminuser" > .secrets/miniouser.txt
openssl rand -base64 12 > .secrets/miniopass.txt # creates random string
// TODO: Needed ?
# Github
echo "<github_user>" > .secrets/githubuser.txt
echo "<github_access_token>" > .secrets/githubpass.txt
| Name | Description |
|---|---|
| github-openem-username | Username for Github container registry |
| github-openem-access-token | Personal access token to Github container registry |
Prefect Worker¶
Workers can only be deployed on a machine that has access to
- the Prefect server (no authentication implemented in Prefect for on-premise deployement currently)
- the S3 storage
- the ETHZ LTS share (ip whitelisting within ETHZ network)
They can be started by the following command:
docker compose --env-file .env --env-file .development.env up -d prefect-archival-worker
docker compose --env-file .env --env-file .development.env up -d prefect-retrieval-worker
Note: due to a bug in Prefect the workers concurrency limit needs to be set manually in the UI.
Flows¶
The flows can be deployed using a container:
This deploys the flows as defined in the prefect.yaml and requires the secrets set up in the previous step.