Scale Down the Prometheus on Openshift Monitoring

Openshift 上對 Prometheus 進行 Scale Down 的方法

這篇文章將探討在 Openshift 上如何有效地對 Prometheus 進行縮減。

環境描述

客戶希望可以修改 Openshift 4.12 上 Prometheus 的 Replicas 數量為 0,希望我可以研究。

研究結果

目前 Openshift 4.12 沒有提供修改 Prometheus 數量的方法。

原本以為修改 Prometheus CRD 就可以更改 Replicas 的數量,但是修改後過沒多久就會被 Operator 改回來。以下是我找到的資料:

曾經在3.11可以調整replicas的設定,但後來消失了:

https://github.com/openshift/cluster-monitoring-operator/pull/330/commits/53453691c9d47f5c621a9d538aba12431541a2c8

我在 cluster-monitoring-operator 的設定上也沒找到參數可以調整,測試 replicas 也無法:

https://github.com/openshift/cluster-monitoring-operator/blob/master/Documentation/api.md#prometheusk8sconfig

目前我找到這個 issue,在 2020 年 8 月還不支援

https://github.com/openshift/cluster-monitoring-operator/issues/896

解決方法

現在這是我看過最暴力的做法,先透過修改 clusterversion/version ,將 openshift-monitoring 設定為 unmanaged,之後再 scale down 所有 deployment ,親測有效,但很暴力:

https://gist.github.com/waynedovey/cbf23d0a9c798c8de68b5f2043ba945b

oc patch clusterversion/version --type='merge' -p "$(cat <<- EOF
spec:
  overrides:
  - group: apps/v1
    kind: Deployment
    name: cluster-monitoring-operator
    namespace: openshift-monitoring
    unmanaged: true
EOF
)"

oc patch prometheus/k8s -n openshift-monitoring --type='merge' -p "$(cat <<- EOF
spec:
  replicas: 0
EOF
)"

oc patch alertmanagers/main -n openshift-monitoring --type='merge' -p "$(cat <<- EOF
spec:
  replicas: 0
EOF
)"

oc scale --replicas=0 deploy/cluster-monitoring-operator -n openshift-monitoring
oc scale --replicas=0 deployment.apps/prometheus-adapter -n openshift-monitoring
oc scale --replicas=0 deployment.apps/thanos-querier -n openshift-monitoring
oc scale --replicas=0 deployment.apps/grafana -n openshift-monitoring
oc scale --replicas=0 deployment.apps/kube-state-metrics  -n openshift-monitoring
oc scale --replicas=0 deployment.apps/openshift-state-metrics  -n openshift-monitoring
oc scale --replicas=0 deployment.apps/prometheus-adapter -n openshift-monitoring
oc scale --replicas=0 deployment.apps/telemeter-client   -n openshift-monitoring
oc scale --replicas=0 deployment.apps/prometheus-operator   -n openshift-monitoring
oc scale --replicas=0 statefulset.apps/alertmanager-main -n openshift-monitoring
oc delete DaemonSet node-exporter -n openshift-monitoring

Reference