１日でAKS上のNGINX、Redisデモアプリ監視をPrometheus+Grafanaで検証する

最近マドリードにてk8sの監視する方法を検証して、提案しようとしている。何故か、私がかかわるk8sを使用したプロジェクトは大体監視を適切に設定できていない。

急いでデモを準備する必要があり、OSSで監視したいって言うならPrometheus一択でしょうということで手元の環境で準備を始めた。

デモ環境
Prometheusのインストール by helm
Prometheusのtargets確認
Prometheusの監視メトリクス拡張
Grafanaダッシュボードカスタム
- 使ってみてよかったダッシュボード
できていないこと

（最終的にはトレーシングもしたいということでIstioの導入に切り替えたのでそれは別記事にします）

デモ環境

https://docs.microsoft.com/ja-jp/azure/aks/kubernetes-walkthrough

そもそもk8s環境を作るのが初めてでどうしようかと思ったが、提案先はAKS-engin on VMということで、AKSで始めることにした。

f:id:kashionki38:20200229212401p:plain
AKSでこんなデモの投票アプリを作成できるということで、これの監視をPrometheusで設定することにした。

諸々デモの手順どおりに進めるとこのようなpodが生成される。

$ kubectl get po
NAME READY STATUS RESTARTS AGE
azure-vote-back-679f7b955f-pwdpd 3/3 Running 0 2d1h
azure-vote-front-b47b4fbf8-4c8rk 3/3 Running 1 27h

azure-vote-frontにはnginxとPythonのflaskが入っていて、azure-vote-backにはRedisが入っている２層構成のようだ。

Prometheusのインストール by helm

helmによるインストールが楽ということでprometheus-operatorというchartsの導入でやってみることにする。 github.com

以下参考にしつつ進めてみる。 qiita.com

インストール自体はhelm installで終了。

$ helm install pg-op stable/prometheus-operator

何が今入っているのか確認。

$ kubectl get all
NAME                                                        READY   STATUS    RESTARTS   AGE
pod/alertmanager-pg-op-prometheus-operator-alertmanager-0   2/2     Running   0          10m
pod/azure-vote-back-5966fd4fd4-d87zv                        1/1     Running   0          94m
pod/azure-vote-front-67fc95647d-sgr42                       1/1     Running   0          94m
pod/pg-op-grafana-5b75f465d7-wq9rf                          2/2     Running   0          11m
pod/pg-op-kube-state-metrics-5fc85698d4-pjmzr               1/1     Running   0          11m
pod/pg-op-prometheus-node-exporter-2r6p8                    1/1     Running   0          11m
pod/pg-op-prometheus-node-exporter-cm2x8                    1/1     Running   0          11m
pod/pg-op-prometheus-node-exporter-fjfd2                    1/1     Running   0          11m
pod/pg-op-prometheus-operator-operator-7c7cb98579-xhlgt     2/2     Running   0          11m
pod/prometheus-pg-op-prometheus-operator-prometheus-0       3/3     Running   1          10m

NAME                                             TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
service/alertmanager-operated                    ClusterIP      None           <none>         9093/TCP,9094/TCP,9094/UDP   10m
service/azure-vote-back                          ClusterIP      10.0.120.31    <none>         6379/TCP                     94m
service/azure-vote-front                         LoadBalancer   10.0.176.255   51.138.50.33   80:32737/TCP                 94m
service/kubernetes                               ClusterIP      10.0.0.1       <none>         443/TCP                      124m
service/pg-op-grafana                            ClusterIP      10.0.81.187    <none>         80/TCP                       11m
service/pg-op-kube-state-metrics                 ClusterIP      10.0.7.118     <none>         8080/TCP                     11m
service/pg-op-prometheus-node-exporter           ClusterIP      10.0.3.136     <none>         9100/TCP                     11m
service/pg-op-prometheus-operator-alertmanager   ClusterIP      10.0.7.11      <none>         9093/TCP                     11m
service/pg-op-prometheus-operator-operator       ClusterIP      10.0.67.126    <none>         8080/TCP,443/TCP             11m
service/pg-op-prometheus-operator-prometheus     ClusterIP      10.0.67.201    <none>         9090/TCP                     11m
service/prometheus-operated                      ClusterIP      None           <none>         9090/TCP                     10m

NAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/pg-op-prometheus-node-exporter   3         3         3       3            3           <none>          11m

NAME                                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/azure-vote-back                      1/1     1            1           94m
deployment.apps/azure-vote-front                     1/1     1            1           94m
deployment.apps/pg-op-grafana                        1/1     1            1           11m
deployment.apps/pg-op-kube-state-metrics             1/1     1            1           11m
deployment.apps/pg-op-prometheus-operator-operator   1/1     1            1           11m

NAME                                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/azure-vote-back-5966fd4fd4                      1         1         1       94m
replicaset.apps/azure-vote-front-67fc95647d                     1         1         1       94m
replicaset.apps/pg-op-grafana-5b75f465d7                        1         1         1       11m
replicaset.apps/pg-op-kube-state-metrics-5fc85698d4             1         1         1       11m
replicaset.apps/pg-op-prometheus-operator-operator-7c7cb98579   1         1         1       11m

NAME                                                                   READY   AGE
statefulset.apps/alertmanager-pg-op-prometheus-operator-alertmanager   1/1     10m
statefulset.apps/prometheus-pg-op-prometheus-operator-prometheus       1/1     10m

なんかめちゃめちゃ入った。。。namespaceもdefaultでわけれてないけど気にせず進める。

Prometheusのtargets確認

PrometheusはPull型アーキテクチャということで、各監視対象に対するexporterからPullしてくる必要がある。
何をPullできているのかをtargetsにて確認可能。
今はPrometheusを外部公開していないので、ローカルポートをフォワードしてアクセスする。

$ kubectl port-forward $(kubectl get pod -l app=prometheus -o template --template "{{(index .items 0).metadata.name}}") 9090:9090

これで、http://localhost:9090でアクセス可能。 f:id:kashionki38:20200229215647p:plain

ここで監視したい項目がちゃんとあるかを確認する。
私として充足したいのは以下。

nodeのリソースメトリクス

node exporterで取る。これはちゃんとprometheus-operatorが入れば自動的に有効化されているはず。

pod/containerのリソースメトリクス

kubelet exporterでcadvisorから取るみたい。Qiitaの記事にあるように、kubelet exporterのexportが導入開始時はうまく行っていない。以下手順でhttps->httpに変更することでkubelet exporterも監視できるようになった。

https://qiita.com/nmatsui/items/6d8319f3216bd8786eb9

kubelet exporterが利用するportをhttpsからhttpに変更
デフォルトでは、Azure AKSではhttpsでのexportがうまくいかないようです。kubeletsの状態をprometheusにexportするポートをhttpsからhttpに変更します。
$ kubectl get servicemonitors pg-exporter-kubelets --namespace monitoring -o yaml | sed 's/https/http/' | kubectl replace -f -
https://github.com/coreos/prometheus-operator/issues/926 を参照

各MW(nginx, redis)のメトリクス

もちろんDefaultで入っていないので別途exporterの導入とscrape（PrometheusのPull向き先設定）をする必要があるので、これから書いていく。

Prometheusの監視メトリクス拡張

nginx exporterの導入

[https://github.com/nginxinc/nginx-prometheus-exporter:embed:cite] exporterは選択肢が色々あるが、取り急ぎこのexporterを使ってみる。nginxのコネクション数が監視可能。
他のexporterではアクセスログの監視もmtailでやってたりするのでそっちを今度は使ってみたいが。

事前準備

For NGINX, expose the stub_status page at /stub_status on port 8080.

git hubにあるようにnginxのstub statusを有効化を事前有効化する必要がある。httpでコネクション数が取れるやつ。
nginx.confを変更する必要がある。

containerないしはkubernetesでどうやってconfを変更するかは恐らくプロジェクトによってやり方まちまちなのだと思うが、今回はConfigMapを使ってfileをVolumeにマウントする方法にした。

nginx.confの準備

nginx.conf

http {
（中略）
    server {
        location /stub_status {
            stub_status on;
        }
    }
}

まずはstub_statusの設定を追加したnginx.confを用意。

ConfigMapを使ったnginx.confのcontainerへのマウント

$ kubectl create configmap nginx-config --from-file nginx.conf

configmapを作る。--from-fileで、ファイル名がそのままkeyになりvalueがngixn.confのファイルの中身になる。

$ kubectl get configmap nginx-confg -o yaml

data:
  nginx.conf: |-
（中略）
    http {
（中略）
        server {
            listen 8080;
            location /stub_status {
                stub_status on;
                allow 127.0.0.1;
                deny all;
            }
        }
（中略）

作った.spec.template.spec.volumesでconfigmapをvolumeに追加し、volumeMountsで/etc/nginx/nginx.confにマウントする。

$ kubectl get deploy azure-vote-front -o yaml

（中略）
spec:
（中略）
  template:
（中略）
    spec:
      containers:
（中略）
        name: azure-vote-front
（中略）
        volumeMounts:
        - mountPath: /etc/nginx/nginx.conf
          name: nginx-config
          subPath: nginx.conf
（中略）
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: nginx.conf
            path: nginx.conf
          name: nginx-config
        name: nginx-config

これで完了。以下でnginx.confがちゃんと反映されてるか確認する。

$ kubectl exec -it azure-vote-front-67fc95647d-sgr42 //bin/cat //etc/nginx/conf/nginx.conf

confimapはデフォルトでは動的に反映されるようになるはずだが、いつまでも反映されなければpod消してデプロイしなおす。

反映されていたら以下でstub statusが設定できているか確認。（curlがないのでwgetしてる。）

$ kubectl exec -it azure-vote-front-67fc95647d-sgr42 //bin/sh
wget http://127.0.0.1/stub_status -qO -

nginx exporterをサイドカー配置

nginx exporterはpod内の別コンテナにサイドカーとして配置する。
引数として-nginx.scrape-uri http://<nginx>:8080/stub_statusを渡す必要があるっぽいので.spec.template.spec.containers[1].argsに指定してみる。

$ kubectl get deploy azure-vote-front -o yaml

（中略）
spec:
（中略）
  template:
（中略）
    spec:
      containers:
（中略）
      - args:
        - -nginx.scrape-uri
        - http://127.0.0.1:8080/stub_status
        env:
        - name: name
          value: nginx-prom-exp
        image: nginx/nginx-prometheus-exporter:0.6.0
        imagePullPolicy: IfNotPresent
        name: nginx-prom-exp
        ports:
        - containerPort: 9113
          protocol: TCP
        resources:
          limits:
            cpu: 250m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

これでサイドカーも配置完了。外部podからexporterのポート9113にアクセスできるようにserviceでポートを開ける。 kubectl get service azure-vote-front -o yaml

（中略）
spec:
（中略）
  ports:
  - name: web
    nodePort: 32737
    port: 80
    protocol: TCP
    targetPort: 80
  - name: nginx-prom-exp
    nodePort: 31217
    port: 9113
    protocol: TCP
    targetPort: 9113

ここまででexporterの設定はすべて完了。以下でprometheusのpodからazure-vote-frontへPullしてnginxメトリクスが取れるか確認。

azure-vote-frontのcluster-ipを確認。

$ kubectl get service
NAME                                     TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                       AGE
azure-vote-front                         LoadBalancer   10.0.176.255       80:32737/TCP,9113:31217/TCP   6d1h

確認したcluster-ipに対してwgetしてメトリクスがpullできてるか確認する。

$ kubectl exec -it prometheus-pg-op-prometheus-operator-prometheus-0 -c prometheus //bin/sh
$ wget http://10.0.176.255:9113/metrics -qO -

Redis exporterの導入

github.com Redisのexporterはこれを使ってみる。

サイドカー配置。
kubectl get deploy azure-vote-back -o yaml

      - image: oliver006/redis_exporter:latest
        imagePullPolicy: Always
        name: redis-exporter
        ports:
        - containerPort: 9121
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

ポート開放。
kubectl get service azure-vote-back -o yaml

spec:
  clusterIP: 10.0.120.31
  ports:
  - name: redis
    port: 6379
    protocol: TCP
    targetPort: 6379
  - name: redis-exp
    port: 9121
    protocol: TCP
    targetPort: 9121

これでexporterは完了。

同様にPrometheusからpullできるか確認。

azure-vote-frontのcluster-ipを確認。

$ kubectl get service
NAME                                     TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                       AGE
azure-vote-back                          ClusterIP      10.0.120.31    <none>          6379/TCP,9121/TCP             6d1h

確認したcluster-ipに対してwgetしてメトリクスがpullできてるか確認する。

$ kubectl exec -it prometheus-pg-op-prometheus-operator-prometheus-0 -c prometheus //bin/sh
/prometheus $ wget http://10.0.120.31:9121/metrics -qO -

Proetheusのscrape configの設定

exporterを設定できてもまだやることがある。Prometheusはpull型なので、exporterのアドレスをPrometheus自身が把握している必要がある。
指定の方法は静的と動的（サービスディスカバリ）と２種類あり、サービスディスカバリの設定方法はこちらのサイトがわかりやすかった。
christina04.hatenablog.com

今回はとりあえず早く実装したかったので、静的な実装で進めた。
Prometheusのscrape_configsで設定するようだが、prometheus-operatorにおいてどうやって追加するのかわからず詰まった。。

最終的に以下のページを参照して、secretを作成しカスタムリソースのprometheusをのなかでadditionalScrapeConfigsとして読み込むことで実装できた。 github.com

secretを作る。

$ kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml

prometheus-additional.yaml

- job_name: custome/nginx-exporter/0
  static_configs:
    - targets:
      - 10.0.176.255:9113
- job_name: redis_exporter
  static_configs:
  - targets: 
      - '10.0.120.31:9121'

カスタムリソースprometheusの編集。secretをadditionalScrapeConfigsとして参照するようにする。 kubectl get prometheus pg-op-prometheus-operator-prometheus -o yaml

（中略）
spec:
  additionalScrapeConfigs:
    key: prometheus-additional.yaml
    name: additional-scrape-configs

ここまででPrometheusの設定と各exporterの設定は完了しているはず。
http://localhost:9090/targetsでcustome/nginx-exporter/0とredis_exporterがいるか確認。
f:id:kashionki38:20200301001559p:plain f:id:kashionki38:20200301001615p:plain

http://localhost:9090/graphでqueryにnginxとかredisとか打ってみて引っかかったら完全にOK。感動。 f:id:kashionki38:20200301001748p:plain f:id:kashionki38:20200301001810p:plain

Grafanaダッシュボードカスタム

Prometheusだけだと可視性や操作性が悪いのでGrafanaで見る。
prometheus-operatorにGrafanaダッシュボードはデフォルトでいくつか含まれているが、複数podを一緒に見れなかったり、追加したexporterのダッシュボードはなかったりでカスタムが必要。
とはいえ、自分で高度なものを作るのもめんどくさいので公開されているテンプレートをimportする。めちゃくちゃかっこいいダッシュボードで自分の環境を見れることに感動する。
f:id:kashionki38:20200301002502p:plain

使ってみてよかったダッシュボード

nginxはコネクションしか取れてなくシンプルなので自分で作った。
f:id:kashionki38:20200301002541p:plain

ここまでで、AKS上のazure-voteデモアプリのnode、pod/container、nginx、redisのリソースを監視できるようになった。

できていないこと

Grafanaの外部公開（やったけど少し力尽きた。かつセキュアな観点が抜けてるので書かない）
Prometheusデータ永続化（今はPrometheusが死ぬとデータも飛ぶ。）
REDメトリクス監視（アクセスログでレスポンスタイムとスループットは可視化したかったけどまだできてない。Istio入れてみたらいったんこれでいいじゃんとなった。）

*1:ちなみにWindowsからkubectlを実行する際に最初はgit bush使っていたが、Windows10ならWSLでUbuntuをコンソールで実行できるのでそっちがおすすめ。
watchとかgit bushに入ってないコマンドももちろん使えるし、別記事ではあるけどistioctlの導入も簡単。
f:id:kashionki38:20200229213421p:plain

*2:ちなみにkubectl get allでは本当に全リソースは出力できない。
以下で全リソース出力可能。
kubectl get "$(kubectl api-resources --namespaced=true --verbs=list -o name | tr "\n" "," | sed -e 's/,$//')"
https://github.com/superbrothers/text.superbrothers.dev/blob/master/content/190616-kubectl-get-all-does-not-include-most-resources.md