Health Monitoring

1. Overview

Monitoring Alurkerja mencakup dua kategori:

API Service — 12 service yang masing-masing memiliki endpoint POST /public/healthz, dimonitor setiap 60 detik dengan expected status 200.
Infrastruktur — komponen pendukung (database, queue, storage, SSL, disk/memory) dengan metode dan interval monitor yang berbeda-beda.

2. Referensi Monitor

Gunakan tabel berikut sebagai acuan saat mengkonfigurasi monitor di tools manapun.

2.1 API Service (12 Service)

Berikut adalah 12 service beserta endpoint /public/healthz masing-masing:

No	Service	Base Path	Health Endpoint
1	authentication	`/api/v1/authentication`	`/api/v1/authentication/public/healthz`
2	bpm	`/api/v1/probis`	`/api/v1/probis/public/healthz`
3	company_profile	`/api/v1/compro`	`/api/v1/compro/public/healthz`
4	generate_test	`/api/v1/generate-test`	`/api/v1/generate-test/public/healthz`
5	integration	`/api/v1/integration`	`/api/v1/integration/public/healthz`
6	migration	`/api/v1/migration`	`/api/v1/migration/public/healthz`
7	notification	`/api/v1/notif`	`/api/v1/notif/public/healthz`
8	proxy	`/api/v1/proxy`	`/api/v1/proxy/public/healthz`
9	report	`/api/v1/report`	`/api/v1/report/public/healthz`
10	simulation	`/api/v1/simulation`	`/api/v1/simulation/public/healthz`
11	tenant_management	`/api/v1/tenant`	`/api/v1/tenant/public/healthz`
12	camunda tasklist	`/api/v1/tasklist`	`/api/v1/tasklist/public/healthz`

2.2 Komponen Infrastruktur

Komponen	Tipe Monitor	Method	Interval	Threshold Alert	Target
Database (PostgreSQL)	TCP Port	—	60 detik	Connection refused	`your-db-host:5432`
Queue — Redis	TCP Port	—	60 detik	Connection refused	`your-redis-host:6379`
Queue — RabbitMQ	TCP Port	—	60 detik	Connection refused	`your-rabbitmq-host:5672`
File Storage	HTTP(s)	`GET`	120 detik	Status ≠ 200	`https://your-storage-host/public/healthz`
SSL Certificate	SSL/TLS	`GET`	Harian	Expire < 30 hari	`https://your-domain.com`
Disk & Memory	Push (Cron)	—	5 menit	Disk > 85%, RAM > 90%	Push ke Uptime Kuma

3. Uptime Kuma

Uptime Kuma adalah self-hosted monitoring tool yang ringan dan mudah dikonfigurasi.

3.1 API Service

Ulangi langkah berikut untuk setiap service:

Klik "Add New Monitor" → pilih tipe HTTP(s).
Isi konfigurasi:

Field	Nilai
Monitor Type	`HTTP(s)`
Friendly Name	`Alurkerja - [nama service]`, contoh: `Alurkerja - Authentication`
URL	URL sesuai tabel di Bagian 2.1
HTTP Method	`POST` (tanpa body)
Heartbeat Interval	`60` detik
Retries	`3`
Expected Status Code	`200`

3.2 Database (PostgreSQL)

"Add New Monitor" → tipe TCP Port.

Field	Nilai
Friendly Name	`Alurkerja - PostgreSQL`
Hostname	`your-db-host`
Port	`5432`
Heartbeat Interval	`60` detik
Retries	`3`

3.3 Queue Service (Redis & RabbitMQ)

Buat dua monitor terpisah dengan tipe TCP Port:

Redis:

Field	Nilai
Friendly Name	`Alurkerja - Redis`
Hostname	`your-redis-host`
Port	`6379`
Heartbeat Interval	`60` detik

RabbitMQ:

Field	Nilai
Friendly Name	`Alurkerja - RabbitMQ`
Hostname	`your-rabbitmq-host`
Port	`5672`
Heartbeat Interval	`60` detik

3.4 File Storage

"Add New Monitor" → tipe HTTP(s).

Field	Nilai
Friendly Name	`Alurkerja - File Storage`
URL	`https://your-storage-host/public/healthz`
HTTP Method	`GET`
Heartbeat Interval	`120` detik
Expected Status Code	`200`

3.5 SSL Certificate

Uptime Kuma mengecek SSL otomatis pada setiap monitor HTTP(s). Aktifkan notifikasi SSL secara eksplisit:

Buka monitor yang sudah ada (atau buat baru) → tipe HTTP(s).
Aktifkan "Enable SSL Certificate Expiry Notification".
Set threshold ke 30 hari.

Field	Nilai
Friendly Name	`Alurkerja - SSL Certificate`
URL	`https://your-domain.com`
HTTP Method	`GET`
Heartbeat Interval	`86400` detik (1 hari)
SSL Expiry Threshold	`30` hari

3.6 Disk & Memory (Push Monitor)

Disk dan memory menggunakan mekanisme Push — server mengirim heartbeat ke Uptime Kuma via cron job.

Setup monitor:

"Add New Monitor" → tipe Push.

Salin URL push yang dihasilkan, contoh:

https://uptime-kuma.your-domain.com/api/push/xxxxxx?status=up&msg=OK&ping=

Set Heartbeat Interval ke 300 detik (5 menit).

Script push (/opt/scripts/health-push.sh):

#!/bin/bash

PUSH_URL="https://uptime-kuma.your-domain.com/api/push/xxxxxx"

DISK_USAGE=$(df -h | grep -vE '^Filesystem|tmpfs|cdrom' \
  | awk '{ print $5 }' | sed 's/%//' | sort -n | tail -1)

MEM_TOTAL=$(free | awk '/^Mem:/ {print $2}')
MEM_USED=$(free | awk '/^Mem:/ {print $3}')
MEM_USAGE=$(( MEM_USED * 100 / MEM_TOTAL ))

STATUS="up"
MSG="Disk: ${DISK_USAGE}%, RAM: ${MEM_USAGE}%"

if [ "$DISK_USAGE" -gt 85 ] || [ "$MEM_USAGE" -gt 90 ]; then
  STATUS="down"
  MSG="ALERT - Disk: ${DISK_USAGE}%, RAM: ${MEM_USAGE}%"
fi

curl -s "${PUSH_URL}?status=${STATUS}&msg=${MSG}&ping=" > /dev/null

Daftarkan ke cron job:

chmod +x /opt/scripts/health-push.sh
crontab -e
# Tambahkan:
*/5 * * * * /opt/scripts/health-push.sh >> /var/log/health-push.log 2>&1

3.7 Notifikasi Alert

Di tab "Notifications" pada setiap monitor, tambahkan channel notifikasi (Slack, Telegram, Email, dll.).

4. Grafana + Prometheus

Untuk monitoring dengan visualisasi mendalam, histori metrik, dan alerting terpusat.

4.1 Setup Blackbox Exporter

Prometheus Blackbox Exporter digunakan untuk HTTP dan TCP probing.

docker run -d \
  --name blackbox_exporter \
  -p 9115:9115 \
  -v $(pwd)/blackbox.yml:/config/blackbox.yml \
  prom/blackbox-exporter:latest \
  --config.file=/config/blackbox.yml

blackbox.yml:

modules:
  http_post_2xx:
    prober: http
    timeout: 10s
    http:
      method: POST
      valid_status_codes: [200]
      preferred_ip_protocol: "ip4"

  http_get_2xx:
    prober: http
    timeout: 10s
    http:
      method: GET
      valid_status_codes: [200]
      preferred_ip_protocol: "ip4"

  tcp_connect:
    prober: tcp
    timeout: 10s

  https_ssl:
    prober: http
    timeout: 10s
    http:
      method: GET
      tls_config:
        insecure_skip_verify: false

4.2 API Service (12 Service)

prometheus.yml — job API service:

scrape_configs:
  - job_name: 'alurkerja-api'
    metrics_path: /probe
    params:
      module: [http_post_2xx]
    static_configs:
      - targets:
          - https://your-domain.com/api/v1/authentication/public/healthz
          - https://your-domain.com/api/v1/probis/public/healthz
          - https://your-domain.com/api/v1/compro/public/healthz
          - https://your-domain.com/api/v1/generate-test/public/healthz
          - https://your-domain.com/api/v1/integration/public/healthz
          - https://your-domain.com/api/v1/migration/public/healthz
          - https://your-domain.com/api/v1/notif/public/healthz
          - https://your-domain.com/api/v1/proxy/public/healthz
          - https://your-domain.com/api/v1/report/public/healthz
          - https://your-domain.com/api/v1/simulation/public/healthz
          - https://your-domain.com/api/v1/tenant/public/healthz
          - https://your-domain.com/api/v1/tasklist/public/healthz
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115

4.3 Database & Queue Service (TCP)

  - job_name: 'alurkerja-tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - your-db-host:5432        # PostgreSQL
          - your-redis-host:6379     # Redis
          - your-rabbitmq-host:5672  # RabbitMQ
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115

4.4 File Storage

  - job_name: 'alurkerja-storage'
    metrics_path: /probe
    params:
      module: [http_get_2xx]
    static_configs:
      - targets:
          - https://your-storage-host/public/healthz
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115
    scrape_interval: 2m

4.5 SSL Certificate

  - job_name: 'alurkerja-ssl'
    metrics_path: /probe
    params:
      module: [https_ssl]
    static_configs:
      - targets:
          - https://your-domain.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115
    scrape_interval: 24h

4.6 Disk & Memory (Node Exporter)

docker run -d \
  --name node_exporter \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  -p 9100:9100 \
  quay.io/prometheus/node-exporter:latest \
  --path.rootfs=/host

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['your-server-host:9100']

4.7 Query json & Alerting

# API/Storage service down
probe_success{job=~"alurkerja-api|alurkerja-storage"} == 0

# TCP connection failed (DB / Queue)
probe_success{job="alurkerja-tcp"} == 0

# Response time API > 3 detik
probe_duration_seconds{job="alurkerja-api"} > 3

# SSL expire < 30 hari
probe_ssl_earliest_cert_expiry - time() < 86400 * 30

# Disk usage > 85%
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85

# Memory usage > 90%
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90

Buat alert rule di Grafana: Alerting → Alert Rules → New Alert Rule, lalu hubungkan ke Contact Point (Slack, PagerDuty, email).

5. Better Uptime / BetterStack

BetterStack adalah layanan SaaS monitoring dengan status page bawaan dan on-call management.

5.1 API Service (12 Monitor)

Navigasi ke "Monitors" → "New Monitor".
Ulangi untuk setiap service:

Field	Nilai
Monitor type	`Website / API`
Name	`Alurkerja - [nama service]`
URL	URL sesuai tabel di Bagian 2.1
Request type	`POST`
Request body	(kosong)
Expected status code	`200`
Check frequency	`1 minute`
Regions	Singapore (atau terdekat)

5.2 Database & Queue Service (TCP)

BetterStack mendukung TCP check secara native:

"New Monitor" → tipe TCP.

Field	Nilai
Name	`Alurkerja - PostgreSQL` / `Redis` / `RabbitMQ`
Host	Host server masing-masing
Port	`5432` / `6379` / `5672`
Check frequency	`1 minute`

5.3 File Storage

Field	Nilai
Monitor type	`Website / API`
Name	`Alurkerja - File Storage`
URL	`https://your-storage-host/public/healthz`
Request type	`GET`
Expected status code	`200`
Check frequency	`2 minutes`

5.4 SSL Certificate

BetterStack mengecek SSL otomatis pada setiap monitor HTTPS. Konfigurasi tambahan:

Buka monitor → tab "SSL".
Aktifkan "Notify before SSL certificate expires".
Set threshold ke 30 hari.

5.5 Disk & Memory

BetterStack tidak mendukung push/custom script secara native. Gunakan Uptime Kuma Push Monitor atau Node Exporter + Grafana untuk komponen ini (lihat Bagian 3.6 dan Bagian 4.6).

5.6 On-Call & Escalation

Di tab "On-call", konfigurasikan eskalasi ke tim terkait berdasarkan tier kritikal service.

6. Gatus

Gatus adalah self-hosted monitoring tool berbasis YAML yang ringan, Kubernetes-native, dan mudah di-version control.

Instalasi

docker run -d \
  --name gatus \
  -p 8080:8080 \
  -v $(pwd)/config:/config \
  twinproduction/gatus:latest

`config/config.yaml` — Lengkap (Semua Monitor)

# Konfigurasi alerting global
alerting:
  slack:
    webhook-url: "https://hooks.slack.com/services/xxx/yyy/zzz"
    default-alert:
      failure-threshold: 3
      success-threshold: 1
      send-on-resolved: true

endpoints:
  # ───────────────────────────────
  # API Service (12 Service)
  # ───────────────────────────────
  - name: authentication
    group: api-service
    url: https://your-domain.com/api/v1/authentication/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: bpm
    group: api-service
    url: https://your-domain.com/api/v1/probis/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: company_profile
    group: api-service
    url: https://your-domain.com/api/v1/compro/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: generate_test
    group: api-service
    url: https://your-domain.com/api/v1/generate-test/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: integration
    group: api-service
    url: https://your-domain.com/api/v1/integration/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: migration
    group: api-service
    url: https://your-domain.com/api/v1/migration/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: notification
    group: api-service
    url: https://your-domain.com/api/v1/notif/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: proxy
    group: api-service
    url: https://your-domain.com/api/v1/proxy/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: report
    group: api-service
    url: https://your-domain.com/api/v1/report/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: simulation
    group: api-service
    url: https://your-domain.com/api/v1/simulation/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: tenant_management
    group: api-service
    url: https://your-domain.com/api/v1/tenant/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: camunda_tasklist
    group: api-service
    url: https://your-domain.com/api/v1/tasklist/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  # ───────────────────────────────
  # Infrastruktur
  # ───────────────────────────────
  - name: postgresql
    group: infrastructure
    url: tcp://your-db-host:5432
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: slack

  - name: redis
    group: infrastructure
    url: tcp://your-redis-host:6379
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: slack

  - name: rabbitmq
    group: infrastructure
    url: tcp://your-rabbitmq-host:5672
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: slack

  - name: file-storage
    group: infrastructure
    url: https://your-storage-host/public/healthz
    method: GET
    interval: 2m
    conditions:
      - "[STATUS] == 200"
    alerts:
      - type: slack

  - name: ssl-certificate
    group: infrastructure
    url: https://your-domain.com
    method: GET
    interval: 24h
    conditions:
      - "[STATUS] == 200"
      - "[CERTIFICATE_EXPIRATION] > 720h"   # 30 hari
    alerts:
      - type: slack
        failure-threshold: 1
        description: "SSL certificate akan expired dalam < 30 hari"

Gatus tidak mendukung disk/memory check secara native. Gunakan kombinasi Node Exporter + Grafana atau Push Monitor di Uptime Kuma untuk komponen tersebut.

7. UptimeRobot

UptimeRobot adalah layanan SaaS dengan free tier hingga 50 monitor (interval minimum 5 menit untuk free, 1 menit untuk Pro).

7.1 API Service (12 Monitor)

Klik "+ Add New Monitor" → tipe HTTP(s).
Ulangi untuk setiap service:

Field	Nilai
Monitor Type	`HTTP(s)`
Friendly Name	`Alurkerja - [nama service]`
URL	URL sesuai tabel di Bagian 2.1
HTTP Method	`POST` (tanpa body)
Expected status code	`200`
Monitoring Interval	`1 minute` (Pro) / `5 minutes` (Free)

7.2 Database & Queue Service (TCP)

"Add New Monitor" → tipe Port.

Komponen	Host	Port
PostgreSQL	`your-db-host`	`5432`
Redis	`your-redis-host`	`6379`
RabbitMQ	`your-rabbitmq-host`	`5672`

Isi Monitoring Interval 1 minute untuk masing-masing.

7.3 File Storage

Field	Nilai
Monitor Type	`HTTP(s)`
Friendly Name	`Alurkerja - File Storage`
URL	`https://your-storage-host/public/healthz`
HTTP Method	`GET`
Monitoring Interval	`5 minutes`

7.4 SSL Certificate

UptimeRobot mengecek SSL otomatis. Aktifkan di pengaturan monitor:

Buka monitor → "Edit".
Centang "Alert when SSL expires in" → set ke 30 hari.

7.5 Disk & Memory

UptimeRobot tidak mendukung push atau custom script. Gunakan Uptime Kuma Push Monitor (lihat Bagian 3.6) untuk komponen ini.

8. OneUptime

OneUptime adalah platform observability open-source all-in-one (uptime monitoring, incident management, status page). Bisa self-hosted maupun SaaS.

Instalasi Self-Hosted

git clone https://github.com/OneUptime/oneuptime.git
cd oneuptime
cp config.env.example config.env
docker compose up -d

8.1 API Service (12 Monitor)

Navigasi ke "Monitors" → "Create Monitor".
Ulangi untuk setiap service:

Field	Nilai
Monitor Type	`API`
Name	`[nama service]`, contoh: `authentication`
URL	URL sesuai tabel di Bagian 2.1
Request Method	`POST`
Request Body	(kosong)
Expected Status Code	`200`
Check Interval	`1 minute`

8.2 Database & Queue Service (TCP)

"Create Monitor" → tipe TCP.

Field	Nilai
Monitor Type	`TCP`
Name	`PostgreSQL` / `Redis` / `RabbitMQ`
Host	Host server masing-masing
Port	`5432` / `6379` / `5672`
Check Interval	`1 minute`

8.3 File Storage

Field	Nilai
Monitor Type	`API`
Name	`File Storage`
URL	`https://your-storage-host/public/healthz`
Request Method	`GET`
Expected Status Code	`200`
Check Interval	`2 minutes`

8.4 SSL Certificate

"Create Monitor" → tipe Website.
Aktifkan "SSL Certificate Monitoring" → set expiry alert ke 30 hari.

8.5 Disk & Memory

OneUptime mendukung Custom Code Monitor (script Node.js) untuk mengecek resource server:

"Create Monitor" → tipe Custom Code.
Masukkan script berikut:

// Script berjalan di server OneUptime
const { execSync } = require('child_process');

const disk = execSync("df / | tail -1 | awk '{print $5}'")
  .toString().trim().replace('%', '');
const mem = execSync("free | awk '/^Mem:/ {printf \"%.0f\", $3/$2*100}'")
  .toString().trim();

if (parseInt(disk) > 85 || parseInt(mem) > 90) {
  throw new Error(`Resource alert — Disk: ${disk}%, RAM: ${mem}%`);
}

Set interval ke 5 menit.

define service {
    host_name               alurkerja-prod
    service_description     Health - Authentication
    check_command           check_http!-H your-domain.com -u /api/v1/authentication/public/healthz --ssl --method=POST
    check_interval          1
    max_check_attempts      3
    notification_interval   30
}

Gunakan plugin check_http via command line:

/usr/lib/nagios/plugins/check_http \
  -H your-domain.com \
  -u /api/v1/authentication/public/healthz \
  --method=POST \
  --ssl \
  -w 3 -c 5

Database & Queue (TCP)

# PostgreSQL
/usr/lib/nagios/plugins/check_tcp -H your-db-host -p 5432

# Redis
/usr/lib/nagios/plugins/check_tcp -H your-redis-host -p 6379

# RabbitMQ
/usr/lib/nagios/plugins/check_tcp -H your-rabbitmq-host -p 5672

File Storage

/usr/lib/nagios/plugins/check_http \
  -H your-storage-host \
  -u /public/healthz \
  --ssl -w 5 -c 10

SSL Certificate

/usr/lib/nagios/plugins/check_http \
  -H your-domain.com \
  --ssl \
  -C 30,14    # warning 30 hari, critical 14 hari

Disk & Memory

# Disk
/usr/lib/nagios/plugins/check_disk -w 15% -c 10% -p /

# Memory (butuh plugin check_mem)
/usr/lib/nagios/plugins/check_mem -w 90 -c 95

9.2 Zabbix

API Service

"Configuration" → "Hosts" → "Items" → "Create Item".

Field	Nilai
Name	`Health - [nama service]`
Type	`HTTP agent`
URL	URL endpoint healthz
Request method	`POST`
Update interval	`1m`

Buat Trigger: last(/host/item.key)<>200

Database & Queue (TCP)

Gunakan tipe item Simple check dengan key net.tcp.port[host,port]:

Service	Key
PostgreSQL	`net.tcp.port[your-db-host,5432]`
Redis	`net.tcp.port[your-redis-host,6379]`
RabbitMQ	`net.tcp.port[your-rabbitmq-host,5672]`

Buat Trigger: last(/host/net.tcp.port[...])<>1

system.cpu.util — CPU usage
vm.memory.size[pavailable] — available memory
vfs.fs.size[/,pused] — disk usage

Set trigger: disk > 85%, memory available < 10%.

10. Verifikasi Manual (cURL)

Cek Satu Service

curl -X POST -o /dev/null -s -w "%{http_code}\n" \
  https://your-domain.com/api/v1/authentication/public/healthz

Cek TCP (Database / Queue)

nc -zv your-db-host 5432       # PostgreSQL
nc -zv your-redis-host 6379    # Redis
nc -zv your-rabbitmq-host 5672 # RabbitMQ

Cek SSL Certificate

echo | openssl s_client -connect your-domain.com:443 2>/dev/null \
  | openssl x509 -noout -dates

Script Cek Semua Sekaligus

Simpan sebagai check-health.sh:

#!/bin/bash

BASE_URL="https://your-domain.com"

echo "=============================="
echo " Alurkerja Health Check"
echo " $(date)"
echo "=============================="

# ─── API Service ───────────────
declare -A API_SERVICES=(
  ["authentication"]="${BASE_URL}/api/v1/authentication/public/healthz"
  ["bpm"]="${BASE_URL}/api/v1/probis/public/healthz"
  ["company_profile"]="${BASE_URL}/api/v1/compro/public/healthz"
  ["generate_test"]="${BASE_URL}/api/v1/generate-test/public/healthz"
  ["integration"]="${BASE_URL}/api/v1/integration/public/healthz"
  ["migration"]="${BASE_URL}/api/v1/migration/public/healthz"
  ["notification"]="${BASE_URL}/api/v1/notif/public/healthz"
  ["proxy"]="${BASE_URL}/api/v1/proxy/public/healthz"
  ["report"]="${BASE_URL}/api/v1/report/public/healthz"
  ["simulation"]="${BASE_URL}/api/v1/simulation/public/healthz"
  ["tenant_management"]="${BASE_URL}/api/v1/tenant/public/healthz"
  ["camunda_tasklist"]="${BASE_URL}/api/v1/tasklist/public/healthz"
)

echo ""
echo "── API Service ──"
ALL_OK=true
for SERVICE in "${!API_SERVICES[@]}"; do
  STATUS=$(curl -X POST -o /dev/null -s -w "%{http_code}" \
    --max-time 10 "${API_SERVICES[$SERVICE]}")
  if [ "$STATUS" == "200" ]; then
    echo "  ✅ $SERVICE → $STATUS"
  else
    echo "  ❌ $SERVICE → $STATUS"
    ALL_OK=false
  fi
done

# ─── TCP Check ─────────────────
declare -A TCP_SERVICES=(
  ["postgresql"]="your-db-host:5432"
  ["redis"]="your-redis-host:6379"
  ["rabbitmq"]="your-rabbitmq-host:5672"
)

echo ""
echo "── Infrastruktur (TCP) ──"
for SERVICE in "${!TCP_SERVICES[@]}"; do
  HOST=$(echo "${TCP_SERVICES[$SERVICE]}" | cut -d: -f1)
  PORT=$(echo "${TCP_SERVICES[$SERVICE]}" | cut -d: -f2)
  if nc -z -w 5 "$HOST" "$PORT" 2>/dev/null; then
    echo "  ✅ $SERVICE ($HOST:$PORT)"
  else
    echo "  ❌ $SERVICE ($HOST:$PORT) — connection refused"
    ALL_OK=false
  fi
done

# ─── File Storage ───────────────
echo ""
echo "── File Storage ──"
FS_STATUS=$(curl -o /dev/null -s -w "%{http_code}" \
  --max-time 10 "https://your-storage-host/public/healthz")
if [ "$FS_STATUS" == "200" ]; then
  echo "  ✅ file-storage → $FS_STATUS"
else
  echo "  ❌ file-storage → $FS_STATUS"
  ALL_OK=false
fi

# ─── SSL Certificate ────────────
echo ""
echo "── SSL Certificate ──"
EXPIRY=$(echo | openssl s_client -connect your-domain.com:443 2>/dev/null \
  | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
if [ -n "$EXPIRY" ]; then
  echo "  ✅ SSL expires: $EXPIRY"
else
  echo "  ❌ SSL check failed"
  ALL_OK=false
fi

echo ""
echo "=============================="
if [ "$ALL_OK" = true ]; then
  echo " Semua monitor sehat ✅"
else
  echo " Ada monitor yang bermasalah ❌"
  exit 1
fi

chmod +x check-health.sh
./check-health.sh

11. Rekomendasi Konfigurasi

Komponen	Method	Interval	Timeout	Retry	Threshold Alert
API Service (12 service)	`POST`	60 detik	10 detik	3	Status ≠ 200, latency > 3s
Database (PostgreSQL)	TCP	60 detik	10 detik	3	Connection refused
Queue (Redis / RabbitMQ)	TCP	60 detik	10 detik	3	Connection refused
File Storage	`GET`	120 detik	10 detik	3	Status ≠ 200
SSL Certificate	`GET`	Harian	10 detik	1	Expire < 30 hari
Disk & Memory	Push/Agent	5 menit	—	—	Disk > 85%, RAM > 90%

12. Troubleshooting

API Service mengembalikan 404

Deployment terbaru belum selesai atau gagal — cek status pod/container.
Base path salah — cek kembali tabel di Bagian 2.1.

API Service mengembalikan 401

Endpoint /public/healthz harus dikonfigurasi sebagai public route tanpa autentikasi:

Pastikan route /public/healthz didaftarkan di luar middleware auth.
Cek API Gateway / reverse proxy — tambahkan pengecualian untuk path /public/healthz.

TCP Check gagal (Database / Queue)

Pastikan firewall mengizinkan koneksi dari server monitoring ke port target.
Verifikasi service berjalan: systemctl status postgresql / redis-cli ping / rabbitmqctl status.

SSL Check menunjukkan expiry tidak akurat

Pastikan server monitoring memiliki akses ke internet untuk menjangkau domain target.
Cek manual: echo | openssl s_client -connect your-domain.com:443 2>/dev/null | openssl x509 -noout -dates

Monitoring tool menunjukkan DOWN padahal service berjalan

Pastikan method yang digunakan sudah POST untuk API service (bukan GET).
Cek whitelist IP — monitoring tool mungkin diblokir firewall.
Verifikasi manual via cURL dari server monitoring.

False positive alert

Naikkan Retries menjadi 3 sebelum alert dikirim.
Gunakan interval minimum 1 menit untuk menghindari noise dari fluktuasi jaringan.

Health Monitoring

On this page