Health Monitoring

Implementasi health monitoring di Alurkerja

1. Overview

Monitoring Alurkerja mencakup dua kategori:

  • API Service — 12 service yang masing-masing memiliki endpoint POST /public/healthz, dimonitor setiap 60 detik dengan expected status 200.
  • Infrastruktur — komponen pendukung (database, queue, storage, SSL, disk/memory) dengan metode dan interval monitor yang berbeda-beda.

2. Referensi Monitor

Gunakan tabel berikut sebagai acuan saat mengkonfigurasi monitor di tools manapun.

2.1 API Service (12 Service)

Berikut adalah 12 service beserta endpoint /public/healthz masing-masing:

NoServiceBase PathHealth Endpoint
1authentication/api/v1/authentication/api/v1/authentication/public/healthz
2bpm/api/v1/probis/api/v1/probis/public/healthz
3company_profile/api/v1/compro/api/v1/compro/public/healthz
4generate_test/api/v1/generate-test/api/v1/generate-test/public/healthz
5integration/api/v1/integration/api/v1/integration/public/healthz
6migration/api/v1/migration/api/v1/migration/public/healthz
7notification/api/v1/notif/api/v1/notif/public/healthz
8proxy/api/v1/proxy/api/v1/proxy/public/healthz
9report/api/v1/report/api/v1/report/public/healthz
10simulation/api/v1/simulation/api/v1/simulation/public/healthz
11tenant_management/api/v1/tenant/api/v1/tenant/public/healthz
12camunda tasklist/api/v1/tasklist/api/v1/tasklist/public/healthz

2.2 Komponen Infrastruktur

KomponenTipe MonitorMethodIntervalThreshold AlertTarget
Database (PostgreSQL)TCP Port60 detikConnection refusedyour-db-host:5432
Queue — RedisTCP Port60 detikConnection refusedyour-redis-host:6379
Queue — RabbitMQTCP Port60 detikConnection refusedyour-rabbitmq-host:5672
File StorageHTTP(s)GET120 detikStatus ≠ 200https://your-storage-host/public/healthz
SSL CertificateSSL/TLSGETHarianExpire < 30 harihttps://your-domain.com
Disk & MemoryPush (Cron)5 menitDisk > 85%, RAM > 90%Push ke Uptime Kuma

3. Uptime Kuma

Uptime Kuma adalah self-hosted monitoring tool yang ringan dan mudah dikonfigurasi.

3.1 API Service

Ulangi langkah berikut untuk setiap service:

  1. Klik "Add New Monitor" → pilih tipe HTTP(s).
  2. Isi konfigurasi:
FieldNilai
Monitor TypeHTTP(s)
Friendly NameAlurkerja - [nama service], contoh: Alurkerja - Authentication
URLURL sesuai tabel di Bagian 2.1
HTTP MethodPOST (tanpa body)
Heartbeat Interval60 detik
Retries3
Expected Status Code200

3.2 Database (PostgreSQL)

  1. "Add New Monitor" → tipe TCP Port.
FieldNilai
Friendly NameAlurkerja - PostgreSQL
Hostnameyour-db-host
Port5432
Heartbeat Interval60 detik
Retries3

3.3 Queue Service (Redis & RabbitMQ)

Buat dua monitor terpisah dengan tipe TCP Port:

Redis:

FieldNilai
Friendly NameAlurkerja - Redis
Hostnameyour-redis-host
Port6379
Heartbeat Interval60 detik

RabbitMQ:

FieldNilai
Friendly NameAlurkerja - RabbitMQ
Hostnameyour-rabbitmq-host
Port5672
Heartbeat Interval60 detik

3.4 File Storage

  1. "Add New Monitor" → tipe HTTP(s).
FieldNilai
Friendly NameAlurkerja - File Storage
URLhttps://your-storage-host/public/healthz
HTTP MethodGET
Heartbeat Interval120 detik
Expected Status Code200

3.5 SSL Certificate

Uptime Kuma mengecek SSL otomatis pada setiap monitor HTTP(s). Aktifkan notifikasi SSL secara eksplisit:

  1. Buka monitor yang sudah ada (atau buat baru) → tipe HTTP(s).
  2. Aktifkan "Enable SSL Certificate Expiry Notification".
  3. Set threshold ke 30 hari.
FieldNilai
Friendly NameAlurkerja - SSL Certificate
URLhttps://your-domain.com
HTTP MethodGET
Heartbeat Interval86400 detik (1 hari)
SSL Expiry Threshold30 hari

3.6 Disk & Memory (Push Monitor)

Disk dan memory menggunakan mekanisme Push — server mengirim heartbeat ke Uptime Kuma via cron job.

Setup monitor:

  1. "Add New Monitor" → tipe Push.
  2. Salin URL push yang dihasilkan, contoh:
    https://uptime-kuma.your-domain.com/api/push/xxxxxx?status=up&msg=OK&ping=
  3. Set Heartbeat Interval ke 300 detik (5 menit).

Script push (/opt/scripts/health-push.sh):

#!/bin/bash

PUSH_URL="https://uptime-kuma.your-domain.com/api/push/xxxxxx"

DISK_USAGE=$(df -h | grep -vE '^Filesystem|tmpfs|cdrom' \
  | awk '{ print $5 }' | sed 's/%//' | sort -n | tail -1)

MEM_TOTAL=$(free | awk '/^Mem:/ {print $2}')
MEM_USED=$(free | awk '/^Mem:/ {print $3}')
MEM_USAGE=$(( MEM_USED * 100 / MEM_TOTAL ))

STATUS="up"
MSG="Disk: ${DISK_USAGE}%, RAM: ${MEM_USAGE}%"

if [ "$DISK_USAGE" -gt 85 ] || [ "$MEM_USAGE" -gt 90 ]; then
  STATUS="down"
  MSG="ALERT - Disk: ${DISK_USAGE}%, RAM: ${MEM_USAGE}%"
fi

curl -s "${PUSH_URL}?status=${STATUS}&msg=${MSG}&ping=" > /dev/null

Daftarkan ke cron job:

chmod +x /opt/scripts/health-push.sh
crontab -e
# Tambahkan:
*/5 * * * * /opt/scripts/health-push.sh >> /var/log/health-push.log 2>&1

3.7 Notifikasi Alert

Di tab "Notifications" pada setiap monitor, tambahkan channel notifikasi (Slack, Telegram, Email, dll.).

4. Grafana + Prometheus

Untuk monitoring dengan visualisasi mendalam, histori metrik, dan alerting terpusat.

4.1 Setup Blackbox Exporter

Prometheus Blackbox Exporter digunakan untuk HTTP dan TCP probing.

docker run -d \
  --name blackbox_exporter \
  -p 9115:9115 \
  -v $(pwd)/blackbox.yml:/config/blackbox.yml \
  prom/blackbox-exporter:latest \
  --config.file=/config/blackbox.yml

blackbox.yml:

modules:
  http_post_2xx:
    prober: http
    timeout: 10s
    http:
      method: POST
      valid_status_codes: [200]
      preferred_ip_protocol: "ip4"

  http_get_2xx:
    prober: http
    timeout: 10s
    http:
      method: GET
      valid_status_codes: [200]
      preferred_ip_protocol: "ip4"

  tcp_connect:
    prober: tcp
    timeout: 10s

  https_ssl:
    prober: http
    timeout: 10s
    http:
      method: GET
      tls_config:
        insecure_skip_verify: false

4.2 API Service (12 Service)

prometheus.yml — job API service:

scrape_configs:
  - job_name: 'alurkerja-api'
    metrics_path: /probe
    params:
      module: [http_post_2xx]
    static_configs:
      - targets:
          - https://your-domain.com/api/v1/authentication/public/healthz
          - https://your-domain.com/api/v1/probis/public/healthz
          - https://your-domain.com/api/v1/compro/public/healthz
          - https://your-domain.com/api/v1/generate-test/public/healthz
          - https://your-domain.com/api/v1/integration/public/healthz
          - https://your-domain.com/api/v1/migration/public/healthz
          - https://your-domain.com/api/v1/notif/public/healthz
          - https://your-domain.com/api/v1/proxy/public/healthz
          - https://your-domain.com/api/v1/report/public/healthz
          - https://your-domain.com/api/v1/simulation/public/healthz
          - https://your-domain.com/api/v1/tenant/public/healthz
          - https://your-domain.com/api/v1/tasklist/public/healthz
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115

4.3 Database & Queue Service (TCP)

  - job_name: 'alurkerja-tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - your-db-host:5432        # PostgreSQL
          - your-redis-host:6379     # Redis
          - your-rabbitmq-host:5672  # RabbitMQ
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115

4.4 File Storage

  - job_name: 'alurkerja-storage'
    metrics_path: /probe
    params:
      module: [http_get_2xx]
    static_configs:
      - targets:
          - https://your-storage-host/public/healthz
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115
    scrape_interval: 2m

4.5 SSL Certificate

  - job_name: 'alurkerja-ssl'
    metrics_path: /probe
    params:
      module: [https_ssl]
    static_configs:
      - targets:
          - https://your-domain.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115
    scrape_interval: 24h

4.6 Disk & Memory (Node Exporter)

docker run -d \
  --name node_exporter \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  -p 9100:9100 \
  quay.io/prometheus/node-exporter:latest \
  --path.rootfs=/host
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['your-server-host:9100']

4.7 Query json & Alerting

# API/Storage service down
probe_success{job=~"alurkerja-api|alurkerja-storage"} == 0

# TCP connection failed (DB / Queue)
probe_success{job="alurkerja-tcp"} == 0

# Response time API > 3 detik
probe_duration_seconds{job="alurkerja-api"} > 3

# SSL expire < 30 hari
probe_ssl_earliest_cert_expiry - time() < 86400 * 30

# Disk usage > 85%
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85

# Memory usage > 90%
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90

Buat alert rule di Grafana: AlertingAlert RulesNew Alert Rule, lalu hubungkan ke Contact Point (Slack, PagerDuty, email).

5. Better Uptime / BetterStack

BetterStack adalah layanan SaaS monitoring dengan status page bawaan dan on-call management.

5.1 API Service (12 Monitor)

  1. Navigasi ke "Monitors""New Monitor".
  2. Ulangi untuk setiap service:
FieldNilai
Monitor typeWebsite / API
NameAlurkerja - [nama service]
URLURL sesuai tabel di Bagian 2.1
Request typePOST
Request body(kosong)
Expected status code200
Check frequency1 minute
RegionsSingapore (atau terdekat)

5.2 Database & Queue Service (TCP)

BetterStack mendukung TCP check secara native:

  1. "New Monitor" → tipe TCP.
FieldNilai
NameAlurkerja - PostgreSQL / Redis / RabbitMQ
HostHost server masing-masing
Port5432 / 6379 / 5672
Check frequency1 minute

5.3 File Storage

FieldNilai
Monitor typeWebsite / API
NameAlurkerja - File Storage
URLhttps://your-storage-host/public/healthz
Request typeGET
Expected status code200
Check frequency2 minutes

5.4 SSL Certificate

BetterStack mengecek SSL otomatis pada setiap monitor HTTPS. Konfigurasi tambahan:

  1. Buka monitor → tab "SSL".
  2. Aktifkan "Notify before SSL certificate expires".
  3. Set threshold ke 30 hari.

5.5 Disk & Memory

BetterStack tidak mendukung push/custom script secara native. Gunakan Uptime Kuma Push Monitor atau Node Exporter + Grafana untuk komponen ini (lihat Bagian 3.6 dan Bagian 4.6).

5.6 On-Call & Escalation

Di tab "On-call", konfigurasikan eskalasi ke tim terkait berdasarkan tier kritikal service.


6. Gatus

Gatus adalah self-hosted monitoring tool berbasis YAML yang ringan, Kubernetes-native, dan mudah di-version control.

Instalasi

docker run -d \
  --name gatus \
  -p 8080:8080 \
  -v $(pwd)/config:/config \
  twinproduction/gatus:latest

config/config.yaml — Lengkap (Semua Monitor)

# Konfigurasi alerting global
alerting:
  slack:
    webhook-url: "https://hooks.slack.com/services/xxx/yyy/zzz"
    default-alert:
      failure-threshold: 3
      success-threshold: 1
      send-on-resolved: true

endpoints:
  # ───────────────────────────────
  # API Service (12 Service)
  # ───────────────────────────────
  - name: authentication
    group: api-service
    url: https://your-domain.com/api/v1/authentication/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: bpm
    group: api-service
    url: https://your-domain.com/api/v1/probis/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: company_profile
    group: api-service
    url: https://your-domain.com/api/v1/compro/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: generate_test
    group: api-service
    url: https://your-domain.com/api/v1/generate-test/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: integration
    group: api-service
    url: https://your-domain.com/api/v1/integration/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: migration
    group: api-service
    url: https://your-domain.com/api/v1/migration/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: notification
    group: api-service
    url: https://your-domain.com/api/v1/notif/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: proxy
    group: api-service
    url: https://your-domain.com/api/v1/proxy/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: report
    group: api-service
    url: https://your-domain.com/api/v1/report/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: simulation
    group: api-service
    url: https://your-domain.com/api/v1/simulation/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: tenant_management
    group: api-service
    url: https://your-domain.com/api/v1/tenant/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  - name: camunda_tasklist
    group: api-service
    url: https://your-domain.com/api/v1/tasklist/public/healthz
    method: POST
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 3000"
    alerts:
      - type: slack

  # ───────────────────────────────
  # Infrastruktur
  # ───────────────────────────────
  - name: postgresql
    group: infrastructure
    url: tcp://your-db-host:5432
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: slack

  - name: redis
    group: infrastructure
    url: tcp://your-redis-host:6379
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: slack

  - name: rabbitmq
    group: infrastructure
    url: tcp://your-rabbitmq-host:5672
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: slack

  - name: file-storage
    group: infrastructure
    url: https://your-storage-host/public/healthz
    method: GET
    interval: 2m
    conditions:
      - "[STATUS] == 200"
    alerts:
      - type: slack

  - name: ssl-certificate
    group: infrastructure
    url: https://your-domain.com
    method: GET
    interval: 24h
    conditions:
      - "[STATUS] == 200"
      - "[CERTIFICATE_EXPIRATION] > 720h"   # 30 hari
    alerts:
      - type: slack
        failure-threshold: 1
        description: "SSL certificate akan expired dalam < 30 hari"

Gatus tidak mendukung disk/memory check secara native. Gunakan kombinasi Node Exporter + Grafana atau Push Monitor di Uptime Kuma untuk komponen tersebut.

7. UptimeRobot

UptimeRobot adalah layanan SaaS dengan free tier hingga 50 monitor (interval minimum 5 menit untuk free, 1 menit untuk Pro).

7.1 API Service (12 Monitor)

  1. Klik "+ Add New Monitor" → tipe HTTP(s).
  2. Ulangi untuk setiap service:
FieldNilai
Monitor TypeHTTP(s)
Friendly NameAlurkerja - [nama service]
URLURL sesuai tabel di Bagian 2.1
HTTP MethodPOST (tanpa body)
Expected status code200
Monitoring Interval1 minute (Pro) / 5 minutes (Free)

7.2 Database & Queue Service (TCP)

  1. "Add New Monitor" → tipe Port.
KomponenHostPort
PostgreSQLyour-db-host5432
Redisyour-redis-host6379
RabbitMQyour-rabbitmq-host5672

Isi Monitoring Interval 1 minute untuk masing-masing.

7.3 File Storage

FieldNilai
Monitor TypeHTTP(s)
Friendly NameAlurkerja - File Storage
URLhttps://your-storage-host/public/healthz
HTTP MethodGET
Monitoring Interval5 minutes

7.4 SSL Certificate

UptimeRobot mengecek SSL otomatis. Aktifkan di pengaturan monitor:

  1. Buka monitor → "Edit".
  2. Centang "Alert when SSL expires in" → set ke 30 hari.

7.5 Disk & Memory

UptimeRobot tidak mendukung push atau custom script. Gunakan Uptime Kuma Push Monitor (lihat Bagian 3.6) untuk komponen ini.

8. OneUptime

OneUptime adalah platform observability open-source all-in-one (uptime monitoring, incident management, status page). Bisa self-hosted maupun SaaS.

Instalasi Self-Hosted

git clone https://github.com/OneUptime/oneuptime.git
cd oneuptime
cp config.env.example config.env
docker compose up -d

8.1 API Service (12 Monitor)

  1. Navigasi ke "Monitors""Create Monitor".
  2. Ulangi untuk setiap service:
FieldNilai
Monitor TypeAPI
Name[nama service], contoh: authentication
URLURL sesuai tabel di Bagian 2.1
Request MethodPOST
Request Body(kosong)
Expected Status Code200
Check Interval1 minute

8.2 Database & Queue Service (TCP)

  1. "Create Monitor" → tipe TCP.
FieldNilai
Monitor TypeTCP
NamePostgreSQL / Redis / RabbitMQ
HostHost server masing-masing
Port5432 / 6379 / 5672
Check Interval1 minute

8.3 File Storage

FieldNilai
Monitor TypeAPI
NameFile Storage
URLhttps://your-storage-host/public/healthz
Request MethodGET
Expected Status Code200
Check Interval2 minutes

8.4 SSL Certificate

  1. "Create Monitor" → tipe Website.
  2. Aktifkan "SSL Certificate Monitoring" → set expiry alert ke 30 hari.

8.5 Disk & Memory

OneUptime mendukung Custom Code Monitor (script Node.js) untuk mengecek resource server:

  1. "Create Monitor" → tipe Custom Code.
  2. Masukkan script berikut:
// Script berjalan di server OneUptime
const { execSync } = require('child_process');

const disk = execSync("df / | tail -1 | awk '{print $5}'")
  .toString().trim().replace('%', '');
const mem = execSync("free | awk '/^Mem:/ {printf \"%.0f\", $3/$2*100}'")
  .toString().trim();

if (parseInt(disk) > 85 || parseInt(mem) > 90) {
  throw new Error(`Resource alert — Disk: ${disk}%, RAM: ${mem}%`);
}
  1. Set interval ke 5 menit.

8.6 Status Page & On-Call

Hubungkan semua monitor ke Status Page dan On-Call Policy yang sudah dikonfigurasi agar incident otomatis tereskalasi ke tim terkait.

9. Checkmk / Nagios / Zabbix

9.1 Nagios / Checkmk

API Service

Tambahkan service definition di Nagios config untuk setiap service:

define service {
    host_name               alurkerja-prod
    service_description     Health - Authentication
    check_command           check_http!-H your-domain.com -u /api/v1/authentication/public/healthz --ssl --method=POST
    check_interval          1
    max_check_attempts      3
    notification_interval   30
}

Gunakan plugin check_http via command line:

/usr/lib/nagios/plugins/check_http \
  -H your-domain.com \
  -u /api/v1/authentication/public/healthz \
  --method=POST \
  --ssl \
  -w 3 -c 5

Database & Queue (TCP)

# PostgreSQL
/usr/lib/nagios/plugins/check_tcp -H your-db-host -p 5432

# Redis
/usr/lib/nagios/plugins/check_tcp -H your-redis-host -p 6379

# RabbitMQ
/usr/lib/nagios/plugins/check_tcp -H your-rabbitmq-host -p 5672

File Storage

/usr/lib/nagios/plugins/check_http \
  -H your-storage-host \
  -u /public/healthz \
  --ssl -w 5 -c 10

SSL Certificate

/usr/lib/nagios/plugins/check_http \
  -H your-domain.com \
  --ssl \
  -C 30,14    # warning 30 hari, critical 14 hari

Disk & Memory

# Disk
/usr/lib/nagios/plugins/check_disk -w 15% -c 10% -p /

# Memory (butuh plugin check_mem)
/usr/lib/nagios/plugins/check_mem -w 90 -c 95

9.2 Zabbix

API Service

  1. "Configuration""Hosts""Items""Create Item".
FieldNilai
NameHealth - [nama service]
TypeHTTP agent
URLURL endpoint healthz
Request methodPOST
Update interval1m

Buat Trigger: last(/host/item.key)<>200

Database & Queue (TCP)

Gunakan tipe item Simple check dengan key net.tcp.port[host,port]:

ServiceKey
PostgreSQLnet.tcp.port[your-db-host,5432]
Redisnet.tcp.port[your-redis-host,6379]
RabbitMQnet.tcp.port[your-rabbitmq-host,5672]

Buat Trigger: last(/host/net.tcp.port[...])<>1

File Storage

Item tipe HTTP agent, method GET, URL ke endpoint storage, trigger jika status <>200.

SSL Certificate

Gunakan item tipe Simple check dengan key net.tcp.port[your-domain.com,443] untuk dasar, atau install Zabbix Agent dengan template SSL Certificate untuk cek expiry.

Disk & Memory

Gunakan Zabbix Agent dengan template bawaan Linux by Zabbix agent:

  • system.cpu.util — CPU usage
  • vm.memory.size[pavailable] — available memory
  • vfs.fs.size[/,pused] — disk usage

Set trigger: disk > 85%, memory available < 10%.

10. Verifikasi Manual (cURL)

Cek Satu Service

curl -X POST -o /dev/null -s -w "%{http_code}\n" \
  https://your-domain.com/api/v1/authentication/public/healthz

Cek TCP (Database / Queue)

nc -zv your-db-host 5432       # PostgreSQL
nc -zv your-redis-host 6379    # Redis
nc -zv your-rabbitmq-host 5672 # RabbitMQ

Cek SSL Certificate

echo | openssl s_client -connect your-domain.com:443 2>/dev/null \
  | openssl x509 -noout -dates

Script Cek Semua Sekaligus

Simpan sebagai check-health.sh:

#!/bin/bash

BASE_URL="https://your-domain.com"

echo "=============================="
echo " Alurkerja Health Check"
echo " $(date)"
echo "=============================="

# ─── API Service ───────────────
declare -A API_SERVICES=(
  ["authentication"]="${BASE_URL}/api/v1/authentication/public/healthz"
  ["bpm"]="${BASE_URL}/api/v1/probis/public/healthz"
  ["company_profile"]="${BASE_URL}/api/v1/compro/public/healthz"
  ["generate_test"]="${BASE_URL}/api/v1/generate-test/public/healthz"
  ["integration"]="${BASE_URL}/api/v1/integration/public/healthz"
  ["migration"]="${BASE_URL}/api/v1/migration/public/healthz"
  ["notification"]="${BASE_URL}/api/v1/notif/public/healthz"
  ["proxy"]="${BASE_URL}/api/v1/proxy/public/healthz"
  ["report"]="${BASE_URL}/api/v1/report/public/healthz"
  ["simulation"]="${BASE_URL}/api/v1/simulation/public/healthz"
  ["tenant_management"]="${BASE_URL}/api/v1/tenant/public/healthz"
  ["camunda_tasklist"]="${BASE_URL}/api/v1/tasklist/public/healthz"
)

echo ""
echo "── API Service ──"
ALL_OK=true
for SERVICE in "${!API_SERVICES[@]}"; do
  STATUS=$(curl -X POST -o /dev/null -s -w "%{http_code}" \
    --max-time 10 "${API_SERVICES[$SERVICE]}")
  if [ "$STATUS" == "200" ]; then
    echo "  ✅ $SERVICE$STATUS"
  else
    echo "  ❌ $SERVICE$STATUS"
    ALL_OK=false
  fi
done

# ─── TCP Check ─────────────────
declare -A TCP_SERVICES=(
  ["postgresql"]="your-db-host:5432"
  ["redis"]="your-redis-host:6379"
  ["rabbitmq"]="your-rabbitmq-host:5672"
)

echo ""
echo "── Infrastruktur (TCP) ──"
for SERVICE in "${!TCP_SERVICES[@]}"; do
  HOST=$(echo "${TCP_SERVICES[$SERVICE]}" | cut -d: -f1)
  PORT=$(echo "${TCP_SERVICES[$SERVICE]}" | cut -d: -f2)
  if nc -z -w 5 "$HOST" "$PORT" 2>/dev/null; then
    echo "  ✅ $SERVICE ($HOST:$PORT)"
  else
    echo "  ❌ $SERVICE ($HOST:$PORT) — connection refused"
    ALL_OK=false
  fi
done

# ─── File Storage ───────────────
echo ""
echo "── File Storage ──"
FS_STATUS=$(curl -o /dev/null -s -w "%{http_code}" \
  --max-time 10 "https://your-storage-host/public/healthz")
if [ "$FS_STATUS" == "200" ]; then
  echo "  ✅ file-storage → $FS_STATUS"
else
  echo "  ❌ file-storage → $FS_STATUS"
  ALL_OK=false
fi

# ─── SSL Certificate ────────────
echo ""
echo "── SSL Certificate ──"
EXPIRY=$(echo | openssl s_client -connect your-domain.com:443 2>/dev/null \
  | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
if [ -n "$EXPIRY" ]; then
  echo "  ✅ SSL expires: $EXPIRY"
else
  echo "  ❌ SSL check failed"
  ALL_OK=false
fi

echo ""
echo "=============================="
if [ "$ALL_OK" = true ]; then
  echo " Semua monitor sehat ✅"
else
  echo " Ada monitor yang bermasalah ❌"
  exit 1
fi
chmod +x check-health.sh
./check-health.sh

11. Rekomendasi Konfigurasi

KomponenMethodIntervalTimeoutRetryThreshold Alert
API Service (12 service)POST60 detik10 detik3Status ≠ 200, latency > 3s
Database (PostgreSQL)TCP60 detik10 detik3Connection refused
Queue (Redis / RabbitMQ)TCP60 detik10 detik3Connection refused
File StorageGET120 detik10 detik3Status ≠ 200
SSL CertificateGETHarian10 detik1Expire < 30 hari
Disk & MemoryPush/Agent5 menitDisk > 85%, RAM > 90%

12. Troubleshooting

API Service mengembalikan 404

  • Deployment terbaru belum selesai atau gagal — cek status pod/container.
  • Base path salah — cek kembali tabel di Bagian 2.1.

API Service mengembalikan 401

Endpoint /public/healthz harus dikonfigurasi sebagai public route tanpa autentikasi:

  • Pastikan route /public/healthz didaftarkan di luar middleware auth.
  • Cek API Gateway / reverse proxy — tambahkan pengecualian untuk path /public/healthz.

TCP Check gagal (Database / Queue)

  • Pastikan firewall mengizinkan koneksi dari server monitoring ke port target.
  • Verifikasi service berjalan: systemctl status postgresql / redis-cli ping / rabbitmqctl status.

SSL Check menunjukkan expiry tidak akurat

  • Pastikan server monitoring memiliki akses ke internet untuk menjangkau domain target.
  • Cek manual: echo | openssl s_client -connect your-domain.com:443 2>/dev/null | openssl x509 -noout -dates

Monitoring tool menunjukkan DOWN padahal service berjalan

  • Pastikan method yang digunakan sudah POST untuk API service (bukan GET).
  • Cek whitelist IP — monitoring tool mungkin diblokir firewall.
  • Verifikasi manual via cURL dari server monitoring.

False positive alert

  • Naikkan Retries menjadi 3 sebelum alert dikirim.
  • Gunakan interval minimum 1 menit untuk menghindari noise dari fluktuasi jaringan.

On this page

1. Overview2. Referensi Monitor2.1 API Service (12 Service)2.2 Komponen Infrastruktur3. Uptime Kuma3.1 API Service3.2 Database (PostgreSQL)3.3 Queue Service (Redis & RabbitMQ)3.4 File Storage3.5 SSL Certificate3.6 Disk & Memory (Push Monitor)3.7 Notifikasi Alert4. Grafana + Prometheus4.1 Setup Blackbox Exporter4.2 API Service (12 Service)4.3 Database & Queue Service (TCP)4.4 File Storage4.5 SSL Certificate4.6 Disk & Memory (Node Exporter)4.7 Query json & Alerting5. Better Uptime / BetterStack5.1 API Service (12 Monitor)5.2 Database & Queue Service (TCP)5.3 File Storage5.4 SSL Certificate5.5 Disk & Memory5.6 On-Call & Escalation6. GatusInstalasiconfig/config.yaml — Lengkap (Semua Monitor)7. UptimeRobot7.1 API Service (12 Monitor)7.2 Database & Queue Service (TCP)7.3 File Storage7.4 SSL Certificate7.5 Disk & Memory8. OneUptimeInstalasi Self-Hosted8.1 API Service (12 Monitor)8.2 Database & Queue Service (TCP)8.3 File Storage8.4 SSL Certificate8.5 Disk & Memory8.6 Status Page & On-Call9. Checkmk / Nagios / Zabbix9.1 Nagios / CheckmkAPI ServiceDatabase & Queue (TCP)File StorageSSL CertificateDisk & Memory9.2 ZabbixAPI ServiceDatabase & Queue (TCP)File StorageSSL CertificateDisk & Memory10. Verifikasi Manual (cURL)Cek Satu ServiceCek TCP (Database / Queue)Cek SSL CertificateScript Cek Semua Sekaligus11. Rekomendasi Konfigurasi12. TroubleshootingAPI Service mengembalikan 404API Service mengembalikan 401TCP Check gagal (Database / Queue)SSL Check menunjukkan expiry tidak akuratMonitoring tool menunjukkan DOWN padahal service berjalanFalse positive alert