Health Monitoring
Implementasi health monitoring di Alurkerja
1. Overview
Monitoring Alurkerja mencakup dua kategori:
- API Service — 12 service yang masing-masing memiliki endpoint
POST /public/healthz, dimonitor setiap 60 detik dengan expected status200. - Infrastruktur — komponen pendukung (database, queue, storage, SSL, disk/memory) dengan metode dan interval monitor yang berbeda-beda.
2. Referensi Monitor
Gunakan tabel berikut sebagai acuan saat mengkonfigurasi monitor di tools manapun.
2.1 API Service (12 Service)
Berikut adalah 12 service beserta endpoint /public/healthz masing-masing:
| No | Service | Base Path | Health Endpoint |
|---|---|---|---|
| 1 | authentication | /api/v1/authentication | /api/v1/authentication/public/healthz |
| 2 | bpm | /api/v1/probis | /api/v1/probis/public/healthz |
| 3 | company_profile | /api/v1/compro | /api/v1/compro/public/healthz |
| 4 | generate_test | /api/v1/generate-test | /api/v1/generate-test/public/healthz |
| 5 | integration | /api/v1/integration | /api/v1/integration/public/healthz |
| 6 | migration | /api/v1/migration | /api/v1/migration/public/healthz |
| 7 | notification | /api/v1/notif | /api/v1/notif/public/healthz |
| 8 | proxy | /api/v1/proxy | /api/v1/proxy/public/healthz |
| 9 | report | /api/v1/report | /api/v1/report/public/healthz |
| 10 | simulation | /api/v1/simulation | /api/v1/simulation/public/healthz |
| 11 | tenant_management | /api/v1/tenant | /api/v1/tenant/public/healthz |
| 12 | camunda tasklist | /api/v1/tasklist | /api/v1/tasklist/public/healthz |
2.2 Komponen Infrastruktur
| Komponen | Tipe Monitor | Method | Interval | Threshold Alert | Target |
|---|---|---|---|---|---|
| Database (PostgreSQL) | TCP Port | — | 60 detik | Connection refused | your-db-host:5432 |
| Queue — Redis | TCP Port | — | 60 detik | Connection refused | your-redis-host:6379 |
| Queue — RabbitMQ | TCP Port | — | 60 detik | Connection refused | your-rabbitmq-host:5672 |
| File Storage | HTTP(s) | GET | 120 detik | Status ≠ 200 | https://your-storage-host/public/healthz |
| SSL Certificate | SSL/TLS | GET | Harian | Expire < 30 hari | https://your-domain.com |
| Disk & Memory | Push (Cron) | — | 5 menit | Disk > 85%, RAM > 90% | Push ke Uptime Kuma |
3. Uptime Kuma
Uptime Kuma adalah self-hosted monitoring tool yang ringan dan mudah dikonfigurasi.
3.1 API Service
Ulangi langkah berikut untuk setiap service:
- Klik "Add New Monitor" → pilih tipe
HTTP(s). - Isi konfigurasi:
| Field | Nilai |
|---|---|
| Monitor Type | HTTP(s) |
| Friendly Name | Alurkerja - [nama service], contoh: Alurkerja - Authentication |
| URL | URL sesuai tabel di Bagian 2.1 |
| HTTP Method | POST (tanpa body) |
| Heartbeat Interval | 60 detik |
| Retries | 3 |
| Expected Status Code | 200 |
3.2 Database (PostgreSQL)
- "Add New Monitor" → tipe
TCP Port.
| Field | Nilai |
|---|---|
| Friendly Name | Alurkerja - PostgreSQL |
| Hostname | your-db-host |
| Port | 5432 |
| Heartbeat Interval | 60 detik |
| Retries | 3 |
3.3 Queue Service (Redis & RabbitMQ)
Buat dua monitor terpisah dengan tipe TCP Port:
Redis:
| Field | Nilai |
|---|---|
| Friendly Name | Alurkerja - Redis |
| Hostname | your-redis-host |
| Port | 6379 |
| Heartbeat Interval | 60 detik |
RabbitMQ:
| Field | Nilai |
|---|---|
| Friendly Name | Alurkerja - RabbitMQ |
| Hostname | your-rabbitmq-host |
| Port | 5672 |
| Heartbeat Interval | 60 detik |
3.4 File Storage
- "Add New Monitor" → tipe
HTTP(s).
| Field | Nilai |
|---|---|
| Friendly Name | Alurkerja - File Storage |
| URL | https://your-storage-host/public/healthz |
| HTTP Method | GET |
| Heartbeat Interval | 120 detik |
| Expected Status Code | 200 |
3.5 SSL Certificate
Uptime Kuma mengecek SSL otomatis pada setiap monitor HTTP(s). Aktifkan notifikasi SSL secara eksplisit:
- Buka monitor yang sudah ada (atau buat baru) → tipe
HTTP(s). - Aktifkan "Enable SSL Certificate Expiry Notification".
- Set threshold ke 30 hari.
| Field | Nilai |
|---|---|
| Friendly Name | Alurkerja - SSL Certificate |
| URL | https://your-domain.com |
| HTTP Method | GET |
| Heartbeat Interval | 86400 detik (1 hari) |
| SSL Expiry Threshold | 30 hari |
3.6 Disk & Memory (Push Monitor)
Disk dan memory menggunakan mekanisme Push — server mengirim heartbeat ke Uptime Kuma via cron job.
Setup monitor:
- "Add New Monitor" → tipe
Push. - Salin URL push yang dihasilkan, contoh:
https://uptime-kuma.your-domain.com/api/push/xxxxxx?status=up&msg=OK&ping= - Set Heartbeat Interval ke
300detik (5 menit).
Script push (/opt/scripts/health-push.sh):
#!/bin/bash
PUSH_URL="https://uptime-kuma.your-domain.com/api/push/xxxxxx"
DISK_USAGE=$(df -h | grep -vE '^Filesystem|tmpfs|cdrom' \
| awk '{ print $5 }' | sed 's/%//' | sort -n | tail -1)
MEM_TOTAL=$(free | awk '/^Mem:/ {print $2}')
MEM_USED=$(free | awk '/^Mem:/ {print $3}')
MEM_USAGE=$(( MEM_USED * 100 / MEM_TOTAL ))
STATUS="up"
MSG="Disk: ${DISK_USAGE}%, RAM: ${MEM_USAGE}%"
if [ "$DISK_USAGE" -gt 85 ] || [ "$MEM_USAGE" -gt 90 ]; then
STATUS="down"
MSG="ALERT - Disk: ${DISK_USAGE}%, RAM: ${MEM_USAGE}%"
fi
curl -s "${PUSH_URL}?status=${STATUS}&msg=${MSG}&ping=" > /dev/nullDaftarkan ke cron job:
chmod +x /opt/scripts/health-push.sh
crontab -e
# Tambahkan:
*/5 * * * * /opt/scripts/health-push.sh >> /var/log/health-push.log 2>&13.7 Notifikasi Alert
Di tab "Notifications" pada setiap monitor, tambahkan channel notifikasi (Slack, Telegram, Email, dll.).
4. Grafana + Prometheus
Untuk monitoring dengan visualisasi mendalam, histori metrik, dan alerting terpusat.
4.1 Setup Blackbox Exporter
Prometheus Blackbox Exporter digunakan untuk HTTP dan TCP probing.
docker run -d \
--name blackbox_exporter \
-p 9115:9115 \
-v $(pwd)/blackbox.yml:/config/blackbox.yml \
prom/blackbox-exporter:latest \
--config.file=/config/blackbox.ymlblackbox.yml:
modules:
http_post_2xx:
prober: http
timeout: 10s
http:
method: POST
valid_status_codes: [200]
preferred_ip_protocol: "ip4"
http_get_2xx:
prober: http
timeout: 10s
http:
method: GET
valid_status_codes: [200]
preferred_ip_protocol: "ip4"
tcp_connect:
prober: tcp
timeout: 10s
https_ssl:
prober: http
timeout: 10s
http:
method: GET
tls_config:
insecure_skip_verify: false4.2 API Service (12 Service)
prometheus.yml — job API service:
scrape_configs:
- job_name: 'alurkerja-api'
metrics_path: /probe
params:
module: [http_post_2xx]
static_configs:
- targets:
- https://your-domain.com/api/v1/authentication/public/healthz
- https://your-domain.com/api/v1/probis/public/healthz
- https://your-domain.com/api/v1/compro/public/healthz
- https://your-domain.com/api/v1/generate-test/public/healthz
- https://your-domain.com/api/v1/integration/public/healthz
- https://your-domain.com/api/v1/migration/public/healthz
- https://your-domain.com/api/v1/notif/public/healthz
- https://your-domain.com/api/v1/proxy/public/healthz
- https://your-domain.com/api/v1/report/public/healthz
- https://your-domain.com/api/v1/simulation/public/healthz
- https://your-domain.com/api/v1/tenant/public/healthz
- https://your-domain.com/api/v1/tasklist/public/healthz
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:91154.3 Database & Queue Service (TCP)
- job_name: 'alurkerja-tcp'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- your-db-host:5432 # PostgreSQL
- your-redis-host:6379 # Redis
- your-rabbitmq-host:5672 # RabbitMQ
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:91154.4 File Storage
- job_name: 'alurkerja-storage'
metrics_path: /probe
params:
module: [http_get_2xx]
static_configs:
- targets:
- https://your-storage-host/public/healthz
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:9115
scrape_interval: 2m4.5 SSL Certificate
- job_name: 'alurkerja-ssl'
metrics_path: /probe
params:
module: [https_ssl]
static_configs:
- targets:
- https://your-domain.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:9115
scrape_interval: 24h4.6 Disk & Memory (Node Exporter)
docker run -d \
--name node_exporter \
--pid="host" \
-v "/:/host:ro,rslave" \
-p 9100:9100 \
quay.io/prometheus/node-exporter:latest \
--path.rootfs=/host - job_name: 'node-exporter'
static_configs:
- targets: ['your-server-host:9100']4.7 Query json & Alerting
# API/Storage service down
probe_success{job=~"alurkerja-api|alurkerja-storage"} == 0
# TCP connection failed (DB / Queue)
probe_success{job="alurkerja-tcp"} == 0
# Response time API > 3 detik
probe_duration_seconds{job="alurkerja-api"} > 3
# SSL expire < 30 hari
probe_ssl_earliest_cert_expiry - time() < 86400 * 30
# Disk usage > 85%
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85
# Memory usage > 90%
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90Buat alert rule di Grafana: Alerting → Alert Rules → New Alert Rule, lalu hubungkan ke Contact Point (Slack, PagerDuty, email).
5. Better Uptime / BetterStack
BetterStack adalah layanan SaaS monitoring dengan status page bawaan dan on-call management.
5.1 API Service (12 Monitor)
- Navigasi ke "Monitors" → "New Monitor".
- Ulangi untuk setiap service:
| Field | Nilai |
|---|---|
| Monitor type | Website / API |
| Name | Alurkerja - [nama service] |
| URL | URL sesuai tabel di Bagian 2.1 |
| Request type | POST |
| Request body | (kosong) |
| Expected status code | 200 |
| Check frequency | 1 minute |
| Regions | Singapore (atau terdekat) |
5.2 Database & Queue Service (TCP)
BetterStack mendukung TCP check secara native:
- "New Monitor" → tipe
TCP.
| Field | Nilai |
|---|---|
| Name | Alurkerja - PostgreSQL / Redis / RabbitMQ |
| Host | Host server masing-masing |
| Port | 5432 / 6379 / 5672 |
| Check frequency | 1 minute |
5.3 File Storage
| Field | Nilai |
|---|---|
| Monitor type | Website / API |
| Name | Alurkerja - File Storage |
| URL | https://your-storage-host/public/healthz |
| Request type | GET |
| Expected status code | 200 |
| Check frequency | 2 minutes |
5.4 SSL Certificate
BetterStack mengecek SSL otomatis pada setiap monitor HTTPS. Konfigurasi tambahan:
- Buka monitor → tab "SSL".
- Aktifkan "Notify before SSL certificate expires".
- Set threshold ke 30 hari.
5.5 Disk & Memory
BetterStack tidak mendukung push/custom script secara native. Gunakan Uptime Kuma Push Monitor atau Node Exporter + Grafana untuk komponen ini (lihat Bagian 3.6 dan Bagian 4.6).
5.6 On-Call & Escalation
Di tab "On-call", konfigurasikan eskalasi ke tim terkait berdasarkan tier kritikal service.
6. Gatus
Gatus adalah self-hosted monitoring tool berbasis YAML yang ringan, Kubernetes-native, dan mudah di-version control.
Instalasi
docker run -d \
--name gatus \
-p 8080:8080 \
-v $(pwd)/config:/config \
twinproduction/gatus:latestconfig/config.yaml — Lengkap (Semua Monitor)
# Konfigurasi alerting global
alerting:
slack:
webhook-url: "https://hooks.slack.com/services/xxx/yyy/zzz"
default-alert:
failure-threshold: 3
success-threshold: 1
send-on-resolved: true
endpoints:
# ───────────────────────────────
# API Service (12 Service)
# ───────────────────────────────
- name: authentication
group: api-service
url: https://your-domain.com/api/v1/authentication/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: bpm
group: api-service
url: https://your-domain.com/api/v1/probis/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: company_profile
group: api-service
url: https://your-domain.com/api/v1/compro/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: generate_test
group: api-service
url: https://your-domain.com/api/v1/generate-test/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: integration
group: api-service
url: https://your-domain.com/api/v1/integration/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: migration
group: api-service
url: https://your-domain.com/api/v1/migration/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: notification
group: api-service
url: https://your-domain.com/api/v1/notif/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: proxy
group: api-service
url: https://your-domain.com/api/v1/proxy/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: report
group: api-service
url: https://your-domain.com/api/v1/report/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: simulation
group: api-service
url: https://your-domain.com/api/v1/simulation/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: tenant_management
group: api-service
url: https://your-domain.com/api/v1/tenant/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
- name: camunda_tasklist
group: api-service
url: https://your-domain.com/api/v1/tasklist/public/healthz
method: POST
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
alerts:
- type: slack
# ───────────────────────────────
# Infrastruktur
# ───────────────────────────────
- name: postgresql
group: infrastructure
url: tcp://your-db-host:5432
interval: 1m
conditions:
- "[CONNECTED] == true"
alerts:
- type: slack
- name: redis
group: infrastructure
url: tcp://your-redis-host:6379
interval: 1m
conditions:
- "[CONNECTED] == true"
alerts:
- type: slack
- name: rabbitmq
group: infrastructure
url: tcp://your-rabbitmq-host:5672
interval: 1m
conditions:
- "[CONNECTED] == true"
alerts:
- type: slack
- name: file-storage
group: infrastructure
url: https://your-storage-host/public/healthz
method: GET
interval: 2m
conditions:
- "[STATUS] == 200"
alerts:
- type: slack
- name: ssl-certificate
group: infrastructure
url: https://your-domain.com
method: GET
interval: 24h
conditions:
- "[STATUS] == 200"
- "[CERTIFICATE_EXPIRATION] > 720h" # 30 hari
alerts:
- type: slack
failure-threshold: 1
description: "SSL certificate akan expired dalam < 30 hari"Gatus tidak mendukung disk/memory check secara native. Gunakan kombinasi Node Exporter + Grafana atau Push Monitor di Uptime Kuma untuk komponen tersebut.
7. UptimeRobot
UptimeRobot adalah layanan SaaS dengan free tier hingga 50 monitor (interval minimum 5 menit untuk free, 1 menit untuk Pro).
7.1 API Service (12 Monitor)
- Klik "+ Add New Monitor" → tipe
HTTP(s). - Ulangi untuk setiap service:
| Field | Nilai |
|---|---|
| Monitor Type | HTTP(s) |
| Friendly Name | Alurkerja - [nama service] |
| URL | URL sesuai tabel di Bagian 2.1 |
| HTTP Method | POST (tanpa body) |
| Expected status code | 200 |
| Monitoring Interval | 1 minute (Pro) / 5 minutes (Free) |
7.2 Database & Queue Service (TCP)
- "Add New Monitor" → tipe
Port.
| Komponen | Host | Port |
|---|---|---|
| PostgreSQL | your-db-host | 5432 |
| Redis | your-redis-host | 6379 |
| RabbitMQ | your-rabbitmq-host | 5672 |
Isi Monitoring Interval 1 minute untuk masing-masing.
7.3 File Storage
| Field | Nilai |
|---|---|
| Monitor Type | HTTP(s) |
| Friendly Name | Alurkerja - File Storage |
| URL | https://your-storage-host/public/healthz |
| HTTP Method | GET |
| Monitoring Interval | 5 minutes |
7.4 SSL Certificate
UptimeRobot mengecek SSL otomatis. Aktifkan di pengaturan monitor:
- Buka monitor → "Edit".
- Centang "Alert when SSL expires in" → set ke 30 hari.
7.5 Disk & Memory
UptimeRobot tidak mendukung push atau custom script. Gunakan Uptime Kuma Push Monitor (lihat Bagian 3.6) untuk komponen ini.
8. OneUptime
OneUptime adalah platform observability open-source all-in-one (uptime monitoring, incident management, status page). Bisa self-hosted maupun SaaS.
Instalasi Self-Hosted
git clone https://github.com/OneUptime/oneuptime.git
cd oneuptime
cp config.env.example config.env
docker compose up -d8.1 API Service (12 Monitor)
- Navigasi ke "Monitors" → "Create Monitor".
- Ulangi untuk setiap service:
| Field | Nilai |
|---|---|
| Monitor Type | API |
| Name | [nama service], contoh: authentication |
| URL | URL sesuai tabel di Bagian 2.1 |
| Request Method | POST |
| Request Body | (kosong) |
| Expected Status Code | 200 |
| Check Interval | 1 minute |
8.2 Database & Queue Service (TCP)
- "Create Monitor" → tipe
TCP.
| Field | Nilai |
|---|---|
| Monitor Type | TCP |
| Name | PostgreSQL / Redis / RabbitMQ |
| Host | Host server masing-masing |
| Port | 5432 / 6379 / 5672 |
| Check Interval | 1 minute |
8.3 File Storage
| Field | Nilai |
|---|---|
| Monitor Type | API |
| Name | File Storage |
| URL | https://your-storage-host/public/healthz |
| Request Method | GET |
| Expected Status Code | 200 |
| Check Interval | 2 minutes |
8.4 SSL Certificate
- "Create Monitor" → tipe
Website. - Aktifkan "SSL Certificate Monitoring" → set expiry alert ke 30 hari.
8.5 Disk & Memory
OneUptime mendukung Custom Code Monitor (script Node.js) untuk mengecek resource server:
- "Create Monitor" → tipe
Custom Code. - Masukkan script berikut:
// Script berjalan di server OneUptime
const { execSync } = require('child_process');
const disk = execSync("df / | tail -1 | awk '{print $5}'")
.toString().trim().replace('%', '');
const mem = execSync("free | awk '/^Mem:/ {printf \"%.0f\", $3/$2*100}'")
.toString().trim();
if (parseInt(disk) > 85 || parseInt(mem) > 90) {
throw new Error(`Resource alert — Disk: ${disk}%, RAM: ${mem}%`);
}- Set interval ke 5 menit.
8.6 Status Page & On-Call
Hubungkan semua monitor ke Status Page dan On-Call Policy yang sudah dikonfigurasi agar incident otomatis tereskalasi ke tim terkait.
9. Checkmk / Nagios / Zabbix
9.1 Nagios / Checkmk
API Service
Tambahkan service definition di Nagios config untuk setiap service:
define service {
host_name alurkerja-prod
service_description Health - Authentication
check_command check_http!-H your-domain.com -u /api/v1/authentication/public/healthz --ssl --method=POST
check_interval 1
max_check_attempts 3
notification_interval 30
}Gunakan plugin check_http via command line:
/usr/lib/nagios/plugins/check_http \
-H your-domain.com \
-u /api/v1/authentication/public/healthz \
--method=POST \
--ssl \
-w 3 -c 5Database & Queue (TCP)
# PostgreSQL
/usr/lib/nagios/plugins/check_tcp -H your-db-host -p 5432
# Redis
/usr/lib/nagios/plugins/check_tcp -H your-redis-host -p 6379
# RabbitMQ
/usr/lib/nagios/plugins/check_tcp -H your-rabbitmq-host -p 5672File Storage
/usr/lib/nagios/plugins/check_http \
-H your-storage-host \
-u /public/healthz \
--ssl -w 5 -c 10SSL Certificate
/usr/lib/nagios/plugins/check_http \
-H your-domain.com \
--ssl \
-C 30,14 # warning 30 hari, critical 14 hariDisk & Memory
# Disk
/usr/lib/nagios/plugins/check_disk -w 15% -c 10% -p /
# Memory (butuh plugin check_mem)
/usr/lib/nagios/plugins/check_mem -w 90 -c 959.2 Zabbix
API Service
- "Configuration" → "Hosts" → "Items" → "Create Item".
| Field | Nilai |
|---|---|
| Name | Health - [nama service] |
| Type | HTTP agent |
| URL | URL endpoint healthz |
| Request method | POST |
| Update interval | 1m |
Buat Trigger: last(/host/item.key)<>200
Database & Queue (TCP)
Gunakan tipe item Simple check dengan key net.tcp.port[host,port]:
| Service | Key |
|---|---|
| PostgreSQL | net.tcp.port[your-db-host,5432] |
| Redis | net.tcp.port[your-redis-host,6379] |
| RabbitMQ | net.tcp.port[your-rabbitmq-host,5672] |
Buat Trigger: last(/host/net.tcp.port[...])<>1
File Storage
Item tipe HTTP agent, method GET, URL ke endpoint storage, trigger jika status <>200.
SSL Certificate
Gunakan item tipe Simple check dengan key net.tcp.port[your-domain.com,443] untuk dasar, atau install Zabbix Agent dengan template SSL Certificate untuk cek expiry.
Disk & Memory
Gunakan Zabbix Agent dengan template bawaan Linux by Zabbix agent:
system.cpu.util— CPU usagevm.memory.size[pavailable]— available memoryvfs.fs.size[/,pused]— disk usage
Set trigger: disk > 85%, memory available < 10%.
10. Verifikasi Manual (cURL)
Cek Satu Service
curl -X POST -o /dev/null -s -w "%{http_code}\n" \
https://your-domain.com/api/v1/authentication/public/healthzCek TCP (Database / Queue)
nc -zv your-db-host 5432 # PostgreSQL
nc -zv your-redis-host 6379 # Redis
nc -zv your-rabbitmq-host 5672 # RabbitMQCek SSL Certificate
echo | openssl s_client -connect your-domain.com:443 2>/dev/null \
| openssl x509 -noout -datesScript Cek Semua Sekaligus
Simpan sebagai check-health.sh:
#!/bin/bash
BASE_URL="https://your-domain.com"
echo "=============================="
echo " Alurkerja Health Check"
echo " $(date)"
echo "=============================="
# ─── API Service ───────────────
declare -A API_SERVICES=(
["authentication"]="${BASE_URL}/api/v1/authentication/public/healthz"
["bpm"]="${BASE_URL}/api/v1/probis/public/healthz"
["company_profile"]="${BASE_URL}/api/v1/compro/public/healthz"
["generate_test"]="${BASE_URL}/api/v1/generate-test/public/healthz"
["integration"]="${BASE_URL}/api/v1/integration/public/healthz"
["migration"]="${BASE_URL}/api/v1/migration/public/healthz"
["notification"]="${BASE_URL}/api/v1/notif/public/healthz"
["proxy"]="${BASE_URL}/api/v1/proxy/public/healthz"
["report"]="${BASE_URL}/api/v1/report/public/healthz"
["simulation"]="${BASE_URL}/api/v1/simulation/public/healthz"
["tenant_management"]="${BASE_URL}/api/v1/tenant/public/healthz"
["camunda_tasklist"]="${BASE_URL}/api/v1/tasklist/public/healthz"
)
echo ""
echo "── API Service ──"
ALL_OK=true
for SERVICE in "${!API_SERVICES[@]}"; do
STATUS=$(curl -X POST -o /dev/null -s -w "%{http_code}" \
--max-time 10 "${API_SERVICES[$SERVICE]}")
if [ "$STATUS" == "200" ]; then
echo " ✅ $SERVICE → $STATUS"
else
echo " ❌ $SERVICE → $STATUS"
ALL_OK=false
fi
done
# ─── TCP Check ─────────────────
declare -A TCP_SERVICES=(
["postgresql"]="your-db-host:5432"
["redis"]="your-redis-host:6379"
["rabbitmq"]="your-rabbitmq-host:5672"
)
echo ""
echo "── Infrastruktur (TCP) ──"
for SERVICE in "${!TCP_SERVICES[@]}"; do
HOST=$(echo "${TCP_SERVICES[$SERVICE]}" | cut -d: -f1)
PORT=$(echo "${TCP_SERVICES[$SERVICE]}" | cut -d: -f2)
if nc -z -w 5 "$HOST" "$PORT" 2>/dev/null; then
echo " ✅ $SERVICE ($HOST:$PORT)"
else
echo " ❌ $SERVICE ($HOST:$PORT) — connection refused"
ALL_OK=false
fi
done
# ─── File Storage ───────────────
echo ""
echo "── File Storage ──"
FS_STATUS=$(curl -o /dev/null -s -w "%{http_code}" \
--max-time 10 "https://your-storage-host/public/healthz")
if [ "$FS_STATUS" == "200" ]; then
echo " ✅ file-storage → $FS_STATUS"
else
echo " ❌ file-storage → $FS_STATUS"
ALL_OK=false
fi
# ─── SSL Certificate ────────────
echo ""
echo "── SSL Certificate ──"
EXPIRY=$(echo | openssl s_client -connect your-domain.com:443 2>/dev/null \
| openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
if [ -n "$EXPIRY" ]; then
echo " ✅ SSL expires: $EXPIRY"
else
echo " ❌ SSL check failed"
ALL_OK=false
fi
echo ""
echo "=============================="
if [ "$ALL_OK" = true ]; then
echo " Semua monitor sehat ✅"
else
echo " Ada monitor yang bermasalah ❌"
exit 1
fichmod +x check-health.sh
./check-health.sh11. Rekomendasi Konfigurasi
| Komponen | Method | Interval | Timeout | Retry | Threshold Alert |
|---|---|---|---|---|---|
| API Service (12 service) | POST | 60 detik | 10 detik | 3 | Status ≠ 200, latency > 3s |
| Database (PostgreSQL) | TCP | 60 detik | 10 detik | 3 | Connection refused |
| Queue (Redis / RabbitMQ) | TCP | 60 detik | 10 detik | 3 | Connection refused |
| File Storage | GET | 120 detik | 10 detik | 3 | Status ≠ 200 |
| SSL Certificate | GET | Harian | 10 detik | 1 | Expire < 30 hari |
| Disk & Memory | Push/Agent | 5 menit | — | — | Disk > 85%, RAM > 90% |
12. Troubleshooting
API Service mengembalikan 404
- Deployment terbaru belum selesai atau gagal — cek status pod/container.
- Base path salah — cek kembali tabel di Bagian 2.1.
API Service mengembalikan 401
Endpoint /public/healthz harus dikonfigurasi sebagai public route tanpa autentikasi:
- Pastikan route
/public/healthzdidaftarkan di luar middleware auth. - Cek API Gateway / reverse proxy — tambahkan pengecualian untuk path
/public/healthz.
TCP Check gagal (Database / Queue)
- Pastikan firewall mengizinkan koneksi dari server monitoring ke port target.
- Verifikasi service berjalan:
systemctl status postgresql/redis-cli ping/rabbitmqctl status.
SSL Check menunjukkan expiry tidak akurat
- Pastikan server monitoring memiliki akses ke internet untuk menjangkau domain target.
- Cek manual:
echo | openssl s_client -connect your-domain.com:443 2>/dev/null | openssl x509 -noout -dates
Monitoring tool menunjukkan DOWN padahal service berjalan
- Pastikan method yang digunakan sudah
POSTuntuk API service (bukanGET). - Cek whitelist IP — monitoring tool mungkin diblokir firewall.
- Verifikasi manual via cURL dari server monitoring.
False positive alert
- Naikkan Retries menjadi
3sebelum alert dikirim. - Gunakan interval minimum
1 menituntuk menghindari noise dari fluktuasi jaringan.
