Halo Guys, Ridwan heres! Come again with a new notes update!
Kali ini saya mau berbagi sedikit tips Troubleshooting di OpenStack. Sebagai SysAdmin pasti pernah mengalami momen di mana Monitoring System Trigger Alert pada salah satu Compute Node (Hypervisor) yang mengalami High Load Resource Usage.
Alert di atas berisi informasi terkait kondisi Resource CPU usage melebihi nilai ambang batas normal, dan tidak ada informasi lebih details seperti penyebabnya apa, aplikasi apa yang menggunakan resource tersebut, dan lainnya. Untuk mencari tahu details tersebut, kita butuh belajar kembali "Back to Basic" pada course di Red Hat 124 dengan judul materi "Monitoring and Managing Linux Processes"
1. Analisis Anomali Resource usage di Server Compute
Pada output command "top" akan ada banyak proses qemu-kvm karena server ini merupakan hypervisor tempat running nya VM.
top - 13:31:48 up 66 days, 23:50, 1 user, load average: 126.82, 126.23, 128.78
Tasks: 3357 total, 4 running, 3353 sleeping, 0 stopped, 0 zombie
%Cpu(s): 52.4 us, 8.6 sy, 0.0 ni, 35.2 id, 0.0 wa, 1.9 hi, 1.8 si, 0.0 st
MiB Mem : 3094507.+total, 614593.1 free, 2479829.+used, 12083.9 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 614678.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
691730 qemu 20 0 518.2g 510.5g 22936 R 2646 16.9 565461:25 qemu-kvm
75529 qemu 20 0 35.0g 31.8g 22584 S 1165 1.1 725669:37 qemu-kvm
213227 qemu 20 0 132.2g 124.6g 22696 S 775.4 4.1 529504:26 qemu-kvmPerhatikan baris pertama pada kolom PID dan %CPU. Di situ terlihat jelas ada proses dari qemu-kvm dengan PID 691730 yang menggunakan CPU secara tidak wajar (sampai 2646% karena multicore). Ini dia penyebab yang bikin load average server jadi tinggi.
2. Mencari Openstack ID-Instance dari Process ID (PID)
[tripleo-admin@Openstack-Compute-2 ~]$ ps aux | grep 691730
tripleo+ 578303 0.0 0.0 6408 2316 pts/27 S+ 13:31 0:00 grep --color=auto 691730
qemu 691730 1201 16.8 543362420 535312312 ? Sl 2025 565465:21 /usr/libexec/qemu-kvm -name guest=instance-00003161,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-324-instance-00003161/master-key.aes"} -machine pc-q35-rhel9.0.0,usb=off,dump-guest-core=off,memory-backend=pc.ram -accel kvm -cpu Cascadelake-Server-noTSX -m 524288 -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":549755813888} -overcommit mem-lock=off -smp 64,sockets=64,dies=1,cores=1,threads=1 -uuid 457990e4-8d26-4f18-9940-72f652b99572 -smbios type=1,manufacturer=Red Hat,product=OpenStack Compute,version=23.2.3-17.1.20231018130828.el9ost,serial=457990e4-8d26-4f18-9940-72f652b99572,uuid=457990e4-8d26-4f18-9940-72f652b99572,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=59,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device {"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"} -device {"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"} -device {"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"} -device {"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"} -device {"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"} -device {"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"} -device {"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"} -device {"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"} -device {"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"} -device {"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"} -device {"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"} -device {"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"} -device {"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"} -device {"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"} -device {"driver":"pcie-root-port","port":30,"chassis":15,"id":"pci.15","bus":"pcie.0","addr":"0x3.0x6"} -device {"driver":"pcie-root-port","port":31,"chassis":16,"id":"pci.16","bus":"pcie.0","addr":"0x3.0x7"} -device {"driver":"pcie-root-port","port":32,"chassis":17,"id":"pci.17","bus":"pcie.0","addr":"0x4"} -device {"driver":"pcie-pci-bridge","id":"pci.18","bus":"pci.1","addr":"0x0"} -device {"driver":"piix3-usb-uhci","id":"usb","bus":"pci.18","addr":"0x1"} -blockdev {"driver":"host_device","filename":"/dev/dm-8","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device {"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on","serial":"eb6afa3f-0f95-407e-a71f-94f013d98902"} -netdev {"type":"tap","fd":"62","vhost":true,"vhostfd":"64","id":"hostnet0"} -device {"driver":"virtio-net-pci","rx_queue_size":512,"host_mtu":8942,"netdev":"hostnet0","id":"net0","mac":"fa:16:3e:65:81:25","bus":"pci.2","addr":"0x0"} -netdev {"type":"tap","fd":"65","vhost":true,"vhostfd":"66","id":"hostnet1"} -device {"driver":"virtio-net-pci","rx_queue_size":512,"host_mtu":9000,"netdev":"hostnet1","id":"net1","mac":"fa:16:3e:63:c6:c3","bus":"pci.3","addr":"0x0"} -add-fd set=0,fd=61,opaque=serial0-log -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device {"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0} -device {"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"} -device {"driver":"usb-kbd","id":"input1","bus":"usb.0","port":"2"} -audiodev {"id":"audio1","driver":"none"} -vnc 172.22.74.116:8,audiodev=audio1 -device {"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"} -device {"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"} -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device {"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=onKalau kita baca argument dari command (qemu) nya panjang sekali, padahal kita hanya butuh informasi ID-Instances saja.
Kalian dapat menggunakan command berikut, untuk menemukan instances dengan CPU/Memory usage tertinggi ( --sort=-%cpu atau --sort=-%mem)
[tripleo-admin@Openstack-Compute-2 ~]$ ps aux --sort=-%cpu | awk 'NR==1 {print $1, $2, $3, $4, "UUID"} / -uuid / {for(i=1;i<=NF;i++) if($i=="-uuid") {print $1, $2, $3, $4, $(i+1)}}' | column -t
USER PID %CPU %MEM UUID
qemu 691730 2646 16.9 457990e4-8d26-4f18-9940-72f652b99572
qemu 75529 1165 1.1 e6175dac-d40f-4375-855c-b48632a28d68
qemu 213227 775.4 4.1 148f0a0b-b13d-4718-aff2-1bc2b11ec505
admin 1642873 0.0 0.0 /Output dari command ini lebih rapih, mudah dipahami.
3. Check ID-Instances via Openstack Client
Setelah mendapatkan ID dari instance, langkah selanjutnya adalah mengecek dari Openstack-Cli-Client atau Dashboard Horizon untuk mengatahui details info dari VM tersebut.
[stack@DIRECTOR ~]$ source overcloudrc
(overcloud) [stack@DIRECTOR ~]$ openstack server show 457990e4-8d26-4f18-9940-72f652b99572 --fit-width
+-------------------------------------+----------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+----------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-2 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-2 |
| OS-EXT-SRV-ATTR:instance_name | instance-00003161 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2025-12-03T14:13:20.000000 |
| addresses | 2024399561-IT-Corp-DRC=10.24.216.154; VPC LAN CRM NG=192.168.100.253 |
| flavor | a1.xxxlarge.rc (df018f79-910f-4487-965b-2068585cb1ca) |
| id | 457990e4-8d26-4f18-9940-72f652b99572 |
| name | LAB-RDW001 |
| project_id | 4af09b6255794f4bbc9b285fb5d7eb3d |
| status | ACTIVE |
| user_id | 2ce9c0fc6163463b85c323ea07151e75 |
+-------------------------------------+----------------------------------------------------------------------+