diff --git a/docs/06-Troubleshooting/shell.md b/docs/06-Troubleshooting/shell.md index f0d12f355a..2163d6808c 100644 --- a/docs/06-Troubleshooting/shell.md +++ b/docs/06-Troubleshooting/shell.md @@ -4,7 +4,7 @@ The `retina shell` command allows you to start an interactive shell on a Kubernetes node or pod for adhoc debugging. -This runs a container image built from the Dockerfile in the `/shell` directory, with many common networking tools installed (`ping`, `curl`, etc.), as well as specialized tools such as [bpftool](#bpftool), [bpftrace](#bpftrace) [pwru](#pwru) or [Inspektor Gadget](#inspektor-gadget-ig). +This runs a container image built from the Dockerfile in the `/shell` directory, with many common networking tools installed (`ping`, `curl`, etc.), as well as specialized tools such as [bpftool](#bpftool), [bpftrace](#bpftrace), [pwru](#pwru), [hotspot-bpf](#hotspot-bpf), or [Inspektor Gadget](#inspektor-gadget-ig). Currently the Retina Shell only works in Linux environments. Windows support will be added in the future. @@ -286,6 +286,44 @@ ig -h ig run trace_dns:latest ``` +## [hotspot-bpf](https://github.com/SRodi/hotspot-bpf) + +eBPF performance lens for real-time root-cause diagnosis of Linux processes. hotspot-bpf correlates CPU time, scheduler contention, page-fault pressure, and RSS growth in a single terminal view, automatically classifying processes into diagnoses such as **CPU-bound**, **Starved**, **Noisy neighbor**, **Mem-thrashing**, or **OOM risk**. + +Requires the `SYS_ADMIN` and `PERFMON` capabilities (for eBPF program loading). + +```shell +kubectl retina shell --capabilities=SYS_ADMIN,PERFMON +``` + +You can then run for example: + +```shell +# Run with default settings (5s sampling window, top 10 processes) +hotspot -interval 5s -topk 5 + +# Filter by cgroup (useful for targeting specific pods) +hotspot -interval 5s -cgroup-filter +``` + +### Custom thresholds + +All diagnosis thresholds are configurable via a YAML config file. To generate the default configuration as a starting point: + +```shell +hotspot -generate-config > /tmp/thresholds.yaml +``` + +Edit the file to adjust thresholds for your environment, then pass it at runtime: + +```shell +hotspot -config /tmp/thresholds.yaml -interval 5s +``` + +Any value not specified in the file retains its compiled-in default. This is especially useful on **multi-core machines** where single-threaded workloads produce low system-wide CPU percentages — lowering the thresholds helps avoid missed classifications. + +For detailed information on all configurable parameters, see the [hotspot-bpf documentation](https://github.com/SRodi/hotspot-bpf#custom-thresholds). + ## [mpstat](https://www.man7.org/linux/man-pages/man1/mpstat.1.html) Tool for detailed reporting of processor-related statistics. `mpstat` is useful for network troubleshooting because it shows how much CPU time is spent handling SoftIRQs, which are often triggered by network traffic, helping identify interrupt bottlenecks or imbalanced CPU usage. SoftIRQs (Software Interrupt Requests) are a type of deferred interrupt handling mechanism in the Linux kernel used to process time-consuming tasks—like network packet handling or disk I/O—outside the immediate hardware interrupt context, allowing faster and more efficient interrupt processing without blocking the system. diff --git a/shell/Dockerfile b/shell/Dockerfile index c885460ef2..2fa9d40cc8 100644 --- a/shell/Dockerfile +++ b/shell/Dockerfile @@ -74,4 +74,24 @@ RUN set -eux; \ rm /tmp/pwru.tar.gz; \ file /usr/local/bin/pwru | grep -q 'ELF' +# https://github.com/SRodi/hotspot-bpf/releases +ARG HOTSPOT_TAG="v0.1.1" +ENV HOTSPOT_TAG=${HOTSPOT_TAG} + +# Download and extract hotspot-bpf release (amd64 only for now) +RUN set -eux; \ + case "$ARCH" in \ + amd64|x86_64) HOTSPOT_ARCH="amd64" ;; \ + *) echo "Skipping hotspot-bpf: unsupported arch $ARCH" && exit 0 ;; \ + esac; \ + HOTSPOT_TAR="hotspot-bpf-linux-${HOTSPOT_ARCH}.tar.gz"; \ + curl -fL -o "/tmp/${HOTSPOT_TAR}" "https://github.com/SRodi/hotspot-bpf/releases/download/${HOTSPOT_TAG}/${HOTSPOT_TAR}"; \ + curl -fL -o "/tmp/${HOTSPOT_TAR}.sha256" "https://github.com/SRodi/hotspot-bpf/releases/download/${HOTSPOT_TAG}/${HOTSPOT_TAR}.sha256"; \ + cd /tmp && sha256sum -c "${HOTSPOT_TAR}.sha256"; \ + tar -xz -C /usr/local/bin -f "/tmp/${HOTSPOT_TAR}" "hotspot-bpf-linux-${HOTSPOT_ARCH}"; \ + mv "/usr/local/bin/hotspot-bpf-linux-${HOTSPOT_ARCH}" /usr/local/bin/hotspot; \ + chmod +x /usr/local/bin/hotspot; \ + rm "/tmp/${HOTSPOT_TAR}" "/tmp/${HOTSPOT_TAR}.sha256"; \ + file /usr/local/bin/hotspot | grep -q 'ELF' + CMD ["/bin/bash", "-l"]