Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion docs/06-Troubleshooting/shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

The `retina shell` command allows you to start an interactive shell on a Kubernetes node or pod for adhoc debugging.

This runs a container image built from the Dockerfile in the `/shell` directory, with many common networking tools installed (`ping`, `curl`, etc.), as well as specialized tools such as [bpftool](#bpftool), [bpftrace](#bpftrace) [pwru](#pwru) or [Inspektor Gadget](#inspektor-gadget-ig).
This runs a container image built from the Dockerfile in the `/shell` directory, with many common networking tools installed (`ping`, `curl`, etc.), as well as specialized tools such as [bpftool](#bpftool), [bpftrace](#bpftrace), [pwru](#pwru), [hotspot-bpf](#hotspot-bpf), or [Inspektor Gadget](#inspektor-gadget-ig).

Currently the Retina Shell only works in Linux environments. Windows support will be added in the future.

Expand Down Expand Up @@ -286,6 +286,44 @@ ig -h
ig run trace_dns:latest
```

## [hotspot-bpf](https://github.com/SRodi/hotspot-bpf)

eBPF performance lens for real-time root-cause diagnosis of Linux processes. hotspot-bpf correlates CPU time, scheduler contention, page-fault pressure, and RSS growth in a single terminal view, automatically classifying processes into diagnoses such as **CPU-bound**, **Starved**, **Noisy neighbor**, **Mem-thrashing**, or **OOM risk**.

Requires the `SYS_ADMIN` and `PERFMON` capabilities (for eBPF program loading).

```shell
kubectl retina shell <node-name> --capabilities=SYS_ADMIN,PERFMON
```

You can then run for example:

```shell
# Run with default settings (5s sampling window, top 10 processes)
hotspot -interval 5s -topk 5

# Filter by cgroup (useful for targeting specific pods)
hotspot -interval 5s -cgroup-filter <cgroup-substring>
```

### Custom thresholds

All diagnosis thresholds are configurable via a YAML config file. To generate the default configuration as a starting point:

```shell
hotspot -generate-config > /tmp/thresholds.yaml
```

Edit the file to adjust thresholds for your environment, then pass it at runtime:

```shell
hotspot -config /tmp/thresholds.yaml -interval 5s
```

Any value not specified in the file retains its compiled-in default. This is especially useful on **multi-core machines** where single-threaded workloads produce low system-wide CPU percentages — lowering the thresholds helps avoid missed classifications.

For detailed information on all configurable parameters, see the [hotspot-bpf documentation](https://github.com/SRodi/hotspot-bpf#custom-thresholds).

## [mpstat](https://www.man7.org/linux/man-pages/man1/mpstat.1.html)

Tool for detailed reporting of processor-related statistics. `mpstat` is useful for network troubleshooting because it shows how much CPU time is spent handling SoftIRQs, which are often triggered by network traffic, helping identify interrupt bottlenecks or imbalanced CPU usage. SoftIRQs (Software Interrupt Requests) are a type of deferred interrupt handling mechanism in the Linux kernel used to process time-consuming tasks—like network packet handling or disk I/O—outside the immediate hardware interrupt context, allowing faster and more efficient interrupt processing without blocking the system.
Expand Down
20 changes: 20 additions & 0 deletions shell/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,24 @@ RUN set -eux; \
rm /tmp/pwru.tar.gz; \
file /usr/local/bin/pwru | grep -q 'ELF'

# https://github.com/SRodi/hotspot-bpf/releases
ARG HOTSPOT_TAG="v0.1.1"
ENV HOTSPOT_TAG=${HOTSPOT_TAG}

# Download and extract hotspot-bpf release (amd64 only for now)
RUN set -eux; \
case "$ARCH" in \
amd64|x86_64) HOTSPOT_ARCH="amd64" ;; \
*) echo "Skipping hotspot-bpf: unsupported arch $ARCH" && exit 0 ;; \
esac; \
HOTSPOT_TAR="hotspot-bpf-linux-${HOTSPOT_ARCH}.tar.gz"; \
curl -fL -o "/tmp/${HOTSPOT_TAR}" "https://github.com/SRodi/hotspot-bpf/releases/download/${HOTSPOT_TAG}/${HOTSPOT_TAR}"; \
curl -fL -o "/tmp/${HOTSPOT_TAR}.sha256" "https://github.com/SRodi/hotspot-bpf/releases/download/${HOTSPOT_TAG}/${HOTSPOT_TAR}.sha256"; \
cd /tmp && sha256sum -c "${HOTSPOT_TAR}.sha256"; \
tar -xz -C /usr/local/bin -f "/tmp/${HOTSPOT_TAR}" "hotspot-bpf-linux-${HOTSPOT_ARCH}"; \
mv "/usr/local/bin/hotspot-bpf-linux-${HOTSPOT_ARCH}" /usr/local/bin/hotspot; \
chmod +x /usr/local/bin/hotspot; \
rm "/tmp/${HOTSPOT_TAR}" "/tmp/${HOTSPOT_TAR}.sha256"; \
file /usr/local/bin/hotspot | grep -q 'ELF'

CMD ["/bin/bash", "-l"]
Loading