What you'll practice

Diagnosing high load — top, ps, and finding the process eating the box.
Disk & memory issues — df -h, du, OOM kills, and full filesystems.
Service failures — reading logs with journalctl and restarting units cleanly.
Networking & DNS — ss, dig, and tracing why a connection hangs.
A repeatable method — observe, hypothesise, narrow down, confirm — instead of guessing.

The loop you'll run

A node goes amber on the topology map. You SSH in and start narrowing it down:

$ ssh srv-web-01
$ top              # load average 18.4 — something is pegging CPU
$ ps aux --sort=-%cpu | head
$ df -h            # /var at 100% — logs never rotated
$ journalctl -u nginx --since "10 min ago"
$ systemctl restart nginx

Why a simulator beats a cheat sheet

Cheat sheets list commands; they don't build the instinct for which one to run next when you don't yet know what's wrong. Because the simulator's boxes break in realistic ways and the SLA is ticking, you practise the actual skill — reasoning your way from a symptom to a root cause — the same thing a troubleshooting interview or a real on-call page is testing.

Keep going

Troubleshooting is the foundation. Put it to work in a full incident response shift, automate the fixes with Ansible, and prep for your SRE interview.

Practice Linux troubleshooting in a real terminal

What you'll practice

The loop you'll run

Why a simulator beats a cheat sheet

Keep going

Ready to open a terminal?