Troubleshoot EC2 Status Checks Failing: System vs Instance

If you’re looking at your EC2 console and seeing “1/2 checks failed” or “2/2 checks failed” under Status Checks, you know something is wrong, but the difference between “system” and “instance” checks isn’t immediately clear. In my experience, most teams panic at this point and reboot, but that’s often the wrong move. The fix depends entirely on which check is failing. I’ll walk you through the diagnostics and the correct fix for each scenario.

The Problem

Your EC2 instance shows a failed status check in the Instances view. You see one of these states:

Error Type	Description
System Status Check Failed	“1/2 checks passed” — system check is red, instance check is green
Instance Status Check Failed	“1/2 checks passed” — instance check is red, system check is green
Both Failed	“0/2 checks passed” — both are red, instance is in bad state

The instance may or may not be responding. The EC2 console gives no hints about what’s actually wrong.

Why Does This Happen?

System check failure = AWS infrastructure problem — The hardware, network fabric, or power infrastructure at the physical host level has an issue. AWS detected a problem with the underlying Nitro system or physical host. You can’t fix this yourself.
Instance check failure = OS/software problem — The operating system is not responding properly to health checks, kernel panicked, the filesystem is corrupted, disk is full, or a critical process is stuck consuming CPU or memory.
Both failures = severe issue — Either the instance is completely unresponsive or there’s a host-level hardware failure affecting multiple aspects.

The Fix

For System Status Check Failure

A system check failure means AWS detected a problem with the physical host. The standard fix is to Stop and Start the instance. This is different from a reboot.

# Stop the instance (not reboot)
aws ec2 stop-instances \
  --instance-ids i-0abc123def456ghij \
  --region us-east-1

# Wait 30 seconds for it to fully stop

# Start the instance (this migrates it to a new physical host)
aws ec2 start-instances \
  --instance-ids i-0abc123def456ghij \
  --region us-east-1

The stop-and-start migration takes the instance off the failed physical host and launches it on new hardware. Within 2–3 minutes, check the status again:

aws ec2 describe-instance-status \
  --instance-ids i-0abc123def456ghij \
  --region us-east-1 \
  --include-all-instances \
  --query 'InstanceStatuses[0].[SystemStatus.Status,InstanceStatus.Status]' \
  --output text

If the system check now shows “ok,” you’re done. If it still shows “impaired,” AWS may need more time or there’s a deeper hardware issue — contact AWS Support.

Important: Do not reboot when the system check fails. Rebooting keeps the instance on the same (failing) physical host.

For Instance Status Check Failure

An instance check failure means the OS or application layer is unhealthy. Try a reboot first:

# Reboot the instance (stays on same host)
aws ec2 reboot-instances \
  --instance-ids i-0abc123def456ghij \
  --region us-east-1

Wait 2–3 minutes for the reboot to complete, then check the console. If the instance status is now “ok,” you’re done. If it’s still failed, there’s a deeper OS problem.

Get the console output to see kernel panic messages or boot errors:

aws ec2 get-console-output \
  --instance-id i-0abc123def456ghij \
  --region us-east-1 \
  --latest \
  --output text

Look for lines like “Kernel panic,” “OOPS,” “EXT4-fs error,” or “Out of memory.” These indicate a filesystem corruption, full disk, or memory pressure. If you see a kernel panic and the instance is Nitro-based, you can use the EC2 Serial Console to get a root shell and investigate further:

# Enable Serial Console for Nitro instances (requires per-account setup)
aws ec2-instance-connect open-websocket-connection \
  --instance-id i-0abc123def456ghij \
  --region us-east-1

Common Causes and Fixes

Full Disk: Check available space. Instance check fails because the filesystem can’t write logs or swap.

# SSH to instance if possible
df -h /
# If disk is 100% full, delete old logs or temp files
sudo rm -rf /tmp/*
sudo rm -rf /var/log/old-logs/*

Memory Pressure (OOM Killer): If swap is thrashing, increase instance type or kill the offending process.

# Check memory usage
free -h

# Identify process consuming most memory
ps aux --sort=-%mem | head -5

# If safe, stop the process
kill -9 <PID>

Filesystem Corruption: If you see “EXT4-fs error” in console output, the filesystem needs a fsck check. You may need to:

Stop the instance.
Detach the root EBS volume.
Attach it to a rescue instance.
Run sudo fsck -y /dev/xvdf1 to repair.
Reattach and start.

How to Run This

Open EC2 Dashboard → select your instance → check Status Checks (in the Details tab).
Identify whether the failure is “System Status Check” or “Instance Status Check.”
If system check failed: aws ec2 stop-instances --instance-ids i-xxx then aws ec2 start-instances --instance-ids i-xxx.
If instance check failed: aws ec2 reboot-instances --instance-ids i-xxx.
Wait 3 minutes and re-check with aws ec2 describe-instance-status --instance-ids i-xxx.
If still failing, run aws ec2 get-console-output --instance-id i-xxx --latest to see detailed boot logs.

Is This Safe?

Rebooting is safe and non-disruptive in most cases. Stop-and-start causes brief downtime (1–2 minutes) and will change the public IP if you’re not using an Elastic IP. Always use Elastic IPs for production instances.

Key Takeaway

System check failure = AWS infrastructure problem, fix with stop-and-start. Instance check failure = OS problem, fix with reboot or filesystem repair. Don’t confuse the two — using the wrong fix wastes time. Always check the console output to identify the root cause.

Dealing with persistent status check failures? Connect with me on LinkedIn or X.