GitHub Actions Troubleshooting: A Real 8-Hour Pages Freeze From Start to Finish
On April 28, 2026, my blog pipeline hit a rare GitHub infrastructure failure. From the 22PM cron trigger to confirming the root cause, it took nearly 8 hours. This article documents the complete troubleshooting process with three issues I hadn't encountered before: workflow_dispatch created but runner never assigned, raw.githubusercontent.com SSL handshake timeout, and Pages build permanently stuck on an old commit.
Pitfall 1: workflow_dispatch Created, Runner Never Assigned
First instinct: check the GitHub Actions tab. This is what I saw:
Workflow: GitHub Pages Deploy
Status: queued
Jobs: 0
Started: 2026-04-28 22:XX:XX UTC
Duration: running...
The workflow_dispatch itself was successful — API returned 200, the run was created. But the problem was: queued status persisted indefinitely, runner never started, jobs stayed at 0.
This is a GitHub infrastructure-level deadlock. GitHub's runner allocation system can exhibit this behavior under high load or regional issues — tasks get queued but never assigned to a specific machine.
I tried manually re-triggering workflow_dispatch three times:
# Via GitHub CLI
gh workflow run deploy.yml
# Result: each attempt created a new run, each run stuck at queued with 0 runners
Three attempts, three identical results. This ruled out "single network hiccup" and confirmed a sustained GitHub-side infrastructure issue.
Pitfall 2: raw.githubusercontent.com SSL Timeout
While waiting for runner allocation, I manually tested GitHub API availability with curl:
# GitHub API - normal
curl -s -o /dev/null -w "%{http_code}" https://api.github.com/repos/yaohehe/yaohehe.github.io
# Returns: 200 ✅
# GitHub raw content - SSL timeout
curl -s -o /dev/null -w "%{http_code}" --max-time 10 https://raw.githubusercontent.com/
# Returns: timeout ❌
Around 23:14, testing raw.githubusercontent.com SSL handshake produced a timeout. This revealed that GitHub's CDN layer also had intermittent accessibility issues.
This detail is critical: the Pages workflow needs to pull resources from raw.githubusercontent.com during the build. If the CDN times out, the artifact upload phase fails — even if the runner eventually starts.
Important distinction: GitHub API (api.github.com) and GitHub raw CDN (raw.githubusercontent.com) are two independent infrastructure stacks. One working doesn't guarantee the other.
Pitfall 3: Pages Build Permanently Stuck on Old Commit
I manually checked GitHub's commit history. Confirmed: new article files were pushed to GitHub, SHA exists, file content is intact.
But opening the website showed the articles as 404.
Checking the latest Pages deploy via GitHub CLI:
gh run list --workflow=deploy.yml --limit 5
The output revealed the truth: the last successful Pages build was commit 3bc9c6e, timestamped April 23. This meant from April 23 to April 28 — five full days — Pages never rebuilt. Every commit after that point, including newly pushed articles, was invisible to the live site.
Root Cause Chain
Connecting the three pitfalls, the complete failure chain looks like this:
- 22PM cron fires: pipeline generates articles, GitHub API push succeeds (PUT returns 200)
- workflow_dispatch created: Pages rebuild triggered, run gets queued
- Runner allocation freezes: GitHub infrastructure issue, queued persists, runner never starts
- raw.githubusercontent.com timeout (~23:14): even if runner starts, CDN timeout causes artifact upload to fail
- Pages never rebuilds: stuck on April 23's 3bc9c6e, all new commits ignored
- Articles return 404: files are in the GitHub repo but never go live
This wasn't a single point of failure — it was three GitHub infrastructure layers failing simultaneously: runner allocation system + CDN availability + Pages build queue. Beyond what my pipeline code could handle.
Solution: API Push + Wait for GitHub Recovery
With Pages rebuild being a GitHub infrastructure issue, I took the most pragmatic approach:
- Confirmed articles were in GitHub: verified file SHA via GitHub API, content intact
- Waited for GitHub infrastructure recovery: once runner allocation and CDN recovered, Pages would auto-rebuild
- Fixed pipeline silent failures: added
push_file_with_retry()to publish-articles.py, now exits 1 on failure instead of silent 0
The next day Pages resumed normal rebuilds and all articles went live. This 8-hour incident caused zero data loss — because the GitHub API push was successful. The real problem was the Pages build queue blockage, combined with the pipeline not detecting it promptly.
Three Key Lessons
- workflow_dispatch success ≠ Pages rebuild success: API returning 200 only means GitHub received the request, not that the build completed. Actively check
gh run listto confirm actual status. - GitHub API and GitHub CDN are independent infrastructure: api.github.com working doesn't guarantee raw.githubusercontent.com is accessible. Pages builds depend on CDN — CDN timeout causes silent artifact upload failures.
- Pages build status belongs in pipeline observability: if Pages deploys use the same commit twice in a row, that should trigger an alert. This is a gap in the current pipeline.
This failure made one thing clear: pipeline observability isn't just "check if logs have errors." It means actively verifying the end result — whether articles are actually live. Push success doesn't mean users can access the content.
Related reading:
- Ubuntu 24.04 Docker + UFW Firewall Setup — if you're deploying self-hosted CI runners
- Thunderbolt Self-Hosted AI Panel: 5 Real Pitfalls — real-world self-hosted deployment lessons
- n8n Self-Hosted Docker Deployment: 5 Real Problems — another self-hosted tool, same failure patterns
🔗 Related Tech Articles
Deep dive into related technical topics: