← Back to Home

GitHub Actions Troubleshooting: A Real 8-Hour Pages Freeze From Start to Finish

GitHubDevOpstroubleshootingCI/CDGitHub Pages

On April 28, 2026, my blog pipeline hit a rare GitHub infrastructure failure. From the 22PM cron trigger to confirming the root cause, it took nearly 8 hours. This article documents the complete troubleshooting process with three issues I hadn't encountered before: workflow_dispatch created but runner never assigned, raw.githubusercontent.com SSL handshake timeout, and Pages build permanently stuck on an old commit.

⚠️ Context: This failure happened during the 22PM scheduled task cycle. GitHub Actions froze completely — workflow_dispatch created a run but the runner never started, and Pages builds were stuck on a commit from April 23. New articles were pushed to GitHub but never went live.

Pitfall 1: workflow_dispatch Created, Runner Never Assigned

First instinct: check the GitHub Actions tab. This is what I saw:

Workflow: GitHub Pages Deploy
Status: queued
Jobs: 0
Started: 2026-04-28 22:XX:XX UTC
Duration: running...

The workflow_dispatch itself was successful — API returned 200, the run was created. But the problem was: queued status persisted indefinitely, runner never started, jobs stayed at 0.

This is a GitHub infrastructure-level deadlock. GitHub's runner allocation system can exhibit this behavior under high load or regional issues — tasks get queued but never assigned to a specific machine.

I tried manually re-triggering workflow_dispatch three times:

# Via GitHub CLI
gh workflow run deploy.yml

# Result: each attempt created a new run, each run stuck at queued with 0 runners

Three attempts, three identical results. This ruled out "single network hiccup" and confirmed a sustained GitHub-side infrastructure issue.

Pitfall 2: raw.githubusercontent.com SSL Timeout

While waiting for runner allocation, I manually tested GitHub API availability with curl:

# GitHub API - normal
curl -s -o /dev/null -w "%{http_code}" https://api.github.com/repos/yaohehe/yaohehe.github.io
# Returns: 200 ✅

# GitHub raw content - SSL timeout
curl -s -o /dev/null -w "%{http_code}" --max-time 10 https://raw.githubusercontent.com/
# Returns: timeout ❌

Around 23:14, testing raw.githubusercontent.com SSL handshake produced a timeout. This revealed that GitHub's CDN layer also had intermittent accessibility issues.

This detail is critical: the Pages workflow needs to pull resources from raw.githubusercontent.com during the build. If the CDN times out, the artifact upload phase fails — even if the runner eventually starts.

Important distinction: GitHub API (api.github.com) and GitHub raw CDN (raw.githubusercontent.com) are two independent infrastructure stacks. One working doesn't guarantee the other.

Pitfall 3: Pages Build Permanently Stuck on Old Commit

I manually checked GitHub's commit history. Confirmed: new article files were pushed to GitHub, SHA exists, file content is intact.

But opening the website showed the articles as 404.

Checking the latest Pages deploy via GitHub CLI:

gh run list --workflow=deploy.yml --limit 5

The output revealed the truth: the last successful Pages build was commit 3bc9c6e, timestamped April 23. This meant from April 23 to April 28 — five full days — Pages never rebuilt. Every commit after that point, including newly pushed articles, was invisible to the live site.

Root Cause Chain

Connecting the three pitfalls, the complete failure chain looks like this:

  1. 22PM cron fires: pipeline generates articles, GitHub API push succeeds (PUT returns 200)
  2. workflow_dispatch created: Pages rebuild triggered, run gets queued
  3. Runner allocation freezes: GitHub infrastructure issue, queued persists, runner never starts
  4. raw.githubusercontent.com timeout (~23:14): even if runner starts, CDN timeout causes artifact upload to fail
  5. Pages never rebuilds: stuck on April 23's 3bc9c6e, all new commits ignored
  6. Articles return 404: files are in the GitHub repo but never go live

This wasn't a single point of failure — it was three GitHub infrastructure layers failing simultaneously: runner allocation system + CDN availability + Pages build queue. Beyond what my pipeline code could handle.

Solution: API Push + Wait for GitHub Recovery

With Pages rebuild being a GitHub infrastructure issue, I took the most pragmatic approach:

  1. Confirmed articles were in GitHub: verified file SHA via GitHub API, content intact
  2. Waited for GitHub infrastructure recovery: once runner allocation and CDN recovered, Pages would auto-rebuild
  3. Fixed pipeline silent failures: added push_file_with_retry() to publish-articles.py, now exits 1 on failure instead of silent 0

The next day Pages resumed normal rebuilds and all articles went live. This 8-hour incident caused zero data loss — because the GitHub API push was successful. The real problem was the Pages build queue blockage, combined with the pipeline not detecting it promptly.

Three Key Lessons

This failure made one thing clear: pipeline observability isn't just "check if logs have errors." It means actively verifying the end result — whether articles are actually live. Push success doesn't mean users can access the content.


Related reading:

🔗 Related Tech Articles

Deep dive into related technical topics:

GitHub Actions Troubleshooting: A Real 8-Hour Pages Freeze From Start to Finish
技术标签: github, devops
GitHub Actions 排错实录:从 8 小时 Pages 卡死中提取的 3 个真实踩坑经历
技术标签: github, devops
GitHub Actions 排错实录:从 8 小时 Pages 卡死中提取的 3 个真实踩坑经历
技术标签: devops, troubleshooting
🔧 DevOps Hardware
查看推荐 →
client="ca-pub-3419621562136630" data-ad-slot="in-article" data-ad-format="auto">