← Back to Home

Search Console Bulk Reports: How I Fixed 480 SEO Issues in 90 Minutes Without Breaking Anything

SEO Search Console Automation Decision Tree

On the morning of June 12, 2026, I received 4 separate Search Console reports in the same hour: 414 4xx status codes, 15 missing meta descriptions, 34 titles too long, 15 missing h1 tags, and 2 pages with multiple h1 tags. Total: 480 flagged issues.

Most people would panic. The default reaction is "delete the broken pages, rewrite the titles, add h1 tags." But that instinct is exactly wrong for at least 30 of the 480 issues. This is the story of how I penetrated all 4 reports in 90 minutes using a 4-step decision tree, fixed only what needed fixing, and made the system immune to future occurrences.

The 4 Reports (Same Morning)

ReportCountFirst InstinctRight Answer
4xx status codes414Rewrite the missing pagesMigrate 414 from archive/ to root, use Contents API for batch push
Meta description missing15Add description meta to all 1515/15 are 301 redirect pages — do nothing
Title too long34Rewrite all 34 titles5 are Moved pages, 29 are real (fix only 29)
H1 missing15Add h1 to all 1515/15 are Moved pages — do nothing
Multiple h1 tags2Delete the extra h1Demote 6 h1s to h2 in 2 real articles

The 4-Step Decision Tree

Without a systematic approach, you will mis-fix at least one of these categories. Here is the decision tree that turned 480 panic-inducing flags into 31 real fixes + 449 known ignorable items.

Step 1: Local Classification (Fingerprint Match)

Before touching anything, sort each URL into one of 5 buckets:

The Moved redirect fingerprint is gold. If you wrote a 5/17 mass rename that turned 16 short slugs into longer SEO-friendly ones, the old URLs were kept as 301 redirects. They are supposed to be tiny. They are supposed to lack descriptions and h1. They are working as designed.

Step 2: curl Verification

For each URL, hit it with curl -sS -o /dev/null -w "%{http_code}". Confirm HTTP 200. If you see 4xx here, do not panic — the 4xx might be a CDN edge cache, not the real state. That is where Step 3 comes in.

Step 3: GitHub Contents API (The Anti-False-4xx Layer)

This is the most underused technique in the SEO toolkit. When Tencent Cloud → GitHub Pages has a flaky CDN edge, your curl gets a 404 or 503, but the file is actually in the repo. To eliminate false negatives:

curl -sS "https://api.github.com/repos/you/you.github.io/contents/path/to/file.html" | python3 -c "import sys,json,base64; d=json.load(sys.stdin); print(base64.b64decode(d['content']).decode()[:200])"

The Contents API goes through GitHub's main infrastructure, not your CDN. It tells you the actual repo state, bypassing the cache layer that lies to curl. This single technique turned a "414 files are missing" report into "0 files are actually missing — 309 are in main archive/, 105 are in affiliate-blog/".

Step 4: Decision Tree (Fix vs. Ignore)

Now you classify each issue into one of two actions:

IGNORE (mark as known noise in your memory file)
FIX (push via Contents API, single-file PUT, no full-pipeline triggers)

The 90-Minute Result

With the decision tree applied:

Beyond the Fix: Make It Permanent

The 4-step tree is the symptomatic cure. The root cause cure is to make the system immune to future occurrences. For the title-too-long category, that meant editing generate-html.py to enforce a hard limit:

def _limit_title_for_seo(title, suffix=' - TechPassive', max_total=70):
    """Hard limit title length to prevent future title-too-long reports."""
    clean = re.sub(r'<[^>]+>', '', title).replace('"', '"').strip()
    if not clean:
        return clean
    full_len = len(clean) + len(suffix)
    is_cn = bool(re.search(r'[\u4e00-\u9fff]', clean))
    effective_max = 75 if is_cn else max_total
    if full_len <= effective_max:
        return clean
    budget = effective_max - len(suffix)
    cut = clean[:budget]
    for sep in [' - ', ' — ', ', ', ': ', ':', ',', '、', ' ']:
        idx = cut.rfind(sep)
        if idx >= budget * 0.6:
            cut = cut[:idx]
            break
    return cut.rstrip(' ,:—-—')

One function, one line in the metadata dict: 'title': _limit_title_for_seo(title). From now on, every generated article will have its title clipped to ≤ 70 characters (or ≤ 75 for Chinese, where Google's truncation threshold is different) before the HTML is written. The report category is permanently retired.

The Lesson

Search Console reports are not your problem. Your problem is the gap between what the report says and what the page actually is. Until you build a decision tree that bridges that gap, every batch report will trigger a panic-edit cycle that destroys more than it fixes.

The 4-step tree is not specific to any one report category. It applies to:

Once you have the tree, every future report is a 30-second lookup. The first time cost is high. Every subsequent time is zero.

Build Your Own Tree

Three things to set up before the next Search Console email lands:

  1. Manifest file in your repo: .moved-pages-manifest.json listing every 301 redirect page with its target URL and mtime. Update this whenever you do a mass rename.
  2. Contents API helper script in /tmp/: a 30-line Python file that wraps the GitHub Contents API for single-file push. Avoid the full-pipeline scripts for surgical fixes — they trigger IndexNow and 15-article pushes that are pure noise.
  3. Decision tree in your memory: a one-page reference of the 5 buckets, the fingerprint patterns, and the 2-action conclusion. Future-you will thank present-you when the next report arrives.

The investment is one afternoon. The return is permanent immunity to the most common SEO panic cycle in self-hosted blogs.

← Back to Home