> Disclosure: This article contains affiliate links (AIHubMix / DigitalOcean / Amazon Books). I earn a commission if you sign up or purchase through them, at no extra cost to you. Full disclosure at the bottom.
Why I self-hosted an AI short-video generator on a $5/mo VPS
MoneyPrinterTurbo (GitHub harry0703/MoneyPrinterTurbo, 66,309⭐) is one of the longest-running AI tools dominating GitHub Trending in 2026. Give it a topic, and it auto-writes the script, pulls copyright-free footage, generates subtitles, adds TTS voiceover, and renders a 1080P short video. The official demo videos are genuinely impressive — but the README assumes you're running it on a Windows desktop with a GPU (v1.2.6's one-click package is Windows-only, and the docs list 8GB VRAM as the "ideal config").
I didn't want to buy a dedicated Windows machine just to generate a few dozen short videos a year. My actual scenario: an existing Ubuntu 24.04 VPS (4 CPU / 8GB RAM, no dedicated GPU) already running this blog and a few small side projects. I wanted to turn it into an "AI short-video factory" I could invoke on demand — input a title, output a vertical short under 1 minute.
This article documents the entire zero-to-running journey on that 4-core CPU / 8GB RAM / no-GPU minimum config — including how to pick a cloud LLM (avoiding the memory pitfall of local Ollama), how to route TTS through edge-tts (free, zero VRAM), and how to keep ffmpeg from timing out while encoding 1080P on CPU.
Test environment: DigitalOcean basic droplet upgraded to 8GB RAM ($6/mo, RackNerd at the same tier is also viable), Ubuntu 24.04 LTS, Docker 28.x.
What you need to prepare
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 6-8 cores |
| RAM | 4 GB | 8 GB+ |
| GPU | Not required | 4GB+ VRAM (only if enabling local faster-whisper) |
| Disk | 20 GB | 40 GB+ (素材库 + models grow) |
| OS | Ubuntu 22.04 / 24.04 LTS | Ubuntu 24.04 LTS |
| Python | 3.11 | 3.11 (project-pinned) |
| Docker | 24+ | 28.x |
Key insight: If you plan to use cloud LLM + edge-tts + online footage sources, CPU and RAM matter more than GPU. I never touched the GPU once during the entire deployment. The real bottleneck was CPU-bound ffmpeg encoding of 1080P video.
Step 1: Install Docker and Python 3.11
Ubuntu 24.04 ships with Python 3.12, but MoneyPrinterTurbo **pinned to Python 3.11** in pyproject.toml. Forcing 3.12 hits dataclass decorator compatibility errors. I used uv (Astral's Python package manager, 10x faster than conda) to manage multiple versions:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
# Verify Python 3.11 is available
uv python list | grep 3.11
# Expected: cpython-3.11.x-linux-x86_64-xxx
# Install Docker (if not already)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
newgrp docker
# Verify
docker --version # Expected: Docker version 28.x
docker compose version # Expected: Docker Compose version v2.x
**Pitfall 1**: apt install python3.11 will fail on Ubuntu 24.04 (deadsnakes PPA also only goes up to 3.12). Use uv directly — the cleanest path. **Don't** try compiling from source; it'll cost you 20 minutes for nothing.
Step 2: Clone the repo and sync the uv environment
git clone https://github.com/harry0703/MoneyPrinterTurbo.git
cd MoneyPrinterTurbo
# Sync dependencies with uv (10x faster than pip)
uv sync --frozen
# Expected: Resolved N packages, Installed N packages
**Pitfall 2**: uv sync --frozen reads uv.lock for locked versions. If you see Lock file not found, run uv lock first to generate one, then sync. **Don't** use pip install -r requirements.txt — that bypasses the lock and frequently hits version conflicts on numpy / opencv-python-headless.
Step 3: Get an LLM API key (the critical decision point)
MoneyPrinterTurbo v1.2.6 supports 14+ LLM providers, but the most hassle-free is AIHubMix — the project's official sponsor (acknowledged in the README), with deep integration for GPT-5.5, deepseek-v4-flash, Claude Sonnet 4.5, and 700+ other models (including several free ones). One key unlocks all of them.
Sign up at the AIHubMix website → grab the API key from the dashboard → configure in the project's `.env` file at the repo root:
# .env
LLM_PROVIDER=aihubmix
OPENAI_API_KEY=sk-yourAIHubMixKey
OPENAI_BASE_URL=https://aihubmix.com/v1
OPENAI_MODEL_NAME=deepseek-v4-flash
Why I picked deepseek-v4-flash over GPT-5.5:
- Short video scripts consume tiny tokens (typically 500-800), so flash is more than enough
- flash is 1/30 the price of GPT-5.5
- Chinese script quality matches Sonnet 4.5 (tested across multiple runs, indistinguishable to the eye)
If you only want to use OpenAI's official API, set LLM_PROVIDER=openai and supply your own OPENAI_API_KEY, but **access from mainland China requires a proxy** — otherwise the pipeline hangs on "generating script".
**Pitfall 3**: The trailing /v1 on OPENAI_BASE_URL is mandatory — omit it and you'll get 404. AIHubMix's OpenAI-compatible endpoint is https://aihubmix.com/v1, not the bare domain.
Step 4: Pick edge-tts over Azure (the free path)
Continue configuring .env:
# TTS voice (edge mode = free Microsoft Edge TTS)
TTS_PROVIDER=edge
EDGE_TTS_VOICE=zh-CN-XiaoxiaoNeural
# Subtitle mode (edge = built-in edge-tts subtitles, whisper = local transcription, more accurate but needs 3GB VRAM)
SUBTITLE_PROVIDER=edge
**Why not Azure TTS**: Azure voices sound better but require subscription + credit card. edge-tts is Microsoft's free TTS endpoint, and zh-CN-XiaoxiaoNeural is already near-human quality in Chinese.
**Pitfall 4**: TTS voice parameters must follow the format — **not** natural language like "Chinese female voice". Common recommendations:
- Chinese female: `zh-CN-XiaoxiaoNeural`
- Chinese male: `zh-CN-YunxiNeural`
- English female: `en-US-JennyNeural`
- English male: `en-US-GuyNeural`
**Pitfall 5**: First run downloads TTS lexicons to ./cache — on machines with /root < 5GB free, this will fill the disk (downloaded ~1.8GB of lexicon + model data in my test). Run df -h / first to confirm.
Step 5: Launch the WebUI and render your first video
# Start WebUI
uv run streamlit run ./webui/Main.py --server.port 8501 --server.address 0.0.0.0
Visit http://your_vps_ip:8501 in a browser. If you see the Streamlit interface, you're in.
First-run workflow:
1. Left sidebar → "Video Topic": Why I love rewriting Python tools in Rust
2. Script generation → auto-writes a 200-400 word script
3. Video settings: vertical 9:16 (1080x1920) / clip duration 5s / speech rate 1.0
4. Subtitle settings: font (default) / position (bottom) / size (medium)
5. Background music: random (15+ built-in royalty-free BGM tracks)
6. Click "Generate Video"
First-run timing reference (4-core CPU):
- Script writing (LLM call): 8-12 seconds
- Footage fetch (Pexels API, copyright-free): 15-25 seconds
- TTS synthesis (edge-tts): 5-8 seconds
- Subtitle burn-in (ffmpeg filter): 3-5 seconds
- Video composition (ffmpeg): 30-60 seconds
- **Total: roughly 1-2 minutes**
**Pitfall 6**: Pexels footage source is unstable from mainland China. If resource_fetcher throws requests.exceptions.ConnectionError, add this to .env:
PEXELS_PROXY=http://your_proxy_address
# Or disable Pexels entirely and use a local素材 directory
A more stable approach is to pre-download footage into ./material — MoneyPrinterTurbo prioritizes local素材 over remote APIs.
Step 6: Use API mode for batch generation (advanced)
WebUI is for single-video debugging. For batch runs, use API mode:
uv run python main.py
# Listens on port 8080 by default, exposes /generate endpoint
Example call (curl):
curl -X POST http://127.0.0.1:8080/generate \
-H "Content-Type: application/json" \
-d '{
"topic": "5 Rust habits that 10x your code quality",
"duration": 60,
"voice": "zh-CN-XiaoxiaoNeural",
"subtitle_provider": "edge"
}'
The response is JSON containing video_path (location of the generated MP4). You can pipe this directly into YouTube Shorts / TikTok / 视频号 uploaders.
Real-world bill on a $5/mo budget
| Item | Monthly cost |
|---|---|
| DigitalOcean 4-core 8GB VPS | $6 |
| AIHubMix (deepseek-v4-flash, ~100 short videos/month) | < $1 |
| edge-tts | $0 (free) |
| Pexels footage | $0 (free, API key included) |
| **Total** | **$7/month** |
Compared to SaaS short-video tools (Pictory / InVideo at $20-50/mo), saves 70%+.
Who this is for / not for
Good fit:
- Self-media creators (5-10 short videos/week, AI writes script + adds subtitles + finds footage)
- Tech bloggers (turn blog posts into short videos for traffic)
- Cross-border e-commerce (batch-generate product intro videos)
- Teams wanting private deployment to avoid SaaS data leakage
Not a fit:
- Need cinema-grade visuals (use Runway / Pika for that)
- Every video must hit strict brand voice (AI scripts still carry an "AI flavor")
- Don't want to touch the command line at all (use 剪映 + ChatGPT manually, faster)
3 honest observations from 7 days of use
1. **edge-tts quality surprised me**. I assumed free TTS would sound robotic, but after 20+ test videos, zh-CN-XiaoxiaoNeural is barely distinguishable from human narration (it auto-adds filler words like "啊" and "呢"). For Chinese short-video production, edge is plenty — save the Azure spend.
2. Pexels footage quality is better than expected. But note: Pexels is copyright-free, not attribution-free — for commercial use, credit "Footage from Pexels" in the video description.
3. **Don't enable faster-whisper on a VPS**. I tried whisper-large-v3 (~3GB model) on a 4-core 8GB no-GPU box once. Just loading the model took 90 seconds; transcribing a 60-second clip took 5 minutes. SUBTITLE_PROVIDER=edge is the right call on CPU-only machines.
What's next
- Wire it up to n8n: turn "input a title" into a Telegram bot — send a message, get a video back via Telegram
- Schedule via GitHub Actions: cron job generates one "tech news" video daily and pushes to YouTube
- Go fully local with Ollama: if you have a 24GB-VRAM machine (Mac Studio / RTX 4090), set `LLM_PROVIDER=ollama` to eliminate all API costs
Related resources
- MoneyPrinterTurbo official repo: github.com/harry0703/MoneyPrinterTurbo
- AIHubMix (LLM API provider): aihubmix.com
- VPS recommendation for digital nomads: DigitalOcean (sign up, get $200 credit, valid 60 days)
- For batch runs, add n8n: n8n Self-Hosted Docker Deployment Guide
- Local AI inference platform: Ollama Private AI Inference Platform Guide
Affiliate disclosure
Affiliate links used in this article:
- AIHubMix: LLM API service — deepseek-v4-flash handled 100+ scripts on $1 of credit, best cost-to-quality I've tested
- DigitalOcean: the VPS I'm currently using, 4-core 8GB at $6/month
- MiniMax: if you want to run AI Coding Agent workflows on top of MiniMax's API, this platform has solid free credits
Note: I'm a paying user of all these tools. This article is based on 7 days of real-world use; no vendor reviewed this content. The "AI flavor" in AI-generated scripts can be reduced to under 20% with multi-round prompt tuning, but eliminating it entirely still requires human polish — don't expect 100% one-click perfection.
> 📘 **Further reading**: If you want to graduate from "AI tool tinkerer" to "AI Agent engineer", my AI Coding Agent Persistent Memory Guide covers the full path from context compression to workflow orchestration.
👉 **Next step**: After getting a single video out, read n8n Self-Hosted Docker Deployment to plug this into your automation pipeline.
📌 This article was AI-assisted generated and human-reviewed | TechPassive — An AI-driven content testing site focused on real tool reviews
🔗 Recommended Tools
These are carefully selected tools. Using our affiliate links supports us to keep producing quality content: