Build an OpenClaw Skill That Screenshots Any Website
OpenClaw (formerly MoltBot) just hit 150,000 GitHub stars. OpenAI acquired the project two days ago. If you're coming from MoltBot, the skill system works the same way, so this OpenClaw skill tutorial applies to both.
The first question new users ask: "What skills should I add?"
A screenshot skill is a good place to start. It takes 5 minutes to set up, and it lets your OpenClaw agent see any website: visually analyze layouts, compare mobile vs desktop, monitor pages for changes, and generate visual reports on command.
All you need is a skill file and a free API key.
What You're Building
By the end of this tutorial, your OpenClaw agent will respond to commands like:
- "Screenshot producthunt.com" captures a JPEG and saves it to
/tmp/screenshot.jpg - "Compare mobile and desktop of stripe.com" takes both screenshots and saves them as separate files
- "Monitor my-competitor.com/pricing daily" captures the page and reports metadata (size, cache status, response time)
- "Full-page dark mode screenshot of github.com" does a full-page capture with dark mode emulation
- "Check if our landing page looks good on iPhone" runs device emulation and saves the result
The agent handles the entire flow: it calls the API via curl, saves the screenshot image to a file, and reports the capture metadata. You just talk to it.
Why Your OpenClaw Agent Needs a Screenshot Skill
Right now, your OpenClaw/MoltBot agent can read the web. It can fetch HTML, parse JSON, and call APIs. But it cannot see a webpage the way you do. It doesn't know if the layout is broken. It can't tell you whether the hero image loaded. It has no idea that the cookie banner is covering the checkout button.
Multimodal models like GPT-4o, Claude, and Gemini are very good at understanding images. They can spot a misaligned button, read text off a screenshot, and compare two designs with precision that rivals a human QA engineer.
The missing piece has always been the screenshot itself. How does an agent running in a terminal get a pixel-perfect render of a webpage?
This skill solves that with one API call that returns one image for full visual understanding.
Prerequisites
You need two things:
- OpenClaw installed and running. If you haven't set it up yet (or need to migrate from MoltBot), follow the official docs. It takes about 10 minutes.
- A SnapRender API key. Sign up here. It's free, no credit card required. You get 500 screenshots per month on the free tier.
The skill also needs curl and jq on your system. curl is pre-installed on macOS and Linux. Install jq with brew install jq (macOS), apt install jq (Ubuntu), or download from the jq site.
Step 1: Install the Skill from ClawHub
The fastest way to add the skill is from ClawHub, the skill registry for OpenClaw:
clawhub install snaprender
That's it. The skill file is downloaded and placed in the right directory automatically.
Prefer manual setup? If you want to customize the skill or can't use ClawHub, skip to the Manual Skill Setup section below.
Step 2: Configure
Enable the skill in your OpenClaw config file at ~/.openclaw/openclaw.json (the path is the same if you migrated from MoltBot):
{
"skills": {
"entries": {
"snaprender": {
"enabled": true,
"env": {
"SNAPRENDER_API_KEY": "sk_live_your_key_here"
}
}
}
}
}
Replace sk_live_your_key_here with your actual API key from the SnapRender dashboard.
If you already have other skills configured, just add the snaprender entry to your existing entries object.
Step 3: Test It
openclaw agent --local --session-id test --message "Screenshot stripe.com for me"
Your agent will:
- Recognize this as a screenshot request from the skill description
- Run curl via the exec tool with your SnapRender API key
- Pipe the response through jq to extract the base64 image
- Save the decoded image to
/tmp/screenshot.jpg - Report metadata: file size, response time, cache status, remaining credits
The first call takes 2-4 seconds (the API spins up Chromium, renders the page, and returns the image). Subsequent calls to the same URL return from cache in under 300ms.
Try a few more commands to see the range of what's possible:
# Mobile comparison
openclaw agent --session-id test \
--message "Compare stripe.com on iPhone 15 Pro and desktop. What layout differences do you see?"
# Dark mode
openclaw agent --session-id test \
--message "Screenshot github.com in dark mode, full page"
# Visual QA
openclaw agent --session-id test \
--message "Screenshot https://your-site.com on iPhone and iPad. Does the layout look correct?"
# Competitive research
openclaw agent --session-id test \
--message "Screenshot the pricing pages of lemonsqueezy.com, stripe.com, and paddle.com. Compare their pricing strategies."
How It Actually Works Under the Hood
When your agent processes a screenshot request, here's the exact flow:
You: "Screenshot producthunt.com on iPhone"
↓
OpenClaw reads the skill description and injects it into the agent's context
↓
Agent generates a curl command via the exec tool:
curl -s "https://app.snap-render.com/v1/screenshot
?url=https%3A%2F%2Fproducthunt.com
&device=iphone_15_pro&response_type=json&format=jpeg&quality=60
&block_ads=true&block_cookie_banners=true"
-H "X-API-Key: $SNAPRENDER_API_KEY"
| tee /tmp/snap_response.json
| jq -r '.image' | sed 's|data:image/[^;]*;base64,||'
| base64 -d > /tmp/screenshot.jpg
↓
SnapRender API receives the request:
→ Validates URL (blocks SSRF attempts)
→ Checks cache (returns instantly if cached)
→ Launches headless Chromium
→ Emulates iPhone 15 Pro viewport
→ Navigates to producthunt.com
→ Blocks ads + cookie banners
→ Waits for page load + network idle
→ Captures pixel-perfect screenshot
→ Returns base64 JSON response
↓
Agent saves the image to /tmp/screenshot.jpg and reports metadata
↓
Agent: "Screenshot saved to /tmp/screenshot.jpg (87KB, 2.3s response
time, cache: MISS). You have 47 credits remaining this month."
The entire round-trip, from your command to a saved screenshot file, takes about 3-5 seconds.
Alternative: Hosted MCP Endpoint (Even Simpler)
Don't want to manage a skill file at all? If your client supports MCP, you can skip the skill entirely and connect to SnapRender's hosted endpoint. No skill file, no curl, no jq -- just a URL and your API key.
Claude Desktop -- add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"snaprender": {
"type": "streamable-http",
"url": "https://app.snap-render.com/mcp",
"headers": {
"Authorization": "Bearer sk_live_your_key_here"
}
}
}
}
Claude Code:
claude mcp add snaprender --transport streamable-http https://app.snap-render.com/mcp -H "Authorization: Bearer sk_live_your_key_here"
Cursor, Windsurf, or any MCP client -- point it at https://app.snap-render.com/mcp with an Authorization: Bearer sk_live_... header. Uses Streamable HTTP transport.
This gives you three tools: take_screenshot, check_screenshot_cache, and get_usage. The hosted endpoint runs on SnapRender's infrastructure, so there's nothing to install or maintain on your machine.
Prefer a local MCP server? Run it via npx:
npx snaprender-mcpwithSNAPRENDER_API_KEYin env. Same tools, runs locally.
Manual Skill Setup
If you can't use ClawHub or want to customize the skill, create it manually.
Create the skill directory:
mkdir -p ~/.openclaw/skills/snaprender
Create the skill file at ~/.openclaw/skills/snaprender/SKILL.md:
---
name: snaprender
description: "Screenshot any website via curl+exec (NOT the browser tool). Run: curl -s \"https://app.snap-render.com/v1/screenshot?url=URL_ENCODED_TARGET&response_type=json&format=jpeg&quality=60&block_ads=true&block_cookie_banners=true\" -H \"X-API-Key: $SNAPRENDER_API_KEY\" | Save image: pipe through jq -r '.image' | sed 's|data:image/[^;]*;base64,||' | base64 -d > /tmp/screenshot.jpg | $SNAPRENDER_API_KEY is pre-set. URL-encode the target. Add &device=iphone_15_pro for mobile, &full_page=true for scroll. NEVER use the browser tool."
metadata: {"openclaw": {"requires": {"bins": ["curl", "jq"], "env": ["SNAPRENDER_API_KEY"]}}}
---
# SnapRender — Screenshot Any Website
Capture a screenshot of any public URL and save it as an image file.
IMPORTANT: Use the exec tool with curl. NEVER use the browser tool for screenshots.
## How to Capture
Run this command via the exec tool. Replace ENCODED_URL with the URL-encoded target
(e.g. https%3A%2F%2Fstripe.com):
curl -s "https://app.snap-render.com/v1/screenshot?url=ENCODED_URL&response_type=json&format=jpeg&quality=60&block_ads=true&block_cookie_banners=true" \
-H "X-API-Key: $SNAPRENDER_API_KEY" \
| tee /tmp/snap_response.json \
| jq -r '.image' | sed 's|data:image/[^;]*;base64,||' | base64 -d > /tmp/screenshot.jpg \
&& jq '{url, format, size, cache, responseTime, remainingCredits}' /tmp/snap_response.json
This saves the screenshot to /tmp/screenshot.jpg and prints metadata.
## Rules
1. Use exec tool only, NEVER the browser tool
2. $SNAPRENDER_API_KEY is already set — use it literally, do NOT replace it
3. URL-encode the target (https://stripe.com becomes https%3A%2F%2Fstripe.com)
4. Always use format=jpeg&quality=60 to keep response small
5. Always pipe to save the image to a file
6. Report metadata to the user: file size, response time, cache status
## Parameters
| Parameter | Values | Default |
|-----------|--------|---------|
| url | URL-encoded target | required |
| response_type | json | json (always) |
| format | jpeg, png, webp | jpeg |
| quality | 1-100 | 60 |
| device | iphone_15_pro, pixel_7, ipad_pro, macbook_pro | desktop |
| dark_mode | true, false | false |
| full_page | true, false | false |
| block_ads | true, false | true |
| block_cookie_banners | true, false | true |
| width | 320-3840 | 1280 |
| height | 200-10000 | 800 |
| delay | 0-10000 | 0 (ms wait after load) |
## After Capturing
1. Tell the user the screenshot was saved to /tmp/screenshot.jpg
2. Report metadata: file size, response time, cache status, remaining credits
3. For comparisons, save each screenshot to a different filename
Then follow Step 2 and Step 3 above to configure and test.
Advanced Patterns
Once the basic skill is working, here are patterns that make your agent genuinely useful in day-to-day work.
Visual Regression Monitoring
Tell your agent to screenshot key pages after every deploy:
"Screenshot our homepage, pricing page, and docs landing page on
desktop and iPhone. Flag anything that looks broken or different
from what you'd expect."
Your agent captures 6 screenshots (3 pages x 2 devices), analyzes each one, and reports issues. At $0.003 per screenshot, the total cost is $0.018 for a full visual regression check.
Competitor Intelligence
Set up a daily monitoring flow:
"Screenshot competitor.com/pricing and competitor2.com/pricing.
Summarize their current plans and pricing. Note anything that
looks like it changed recently."
Run this daily via cron or a scheduled agent task. Your agent builds up context over time and can tell you when competitors change their pricing, add features, or redesign their pages.
Design Review Across Devices
Before shipping a new feature:
"Screenshot staging.myapp.com/new-feature on every available device.
For each one, check:
- Does the layout break at any viewport?
- Is the text readable?
- Are interactive elements accessible?
- Does dark mode work correctly?"
Your agent runs through desktop, iPhone 14, iPhone 15 Pro, Pixel 7, iPad Pro, and MacBook Pro in both light and dark mode. That comes to 12 screenshots for a full cross-device audit at $0.036 total.
Social Preview Validation
Before publishing a blog post or landing page:
"Screenshot the OG image preview for myblog.com/new-post.
Does the title fit? Is the image cropped correctly?
How will this look when shared on Twitter and LinkedIn?"
This catches social card issues before the post goes live.
SnapRender vs. DIY Puppeteer
You could spin up your own Puppeteer instance. Plenty of people try. Here's why they end up using an API instead:
| SnapRender API | DIY Puppeteer | |
|---|---|---|
| Setup time | 5 minutes | Hours to days |
| Infrastructure | None (it's an API) | You manage Chromium, memory, crashes |
| Caching | Built-in smart cache | Build your own |
| Ad blocking | One parameter | Maintain your own filter lists |
| Cookie banners | One parameter | CSS selectors that break monthly |
| Device emulation | One parameter | Manual viewport + UA config |
| Cost at 2K shots/mo | $9/mo | $20-50/mo for a VPS + your time |
| Chromium crashes | Not your problem | Very much your problem |
The free tier covers personal agent use entirely. You don't hit the upgrade threshold until you're running 500+ screenshots a month, and at that volume the $9 Starter plan costs less than maintaining your own Chromium infrastructure.
Pricing
This matters when your agent is making API calls on your behalf every day.
| Plan | Price | Screenshots/mo | Per Screenshot |
|---|---|---|---|
| Free | $0 | 500 | Free forever |
| Starter | $9/mo | 2,000 | $0.0045 |
| Growth | $29/mo | 10,000 | $0.0029 |
| Business | $79/mo | 50,000 | $0.0016 |
| Scale | $199/mo | 200,000 | $0.0010 |
The free tier has no credit card requirement, no trial period, and no watermark. You get 500 screenshots per month permanently, which is enough for a personal agent running a few screenshot commands a day.
Most OpenClaw and former MoltBot users will be fine on Free or Starter. The Growth plan is for when your agent is doing regular monitoring work across dozens of pages.
Get Your Free API Key
Three steps, under 2 minutes:
1. Sign up at snap-render.com (30 seconds, no credit card).
2. Pick your setup:
| Method | What you do | Time |
|---|---|---|
| Hosted MCP endpoint | Paste a URL into your MCP client config | 30 sec |
| ClawHub skill | clawhub install snaprender + add config |
1 min |
| Manual skill | Copy the SKILL.md from this tutorial | 2 min |
3. Tell your agent to screenshot something.
The free tier includes 500 screenshots per month with no credit card required, and paid plans start at $9/month if you need more volume.