Build an OpenClaw Skill That Screenshots Any Website

OpenClaw (formerly MoltBot) just hit 150,000 GitHub stars. OpenAI acquired the project two days ago. If you're coming from MoltBot, the skill system works the same way, so this OpenClaw skill tutorial applies to both.

The first question new users ask: "What skills should I add?"

A screenshot skill is a good place to start. It takes 5 minutes to set up, and it lets your OpenClaw agent see any website: visually analyze layouts, compare mobile vs desktop, monitor pages for changes, and generate visual reports on command.

All you need is a skill file and a free API key.

What You're Building

By the end of this tutorial, your OpenClaw agent will respond to commands like:

"Screenshot producthunt.com" captures a JPEG and saves it to /tmp/screenshot.jpg
"Compare mobile and desktop of stripe.com" takes both screenshots and saves them as separate files
"Monitor my-competitor.com/pricing daily" captures the page and reports metadata (size, cache status, response time)
"Full-page dark mode screenshot of github.com" does a full-page capture with dark mode emulation
"Check if our landing page looks good on iPhone" runs device emulation and saves the result

The agent handles the entire flow: it calls the API via curl, saves the screenshot image to a file, and reports the capture metadata. You just talk to it.

Why Your OpenClaw Agent Needs a Screenshot Skill

Right now, your OpenClaw/MoltBot agent can read the web. It can fetch HTML, parse JSON, and call APIs. But it cannot see a webpage the way you do. It doesn't know if the layout is broken. It can't tell you whether the hero image loaded. It has no idea that the cookie banner is covering the checkout button.

Multimodal models like GPT-4o, Claude, and Gemini are very good at understanding images. They can spot a misaligned button, read text off a screenshot, and compare two designs with precision that rivals a human QA engineer.

The missing piece has always been the screenshot itself. How does an agent running in a terminal get a pixel-perfect render of a webpage?

This skill solves that with one API call that returns one image for full visual understanding.

Prerequisites

You need two things:

OpenClaw installed and running. If you haven't set it up yet (or need to migrate from MoltBot), follow the official docs. It takes about 10 minutes.
A SnapRender API key. Sign up here. It's free, no credit card required. You get 500 screenshots per month on the free tier.

The skill also needs curl and jq on your system. curl is pre-installed on macOS and Linux. Install jq with brew install jq (macOS), apt install jq (Ubuntu), or download from the jq site.

Step 1: Install the Skill from ClawHub

The fastest way to add the skill is from ClawHub, the skill registry for OpenClaw:

clawhub install snaprender

That's it. The skill file is downloaded and placed in the right directory automatically.

Prefer manual setup? If you want to customize the skill or can't use ClawHub, skip to the Manual Skill Setup section below.

Step 2: Configure

Enable the skill in your OpenClaw config file at ~/.openclaw/openclaw.json (the path is the same if you migrated from MoltBot):

{
  "skills": {
    "entries": {
      "snaprender": {
        "enabled": true,
        "env": {
          "SNAPRENDER_API_KEY": "sk_live_your_key_here"
        }
      }
    }
  }
}

Replace sk_live_your_key_here with your actual API key from the SnapRender dashboard.

If you already have other skills configured, just add the snaprender entry to your existing entries object.

Step 3: Test It

openclaw agent --local --session-id test --message "Screenshot stripe.com for me"

Your agent will:

Recognize this as a screenshot request from the skill description
Run curl via the exec tool with your SnapRender API key
Pipe the response through jq to extract the base64 image
Save the decoded image to /tmp/screenshot.jpg
Report metadata: file size, response time, cache status, remaining credits

The first call takes 2-4 seconds (the API spins up Chromium, renders the page, and returns the image). Subsequent calls to the same URL return from cache in under 300ms.

Try a few more commands to see the range of what's possible:

# Mobile comparison
openclaw agent --session-id test \
  --message "Compare stripe.com on iPhone 15 Pro and desktop. What layout differences do you see?"

# Dark mode
openclaw agent --session-id test \
  --message "Screenshot github.com in dark mode, full page"

# Visual QA
openclaw agent --session-id test \
  --message "Screenshot https://your-site.com on iPhone and iPad. Does the layout look correct?"

# Competitive research
openclaw agent --session-id test \
  --message "Screenshot the pricing pages of lemonsqueezy.com, stripe.com, and paddle.com. Compare their pricing strategies."

How It Actually Works Under the Hood

When your agent processes a screenshot request, here's the exact flow:

You: "Screenshot producthunt.com on iPhone"
  ↓
OpenClaw reads the skill description and injects it into the agent's context
  ↓
Agent generates a curl command via the exec tool:
  curl -s "https://app.snap-render.com/v1/screenshot
    ?url=https%3A%2F%2Fproducthunt.com
    &device=iphone_15_pro&response_type=json&format=jpeg&quality=60
    &block_ads=true&block_cookie_banners=true"
    -H "X-API-Key: $SNAPRENDER_API_KEY"
    | tee /tmp/snap_response.json
    | jq -r '.image' | sed 's|data:image/[^;]*;base64,||'
    | base64 -d > /tmp/screenshot.jpg
  ↓
SnapRender API receives the request:
  → Validates URL (blocks SSRF attempts)
  → Checks cache (returns instantly if cached)
  → Launches headless Chromium
  → Emulates iPhone 15 Pro viewport
  → Navigates to producthunt.com
  → Blocks ads + cookie banners
  → Waits for page load + network idle
  → Captures pixel-perfect screenshot
  → Returns base64 JSON response
  ↓
Agent saves the image to /tmp/screenshot.jpg and reports metadata
  ↓
Agent: "Screenshot saved to /tmp/screenshot.jpg (87KB, 2.3s response
 time, cache: MISS). You have 47 credits remaining this month."

The entire round-trip, from your command to a saved screenshot file, takes about 3-5 seconds.

Alternative: Hosted MCP Endpoint (Even Simpler)

Don't want to manage a skill file at all? If your client supports MCP, you can skip the skill entirely and connect to SnapRender's hosted endpoint. No skill file, no curl, no jq -- just a URL and your API key.

Claude Desktop -- add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "snaprender": {
      "type": "streamable-http",
      "url": "https://app.snap-render.com/mcp",
      "headers": {
        "Authorization": "Bearer sk_live_your_key_here"
      }
    }
  }
}

Claude Code:

claude mcp add snaprender --transport streamable-http https://app.snap-render.com/mcp -H "Authorization: Bearer sk_live_your_key_here"

Cursor, Windsurf, or any MCP client -- point it at https://app.snap-render.com/mcp with an Authorization: Bearer sk_live_... header. Uses Streamable HTTP transport.

This gives you three tools: take_screenshot, check_screenshot_cache, and get_usage. The hosted endpoint runs on SnapRender's infrastructure, so there's nothing to install or maintain on your machine.

Prefer a local MCP server? Run it via npx: npx snaprender-mcp with SNAPRENDER_API_KEY in env. Same tools, runs locally.

Manual Skill Setup

If you can't use ClawHub or want to customize the skill, create it manually.

Create the skill directory:

mkdir -p ~/.openclaw/skills/snaprender

Create the skill file at ~/.openclaw/skills/snaprender/SKILL.md:

---
name: snaprender
description: "Screenshot any website via curl+exec (NOT the browser tool). Run: curl -s \"https://app.snap-render.com/v1/screenshot?url=URL_ENCODED_TARGET&response_type=json&format=jpeg&quality=60&block_ads=true&block_cookie_banners=true\" -H \"X-API-Key: $SNAPRENDER_API_KEY\" | Save image: pipe through jq -r '.image' | sed 's|data:image/[^;]*;base64,||' | base64 -d > /tmp/screenshot.jpg | $SNAPRENDER_API_KEY is pre-set. URL-encode the target. Add &device=iphone_15_pro for mobile, &full_page=true for scroll. NEVER use the browser tool."
metadata: {"openclaw": {"requires": {"bins": ["curl", "jq"], "env": ["SNAPRENDER_API_KEY"]}}}
---

# SnapRender — Screenshot Any Website

Capture a screenshot of any public URL and save it as an image file.

IMPORTANT: Use the exec tool with curl. NEVER use the browser tool for screenshots.

## How to Capture

Run this command via the exec tool. Replace ENCODED_URL with the URL-encoded target
(e.g. https%3A%2F%2Fstripe.com):

curl -s "https://app.snap-render.com/v1/screenshot?url=ENCODED_URL&response_type=json&format=jpeg&quality=60&block_ads=true&block_cookie_banners=true" \
  -H "X-API-Key: $SNAPRENDER_API_KEY" \
  | tee /tmp/snap_response.json \
  | jq -r '.image' | sed 's|data:image/[^;]*;base64,||' | base64 -d > /tmp/screenshot.jpg \
  && jq '{url, format, size, cache, responseTime, remainingCredits}' /tmp/snap_response.json

This saves the screenshot to /tmp/screenshot.jpg and prints metadata.

## Rules

1. Use exec tool only, NEVER the browser tool
2. $SNAPRENDER_API_KEY is already set — use it literally, do NOT replace it
3. URL-encode the target (https://stripe.com becomes https%3A%2F%2Fstripe.com)
4. Always use format=jpeg&quality=60 to keep response small
5. Always pipe to save the image to a file
6. Report metadata to the user: file size, response time, cache status

## Parameters

| Parameter | Values | Default |
|-----------|--------|---------|
| url | URL-encoded target | required |
| response_type | json | json (always) |
| format | jpeg, png, webp | jpeg |
| quality | 1-100 | 60 |
| device | iphone_15_pro, pixel_7, ipad_pro, macbook_pro | desktop |
| dark_mode | true, false | false |
| full_page | true, false | false |
| block_ads | true, false | true |
| block_cookie_banners | true, false | true |
| width | 320-3840 | 1280 |
| height | 200-10000 | 800 |
| delay | 0-10000 | 0 (ms wait after load) |

## After Capturing

1. Tell the user the screenshot was saved to /tmp/screenshot.jpg
2. Report metadata: file size, response time, cache status, remaining credits
3. For comparisons, save each screenshot to a different filename

Then follow Step 2 and Step 3 above to configure and test.

Advanced Patterns

Once the basic skill is working, here are patterns that make your agent genuinely useful in day-to-day work.

Visual Regression Monitoring

Tell your agent to screenshot key pages after every deploy:

"Screenshot our homepage, pricing page, and docs landing page on
desktop and iPhone. Flag anything that looks broken or different
from what you'd expect."

Your agent captures 6 screenshots (3 pages x 2 devices), analyzes each one, and reports issues. At $0.003 per screenshot, the total cost is $0.018 for a full visual regression check.

Competitor Intelligence

Set up a daily monitoring flow:

"Screenshot competitor.com/pricing and competitor2.com/pricing.
Summarize their current plans and pricing. Note anything that
looks like it changed recently."

Run this daily via cron or a scheduled agent task. Your agent builds up context over time and can tell you when competitors change their pricing, add features, or redesign their pages.

Design Review Across Devices

Before shipping a new feature:

"Screenshot staging.myapp.com/new-feature on every available device.
For each one, check:
- Does the layout break at any viewport?
- Is the text readable?
- Are interactive elements accessible?
- Does dark mode work correctly?"

Your agent runs through desktop, iPhone 14, iPhone 15 Pro, Pixel 7, iPad Pro, and MacBook Pro in both light and dark mode. That comes to 12 screenshots for a full cross-device audit at $0.036 total.

Social Preview Validation

Before publishing a blog post or landing page:

"Screenshot the OG image preview for myblog.com/new-post.
Does the title fit? Is the image cropped correctly?
How will this look when shared on Twitter and LinkedIn?"

This catches social card issues before the post goes live.

SnapRender vs. DIY Puppeteer

You could spin up your own Puppeteer instance. Plenty of people try. Here's why they end up using an API instead:

	SnapRender API	DIY Puppeteer
Setup time	5 minutes	Hours to days
Infrastructure	None (it's an API)	You manage Chromium, memory, crashes
Caching	Built-in smart cache	Build your own
Ad blocking	One parameter	Maintain your own filter lists
Cookie banners	One parameter	CSS selectors that break monthly
Device emulation	One parameter	Manual viewport + UA config
Cost at 2K shots/mo	$9/mo	$20-50/mo for a VPS + your time
Chromium crashes	Not your problem	Very much your problem

The free tier covers personal agent use entirely. You don't hit the upgrade threshold until you're running 500+ screenshots a month, and at that volume the $9 Starter plan costs less than maintaining your own Chromium infrastructure.

Pricing

This matters when your agent is making API calls on your behalf every day.

Plan	Price	Screenshots/mo	Per Screenshot
Free	$0	500	Free forever
Starter	$9/mo	2,000	$0.0045
Growth	$29/mo	10,000	$0.0029
Business	$79/mo	50,000	$0.0016
Scale	$199/mo	200,000	$0.0010

The free tier has no credit card requirement, no trial period, and no watermark. You get 500 screenshots per month permanently, which is enough for a personal agent running a few screenshot commands a day.

Most OpenClaw and former MoltBot users will be fine on Free or Starter. The Growth plan is for when your agent is doing regular monitoring work across dozens of pages.

Get Your Free API Key

Three steps, under 2 minutes:

1. Sign up at snap-render.com (30 seconds, no credit card).

2. Pick your setup:

Method	What you do	Time
Hosted MCP endpoint	Paste a URL into your MCP client config	30 sec
ClawHub skill	`clawhub install snaprender` + add config	1 min
Manual skill	Copy the SKILL.md from this tutorial	2 min

3. Tell your agent to screenshot something.

The free tier includes 500 screenshots per month with no credit card required, and paid plans start at $9/month if you need more volume.

Get your free API key →