Lesson 19 — OpenClaw Nano Banana Pro: Conversational AI Image Generation and Editing (Gemini-Powered, 2026)
Goal: Install the Nano Banana Pro Skill so OpenClaw can generate images and edit existing ones through natural language, supporting 1K/2K/4K resolutions.
What Is Nano Banana Pro?
Nano Banana Pro is one of ClawHub's top image Skills (72k downloads), powered by Google's Gemini image model. It supports both text-to-image and image-to-image modes. No need to separately register for Midjourney or DALL-E — all image creation happens through OpenClaw's conversational interface.
Step 1: Install the Dependency CLI Tool
Nano Banana Pro depends on the nano-banana-pro CLI. Run this in your terminal:
npm install -g nano-banana-proVerify installation:
nano-banana-pro --version
# Should output a version number like 2.4.1If you get a permission error, add sudo or use pnpm global install:
pnpm add -g nano-banana-proStep 2: Install the Skill
In WebChat or Telegram, send:
/install @steipete/nano-banana-pro
Verify successful installation:
pnpm openclaw skills list
# nano-banana-pro should appear in the listAfter installation, OpenClaw automatically recognizes image generation intent — no additional API Key configuration needed (uses the Gemini quota already bound to OpenClaw).
Step 3: Text-to-Image Basics
Describe the image you want in plain language:
Generate an image for me: cyberpunk-style Tokyo street, neon lights reflecting on rain-slicked pavement, night atmosphere
Or use the slash command:
/image A tabby cat sitting on a cloud, Studio Ghibli animation style, soft color palette
More prompt examples:
/image Minimalist coffee brand logo, black and white, no background, suitable for commercial use
/image Futuristic product showcase image: smart earbuds floating in midair, background is a gradient purple-blue glow
Sample output: AI will display the generated image directly in the conversation, with a download link and generation time.
Step 4: Image Editing (image-to-image)
If you want to modify an existing image, use the --input-image parameter to pass in the original:
Edit this image — change the background to white while keeping the subject unchanged: --input-image ~/Desktop/product.jpg
Or just describe the edit intent directly:
Change the style of this photo to watercolor: ~/Downloads/photo.png
Remove the text from this image while keeping the background looking natural
Add a "NEW" badge to this product image — red background, white text, upper right corner
The core of image editing with natural language instructions is describing your editing intent without needing to open Photoshop or Figma.
Step 5: Resolution Control (1K/2K/4K)
Control output size with the --resolution parameter:
# 1K (1024×1024) — Quick preview, good for prototyping, lowest quota consumption
/image --resolution 1k city night skyline aerial view
# 2K (2048×2048) — Sweet spot for everyday use, good for social media posts
/image --resolution 2k product promotional image, clean background
# 4K (4096×4096) — Print-quality, ideal for posters and covers, higher quota consumption
/image --resolution 4k grand landscape art style exhibition backdrop| Resolution | Use Case | Generation Time | Quota |
|---|---|---|---|
| 1K | Quick tests, avatars | ~5 sec | 1× |
| 2K | Social media, websites | ~15 sec | 3× |
| 4K | Print, exhibitions | ~45 sec | 8× |
Recommendation: refine your prompt at 1K first until you're happy with the composition, then scale up to 4K.
Step 6: Practical Prompt Tips
Style control: Add style keywords at the end of the prompt
A modern library interior, abundant natural light, warm tones — photography style, 85mm lens, shallow depth of field
Detail layering: Describe from foreground to background
Foreground: a steaming latte
Midground: wooden desk with a laptop half-open
Background: floor-to-ceiling windows with a snowy cityscape, bokeh effect
Overall style: Instagram aesthetic, high saturation, natural light
Negative prompts (exclude unwanted elements):
/image a modern app UI screenshot, clean interface design --negative blurry text, low resolution, distortion
Step 7: Batch Generate Multiple Versions
Generate multiple versions at once to compare:
Generate 4 brand logo options for me, theme is "AI + Ocean", each with a different style
Or specify a count:
/image --count 4 --resolution 1k minimal tech icon, circular background, different color schemes
After batch generation, you can do a second-pass edit on whichever version you like best.
FAQ
Can OpenClaw generate images for free?
OpenClaw itself is a self-hosted open-source framework. Nano Banana Pro Skill calls the Gemini image model API. If you're using Google AI Studio's free quota, generating 1K/2K images within that quota is free. Beyond the free quota, standard Gemini API billing applies — typically around $0.003 per 1K image, far cheaper than a Midjourney subscription. We recommend setting a monthly usage cap in your OpenClaw config file to avoid unexpected charges.
What's the difference between Nano Banana Pro and Midjourney?
Nano Banana Pro's core advantage is being integrated into OpenClaw's conversational workflow — you can complete an entire flow like "search for reference images → generate → edit → save to Notion" within the same conversation without switching between tools. Midjourney currently has a slight edge in image quality and artistic style, but it requires working through Discord and doesn't support programmatic calls. If your main needs are batch generation, automated workflows, or image editing, Nano Banana Pro is a better fit.
Are generated images copyrighted?
According to Google Gemini's terms of service, images generated through the API are owned by the user and may be used commercially. However, generating images with real people's likenesses, well-known trademarks, or copyrighted artistic styles (like "Ghibli-style") has legal gray areas — consult legal counsel before commercial use. Generated images cannot be used for illegal content; Gemini API has built-in safety filters that automatically reject violating requests.
What types of edits can image-to-image do?
Supported edit types include: background replacement (background removal or swap), style transfer (convert a photo to oil painting/watercolor/comic style), local modifications (describe the region to change in natural language), image restoration (remove watermarks, fill missing areas), and color adjustments (change the color of a specific region). For precise local edits, describe "which region" and "what change" as clearly as possible in the prompt — the model will try to leave undescribed regions unchanged.