Lesson 19 — OpenClaw Nano Banana Pro: Conversational AI Image Generation and Editing (Gemini-Powered, 2026)

Goal: Install the Nano Banana Pro Skill so OpenClaw can generate images and edit existing ones through natural language, supporting 1K/2K/4K resolutions.

What Is Nano Banana Pro?

Nano Banana Pro is one of ClawHub's top image Skills (72k downloads), powered by Google's Gemini image model. It supports both text-to-image and image-to-image modes. No need to separately register for Midjourney or DALL-E — all image creation happens through OpenClaw's conversational interface.

Step 1: Install the Dependency CLI Tool

Nano Banana Pro depends on the nano-banana-pro CLI. Run this in your terminal:

npm install -g nano-banana-pro

Verify installation:

nano-banana-pro --version
# Should output a version number like 2.4.1

If you get a permission error, add sudo or use pnpm global install:

pnpm add -g nano-banana-pro

Step 2: Install the Skill

In WebChat or Telegram, send:

/install @steipete/nano-banana-pro

Verify successful installation:

pnpm openclaw skills list
# nano-banana-pro should appear in the list

After installation, OpenClaw automatically recognizes image generation intent — no additional API Key configuration needed (uses the Gemini quota already bound to OpenClaw).

Step 3: Text-to-Image Basics

Describe the image you want in plain language:

Generate an image for me: cyberpunk-style Tokyo street, neon lights reflecting on rain-slicked pavement, night atmosphere

Or use the slash command:

/image A tabby cat sitting on a cloud, Studio Ghibli animation style, soft color palette

More prompt examples:

/image Minimalist coffee brand logo, black and white, no background, suitable for commercial use

/image Futuristic product showcase image: smart earbuds floating in midair, background is a gradient purple-blue glow

Sample output: AI will display the generated image directly in the conversation, with a download link and generation time.

Step 4: Image Editing (image-to-image)

If you want to modify an existing image, use the --input-image parameter to pass in the original:

Edit this image — change the background to white while keeping the subject unchanged: --input-image ~/Desktop/product.jpg

Or just describe the edit intent directly:

Change the style of this photo to watercolor: ~/Downloads/photo.png
Remove the text from this image while keeping the background looking natural
Add a "NEW" badge to this product image — red background, white text, upper right corner

The core of image editing with natural language instructions is describing your editing intent without needing to open Photoshop or Figma.

Step 5: Resolution Control (1K/2K/4K)

Control output size with the --resolution parameter:

# 1K (1024×1024) — Quick preview, good for prototyping, lowest quota consumption
/image --resolution 1k city night skyline aerial view
 
# 2K (2048×2048) — Sweet spot for everyday use, good for social media posts
/image --resolution 2k product promotional image, clean background
 
# 4K (4096×4096) — Print-quality, ideal for posters and covers, higher quota consumption
/image --resolution 4k grand landscape art style exhibition backdrop

Resolution	Use Case	Generation Time	Quota
1K	Quick tests, avatars	~5 sec	1×
2K	Social media, websites	~15 sec	3×
4K	Print, exhibitions	~45 sec	8×

Recommendation: refine your prompt at 1K first until you're happy with the composition, then scale up to 4K.

Step 6: Practical Prompt Tips

Style control: Add style keywords at the end of the prompt

A modern library interior, abundant natural light, warm tones — photography style, 85mm lens, shallow depth of field

Detail layering: Describe from foreground to background

Foreground: a steaming latte
Midground: wooden desk with a laptop half-open
Background: floor-to-ceiling windows with a snowy cityscape, bokeh effect
Overall style: Instagram aesthetic, high saturation, natural light

Negative prompts (exclude unwanted elements):

/image a modern app UI screenshot, clean interface design --negative blurry text, low resolution, distortion

Step 7: Batch Generate Multiple Versions

Generate multiple versions at once to compare:

Generate 4 brand logo options for me, theme is "AI + Ocean", each with a different style

Or specify a count:

/image --count 4 --resolution 1k minimal tech icon, circular background, different color schemes

After batch generation, you can do a second-pass edit on whichever version you like best.

FAQ

Can OpenClaw generate images for free?

OpenClaw itself is a self-hosted open-source framework. Nano Banana Pro Skill calls the Gemini image model API. If you're using Google AI Studio's free quota, generating 1K/2K images within that quota is free. Beyond the free quota, standard Gemini API billing applies — typically around $0.003 per 1K image, far cheaper than a Midjourney subscription. We recommend setting a monthly usage cap in your OpenClaw config file to avoid unexpected charges.

What's the difference between Nano Banana Pro and Midjourney?

Nano Banana Pro's core advantage is being integrated into OpenClaw's conversational workflow — you can complete an entire flow like "search for reference images → generate → edit → save to Notion" within the same conversation without switching between tools. Midjourney currently has a slight edge in image quality and artistic style, but it requires working through Discord and doesn't support programmatic calls. If your main needs are batch generation, automated workflows, or image editing, Nano Banana Pro is a better fit.

Are generated images copyrighted?

According to Google Gemini's terms of service, images generated through the API are owned by the user and may be used commercially. However, generating images with real people's likenesses, well-known trademarks, or copyrighted artistic styles (like "Ghibli-style") has legal gray areas — consult legal counsel before commercial use. Generated images cannot be used for illegal content; Gemini API has built-in safety filters that automatically reject violating requests.

What types of edits can image-to-image do?

Supported edit types include: background replacement (background removal or swap), style transfer (convert a photo to oil painting/watercolor/comic style), local modifications (describe the region to change in natural language), image restoration (remove watermarks, fill missing areas), and color adjustments (change the color of a specific region). For precise local edits, describe "which region" and "what change" as clearly as possible in the prompt — the model will try to leave undescribed regions unchanged.

Next Steps

Lesson 20 — Install the Obsidian Skill to automatically save generated images and the creative process to your Obsidian notes
Lesson 11 — Run a security check with Skill Vetter before installing