The Optimization Rabbit Hole
What followed was an intense six-hour sprint of optimization. Over 40 commits, we tried everything we could think of to make LLM icon generation work well.
Model Upgrades
We started with GPT-4o-mini, then upgraded to GPT-5.2 hoping for better quality. We enabled reasoning mode, then disabled it. We tried different temperature settings. Each change helped a little, but never enough.
Prompt Engineering Adventures
We developed what we called "joyful minimalism"—a set of five design principles for our icons:
- One gesture per icon
- Round over sharp
- Let it breathe (70% empty canvas)
- Subtract until it breaks
- No text ever
We reverse-engineered the Lucide icon style and fed those instructions to the LLM. We provided reference SVGs for common icon types. We tried fixed icon vocabularies where the LLM could only choose from predefined shapes. We experimented with different viewBox sizes—24x24, 256x256, 128x128—each with proportionally scaled stroke widths.
The Image Prompt Era
We added an image_prompt field that generated detailed visual descriptions before the SVG. For weather icons, we tried location-specific skylines—the Golden Gate Bridge for San Francisco, the Empire State Building for New York. For groceries, we described containers and identifying marks. The prompts got increasingly elaborate.
## Weather Icon: San Francisco, Rainy
- Silhouette of Golden Gate Bridge in background
- Gentle rain drops falling at 15-degree angle
- Low clouds obscuring bridge towers
- Minimalist, monochrome style
- No text or labels
Caching and Logging
We implemented LLM prefix caching to speed up responses. We added client-side caching to prevent duplicate API calls. We logged every generated icon to MongoDB so we could analyze patterns and improve over time. We added cache versioning so we could invalidate old icons when we improved the prompts.
Looking back at the commit log, we can see the growing desperation in the commit messages: "improve icon quality," "simplify prompts for cleaner icons," "constrain to fixed icon vocabulary," "enhance icon prompts with richer descriptions." Each one a small step forward, but the fundamental problems remained.
The icons were still slow. They were still inconsistent. And we were still paying for every generation. Worse, debugging was nearly impossible—when an icon looked wrong, we couldn't easily explain why the LLM had made that choice.