SVG-Master: Fine-Tuned Model for SVG Code Generation

SVG-Master: Fine-Tuned Model for SVG Code Generation

A 3B-parameter fine-tuned model that generates valid SVG code from natural language descriptions, built using LoRA on Apple MLX framework over three training iterations.

SVG-Master

SVG-Master converts natural language descriptions into valid SVG code. Teaching language models to generate syntactically correct vector graphics is challenging: SVG requires simultaneous precision in XML syntax, coordinate mathematics, and visual composition. This model delivers production-ready SVG through three iterative training cycles.

πŸ” Problem

SVG generation sits at the intersection of structured code generation and visual reasoning. General-purpose models struggle because:

Visual Creativity β€” Aesthetic judgment and design principles must be encoded implicitly in training data

Mathematical Precision β€” Coordinates, paths, and geometry require exact numeric output with no tolerance for approximation

Syntax Strictness β€” One invalid attribute renders the entire graphic blank or broken with no partial rendering

Contextual Understanding β€” The same description requires different outputs at different scales and viewBox settings

A model that produces plausible-looking but syntactically invalid SVG is functionally useless. Output must render immediately without post-processing correction.

πŸ› οΈ Architecture & Training

Base Model: Llama 3.2 3B Instruct

Framework: Apple MLX on Apple Silicon

Technique: LoRA (Low-Rank Adaptation)

Training Data: Curated SVG-description pairs with three refinement cycles

Output Format: Valid XML with proper SVG namespace declarations, viewBox, and optimized syntax

Llama 3.2 3B was selected for instruction-following capability. The model had no visual pretraining, but three training cycles built SVG competency:

Cycle 1 β€” Generated syntactically invalid SVG. Path data contained incorrect coordinate formats. Output showed no visual composition awareness.

Cycle 2 β€” Extensive data cleaning and manual curation. Syntax validity improved substantially. Pattern logic remained inconsistent for multi-element scenes.

Cycle 3 β€” Manual validation layers added to training pipeline. Edge cases introduced. Complex compositions improved. Output reached stable syntactic validity.

πŸ“Š Capabilities

The model generates complete, ready-to-render SVG code with proper structure:

Valid XML with correct SVG namespace declarations

Responsive Design with explicit viewBox and width/height attributes

Gradient Definitions in <defs> blocks where applicable

Multi-element Compositions with layered shapes and proper element ordering

Example input: β€œA minimalist sunset over a calm ocean with orange and purple gradients”

Example output: Complete SVG with layered rectangles, gradients, and proper viewBox scaling.

πŸ›‘οΈ Limitations

  • Training data limited to common design patterns β€” specialized visual styles will underperform
  • Static SVG only β€” complex animations require manual adjustment
  • Text rendering occasionally needs font and positioning corrections
  • Highly detailed illustrations may require post-processing
  • Not a replacement for human design work in production contexts

See the official Hugging Face documentation for complete technical details.

πŸš€ Quick Start

Hugging Face

Access the model and documentation

Ollama

ollama pull fahidnasir/svg-master && ollama run fahidnasir/svg-master "Generate a blue glowing circuit board icon"

Python

from mlx_lm import load, generate
model, tokenizer = load("fahidnasir/SVG-Master")
generate(model, tokenizer, "Generate a blue glowing circuit board icon")

πŸ’‘ Key Takeaways

  1. Data quality is the primary determinant of output validity β€” clean training examples matter more than dataset size for structured code generation.
  2. Three training cycles were necessary, not exceptional β€” systematic output failures at each stage revealed specific gaps requiring targeted data additions.
  3. Base model choice shapes the ceiling β€” code-unspecialized models require more fine-tuning data to reach comparable syntax reliability.
  4. Visual tasks require both structural and semantic alignment β€” prompt descriptions must accurately match their paired SVG, or the model learns inconsistent mappings.
  5. Partial success is not success for SVG β€” 90% valid output is not deployable if the remaining 10% renders as blank or broken graphics.