Semantic attractors for creative minds

A guide for creative minds in generative AI

Oct 30, 2024

In the realm of generative AI, especially for creative industries, the concept of a "semantic attractor" offers insight into how models interpret and produce coherent, thematically driven content. Let’s explore what a semantic attractor is, why it occurs, and how it reveals itself in practice with examples from photography, image generation, and animation tools

A frame of AI Muse spot: the statue start talking and the semantic attractor implicit into the lipsync tool in Runway “humanize” the statue itself

What is a Semantic Attractor?

A semantic attractor is a kind of "gravity field" in AI-generated content, pulling associations towards a central theme or idea. A semantic attractor functions like a "gravity field" within AI-generated content, drawing associations toward a central theme or idea. It creates an implicit bias in the model’s latent space—the multidimensional landscape where all learned concepts reside. These attractors guide the AI to produce outputs that naturally align with specific themes or patterns, lending a cohesive or "thematic" quality to content, even without explicit rules. A simple way to observe this concept in action is by prompting a language model to take on a specific role, like “an expert in physics” or “a 5-year-old child.” Such roleplaying prompts act as boundaries, narrowing the model's range of choices and subtly shifting the semantic tone of every output to align with the requested mood.

The first time I’ve seeen the concept in action in images was in the early 2023 thanks to Francesco D’Isa, an Italian AI artist showing how the use of the word crow in a positive prompt on a text-to-imagine AI changed the whole atmosphere of the image (sometimes even without adding crow!). It works also with negative prompting, so if you don’t want your image too dark, writing something like “no crows” in the promt can work.

In simpler terms, imagine each semantic attractor as a magnet drawing related ideas, visual features, or associations into the generated output.

Examples of semantic attractors in action

The rabbit R1 camera: redefining reality with multimodal AI prompting

The RabbitR1 camera captures images in real-time and uses generative AI to interpret and modify reality through what’s called a multimodal prompt system. Unlike traditional cameras, the RabbitR1 doesn’t just "see" and replicate what’s in front of it. Instead, it processes and transforms the visual input based on prominent elements it detects.

How does the Rabbit r1 camera’s multimodal system work?

The RabbitR1 operates with a multimodal AI that integrates visual recognition with generative capabilities. Essentially, it combines elements of object recognition and style transformation. When it detects a strong visual motif—like a chess piece, a type of flooring, or a specific texture—it doesn’t simply document that motif. Instead, it interprets the object as a "theme" and subtly shifts the surrounding environment to reflect that theme. This shift creates a cohesive, stylized version of reality rather than a pure reproduction, like adding a filter that adapts specifically to elements within the frame.

Here's the process broken down:

image Recognition as a prompt: the Rabbit r1 first identifies prominent objects or textures in the scene. These serve as multimodal prompts, meaning they’re both visual data and thematic cues that the AI interprets as anchors. If it sees a chess piece, for instance, it interprets this as an attractor for a "chess theme."
generative modification: Once a theme is identified, the camera’s AI system generates stylistic adjustments to replicate and extend that theme across the rest of the frame. This means that recognizing a chess piece might result in a stylized effect where other objects take on "chess-like" qualities, creating a unified aesthetic as though the whole scene belongs in a chess-inspired world.
replication of detected patterns: the AI is trained to create visual coherence based on detected patterns. When it recognizes certain textures—like wooden flooring (parquet)—the rabbit R1’s multimodal AI tends to render other materials in the scene with wood-like textures. If it detects concrete, it might make everything appear rugged or textured, reflecting a cohesive industrial vibe.

Chessifying reality: imagine photographing a single chess piece on a table with a clear background in a large view of a room. For RabbitR1, this chess piece doesn’t exist in isolation; it serves as a prompt for a chess-themed aesthetic. The model might add a subtle grid pattern to the table, or other items in the background might take on muted, chessboard-like hues, creating a scene reminiscent of a chessboard without placing explicit chess elements everywhere.

Wooden world with parquet detection: if the rabbit R1 sees a patch of wooden parquet, it’s likely to interpret the scene through a "wooden" lens. Objects might take on warmer, wood-like textures, and any metallic items could appear bronzed or rustic, adding a naturalistic coherence to the scene. The color dominance would be warm. Instead of the entire room’s elements standing out in stark contrast, they align with the parquet's warm, organic tone.

Why does the camera do this?

Similar to how generative models handle text prompts, the RabbitR1’s AI system contains a latent space—a multidimensional array of data that captures relationships between textures, materials, and themes. When an object like a chess piece or parquet is detected, the latent space activates "anchors" that map related textures or styles, guiding the generative AI to subtly alter other visual elements to fit the anchor.

Moreover, the model is trained on vast datasets of thematic scenes, learning the textures and styles typically associated with specific objects. Parquet floors are associated with organic, warm tones and textures.

Thus, when rabbit R1’s AI recognizes one of these anchors, it applies a style transfer, harmonizing the rest of the scene with the detected element's aesthetic.

These strong associations within the model that align related concepts are biases too: they ensure that the scene feels unified, as though everything aligns with the detected theme. But they also may led to standardized – or worst – imagines, refletcting human biases of the dataset.

Semantic attractor in action with the AI Muse spot

We used a mix of Midjourney and Runway to create AI Muse promo (see the video here) with the statue, seeing the semantic attractor in action in an unexpected way.

Runway’s lipsync tool can animate still images, making static faces "speak." But when Runway tries to animate something unconventional—like the face of our statue—its AI model defaults to human-centric features due to its semantic attractors for speech. Since its training data lacks examples of speaking statues, Runway’s AI relies on attractors related to human speech, resulting in some fascinating transformations. The statue’s mouth becomes slightly more colorful, the lips softer and more expressive, all to align with the AI’s "understanding" of a face that is speaking.

This humanization effect arises from Runway’s reliance on a latent space populated with speaking humans. When attempting to animate a statue, the absence of "speaking statues" in the training data causes the model to fall back on human-related features—such as warm tones, naturalistic lip movement, and subtle facial expressions. Essentially, the AI adapts the statue’s face to match human norms for speaking, softening or coloring features in ways that mimic a real human’s expressions. And this needs to be corrected if you truly want a speaking statue.

Semantic attractor for creatives

Creatives can use semantic attractors to build a recognizable personal style in their outputs. By choosing certain recurring elements or tones as attractors, artists can encourage the AI to consistently apply these themes across projects. For example, an artist might use floral motifs or warm lighting as attractors, resulting in a body of work that visually feels connected and signature.

Attractors allow creatives to explore themes in a nuanced way, exploring their own patterns, or quickly iterating on ideas with small adjustments to prompts or object choices.

Brands can train custom AI models with semantic attractors aligned to their identity, reinforcing brand voice and style. For example, a fashion brand might train an AI model with specific attractors related to luxury or natural textures, producing visual assets that consistently feel on-brand without extensive manual adjustments.

Artists can also experiment with atypical attractor combinations to push the boundaries of their art. Combining attractors like “minimalist architecture” with “organic textures” might yield unexpected visuals, merging clean lines with nature-inspired details in novel ways. This process of combining unrelated attractors lets artists explore new visual territories and generate unique, innovative styles.

In storytelling, whether through video, gaming, or marketing, semantic attractors enable the AI to weave mood, emotion, and narrative depth into visual outputs. For example, in video game design, attractors linked to specific settings (like “mystical forest” or “urban dystopia”) could guide the AI in rendering scenes that feel immersive and atmospherically unified. This storytelling potential is particularly powerful in adaptive media, where attractors can shift in response to user choices, creating a dynamic visual narrative that evolves with the player’s journey.

On a more technical level, creatives with access to training data – like in open sources, offline models –, or with fine-tuning, can influence the attractors within their own models. By training an AI model on a carefully selected dataset that emphasizes certain visual qualities, creatives can establish strong attractors around these qualities. This leads to a model that naturally gravitates toward certain aesthetics, like soft lighting or vibrant color palettes, even when given varied prompts.

Certain AI platforms allow conditional prompts, where different attractors can be layered or adjusted based on user preferences. This flexibility enables a model to shift between multiple thematic attractors, allowing for dynamic adjustments in style without retraining the model.

Thanks for reading AI Muse! This post is public so feel free to share it.

AI Muse