Art-Free Generative Models: Art Creation Without Graphic Art Knowledge

Hui Ren*, 1,
1ShanghaiTech University 2MIT 3Northeastern University

*Co-first authors

Teaser Image

(a) We introduce Art-Free SAM, a carefully curated text-to-image dataset with minimal graphic art content, used to pretrain Art-Free Diffusion model (θ).
(b) We show three famous artists styles reproduced and generalized by Art-Free Diffusion after exposing an Art Adapter to a small sample (A) of each artist’s work.

We explore the question: ``How much prior art knowledge is needed to create art?". To find out, we designed a text-to-image generation model that skips training on art-related content entirely. Then, we developed a straightforward method to create an "art adapter," which learns artistic styles using just a handful of examples. Our experiments reveal that the art generated this way is rated by users as on par with pieces from models trained on massive, art-heavy datasets. Finally, through data attribution techniques, we illustrate how examples from both artistic and non-artistic datasets contributed to the creation of new artistic styles.


Text-to-image generation models have revolutionized generative AI, producing high-quality, user-defined images that have even won art competitions. However, their ability to replicate specific artistic styles and potential memorization of training data has raised ethical and legal concerns, leading to lawsuits from artists. While opt-out strategies and watermarking have been proposed to address these issues, they often face scalability and effectiveness challenges. Our approach takes a different path, showing that it is potentially possible to develop tools valuable to artists without training on art datasets at all. By focusing on post-training adaptations, we offer a solution that respects intellectual property while fostering creativity and innovation.

Art-Free Diffusion

Distinguishing "art" from "not art" in natural images is challenging, as artistic expression often emerges in unexpected places, from sculptural designs to logos and branding on everyday objects. Our objective is to separate visual art from natural imagery, ensuring everyday scenes and objects are represented while minimizing intentional artistic elements. This figure illustrates our approach to defining the boundary between art and non-art images. We exclude graphic arts but retain other forms, such as architecture. The spectrum ranges from "definitely art" (e.g., photographs of tapestries, baroque architecture, or paintings) to "maybe art" (e.g., accidental art that isn't the main subject). It further includes "maybe not art" (e.g., artistic elements in daily objects like doors, signboards, or decorative cakes) and "definitely not art" (e.g., nature or landscapes).

We train the Art-Free Diffusion on a dataset containing minimal graphic content. Our Art-Free Diffusion model is built on a latent diffusion architecture, to prevent any art-related knowledge from leaking through the text embeddings, we instead use a language-only Text Encoder based on BERT.

Teaser Image

Method

To train an Art-Style Adapter, we collect a few examples of artworks in a specific style X0 ∈ A and caption the content of the artwork. This can be done automatically or manually. To connect the newly learned style information with specific tokens in the prompt, we append a text "in the style of V* art" to the content prompt, denoted as C*. To enable the model to learn this new artistic style, we fine-tune the U-Net module using LoRA. The generated image should match the style of a small exemplar dataset when prompted with a caption C*, which includes a style prefix V*. For example, if C* = "People walking along a riverside path with colorful trees in the style of V*", the image should reflect both the scene (content) and the specified artistic style. Content loss ensures that the visual elements of the prompt C = "People walking along a riverside path with colorful trees" are accurately depicted, while style loss maintains the distinct artistic qualities associated with V*.

Teaser Image

Demo

Evaluation

Our Art-Free Diffusion model shows limited style trans- fer with training-free methods, suggesting that traditional models may rely on inherent artistic biases. Unlike our model, traditional models have seen vast amounts of art, enabling them to internalize stylistic patterns for effective style transfer.

Teaser Image

(Left) Results of the Perceptual User Study; Art-Free Diffusion with Adapter method (green bar) is preferred over im- age editing baselines, on par with Adapter on the SD1.4 backbone and favored less with StyleAligned (SD1.4), however the margin of preference is narrow between the baselines. (Right) Quantita- tive evaluation of the baselines, Art-Free Diffusion with the Art Adapter acheives a good trade-off between the style and content.

Teaser Image

Qualitative Comparison - Image Stylization

Comparison of our method and other image stylization baselines for the artist Van Gogh. All captions contain a suffix “in the style of Vincent van Gogh / V* art””.

Teaser Image

Qualitative Results

Art generation results (of art generation and image stylization) and training images are shown in Figures 20–36. We demonstrate our model’s ability to replicate diverse artistic styles: Impressionism (Monet, van Gogh, Corot), Art Nouveau (Klimt), Fauvism (Derain), Abstract Expressionism (Matisse, Pollock, Richter), Abstract Art (Kandinsky), Cubism (Picasso, Gleizes), Pop Art (Lichtenstein, Warhol), Ukiyo-e (Hokusai), Expressionism (Escher), and Postmodern and Geometric Abstraction (Miró, Battiss). The captions and reference images are sampled from the LAION Pop dataset. The examples shown are generated after training on just 10-15 samples from the artist, representing the model's only exposure to the artistic style.



Data Attribution

We find that our Art Adapter can generalize from a small Art-Style training set and generate seemingly novel images that are coherent with the given artistic style. To better understand which training images contributed to the synthesized image, and to check whether the art filtering may have overlooked some art content that influenced the result, we applied an off-the-shelf data attribution technique. For each generated image, we retrieved the top five attributed images from both Art-Free SAM and Art-Style examples. While we expect stylistic elements to dominate, real-world influences from the Art-Free dataset play a significant role. In the style-inspired generation, distinctive artistic features capture the essence of the style, yet the attribution method uncovers real-world elements beneath, as though the style has been gently overlaid on the content.


BibTeX

@misc{ren2024art-free,
      title={Art-Free Generative Models: Art Creation Without Graphic Art Knowledge},
      author={Hui Ren and Joanna Materzynska and Rohit Gandikota and David Bau and Antonio Torralba},
      year={2024},
      eprint={2412.00176},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.00176},
      }