Distinguishing "art" from "not art" in natural images is challenging, as artistic expression often emerges in unexpected places, from sculptural designs to logos and branding on everyday objects.
Our objective is to separate visual art from natural imagery, ensuring everyday scenes and objects are represented while minimizing intentional artistic elements. This figure illustrates our approach to defining the boundary between art and non-art images. We exclude graphic arts but retain other forms, such as architecture. The spectrum ranges from "definitely art" (e.g., photographs of tapestries, baroque architecture, or paintings) to "maybe art" (e.g., accidental art that isn't the main subject).
It further includes "maybe not art" (e.g., artistic elements in daily objects like doors, signboards, or decorative cakes) and "definitely not art" (e.g., nature or landscapes).
We train the Art-Free Diffusion on a dataset containing minimal graphic content. Our Art-Free Diffusion model is built
on a latent diffusion architecture, to prevent any art-related knowledge from leaking through the text embeddings, we instead use a language-only Text Encoder based on BERT.