huaxizi

V.I.T.I.V. (2023-ongoing)

A collaborated project with Yan Hui (here's a project write-up by Yan Hui), born out of an AI+art hackathon “AIathon.” It introduces a new perspective on perception and interaction as a multi-agent system, working with text-image-video AI models.

“I am VITIV.”
“I am a ____.”
“I see the world with you.”

As a multi-agent system, VITIV engages with humans/artists through its camera lens, offering a unique dialogue about shared “observations.” Named after the Video-to-Image-to-Text-to- Image-to-Video model it employs, VITIV represents a dynamic feedback loop between intelligent agents. This reflective process influences evaluation systems and enhances our collective ability to perceive and create, or imagine.

At its current prototype stage as a web application, VITIV presents a live video feed on the browser, allowing users to inquire about the captured images. Asking questions like "What do you see?" initiates the conversation with the system, where it responds based on the captured image. This exchange of thoughts and vision can continue, and at any point, users can request VITIV to generate a video reflecting its vision, in parallel to the live video feed its camera lens captures as well as what our human eyes see.

The models we’re currently working with are CogVLM, an open-source visual language model developed by Zhipu AI and Tsinghua University, which will generate text from captured image, and Runway Gen-2, which takes both text and image to generate videos. Though both VITIV’s input and output are in forms of video, it is important to go through the models with using image and its captioning with natural language, instead of directing training an image to image model. The first step of the work is to critically understand the image/video visual culture we’re working with in description of the natural languages humans have used to construct the world with.

***

This project started at a hackathon named “AIathon” as the kick-off event of its 8th Annual Conference of Network Society, “Counter-Culture? Resetting all (im)possibilities of Technology,“ hosted by the Institute of Network Society in the School of Intermedia Art at China Academy of Art. The hackathon was 48 hours happened on November 4th & 5th, 2023 and took place at the Zhangjiang Science Hall in Shanghai.

VITIV was given an “Inventor Award” among four prized projects at “AIathon.”

VITIV was also presented at the Eighth Annual Conference of Network Society, (live video archive from the timecode, 5:24:29 to 5:36:58)

always virtually floating..... hua xi zi "cecilia" ...

BACK⤴

V.I.T.I.V. (2023-ongoing)