What Is Generative AI Video And Is It Useful?

Video AI Technology

During 2022 and 2023 a variety of visual Generative Artificial Intelligence (AI) technologies were released to an enthusiastic audience of digital content creators. Examples include Midjourney and Adobe Firefly – both designed to create original visual content from text prompts and bitmaps such as photographs or artwork. On face value it would seem that the users imagination is the only limit within this exciting new world and that the role of the creative practitioner is under threat. However, after some usage these tools rapidly reveal their limitations whilst demonstrating how results may, and often do, deviate from intended outcomes…sometimes wildly. As a progression from image generation, similar computing principles and technologies have been applied to the creation of video content. Examples include Kaiber.ai, Pika.art and Runway Gen1 and Gen2 with many new apps, each with their own unique take on the process, being added all the time. Whilst there is still much work to be done, creating exciting, novel content using these tools is becoming easier and more sophisticated use of prompts, filters and styles are generating results closer and closer to original intent. So, in simple terms, how does this technology work and how might it be useful to video creators and film makers?

Components

There are three key computing principles upon which video AI systems are built:

Machine Learning

Machine learning algorithms enable computers to predict decisions and outcomes from datasets. As the computer system collects more data, the way the data is process adapts to produce the most effective or efficient outcome.

Computer Vision

Computer vision allows computers to understand the content of visual data such as images and video including identification of objects, types of video and an understanding of context. Computer vision is trained to recognise environments, lighting conditions and movement.

Language Processing

Natural language processing is crucial to the video AI production process because many interfaces require the use of text prompts. In this case detail, context and even sentiment are all interpreted by computer systems to produce an appropriate array of results.

Video Processing

Considering the 3 principles, we can now examine the key processes involved in creating the final product:

Establish A Dataset

The collection and preparation of a large number of videos which, depending upon the focus of the model, may include a wide variety of themes and styles. Many of these videos are preprocessed to extract relevant elements which can then be identified and logged later in the process.

Select A Generative Model

A generative model is required. There are a variety of flavours including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Recurrent Neural Networks (RNNs). Each flavour has its own algorithm and method used to synthesise new video content from the network of existing video material within the dataset.

Analyse The Data

The video synthesis process requires an 'understanding' of the type of content within its library in terms of content, context, meaning and intent. The patterns and structures within each video are assessed using computer vision. This is referred to as training the generative model and where necessary can be adapted to suit specific requirements. It is also important to train it to minimise the differences in the 'intent' between the output and the visual dataset to fulfill the original requirement whilst providing novel solutions.

Prompt Using Text, Image Or Video

The Generative AI system needs a starting point. Text, images and even video can be used as prompts to guide the AI, using natural language processing and machine learning, towards the type of content you are looking to produce. Many applications allow the user to apply a "style", essentially a selection of filters that apply a look and feel to the video and often include options such as steampunk, cartoon, cinematic or 3D animation. Due to the slightly unpredictable nature of generative AI content, most applications provide 2 or 3 preview options with variations upon the theme for the user to choose from before commiting.

Generate Video Output

With the previous 4 steps completed, an optimised AI system can now generate a complete video clip based upon the interpretation of text, image or source video prompts. The AI system organises the visual components, composition and filters to produce the required output. The larger the dataset (source video material in this case) the greater the chances of the final video content matching the users intended outcome.

It is important to note that Generative AI Video is an intensive computational process usually undertaken remotely using powerful PCs trained upon large datasets of original and licensed video material. The ongoing costs to run these systems has led most companies to create subscription based options ranging from $5 a month to $100 depending upon the platform and the amount of video you want to generate.

Generative AI Video Examples

Below are examples illustrating how generative video AI can produce a variety of interesting results from text and image prompts. These examples are from Runway’s Gen 2 model where text prompts can be used to create a single AI generated image which is then animated automatically or by using a ‘motion brush’ to highlight areas of movement. In addition it is possible to adjust camera angles and position. The prompts used to generate these images are indicated beneath each example.

“digging hole in a post-apocalyptic desert with spade, cinematic, sunset”

“beautiful mountain range at sunset, cinematic, romantic, dynamic”

“a big ocean wave at daybreak. cinematic, film, moody, high resolution”

(image only prompt with camera rotation and zoom out)

Uses For Content Creators

This rapidly developing technology, while enthusiastically hailed as a threat to traditional film making, remains in its infancy and any serious disruption within the industry remains to be seen. As already mentioned, there are some reasonably significant shortcomings. Some of them are highlighted in this article from No Film School discussing whether AI will replace the role of the video editor along with the ethical problems and disadvantages described in this article from Wired magazine. However, on the positive side, I believe Generative AI has the potential to make a positive contribution in the following ways:

Storyboarding – allowing rapid exploration of ideas and concepts during the writing stages for presentation purposes
Art direction – visual design development including the exploration of set design, characters, clothing and environments
Camera movement – experiment with camera movement and position within a scene
Education – learning through the exploration of video production concepts and short film production

Summary

The ability to match original intention and technical quality of results is developing at a rapid rate, but there is a long way to go before these systems could be described as reliable. For every inspired, original, dynamic video there are equally disappointing outcomes. These usually involve people moving their limbs, notorious hand / finger problems (where the source training material has limited visual data on hand / finger shape and movement) and stabilising human characteristics such as facial features. Whether we reach a point where a director can produce their entire vision for a story without pandering to the constraints of the system (ie changing the content to suit content creation limitations) remains to be seen. It has a role to play in key stages of production and can be used to create meaningful video sequences to an extent, but do not underestimate the level of human intervention required. One example of this would be the use of an AI video generator to create a series of backdrops and components for each scene, isolated on separate layers in Photoshop and then animated using an After Effects package such as Volumax.

Useful links

Early exploration of Midjourney
Artificial Intelligence and The Singularity
IBM’s Article on Generative Artificial Intelligence (AI)
IBM’s Article on Computer Vision

Tags:

Video AI Technology

Components

Machine Learning

Computer Vision

Language Processing

Video Processing

Establish A Dataset

Select A Generative Model

Analyse The Data

Prompt Using Text, Image Or Video

Generate Video Output

Generative AI Video Examples

Uses For Content Creators

Summary

Useful links

Tags:

About Peter Simcoe

About Simcoemedia

Simcoemedia News

Download CV / Resume

Related Links

What Is Generative AI Video And Is It Useful In Film Production?

Video AI Technology

Components

Machine Learning

Computer Vision

Language Processing

Video Processing

Establish A Dataset

Select A Generative Model

Analyse The Data

Prompt Using Text, Image Or Video

Generate Video Output

Generative AI Video Examples

Uses For Content Creators

Summary

Useful links

Tags:

About Peter Simcoe

About Simcoemedia

Simcoemedia News

Download CV / Resume

Related Links