Skip to main content
AIVideo Production

Generative AI Video And Film Production

By 1st January 2024April 22nd, 2025No Comments

Introducing Generative AI Technologies

During 2022 and 2023 a variety of photo and video Generative Artificial Intelligence (AI) technologies were released to a wildy enthusiastic audience of digital content creators. Examples include Midjourney and Adobe Firefly – both designed to create images from text prompts.

As a progression from image generation, similar principles have been applied to the creation of video content and, in a similar way to still images, the results are often close but not exactly as the user intended. Examples of video generators include Kaiber.ai, Pika.art and Runway Gen1 and Gen2 with many new apps, each with their own unique take on the process, being added all the time. Creating exciting, novel content by proving a text prompt or image is becoming easier and more sophisticated use of prompts, filters and styles are enabling users to generate content closer to their original vision.

How Generative AI Works

Generative AI creates image, video and audio content by using deep learning (a ‘machine learning’ technique). A vast array of content is ingested into the system where different styles, techniques and compositions are analysed and differentiated. Neural networks (essentially computers running learning processes similar to that of the human brain) analyse patterns, shapes, objects and the overall feel or intent to establish a series of principles upon which to build a new image, video or audio clip. The greater the number and variety of the source material, the more effective and realistic the results. In very simple terms, when a user makes a request to generate video via a text prompt or provides a visual guide by using an image, the system analyses and interprets the intention then synthesises what it considers to be the most likely pleasing result by reconstituting styles, shapes, textures and overall feel from a vast number of sources. The next section further describes some of the components that enable this process.

Components

There are three key computing principles upon which video AI systems are built:

Machine Learning

Machine learning algorithms enable computers to predict decisions and outcomes from datasets. As the computer system collects more data, the way the data is process adapts to produce the most effective or efficient outcome.

Computer Vision

Computer vision allows computers to understand the content of visual data such as images and video including identification of objects, types of video and an understanding of context. Computer vision is trained to recognise environments, lighting conditions and movement.

Language Processing

Natural language processing is crucial to the video AI production process because many interfaces require the use of text prompts. In this case detail, context and even sentiment are all interpreted by computer systems to produce an appropriate array of results.

Video Processing

Considering the 3 principles, we can now examine the key processes involved in creating the final product:

1

Establish A Dataset

The collection and analysis of a large number of videos which, depending upon the focus of the type of computer model chosen, may include a wide variety of themes and styles. Many of these videos are preprocessed to extract relevant elements which can then be identified and logged later in the process.
2

Select A Generative Model

A generative model is required. There are a variety of flavours including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Recurrent Neural Networks (RNNs). Each flavour has its own algorithm and method used to synthesise new video content from the network of existing video material within the dataset.
3

Analyse The Data

The synthesis process requires an 'understanding' of the type of content within its library in terms of content, context, meaning and intent. The patterns and structures within each video are assessed using computer vision. This is referred to as training the generative model and where necessary can be adapted to suit specific requirements. It is also important to train it to minimise the differences in the 'intent' between the output and the visual dataset to fulfill the original request via text or image prompt.
4

Prompt Using Text, Image Or Video

The Generative AI system needs a starting point. Text, images and even video can be used as prompts to guide the AI, using natural language processing and machine learning, towards the type of content you are looking to produce. Many applications allow the user to apply a "style", essentially a selection of filters that apply a look and feel to the video and often include options such as steampunk, cartoon, cinematic or 3D animation. Due to the slightly unpredictable nature of generative AI content, most applications provide 2 or 3 preview options with variations upon the theme for the user to choose from before commiting.
5

Generate Video Output

With the previous 4 steps completed, an optimised AI system can now generate a complete video clip based upon the interpretation of text, image or source video prompts. The AI system organises the visual components, composition and filters to produce the required output. The larger the dataset (source video material in this case) the greater the chances of the final video content matching the users intended outcome.

It is important to note that Generative AI Video is an intensive computational process usually undertaken remotely using powerful PCs trained upon large datasets of original and licensed video material. The ongoing costs to run these systems has led most companies to create subscription based options ranging from $5 a month to $100 depending upon the platform and the amount of video you want to generate.

Generative AI Video Examples

Below are examples illustrating how generative video AI can produce a variety of interesting results from text and image prompts. These examples are from Runway’s Gen 2 model where text prompts can be used to create a single AI generated image which is then animated automatically or by using a ‘motion brush’ to highlight areas of movement. In addition it is possible to adjust camera angles and position. The prompts used to generate these images are indicated beneath each example.

“digging hole in a post-apocalyptic desert with spade, cinematic, sunset”

“beautiful mountain range at sunset, cinematic, romantic, dynamic”

“a big ocean wave at daybreak. cinematic, film, moody, high resolution”

(image only prompt with camera rotation and zoom out)

Uses For Content Creators

So how does this technology work and how might it be useful to video content creators? This rapidly developing technology, whilst hailed as a threat to traditional film making, remains in its infancy and any serious disruption remains to be seen. As already mentioned, there are some reasonably significant shortcomings. Some of them are highlighted in this article from No Film School discussing whether AI will replace the role of the video editor along with the ethical problems and disadvantages described in this article from Wired magazine. However, on the positive side, I believe Generative AI has the potential to make a positive contribution in the following ways:

  • Storyboarding – allowing rapid exploration of ideas and concepts during the writing stages for presentation purposes
  • Art direction – visual design development including the exploration of set design, characters, clothing and environments
  • Camera movement – experiment with camera movement and position within a scene
  • Education – learning through the exploration of video production concepts and short film production

Summary

The ability to match original intention and technical quality of results is developing at a rapid rate, but there is a long way to go before these systems could be described as reliable. For every inspired, original, dynamic video there are many more disappointing outcomes. The movement of organic life such as humans and animals is notoriously difficult and usually disappointing. Whether we reach a point where a director can produce their entire vision for a story without pandering to the constraints of the system (ie changing the content to suit content creation limitations) remains to be seen. Generative AI can help video producers during key stages of production and can be used to create meaningful video sequences to an extent, but do not underestimate the level of human intervention required. Try for yourself – Runway and Kaiber are both worth a look…each with their own advantages, styles and challenges.

Useful links

Early exploration of Midjourney
Artificial Intelligence and The Singularity
IBM’s Article on Generative Artificial Intelligence (AI)
IBM’s Article on Computer Vision

Peter Simcoe

Simcoemedia is the company created by Peter Simcoe. Peter is a freelance video producer, designer and photographer based in Chester, England. His clients include Airbus, Matterport.com, Toyota Motor Manufacturing, Loughborough University and many more companies across the UK and beyond.