Introducing Generative AI Technologies
During 2022 and 2023 a variety of photo and video Generative Artificial Intelligence (AI) technologies were released to a wildy enthusiastic audience of digital content creators. Examples include Midjourney and Adobe Firefly – both designed to create images from text prompts.
As a progression from image generation, similar principles have been applied to the creation of video content and, in a similar way to still images, the results are often close but not exactly as the user intended. Examples of video generators include Kaiber.ai, Pika.art and Runway Gen1 and Gen2 with many new apps, each with their own unique take on the process, being added all the time. Creating exciting, novel content by proving a text prompt or image is becoming easier and more sophisticated use of prompts, filters and styles are enabling users to generate content closer to their original vision.
How Generative AI Works
Generative AI creates image, video and audio content by using deep learning (a ‘machine learning’ technique). A vast array of content is ingested into the system where different styles, techniques and compositions are analysed and differentiated. Neural networks (essentially computers running learning processes similar to that of the human brain) analyse patterns, shapes, objects and the overall feel or intent to establish a series of principles upon which to build a new image, video or audio clip. The greater the number and variety of the source material, the more effective and realistic the results. In very simple terms, when a user makes a request to generate video via a text prompt or provides a visual guide by using an image, the system analyses and interprets the intention then synthesises what it considers to be the most likely pleasing result by reconstituting styles, shapes, textures and overall feel from a vast number of sources. The next section further describes some of the components that enable this process.
Components
There are three key computing principles upon which video AI systems are built:
Machine Learning
Machine learning algorithms enable computers to predict decisions and outcomes from datasets. As the computer system collects more data, the way the data is process adapts to produce the most effective or efficient outcome.
Computer Vision
Computer vision allows computers to understand the content of visual data such as images and video including identification of objects, types of video and an understanding of context. Computer vision is trained to recognise environments, lighting conditions and movement.
Language Processing
Natural language processing is crucial to the video AI production process because many interfaces require the use of text prompts. In this case detail, context and even sentiment are all interpreted by computer systems to produce an appropriate array of results.
Video Processing
Considering the 3 principles, we can now examine the key processes involved in creating the final product:
It is important to note that Generative AI Video is an intensive computational process usually undertaken remotely using powerful PCs trained upon large datasets of original and licensed video material. The ongoing costs to run these systems has led most companies to create subscription based options ranging from $5 a month to $100 depending upon the platform and the amount of video you want to generate.
Generative AI Video Examples
Below are examples illustrating how generative video AI can produce a variety of interesting results from text and image prompts. These examples are from Runway’s Gen 2 model where text prompts can be used to create a single AI generated image which is then animated automatically or by using a ‘motion brush’ to highlight areas of movement. In addition it is possible to adjust camera angles and position. The prompts used to generate these images are indicated beneath each example.
“digging hole in a post-apocalyptic desert with spade, cinematic, sunset”
“beautiful mountain range at sunset, cinematic, romantic, dynamic”
“a big ocean wave at daybreak. cinematic, film, moody, high resolution”
(image only prompt with camera rotation and zoom out)
Uses For Content Creators
So how does this technology work and how might it be useful to video content creators? This rapidly developing technology, whilst hailed as a threat to traditional film making, remains in its infancy and any serious disruption remains to be seen. As already mentioned, there are some reasonably significant shortcomings. Some of them are highlighted in this article from No Film School discussing whether AI will replace the role of the video editor along with the ethical problems and disadvantages described in this article from Wired magazine. However, on the positive side, I believe Generative AI has the potential to make a positive contribution in the following ways:
- Storyboarding – allowing rapid exploration of ideas and concepts during the writing stages for presentation purposes
- Art direction – visual design development including the exploration of set design, characters, clothing and environments
- Camera movement – experiment with camera movement and position within a scene
- Education – learning through the exploration of video production concepts and short film production
Summary
The ability to match original intention and technical quality of results is developing at a rapid rate, but there is a long way to go before these systems could be described as reliable. For every inspired, original, dynamic video there are many more disappointing outcomes. The movement of organic life such as humans and animals is notoriously difficult and usually disappointing. Whether we reach a point where a director can produce their entire vision for a story without pandering to the constraints of the system (ie changing the content to suit content creation limitations) remains to be seen. Generative AI can help video producers during key stages of production and can be used to create meaningful video sequences to an extent, but do not underestimate the level of human intervention required. Try for yourself – Runway and Kaiber are both worth a look…each with their own advantages, styles and challenges.
Useful links
Early exploration of Midjourney
Artificial Intelligence and The Singularity
IBM’s Article on Generative Artificial Intelligence (AI)
IBM’s Article on Computer Vision