Article 4 minutes of reading

Sora, the AI which is triggering fear and awe

Innovation Technology

19.03

Sora, the AI which is triggering fear and awe

Article author :

François Genette

News addict, geek culture fan, digital tech aficionado and hardcore gamer, François Genette is passionate about everything related to digital. A journalist for nearly 15 years in the major national and local media, he now uses his pen to share his discoveries from the worlds he loves.

Last February 15, OpenAI, the firm globally known for its ChatGPT artificial intelligence, launched its new offspring, named Sora. Or, rather, it would be more accurate to say that it ‘teased’ it. Because this new video generator with mind-boggling visual rendering is currently merely in the test phase. Which has not prevented it from once again having the digital tech sector quaking in its boots.

‘Several giant woolly mammoths approach treading through a snowy meadow, their long woolly fur lightly blows in the wind as they walk, snow-covered trees and dramatic snow-capped mountains in the distance…’ This is a section of the prompt (the textual command given to the artificial intelligence) used by OpenAI during one of its first examples of Sora’s power. And here is the result …

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB
— OpenAI (@OpenAI) February 15, 2024

This example is far from the only one. On the same February 15, 2024, dozens of videos generated by OpenAI’s artificial intelligence were broadcast on the social networks. They all have two points in common: unbelievably striking realism in terms of the image, and a ‘camerawork’ technical quality which stands comparison with the most seasoned directors and camera operators.

An unexpected technological leap

But the most surprising thing about Sopra is its capabilities, a good deal more powerful than expected. After the ChatGPT tsunami, the whole sector was anticipating a certain stagnation for a couple of years before a new major advance in the AI domain. But real world events caught everybody unawares and nobody, not even the experts in the field, thought that an artificial intelligence would be capable of producing videos of such quality in 2024.

So, how is such stupendous progress to be explained? For many, it is the cutthroat competition between OpenAI and its competitors in the sector which spurred the Californian firm to very quickly take things to the next level. Google, Meta and Elon Musk are just some of the key players to have thrown themselves into the AI race. A battle which stimulates the exponential growth of testing in this domain and leads to the emergence of new revolutionary tools, with Sora being one of them.

This competition is certainly also the reason why OpenAI decided to ‘tease’ Sora so early, even though the tool is not ready. This is clearly demonstrated by the fact that Sora is only accessible to a very small minority of users. And then there are the statements by OpenAI’s boss, Sam Altman, who admits himself that the platform’s current model still contains flaws to be resolved before its launch.

How does Sora function?

To understand how Sora functions, let us, first of all, remind ourselves of its capabilities. OpenAI’s new AI can produce high-quality videos of a maximum duration of one minute on the basis of a textual command. But that’s not all, as it can also transform an existing still image into an animated sequence, in paying particular attention to every last detail. Moreover, it can listen to an existing video or address shortcomings in it by adding missing images it has itself generated.

To achieve these results, Sora uses an architecture of processors similar to the latest GPT models. This architecture makes use of hundreds of millions of connections carried out by powerful algorithms. These algorithms arrange the texts received into units infinitely much easier to analyse, which are then combined with descriptive and visual information.

Sora then uses this data to reconstitute the videos by drawing on an immense database it has access to. The process goes through several stages before the final generation in order to progressively improve the quality of the samples to obtain a sharp high resolution image.

This approach enables Sora to not only produce incredibly realistic videos, but also to train and improve itself on each performance.

The current limitations of the Sora model

As impressive as it is, there are limits to what Sora can do. And they are pretty similar to those found in the majority of image generators. One expression sums them up marvellously: ‘The Devil is in the details’. In point of fact, whilst the images are generally very realistic, artificial intelligence often struggles badly to match up with reality. We thus discover a certain number of errors, even discrepancies, in certain shots, such as a cat with one paw too many, or a face in which the nose isn’t exactly where it should be, or buildings whose architecture seems more than unrealistic.

Another problem is Sora’s difficulty with the physics of our world. The artificial intelligence does not yet appear to have acquired the ability to realistically deal with the complex interactions between a person’s actions and their environment. For example, one video shows a man blowing on candles without them being extinguished.

Finally, there is time and again the problem of Sora’s capacity for iteration. Here again, as with Midjourney, DALL-E and the other image generators, Sora is incapable of precisely reproducing the characters or graphical elements it generates from one occasion to the next. In fact, by making use of an astronomical quantity of date for each of its productions, it produces ‘works’ originating from ‘unique’ combinations which are impossible for it to reproduce. At least, up until now…

Sora, a menace or a blessing?

In any event, the advent of Sora raises a series of legitimate concerns as to its impact on society. In fact, alongside other similar technologies, this new artificial intelligence represents a major challenge in terms of security and ethics. We are obviously thinking about the dissemination of disinformation, already worrying today, and which risks considerably intensifying with the arrival of such a powerful and easily accessible tool.

Sora could also render obsolete a large number of professions in the audiovisual sector, leaving its skilled workers vulnerable to unemployment and placing at risk the very means of existence of numerous artists. In fact, large-scale accessibility and extensive automation could signify the disappearance of the need for talent and expertise, little by little leading to the replacement of art by the generation of content on the basis of mere lines of text.

On the other hand, some see in Sora the opportunity to revolutionise the creation of content, in promising unprecedented gains in time and creativity. For these optimists, Sora will be limited to the role of a superpowered assistant, facilitating the production of sequences of illustration and relevant contextual images, and thereby permitting the abolishing of the use of generic stock images, a widely used practice up until now.

It remains to be seen what the actual launch of the tool – which has yet to be allocated a definitive date – has in store for us. Knowing OpenAI, it is not out of the question that, between now and then, other major surprises are added to Sora and trigger awe and fear in equal measure concerning this new tool’s potential.

Call for projects

A story, projects or an idea to share?

Suggest your content on kingkong.