Sean Thomas

AI just exploded. Again

This technology will shake our world

  • From Spectator Life
(iStock)

When they come to write the history of the AI revolution, there’s a good chance that the writers will devote many chapters to the early 2020s. Indeed, such is the pace, scale and wildness of the development, it is possible entire books will be devoted to, say, what happened in the last week or so.

This is happening now, not in some dystopian future

If you’ve not been paying attention, let me talk you through it. On 15 February Sundar Pichai, CEO of Google, and Demis Hassabis, the head tech bro at DeepMind (the London-based AI company bought by Google in 2014) announced the launch of Gemini Ultra 1.5 Pro. It may sound like a slightly superior razor-blade, in truth it is a very serious machine.

To demo Gemini 1.5’s capabilities, Google fed it the entire 402 page transcript of the Apollo 11 mission. They then asked Gemini to find ‘three comedic moments’ within that text. It did so, in 30 seconds – e.g. it found astronaut Michael Collins ‘betting someone a cup of coffee’. To repeat, Google didn’t search for specific words, the researchers asked a machine to find ‘three funny bits’ in 402 pages. It did so. They also showed Gemini a simple and rubbish drawing, apparently by a four year old, of what could be a treading boot. They asked Gemini to identify this moment in the document, using just the child-like drawing. Gemini correctly identified it as picturing the moment Neil Armstrong stepped on to the lunar surface. 

In other words, Gemini 1.5 has majorly impressive conceptual abilities, and it can consume, interpret and parse textual and visual information much faster than any human. Not a good moment for researchers, historians, biographers, analysts. Such is the achievement of Gemini 1.5, some experts believe it has passed an ‘advanced Turing Test’ devised by AI skeptic and NYU academic Gary Marcus back in 2014, thus becoming true AI.

If Marcus is looking embarrassed by AI development, he’s not alone. So is Yann LeCun, and he is a ‘godfather of machine learning’ and the actual head of AI at Meta (alias Facebook) – so he should know what he’s talking about. Apparently, he doesn’t. On 13 February at the World Governments Summit in Dubai, LeCun publicly mused on the possibility of AI text-to-video – i.e. the idea that one day AI might be able to create convincing videos from merely verbal prompts, the same way ChatGPT can create poems, novellas, stories, code, from brief suggestions.

LeCun said: ‘basically we don’t know how to do this, properly. It doesn’t work for video. What works for text doesn’t work for video.’ He then went on to explain that this might become possible in the future using a new architecture which he himself, perhaps relevantly, is creating.

Oops. Two days later, on 15 February, and about 90 minutes (I’m not joking) after Google announced Gemini 1.5 Pro, the $80 billion AI company OpenAI – which increasingly resembles Roald Dahl’s chocolate factory, a mysterious building concealing many wonders – revealed Sora. And Sora is a machine which can do precisely what Yann LeCun said is currently impossible. It can create convincing 60 second videos from verbal prompts. If you ask it to create a scene of a woman walking through Tokyo, it does that. It can create finely detailed videos of dogs playing in snow, cars driving past dinosaurs, an ant marching down an ant tunnel, and people staring out of train windows (capturing the complex reflections with unnerving skill).

If Gemini 1.5 was a scary moment for anyone who analyses and summarises words and pictures for a living, then Sora is an existentially terrifying moment for anyone who works in video, TV, movies, advertising – and that means anyone, from the actors to the directors to the make-up artist.

In fact, we can already see this in action. Since the advent of Sora one Hollywood mogul, Tyler Perry, has indefinitely halted the construction of an $800 million studio – citing his ‘shock’ at Sora’s abilities. At the same time, James Hawes, a director of the hit TV drama Slow Horses, is predicting entirely AI made TV shows: in three to five years.

Moreover, Sora 1.0, as has been noted, is as ‘bad as this technology will ever be’. It will only improve from here, and probably at stupefying pace. A year ago, the best text-to-video imagery was laughably poor scenes of Will Smith eating spaghetti, where his face fell apart on contact with the pasta. Now – a mere ten months later – we have a machine which can render, in exquisite and plausible detail, a lizard morphing into a bird, found-footage from the California gold rush, a kangaroo disco dancing, or imagined scenes from Lagos Nigeria in 2050.

Unsurprisingly, the advent of Sora has invoked howls of anguish – and anger – from people in the creative sector. And you cannot blame them. Yes, it might be fun to chortle at overpaid Hollywood stars and smarmy Soho copywriters finally getting their comeuppance, but it is worth sparing a moment for people in this field.

Imagine you are 26 and really good at, say, animation, a creative task you love, which you have spent years refining. With one announcement, OpenAI have just taken away your raison d’etre, the unique creative purpose in your life, at the same as removing your means of employment, and of feeding your family. That is going to hurt, intensely; we may see suicides. Nor am I exaggerating the threat. Jeffrey Katzenberg, the ex-head of Disney, reckons ‘90 per cent of animation jobs’ will be gone in three years. This is happening now, not in some dystopian future.

The impact of Gemini and, in particular, Sora, goes deeper still. Such is its power it has got experts wondering if this is not just a glimpse of actual artificial general intelligence, but something closer to artificial super intelligence, a computer so much smarter than us we might never truly understand it.

The senior AI research scientist at mega-chip-corp Nvidia, Jim Fan, has speculated on Twitter/X that Sora seems to have an ‘intuitive grasp’ of physics: it appears to understand how objects move through space-time, interacting with the world. Sora wasn’t taught this physics, it has somehow acquired it, so this is possibly an ‘emergent skill’ – the same way, perhaps, that complex language emerged from primate grunting and howling. No one sat humans down and taught us how to talk; language simply emerged, followed by writing.

If all this sounds bewildering, console yourself that the feeling is widely shared. Some philosophers are wondering in puzzlement if Sora might be proof that we exist in an AI simulation – an extreme cosmological theory which has been gaining ground in recent years. Others have suggested that most people cannot grasp what is happening in AI in the same way most people cannot grasp the concept of exponential growth: just as we find it hard to comprehend the idea that if you put one grain of sand on a chess square then two on the next, then four, you soon end up with more rice than exists in the universe, so we do not comprehend how AI will speedily scale up in power, especially if it becomes recursive – if it learns how to improve itself.

That then, was one week in February 2024, in the field of AI. It is undeniably exciting – and, for many, deeply disturbing. And we are only two months into the year.

Comments