Duty Now For The Future [AI edition]

Oct 18

Oct 18 Duty Now For The Future [AI edition]

First the eyes gave it away, then they figured out how to make them the same size and facing the same direction.

Fingers and limbs proved to be next the next tells, but now limbs usually look anatomically correct and fingers -- while problematic -- are getting better.

Rooms and vehicles and anything that needs to interact with human beings typically show some detail that's wrong, revealing the image as AI generated.

But again, mistakes are fewer and fewer, smaller and smaller, and more and more pushed to the periphery of the image, thus avoiding glaring error.

Letters and numbers -- especially when called to spell out a word -- provide an easy tell, typically rendered as arcane symbols or complete gibberish, but now AI can spell out short words correctly on images and it's only a matter of time before that merges with generative text AI to provide seamless readable signs and paragraphs.

All this in just a few years. We can practically see AI evolving right before our eyes.

Numerous problems still must be dealt with, but based on the progress already displayed, we are in the ballpark. All of this is a preamble to a look at where AI is heading and what we'll find when we get there. I haven't even touched on AI generated music or text yet, but I will include them going forward.

. . .

The single biggest challenge facing image generating AI is that it still doesn't grasp the concept of on model.

For those not familiar with this animation term, it refers to the old hand drawn model sheets showing cartoon characters in a variety of poses and expressions. Animators relied on model sheets to keep their characters consistent from cartoon to cartoon, scene to scene, even frame to frame in the animation. Violate that reference -- go “off model” as it were -- and the effect could look quite jarring.*

AI still struggles to show the same thing the same way twice. Currently it can come close, but as the saying goes, “Close don't count except in horseshoes, hand grenades, and hydrogen warfare.”

There are some workarounds to this problem, some clever (i.e., isolate the approved character and copy then paste them into other scenes), some requiring brute force (i.e., make thousands of images based on the same prompt then select the ones that look closest to one another).

When done carefully enough, AI can produce short narrative videos narrative in the sense they can use narration to appear to be thematically linked.

Usually, however, they're just an endless flow of images that we, the human audience, link together in our mind. This gives the final product, at least from a human POV, a surreal, dreamlike quality.

In and of themselves, these can be interesting, but they convey no meaning or intent; rather, it's the meaning we the audience subscribed to them.

Years ago when I had my first job in show biz (lot attendant at a drive-in theater), a farmer with property adjoining us raised peacocks as a hobby. The first few times I heard them was an unnerving experience: They sounded like a woman screaming help me.

But once I learned the sounds came from peacocks, I stopped hearing cries for help and only heard birds calling out in a way that sounded similar to a woman in distress.

Currently AI does that with video. This will change with blinding speed once AI learns to stay on model. The dreamlike / nightmarish / hallucinogenic visions we see now will be replaced with video that shows the same characters shot to shot, making it possible to actually tell stories.

How to achieve this?

Well, we already use standard digital modeling for animated films and video games. Contemporary video games show characters not only looking consistent but moving in a realistic manner. Tell the AI to draw only those digital models, and it can generate uniformity. Already in video game design a market exists for plug-in models of humans, animals, mythical beasts, robots, vehicles, spacecraft, buildings, and assorted props. There are further programs to provide skins and textures to these, plus programs to create a wide variety of visual effects and renderings.

Add to this literally thousands of preexistent model sheets and there's no reason AI can't be tweaked to render the same character or setting again and again.

As mentioned, current AI images and video show a dreamlike quality. Much as our minds attempt to weave a myriad of self-generated stimulations into some coherent narrative form when we sleep, resulting in dreams, current AI shows some rather haunting visual images when it hits on something that shares symbolic significance in many minds.

This is why the most effective AI videos touch on the strange and uncanny in some form. Morphing faces and blurring limbs appear far more acceptable in video fantastique than attempts to recreate reality. Like a Rorschach blot, the meaning is supplied by the viewer, not the creator.

This, of course, lends to the philosophical rabbit hole re quantum mechanics and whether objects really exist independent of an observer, but that's an even deeper dive for a different day.

* (There are times animators deliberately go off model for a given effect, of course, but most of the time they strive for visual continuity.)