Stacking AI Creatively

The Speed of AI

OpenAI has just released Point-E for 3D modeling.

Over the past weeks, Artificial Intelligence (AI) has continued to evolve and develop at a rapid pace. AI technology is being applied in a range of areas, from healthcare to finance, from manufacturing to retail, and from education to transportation. 

One of the most significant developments in AI has been the use of AI-powered robots in healthcare. In Japan, there’s work being done to develop robot-assisted surgery to detect tumor cells in a patient's body and accurately identify them with high accuracy. 

In the transportation sector, self-driving cars are becoming increasingly commonplace. Companies such as Waymo, Tesla, and Uber are actively developing and testing autonomous vehicles. These vehicles use AI to identify objects in the environment and make decisions on how to drive safely. 

In the manufacturing sector, AI is being used to automate processes and improve efficiency. Companies such as GE, Siemens, and ABB are using AI to automate production lines and reduce costs. AI is also being used to optimize supply chains, and to monitor and predict market trends.

High Concept, Low Knowledge

But these are all pretty high-concept applications of AI. Good applications, to be sure, but one thing we’ve observed over the years in IT development is that often the high concept stuff is the only approach that gets taken, and that leaves the less-technologically adept in the dust.

The evidence of these shortcomings in public education is everywhere: 

  • Data science is something that only a select few can even begin to approach, much less do well. And for good reason - that stuff gets complicated - but also a promising section of the industry with players like Airtable coming in to make it easier.

  • Web3 is a nebulous term that includes cryptocurrency, blockchain, DeFi, and more, all terms that your average consumer cannot parse. This is one reason why the Terra, Celsius, and FTX catastrophes are so spectacular: a layperson can’t understand what happened.

  • Game development involves an immensely complicated network of production tools that can’t realistically be easily handled without a devoted team. That’s why Stardew Valley success is so impressive - a single person created all of it.

And its the same with AI. Self-driving cars, surgery, etc., these are all great applications but don’t really help the average person get their own hands dirty with this flabbergastingly awesome technology.

We Don’t Like That

First, we realized that there’s not much talk about stacking AI resources. It’s common enough these days to hear about text completion or image generation (that’s not art, but that’s also a longer and separate discussion), but we haven’t ever seen any talk about combining these things, which is, in a word, dumb.

This is a missed opportunity. The combination of content is what makes truly cohesive and engaging narratives possible; leaving it on its lonesome is just a waste. And so by understanding that we’ve got text and image generation, translation, and vocal reproduction tools at our fingertips, it wasn’t very hard for us to realize that we could make some pretty good albeit rudimentary video content.

AI Story Creation

But what is a movie? It’s a story. So we needed that first. Off to OpenAI’s Playground we went, and fed it the following prompt:

There once was a man who lived on top of a mountain on the outskirts of Tokyo, Japan. He had lived there for many years, and his beard was long and gray. His only friend was Mich the goat, and it was Mich that went out to hunt every day for them to eat.

From that, we received:

One day, Mich did not come back, and the man became worried. He went out to look for his friend, and he found him lying on the ground, dead. The man was so sad that he decided to kill himself. He tied a rope around his neck and jumped off the cliff. As he was falling, he saw a beautiful woman walking along the path below. He reached out his hand to grab her, and she pulled him to safety. The woman told the man that her name was Kiku and that she was a spirit who had come to save him. She said that she would be his guardian angel and would always watch over him. The man was so grateful that he decided to live the rest of his days on the mountain with Kiku.

WOW! That got dark quickly, huh? But no matter, because it’s a viable tale. 

AI Translation

Next, we jumped over to DeepL to give us a Japanese translation of the above text. It’s a pretty good product, but not without its failures.

昔、東京の郊外の山の上に住んでいた男がいた。その人は長年そこに住んでいて、ひげも長く、白髪も生えていた。彼の唯一の友だちはヤギのミチで、毎日ミチがヤギを狩りに行って食べていた。ある日、ミッちゃんが帰ってこないので、男は心配になり、ミッちゃんを探しに出かけた。すると、ミッちゃんが地面に倒れて死んでいるのを発見した。男は悲しくなって自殺することにした。首にロープをかけ、崖から飛び降りた。その時、下の道を美しい女性が歩いているのが見えた。彼は手を伸ばして彼女をつかまえ、彼女は彼を安全な場所まで引っ張った。その女性は、自分の名前が「キク」であり、自分を救いに来た精霊であると男に告げた。そして、キクと名乗り、キクの守護神としていつも見守ってくれていると告げた。男は感謝し、残りの日々をキクと一緒に山で暮らそうと決心した。

There are two interesting things to notice here:

  1. It sorta screwed up Mich’s character and suggested that Mich (a goat) goes out every day to hunt goats. Not what we intended, so we’d mark this as a miss.

  2. It ended up referring to Mich as Micchan, which is a typical Japanese diminutive used as an affectionate nickname sorta thing. How the fuck did OpenAI know to do that? Count that as an unexpected win.

AI Voiceovers

From there, we needed to generate a voiceover to use on the video, so we went to Narakeet for that. Watch the video and you’ll hear what it gave us, but sound wise its perfect. Nothing but respect for that.

AI Image Generation

And then comes the lynchpin: visuals. There are a lot of AI image generators out there, but we chose to go with Midjourney and Craiyon. The former because it’s pretty damn good at producing very dramatic and rather fantasy-oriented images from prompts. The latter because it’s free and our trial period for Midjourney limits us to so many images.

Is It Complete?

We also needed music, but Aaron’s pretty good at that so he just threw something together real quick to add a touch of the human. We could’ve also used Boomy, Jukebox, Aiva, to name a few.

And then it was just a matter of assembly, which I (Matt) am pretty good at, and honestly this was low level stuff that only requires iMovie and maybe 10 minutes. Load the images and the audio into the video editor, cut the narration into chunks to give it some breathing room, mess around with ken burns effects, crops, and transitions, make intro and outro scenes, and then throw the soundtrack over it and you’ve got a stew cooking.

Previous
Previous

Out Like a Tiger, In Like a Rabbit

Next
Next

China’s Chances in Japan