illustration

Copyright© Schmied Enterprises LLC, 2025.

Generative AI is not a new concept, yet the hardware was once prohibitively expensive. Artificial intelligence concepts have been around for decades, but it wasn't until the mid-2010s that hardware began catching up with the needs of image recognition. By 2020, hardware had reached the capability to train large language models (LLMs).

These days, training generative text can be accomplished with a laptop or a Mac Studio. The next frontier is image generation, which typically requires larger NVidia Tesla or H100 GPUs housed in data centers. However, the user base for image generation is much smaller. Text generation can be used by a broad range of white and blue-collar workers, while image generation is usually confined to the marketing and media industries, where stock images are often in use for extended periods.

What will the next generation of AI startups accomplish? It's not as complex as it may seem. Imagine you are an author. You take the PDF version of your latest fiction novel, open The App, and paste in the document. The system uses LLMs to summarize each chapter and extract text from approximately one hundred scenes across twenty chapters. It then runs an inference, converting each scene description in English to an OpenUSD file. These files can be directly fed into NVidia tools to generate a scene.

The software's potential doesn't end there. It can modify the OpenUSD files for each scene using stable diffusion or a similar Monte Carlo algorithm, allowing for changes in camera position and the addition of characters.

A text-to-audio model can transform conversations into audio. And just like that, your movie adapted from your book is ready.

Of course, there's always room for refinement in such a movie. Advanced user interfaces can let you tweak camera positions, text, tone, and speech.

All of this hinges on computing power. Image generation requires significantly more computing power than text generation. Video generation demands exponentially more computing power than both image and text generation.

This is the essence of the discussions. Whether you invest in a cluster or rent one from a cloud provider, the opportunities are immense.

Envision a retired, widowed man. He opens The App, uploads old images of his late wife, and writes a script of cherished moments with his wife and children. The App generates videos that help him find joy and manage his dementia.

Students can gather scripts for school projects, and with just a few clicks, they can create videos that don't compromise their privacy or that of their peers.

You can record your dancing and replace your image with that of a stuffed bear, easily sharing the video without revealing your identity, while still providing enjoyment to others.

Picture a famous actor using images from their prime in their twenties. They license the content to a filmmaker. Once the images are entered into The App, the movie starts selling, and the actor earns a significant income from royalties.

Who will establish that startup? There will likely be many. The future is now. Celebrate!