Welcome to Generation AI, the first issue of a new newsletter by Kurt Cagle that focuses on Generative AI, emphasizing prompts, techniques, plugins, models, and similar user-facing optimizations. It is part of The Cagle Report collection of newsletters and is geared primarily to a lay technical audience. It’s also a convenient place for me to park ideas for reference later. If you have a prompt or toolset you want to share (or if you simply want to know how to ask prompt questions, please feel free to drop me a line at kurt.cagle@gmail.com.
An Author’s Notebook
When talking about ChatGPT, there is a natural inclination to try to focus everything on one (or perhaps two) prompts. This is the search mentality orientation to chat, and while helpful, it tends to misrepresent what I believe is a far more valuable mode of operation: the notebook.
This mode is more exploratory and is intended to explore themes and ideas rather than simply getting answers. Programmers frequently use Jupyter notebooks to explore datasets or when trying to develop code. Authors of books, on the other hand, can use notebooks to keep track of characters, places, and plot ideas and can also take advantage of chat to provide some surprisingly sophisticated insights into both theme and process.
The examples here all made use of the paid ChatGPT 4 service. I’m still waiting for the upgrade to the unified service so this part may change slightly. Still, suppose you haven’t been upgraded to multimodal. In that case, you will also need to select the DallE-3 mode, especially if you expect to do a lot of visualization (this should be available automatically in multimodal).
Authors are often described as either pantsers (they work by the seat of their pants) or plotters (they work from a particular outline). It turns out that pantsers also tend to be character-driven - they let the characters determine the direction of the plot, oftentimes with only a loose idea of where they expect the story to go. This makes for a more spontaneous story but also has a good chance of ending up in the swamp of despair, where the character wanders into unexpected plotlines that prove difficult to resolve and usually result in the author never getting the book to the finish line.
Plotters, conversely, may have an obvious idea about where they want their characters to go (they are plot-driven). Still, the stories, as a consequence, may feel somewhat arbitrary and formulaic. Plotters get the stories done, but they can often feel duller and lifeless than the pantser version.
The ideal solution is to work somewhere in the middle - let the story be character-driven but with the plot providing structure. Starting from scratch, this process can be very time-consuming and frustrating, but this is also one of those areas where ChatGPT and AI can come to the rescue, helping the author better structure their approach without actually letting the AI write their story.
(Re-)writing Storm Crow
Writing is an iterative process, and sometimes, those iterations can go in unexpected directions. I started writing Storm Crow several years ago, featuring as a protagonist a young female medical doctor named Breannen (Bree) Storme, a woman living in Seattle who finds herself with powerful mage-like powers at the age of 23 in a world where paranormal creatures from vampires and werewolves to dryads and minotaurs existed but lived in the shadows. The character was intended to support multiple books in a series.
The story ended up pivoting between three different storylines - one where Bree went from Seattle to Portland, a second where she stayed in Seattle and became part of an FBI-like task force, and a third in which she went to Europe to find the people responsible for the murder of her family. As you can probably guess from this alone, I am a pantser, and as such am much more character-oriented as an author than I am plot-driven. I’ve written three separate novels but haven’t been happy with them.
Consequently, when I was casting about looking for a topic on ChatGPT as writer’s assistance, focusing on Storm Crow and the adventures of Breannen Storme seemed a good candidate.
So powering up ChatGPT, I started typing in what I knew about the characters (there are a number of them). For Bree, as an example, I wrote the following:
[Me] Breean Lily Storme (Bree) is a character in the Storm Crow series by Kurt Cagle. She is 23 years old, was born in Edinburgh, Scotland, in 2000, and is the daughter of David Storme and Maeve Sinclair. She describes herself as looking like what Seattle would look like ig Seattle was a young, Gothic woman.
DallE-3 returns two images when requested, though I’ll only show the one that I think best illustrates the concept in question. In this case it responded with:
[ChatGPT] Here are the images inspired by the character Breean Lily Storme, embodying the essence of Seattle with a Gothic touch.
I then elaborated on the character’s role as a doctor:
[Me] Bree is in the last year of her medical residency, after studying medical pathologies, with an emphasis on diseases and injuries of the Fae and other paranormals in the area. She is often sought out by the halfling street population, including dwarfs (Dwaerves), vampires (Vampyr), werewolves (Wyrs), Dryads and Satyrs (Dryas), Orcs (Arucs), and others, many of whom disguise themselves. She's intelligent, agile and graceful, but tends to suffer from ADHD. She's also a mage, telekinetic, and seer, though most of these talents haven't fully manifested yet.
With the response being:
[ChatGPT] The images reflect the character of Bree, a medical resident with a specialization in paranormal pathologies.
For the next step, I described the breakdown that Bree had at the beginning of the story, and asked for an analysis.
[Me] At the start of the book, Bree had a mental breakdown after a nine year old girl patient of hers died from late stage leukemia. She tried using her magic to fight the infection within the girl's body, As a consequence, she took a leave of absence and sequester herself in her apartment above her sister Cassandra's (Cassie's) bakery, working late nights trying to figure out how she could have saved the girl. She's taken to wearing t-shirts and jeans, the apartment is a mess, and she spends all of her time when not sleeping or reviewing her father's notebooks on magic at the computer trying to create a magical solution.
Analyze this.
This shifts the focus away from rendering images and into looking at character motivations and story-telling hints:
[ChatGPT] Bree's state of mind at the start of the book sets a tone of deep introspection and highlights the struggle of coping with the limitations of her powers and the harsh realities of life and death. The scene you describe paints a picture of someone who has been deeply affected by a traumatic event, driving her to obsessively search for answers. This determination, combined with her isolation, suggests a character on the brink of a significant transformation or revelation.
Her attire of t-shirts and jeans, along with the disarray of her living space, conveys her disregard for her own well-being and the external world as she delves into her late father's notebooks, seeking knowledge that could have changed the outcome for her young patient. The use of magic as a potential solution indicates her desperation and the blurred lines between her medical practice and her magical heritage.
This starting point also creates a basis for a narrative arc that could lead Bree through a journey of healing, acceptance, and perhaps a redefinition of her own limits and strengths. It's a poignant setup that can resonate with readers who have experienced their own forms of loss or helplessness.
Wow! I had not been expecting the level of introspection when I wrote the prompt, and the response helped to better define for me several key aspects of the character at this point in the story. This was when I began to think that ChatGPT might prove to be more valuable than simply a vehicle for generating pictures. Speaking of,
[Me] Render a photograph of Bree in her apartment.
The response?
Oooof! That image captures so much - despair, loneliness, isolation, the disorganization in her life, her lack of self-care. She’s really punishing herself. Already, I’m beginning to think of aspects of the character I hadn’t picked up before, all due to a single image.
I slipped into author mode at this point, writing a short passage similar to one from the first chapter:
[Me] Bree picked up a a notebook, one that she'd found in a box in her room from several years before when she'd lived with her grandfather Isaac. The notebook contained drawn sigils accompanied by descriptions in Greek, Hebrew, Aramaic, and mathematical notation, all languages that she'd taught herself while trying over the years to understand the significance of the notebook.
ChatGPT spun its wheels, then responded to my prompt WITH a prompt:
[ChatGPT] To visualize this scene, I'll create a detailed prompt for an image of a woman in her apartment examining an old, worn notebook filled with sigils and inscriptions in various languages.
Prompt: "Create an image of a young woman, with fair skin and long, wavy chestnut hair, sitting at a cluttered desk in a dimly lit apartment. She wears a simple t-shirt and jeans, her expression one of deep concentration. In her hands, she holds an ancient-looking notebook, its pages yellowed and edges frayed. The open notebook reveals intricate sigils and arcane symbols drawn meticulously on the left side, with annotations in Greek, Hebrew, Aramaic, and complex mathematical equations on the right side. The desk is strewn with other magical tomes and scattered notes, illuminated by the soft glow of a computer screen displaying more esoteric diagrams. The room around her is a mix of shadows and the warm light from a desk lamp, emphasizing the late hour and her dedication."
Let's generate two images using this prompt to bring the scene to life.
This is getting into meta territory, with ChatGPT in essence responding with its own prompt to generate an image. All I needed to do was confirm this:
[Me] Do it.
A few seconds later, DallE-3 came back with two spectacular images.
The level of imaging on this is simply incredible, including the nice touch of reflecting one of the sigils on the computer behind her. The lamp and candle give just enough illumination to make details visible without breaking the rather dark and mysterious mood. Significantly, it also feeds into my mind ideas about exposition I hadn’t begun to explore before. Again, jumping into author mode:
[Me] After some searching, Bree reached out and begun constructing a series of sigils drawn from the notebooks, speaking each glyph’s Aramaic name in turn. The sigils emerged, one by one, from the book and created a multi-dimension shape glowing blue in the air before her, the same color reflected in her own aura flashing blue in resonance. To one side, her Russian Blue kitten Catya watched with wide eyes, her fur sparking with the same resonance. Breathily she repeated the spell, then whispered the terminating “Amen”. The world paused in anticipation.
ChatGPT took a long time to parse through this, but the results were worth it:
(I told Chat GPT to render new images to get the second batch.)
Of all of these, I think I like the third one best, especially with Catya, the kitten, entranced by the escaping sigils escaping from the page with a very worried look on its face, the absorption of Bree with what she’s seeing, and the window acting as a framing device while hinting at a storm going on outside. There’s a feeling of anticipation here, as if something BIG is about to happen.
So, let’s make that happen:
[Me] Then reality exhaled. The room turned a brilliant blue white as lightning caused the window behind her to explode inward. Deep within her, Bree felt as if an interconnected lattice work of chains broke free, chains that had become so much a part of her that she no longer noticed them. A fierce, rain-laden wind howled through the room, picking her and Catya up throwing them outward, tearing through the fabric of the universe itself, until the world turned black.
I ended up doing six images with this one, and while not as satisfied with any of them, I will lay the blame on myself for trying to describe events that occurred in the space of a moment on that I would have trouble illustrating myself.
The second one, with Bree center stage and the cat going flying, is probably the best overall, but again, these are primarily acting as catalysts for ideas.
This is a seminal point in the book as it shows the exact moment that the bindings that had prevented Bree from accessing most of her powers finally broke.
Using ChatGPT in this manner allows you as an author to create a scene by scene representation of what’s going on. It can also be used to identify characters, locations and relationships. These will be covered in the next post.
Things to Watch
Clicking on images will let you see them at full size (1024 pixel square) and more important save the images. Currently it’s very easy to lose links to images in chats. The chats themselves can be persisted, however, by clicking on the Share Chat button at the upper right part of the session.
This can be addictive, and a real boon to authors who may be struggling to visualize their worlds. Have fun!
In Media Res,
this is awesome! The pictures are amazing and the insights and wording from Chat is great. Did it take you long to have this deep conversation with Chat? I'm sure there was plenty of fumbling around to get it to this state?