AI photo generation · · 8 min read
Midjourney: a prompt guide
Prompt anatomy, parameters, sref and cref, consistent characters and fixing hands. A hands-on guide.
Midjourney v7 does not read your mind. It reads your prompt. The gap between a random image and one that lands in a client moodboard is not “a better idea” — it is a better prompt structure and deliberate use of parameters. This guide is for designers and creators who want to move from “rolling the dice” to a repeatable process. No magic. Just control.
Anatomy of a prompt
A good prompt has six layers. You do not need all of them in every image, but when the result disappoints you, one of them is almost always missing:
- Subject — what we see: “an elderly watchmaker at her bench”.
- Medium — the technique: photography, watercolour, 3D render, vector illustration.
- Style — an aesthetic reference: art deco, brutalism, 1970s cinema.
- Lighting — rim light, golden hour, neon, soft window light.
- Composition — the frame and lens: close-up, wide shot, top-down, 35mm.
- Mood — melancholic, energetic, clinical, fairy-tale.
Order matters. Midjourney weights words more heavily at the start of a prompt. Subject first, then medium and style, technical detail last. What you actually want goes up front.
The parameters you actually use
Parameters go at the end of the prompt, after a double dash. The list worth memorising:
--ar 16:9— aspect ratio.--ar 2:3for posters,--ar 3:2for photography,--ar 1:1for icons.--stylize 100— how hard the model applies its own taste. Lower = more faithful to the prompt, higher = prettier but “looser”.--chaos 25— spread across the four variants in a grid. High chaos gives four different directions instead of four similar ones.--weird 250— controlled weirdness. Introduces unusual, sometimes surreal aesthetic choices.--sref— style reference: you attach an image (or style code) the model borrows aesthetics from, not content.--cref— character reference: keeps a consistent character across shots (face, clothing).--no— exclusion.--no textremoves lettering,--no peopleempties the scene.
Ranges: --stylize takes 0–1000, --chaos 0–100, --weird0–3000. Do not touch three at once when starting — change one parameter and watch what happens. Otherwise you never learn what each one does.
When to raise and when to lower stylize
This is the most common question, so let me be concrete. --stylize is a slider between “do exactly what I wrote” and “make it pretty your way”.
- Low (0–100) — for a precise brief, product layout, diagram, exact composition. The model obeys the words.
- Medium (100–300) — the default zone for most commercial work. A balance of control and beauty.
- High (500–1000) — when you want inspiration, a poster, something striking, and detail fidelity does not matter.
Practical rule: the more detailed your prompt, the lower the --stylize. High stylize on a long, precise prompt is a recipe for frustration — the model starts “improving” the very things you wanted exactly as written.
Permutation prompts
Permutation prompts are the fastest way to explore variants without retyping by hand. Braces with comma-separated options generate every combination at once. In prose: you write the subject, then in the medium slot you drop a brace with three options (photography, watercolour, isometric render), and in the time-of-day slot a second brace (dawn, dusk). Three times two gives six jobs from a single command.
Use this at the start of a project: one axis is medium, the other is mood. In a few minutes you have a board of directions and you pick one to refine. Mind the cost — permutations multiply the number of generations, so keep the lists short (2–4 options per axis).
Image prompts and blending
You can paste one or more image URLs at the start of a prompt — Midjourney treats them as a visual starting point. Three uses:
- Image + text — the image sets direction, the text dictates changes (“this building, but at dusk, in the rain”).
- Blending two images — you fuse aesthetics and form from two sources into one hybrid.
- Image weight — the
--iwparameter controls how strongly the image outweighs the text description.
Distinguish this from --sref. An image prompt affects both content and style.--sref takes only the aesthetic — palette, texture, the way light is painted — and leaves the subject to your text. For a visually consistent series use--sref, not a raw image prompt.
Consistent characters
The hardest problem in Midjourney work, and the most common reason clients come back with notes.--cref solves most cases:
- First generate one strong character portrait you like. That is your canon.
- Attach its URL via
--crefto the next prompts with new scenes and poses. - The
--cwparameter (0–100) controls how much the model keeps: high--cw 100copies face, clothing and hair;--cw 0keeps only the face and lets the rest go.
To change the character’s outfit between scenes, lower --cw, otherwise the model insists on the same coat. Consistency is not magic — it is discipline: one canonical image and sticking to it across the whole series.
Fixing hands and text
Two classic weak spots. Hands improved in v7 but can still fail. What works:
- Hands — avoid frames where hands are in the foreground and spread out. Use
Vary (Region), select only the hand and regenerate just that patch. - Text — short quoted strings sometimes come out, longer ones almost never. For logotypes and lettering you genuinely need, add them in a graphics editor — do not fight the model over spelling.
- If you do not want random lettering on the image, add
--no text.
Upscaling and Vary (Region)
From a grid of four images you pick one and raise the resolution (Upscale). This is the finishing stage, not the exploratory one. Two tools worth knowing:
- Vary (Subtle / Strong) — generates variants of the whole image: subtle keeps the composition, strong lets it drift.
- Vary (Region) — you select a patch (background, hand, detail) and regenerate only that, optionally with a new prompt for that area. This is inpainting and the most powerful tool for fixes without losing the rest of the frame.
Studio rule: compose a good base through the prompt first, then fix with Vary (Region). Trying to nail everything with one perfect prompt is a waste of time.
Common mistakes
Most Midjourney disappointments come from a handful of repeatable causes. Before you blame the model, check whether you are making one of these:
- An overloaded prompt — twenty adjectives do not give a richer image, just a muddier one. The model spreads its “attention” across every word, so none of them lands. Stick to one clear subject and a few strong traits.
- Conflicting signals — “minimalist, richly ornamented, baroque, clean” is four directions at once. The model averages them into mush. Decide.
- Fighting parameters instead of the prompt — if the composition is wrong,
--stylizewill not fix it. Fix the subject and framing description first, leave parameters for last. - Writing negatives into the body — “no background” in the prompt body sometimes adds a background, because the model sees the word “background”. Use
--nofor exclusions, not a negated sentence. - Abandoning a good seed — when you catch a composition close to target, do not start from scratch. Use Vary or
--sreffrom that image to keep what already works.
The overriding rule: one prompt, one intent. Explore broadly at the start, then narrow. The best results rarely land in the first grid — they land in the third or fourth, once you know what you want and say it to the model more simply.
Five example prompts
- Product portrait: a watch on dark stone, macro photography, soft window light from the left, shallow depth of field, minimalist mood,
--ar 4:5--stylize 80. - Film poster: a lone figure on an empty street in the rain at night, neon, 1980s cinema, wide shot,
--ar 2:3--stylize 600--chaos 20. - Consistent mascot series: canonical portrait + a new scene (the mascot drinking coffee in an office), flat vector illustration,
--cref [url]--cw 80--ar 1:1. - Style exploration: fixed subject, a permutation brace on medium (watercolour, ink, isometric render),
--ar 3:2--chaos 40. - Interior render: a Scandinavian living room at golden hour, architectural photography, 24mm lens, warm light,
--ar 16:9--stylize 150--no people.
TL;DR
Build the prompt in layers: subject, medium, style, lighting, composition, mood — in that order. Low --stylize for precise briefs, high for inspiration. --srefcopies the aesthetic, --cref holds the character. Permutations for exploration, image prompts for hybrids. Fix hands and details with Vary (Region), add text in an editor. Midjourney rewards structure, not luck.