There’s a temptation when you first open the Story AI Pipeline Configurator to turn everything on.
Transcription. Speaker diarization. Emotion analysis. Scene detection. B-roll queries. Thumbnail generation. AI music. The full stack.
Resist it.
Not because those tools aren’t good — they are. But because raw footage doesn’t need everything at once. Neither do you.
The Three Stages of a Pipeline
Every Story AI job moves through up to three stages. Each stage builds on the one before it.
Stage 01 is the prerequisite for everything else. You cannot run Emotion analysis without a transcript. You can run Visual independently — but it’s most powerful when it can cross-reference with text.
This dependency chain is also a cost chain. You pay for what you actually need, in the order you actually need it.
The Live Calculator
Before committing to any job, the Pipeline Configurator shows you the exact cost. Drag the duration slider to match your footage, toggle services on or off, and the total updates instantly.
💰 Live calculator
Updates in real time
The purple bar — Remaining after job — shows your current minute balance minus what this job will consume. If it goes red, you’re short. Top up before you run.
The duration shown is the actual bin footage in that project — only files you explicitly added to the project’s Media Bin are counted. Library files that aren’t in the project bin don’t inflate the estimate.
How the Calculation Works
Every service has a base cost per minute of footage. A margin is applied on top — visible in the calculator as the final price.
The formula is simple:
Final price = (base cost/min × margin) × duration in minutes
Example: Whisper base cost is €0.006/min. With 20% margin applied: €0.0072/min. On 60 minutes of footage: €0.43.
Add Story Summary (€0.019/min after margin) and you’re at €1.57 for the full 60-minute job. That includes a full transcript with word-level timestamps and a GPT-4o story summary. For a news piece, that’s your script draft.
Start With Transcribe. Always.
Our recommendation, every time: run Transcription first.
The moment your footage has a transcript, something changes. You can talk to your material. Open the AI chat in your project, ask questions:
- “When does the interview get emotional?”
- “Find every mention of the deadline”
- “What’s the strongest quote in the first ten minutes?”
The AI answers in timecodes. You’re not scrubbing anymore. You’re reading.
Why this works: Transcription is cheap and fast — a 60-minute interview through Whisper costs under €0.50. Once you’ve had that conversation with your footage, you know exactly which additional services are worth running. You’re not guessing.
From there, you build. Maybe you add Emotion Analysis because you realised the sentiment curve would’ve found the key scene faster. Maybe you add B-roll queries because you’re spending too much time hunting for cutaways. You construct the pipeline around what the story is asking for — not what the feature list suggests.
Processed Once. Stays Forever.
This is the part that changes how you think about cost.
Every file that passes through a processing stage — Transcription, Emotions, Visual — stays on the server permanently, stored on the media file itself.
This means the data is not tied to a specific project — it’s tied to the file. Delete a project, create a new one with the same footage, and all the enrichment data is still there. You don’t pay to run Whisper on the same footage twice. You don’t re-generate keyframes six months later for a follow-up piece. You don’t repeat Visual AI because a client wants a different cut.
It’s already there. On the file. Forever.
What you can do — at any time, for a fraction of the cost — is ask Story AI to re-contextualise it.
| Layer | Run | Cost | Example |
|---|---|---|---|
| Visual AI | Once | Higher — GPU + vision models | Scene detection, keyframes, GPT-4o frame descriptions |
| Speaker diarization | Once | Medium — per minute | Who says what, when — permanent record |
| Transcription | Once | Low — fast model | Full transcript, word timestamps, 99 languages |
| Story Summary | Re-runnable | Very low — text only | Different angle, shorter format, different language |
| Narrative Outline | Re-runnable | Very low — text only | 3-act structure, Hero’s Journey, newsroom format |
| Quote Extraction | Re-runnable | Very low — text only | Different selection criteria, different tone |
The foundational work makes your footage machine-readable. The story layer makes it editable by intent. Once your footage is machine-readable, Story AI can slice it a hundred different ways without touching the underlying analysis.
Where You Buy Processing Minutes
Processing jobs run against your AI Credits balance — measured in minutes of footage.
1 credit = 1 minute of video. The balance is shared across all your jobs.
Settings → Story AI → Buy Credits
All payment and account management lives in Settings — the same place where you manage your subscription plan, billing history, invoices, and profile. Credits are sold in packs (S / M / L) via Stripe. They never expire. They’re yours for as long as your account exists.
If you’re logged in, your current balance is always visible in Settings. The Pipeline Configurator shows you exactly how many minutes a job will consume before you confirm — so you know whether to top up or trim the pipeline.
Settings → Story AI → Buy AI CreditsWe put payments here — not in a separate billing portal, not in a pop-up — because everything that touches your account should be in one place. Plan upgrades, credit purchases, invoices, connected services. One tab. No surprises.
A Practical First Week
If you’re new to Story AI, here’s how to think about your first few jobs:
Day 1: Run Transcription only. Talk to your material. See what it knows.
Day 2–3: Add Summary + Quotes. Now you can write a script by talking, not scrubbing.
Week 1: If you’re working with multiple speakers, add Diarization. If you need visual coverage, add Scene Detection and Keyframes.
When you’re ready: Run Visual AI on the sequences where it will actually save time. It’s not needed on every file — it’s needed on the ones where you’re hunting for shots.
By then, you’ll have a pipeline config that’s yours. Not ours. Not the template’s.
The footage is already there. You just need to tell the machine what you’re looking for.
Start with a transcript. The rest follows.