Video Subtitle Translator
When to Use
Use this skill when the user asks to create subtitles for a local audio or
video file, translate subtitles, produce .srt or .vtt sidecar files, embed
soft subtitles into a video, or burn translated subtitles into video frames.
The ASR path is OOMOL built-in Fusion API Qwen ASR file transcription. Do not
ask the user for an ASR provider key and do not rediscover the ASR capability at
runtime. The LLM translation path uses the model configuration from oo llm config --json; do not ask the user for an OpenAI key unless oo llm config
fails or returns unusable values.
Inputs
-
Required: a local audio or video path, or a publicly reachable audio URL.
-
Translation intent: most video-subtitle product requests are for translated subtitles because the viewer usually does not share the source language. Translate when the user explicitly asks for translated subtitles, a translated video, video translation, or “subtitles and translation”.
-
If the user asks to “add subtitles”, “make subtitles”, “subtitle this video”, or otherwise requests video subtitles without clearly saying whether they want translation, pause before execution and ask them to choose a target language for translation or explicitly confirm same-language/no-translation subtitles.
-
If the request is only to transcribe, generate captions, or add same-language subtitles, and that same-language/no-translation intent is explicit, do not translate.
-
Default translation target: when translation is clearly requested and the user does not name a target language, infer the target from the natural language of the user’s request. The language they used to ask is usually their preferred subtitle language. For example, a Chinese request defaults to Simplified Chinese (
zh), a Japanese request defaults to Japanese (ja), a Spanish request defaults to Spanish (es), and an English request defaults to English (en). For Chinese, use Simplified Chinese unless the user writes in Traditional Chinese or explicitly asks for Traditional Chinese, Cantonese, or another variant. -
Ask when unclear: if the wording makes it unclear whether translation is wanted, ask the user which language to translate subtitles into, or ask them to confirm they want same-language/no-translation subtitles. If translation is clearly wanted but the user’s preferred target language cannot be inferred from the request, ask which target language to use.
-
Defaults: source language omitted for unknown or multilingual audio;
enableWords: true;enableITN: false; SRT output; burned-in MP4 for translated videos; sidecar-only output for audio. -
Default video delivery mode: burned-in MP4. Most users want a single MP4 file whose subtitles are visible everywhere, especially on social platforms, mobile players, and upload workflows that ignore subtitle tracks.
-
Secondary video delivery mode: soft-subtitle MKV. Use MKV when the user asks for editable/selectable subtitles, no re-encoding, subtitle tracks, multiple languages, MKV specifically, soft subtitles, or when burn-in encoding is not available or unsuitable.
-
Optional: source language code, output directory, VTT output, soft subtitle mode (
soft-mkvorsoft-mp4), burn-in mode, subtitle language code, and whether to keep both original and translated subtitle tracks. -
Optional translation style inputs:
translation_profile,domain,audience,style_notes,glossary, andvideo_context.translation_profiledefaults togeneral.- Supported profiles:
general,film_tv,youtube_explainer,technical_course,interview_podcast,news_documentary,business_training,gaming_stream, andkids_content. - Infer
translation_profilefrom the user’s wording when obvious. Usefilm_tvfor films, TV dramas, sitcoms, streaming shows, and scripted dialogue. Useyoutube_explainerfor YouTube-style explainers, reviews, tutorials, and creator videos. Usetechnical_coursefor courses, API walkthroughs, coding videos, academic talks, or other terminology-heavy material. Useinterview_podcastfor interviews, podcasts, panel conversations, or unscripted long-form speech. Usenews_documentaryfor news, documentary, or factual narration. Usebusiness_trainingfor corporate, sales, compliance, or onboarding material. Usegaming_streamfor game streams and esports content. Usekids_contentfor children’s videos. If the scene is unclear, keepgeneralinstead of asking. domainshould briefly describe the topic when known, such asAI developer tools,medical lecture, orsitcom dialogue.audienceshould describe the expected viewer when known, such asgeneral viewers,software developers, oradult streaming viewers.style_notesshould preserve the user’s requested style, for examplenatural Netflix-style Chinese subtitlesoraccurate but not too formal.glossaryis an optional list of source-to-target term mappings for names, products, acronyms, technical terms, and recurring phrases.video_contextis optional high-level context such as title, description, speaker notes, episode setting, or project-specific terminology.
-
Supported Fusion API source language codes are
zh,yue,en,ja,de,ko,ru,fr,pt,ar,it,es,hi,id,th,tr,uk,vi,cs,da,fil,fi,is,ms,no,pl, andsv. Omitlanguageinstead of guessing when the user says auto, unknown, or multilingual.
Execution
This skill ships a bundled helper script at scripts/subtitle-tools.mjs.
Resolve it relative to this SKILL.md directory and prefer it for local
transcript-to-subtitle conversion and LLM subtitle translation. Do not recreate
the conversion or translation code inline when the script is available.
1. Check FFmpeg First
Run:
ffmpeg -version
ffprobe -version
If either command is missing, stop and guide the user to install FFmpeg before processing video or extracting audio. Suggested installs:
- macOS with Homebrew:
brew install ffmpeg - Windows/Linux: download the matching prebuilt FFmpeg archive from
https://github.com/jellyfin/jellyfin-ffmpeg/releases, extract it, and add the directory containingffmpegandffprobetoPATH.
Do not recommend sudo apt install ffmpeg for Linux by default. It is more
invasive than needed for this workflow and may install an older or differently
configured distro build.
After installation, ask the user to open a new terminal or refresh PATH, then
rerun the version checks. Do not continue with video work when FFmpeg is
missing. For a publicly reachable audio URL and sidecar-only subtitle output,
FFmpeg is not needed unless conversion, muxing, or burning is requested.
2. Check Node.js First
The bundled helper script requires a local JavaScript runtime. Use Node.js 18
or newer because the script is an ES module and uses the built-in fetch API.
Run:
node --version
node -e "const major=Number(process.versions.node.split('.')[0]); process.exit(major >= 18 ? 0 : 1)"
If node is missing or the version check fails, stop and guide the user to
install Node.js 18 LTS or newer before running scripts/subtitle-tools.mjs.
Suggested installs:
- macOS with Homebrew:
brew install node - Windows with winget:
winget install OpenJS.NodeJS.LTS - Ubuntu/Debian: install Node.js 18+ with NodeSource or
nvm; distroapt install nodejs npmis acceptable only when it provides Node.js 18+
After installation, ask the user to open a new terminal or refresh PATH, then
rerun the version checks. Do not continue with local subtitle conversion or LLM
translation when the JavaScript runtime is missing or older than Node.js 18.
3. Prepare Audio 3333
Create a stable work directory such as outputs/subtitles-<input-name>/. Reuse
the same directory on reruns when the user is retrying the same input.
For local video or any media that needs normalization, extract mono 16 kHz WAV:
ffmpeg -y -i "$INPUT_MEDIA" -vn -ac 1 -ar 16000 -c:a pcm_s16le "$WORK_DIR/audio.wav"
For a local audio file that Fusion API can read directly, uploading the
original file is acceptable. If the upload or ASR rejects the format, convert it
with the same FFmpeg command and upload audio.wav.
Upload the audio that will be transcribed:
oo file upload "$AUDIO_PATH" --json
Use the returned downloadUrl as fileURL. Do not pass local filesystem paths
to Fusion API connector actions.
4. Submit Fusion API ASR
Use this exact connector action:
oo connector run "fusion-api" \
--action "qwen_asr_filetrans_submit" \
--data @submit-asr.json \
--json
Payload skeleton:
{
"fileURL": "https://...",
"language": "en",
"enableITN": false,
"enableWords": true,
"channelID": [0]
}
Rules:
fileURLis required and must be the uploaded audiodownloadUrlor a public audio URL.- Omit
languagewhen unknown or multilingual; otherwise use one supported code from the input list above. - Set
enableWords: trueso the result can be converted into timed subtitles. - Omit
channelIDunless the user specifically asks for one or more channels. - The submit response returns
sessionId. Save it injob.created.json.
5. Poll and Fetch Result
Poll state with:
oo connector run "fusion-api" \
--action "qwen_asr_filetrans_state" \
--data "{\"sessionID\":\"$SESSION_ID\"}" \
--json
Expected states:
{"state":"processing","progress":...}: wait and poll again.{"state":"completed"}: fetch the result.{"state":"not_found","error":"..."}: stop and report the missing session.
Fetch result with:
oo connector run "fusion-api" \
--action "qwen_asr_filetrans_result" \
--data "{\"sessionID\":\"$SESSION_ID\"}" \
--json
The completed result has state: "completed" and useful transcript data at
data. Save the full response as job.done.json and data as
transcript.json. Schema indicates data includes taskID,
transcriptionURL, usage, and transcription details. This field shape was
schema-confirmed; exact nested transcript content can vary by media.
6. Build Source Subtitles
Use the bundled script to convert the saved Fusion API result or
transcript.json into timed subtitle files:
node "$SKILL_DIR/scripts/subtitle-tools.mjs" fusion-to-subtitles \
--input "$WORK_DIR/transcript.json" \
--out-dir "$WORK_DIR" \
--formats srt
Pass --formats all when VTT was requested. The script contains the
Sublinea-style timed-word cue segmentation defaults and writes
transcript.txt, transcript.srt, transcript.word-timed.srt, and optionally
transcript.word-timed.vtt.
Convert Fusion API transcript data into an internal timed-word list:
- Iterate
data.transcription.transcripts[]. - Use
transcript.textfor plain text when present. - For each
transcript.sentences[], use sentencebeginTime,endTime,text,language, andwords[]. - For each word, use
beginTime,endTime,text, and optionalpunctuation. - Preserve punctuation by appending it to the preceding word when present.
Normalize timestamps to seconds. Fusion API results may use millisecond-style
integer timestamps or second-style numeric timestamps; if values are larger
than normal media seconds, divide by 1000. Keep a copy of the raw
transcript.json.
Segment timed words into subtitle cues with these defaults, adapted from the Sublinea project:
- maximum cue duration:
4.2seconds - target cue duration:
2.8seconds - maximum cue characters:
54 - maximum words per cue:
12 - split at pauses of at least
0.55seconds - cue start padding:
0.08seconds - cue end padding:
0.16seconds - minimum gap between cues:
0.05seconds
Write:
transcript.txttranscript.srttranscript.word-timed.srttranscript.word-timed.vtt, when VTT was requested
Use SRT as the stable exchange format for translation and soft subtitle muxing. For burned-in subtitles, convert the final SRT to ASS first so font size, outline, alignment, and bottom margin are interpreted in an explicit script resolution instead of relying on FFmpeg’s SRT-to-ASS defaults.
7. Translate Subtitles With OO LLM Config
When translation is requested, use the target language named by the user. If the user clearly requested translation but omitted the target language, infer the target from the natural language of the user’s request. Then run:
oo llm config --json
Use the returned apiKey, baseUrl, and model for an OpenAI-compatible chat
completions request to ${baseUrl without trailing slash}/chat/completions.
Do not hardcode, persist, log, or print the API key.
Then use the bundled script to translate SRT cue text while preserving cue indexes and timing. Example for a Chinese-language request:
node "$SKILL_DIR/scripts/subtitle-tools.mjs" translate-srt \
--input "$WORK_DIR/transcript.srt" \
--out-dir "$WORK_DIR" \
--source-language auto \
--target-language "Simplified Chinese" \
--target-code zh \
--profile youtube_explainer \
--formats srt
Adjust --target-language, --target-code, --source-language, --profile,
--domain, --audience, --style-notes, --glossary-json, and
--video-context-json from the user’s request and the inferred target language.
For example, use --target-language "Simplified Chinese" --target-code zh for
a Chinese-language request, --target-language "Japanese" --target-code ja for
a Japanese-language request, and --target-language "Spanish" --target-code es
for a Spanish-language request. If the request language is mixed, ambiguous, or
not a stable signal of the user’s preferred subtitle language, ask for the
target language before translating. Pass --formats all when VTT was requested.
The script calls oo llm config --json, sends OpenAI-compatible chat
completions requests, writes translation.<target-code>.json as a resumable
checkpoint after each batch, retries failed batches at smaller sizes, and writes
translation.<target-code>.srt plus optional VTT.
Translate only subtitle cue text. Preserve cue indexes and all timing fields.
Use batches of about 30 cues with a small context window before and after the
batch. Include any available translation style inputs in the user payload:
translation_profile, domain, audience, style_notes, glossary, and
video_context. Keep the prompt stable across profiles; let the profile and
metadata drive style adaptation. Require the model to return JSON:
{
"items": [
{ "index": 1, "text": "translated subtitle text" }
]
}
Recommended request body:
{
"model": "<oo llm model>",
"temperature": 0.2,
"messages": [
{
"role": "system",
"content": "You are a professional subtitle translator.\n\nReturn only valid JSON with this exact shape: {\"items\":[{\"index\":1,\"text\":\"translated subtitle text\"}]}.\n\nHard requirements:\n- Translate only subtitle cue text.\n- Preserve every requested cue index exactly.\n- Do not add, remove, merge, split, or reorder subtitle cues.\n- Preserve meaning, speaker intent, tone, names, numbers, dates, brands, code terms, and important repeated phrases.\n- Keep each subtitle concise, natural, and readable on screen.\n- Use context_before and context_after only to resolve meaning, references, pronouns, tone, and continuity.\n- Follow glossary entries when provided. Keep source terms unchanged when the glossary says so or when a product, API, command, code symbol, or proper noun should remain in the source language.\n- Do not add explanations, notes, markdown, or extra JSON fields.\n\nStyle adaptation:\n- If translation_profile is \"film_tv\", translate into natural spoken dialogue. Preserve character emotion, humor, subtext, register, and relationship dynamics. Avoid stiff literal phrasing. Adapt slang, insults, jokes, and profanity to an equivalent natural intensity in the target language.\n- If translation_profile is \"youtube_explainer\", translate clearly and naturally for online video viewers. Keep domain terminology accurate while avoiding overly academic wording. Preserve tool names, product names, acronyms, and technical concepts unless a standard target-language translation exists.\n- If translation_profile is \"technical_course\", prioritize precision, terminology consistency, and instructional clarity. Use standard technical terms. Avoid embellishment or casual paraphrase that may reduce accuracy.\n- If translation_profile is \"interview_podcast\", preserve the speaker's tone and conversational rhythm. Lightly clean filler words only when they hurt subtitle readability, without changing the speaker's position.\n- If translation_profile is \"news_documentary\", use a neutral, accurate, polished style. Avoid slang unless it is essential to the source.\n- If translation_profile is \"business_training\", use concise professional language with consistent business terminology.\n- If translation_profile is \"gaming_stream\", use energetic, natural spoken language. Preserve game-specific terms, memes, reactions, and player intent.\n- If translation_profile is \"kids_content\", use simple, friendly, age-appropriate wording.\n- Otherwise use a natural general subtitle style."
},
{
"role": "user",
"content": "{\"source_language\":\"auto\",\"target_language\":\"Simplified Chinese\",\"translation_profile\":\"youtube_explainer\",\"domain\":\"AI developer tools\",\"audience\":\"general technical viewers\",\"style_notes\":\"Natural, accurate subtitles; keep product names and standard technical terms consistent.\",\"video_context\":{\"title\":\"Building an AI agent with tool calling\",\"description\":\"A YouTube tutorial for developers\",\"speaker_notes\":\"One speaker explaining a workflow casually.\"},\"glossary\":[{\"source\":\"agent\",\"target\":\"智能体\"},{\"source\":\"tool calling\",\"target\":\"工具调用\"}],\"context_before\":[],\"subtitles\":[{\"index\":1,\"text\":\"Today we're going to build a simple agent with tool calling.\"}],\"context_after\":[]}"
}
]
}
For film, TV, and other scripted dialogue, prefer a payload like:
{
"source_language": "auto",
"target_language": "Simplified Chinese",
"translation_profile": "film_tv",
"domain": "scripted dialogue",
"audience": "adult streaming viewers",
"style_notes": "Natural spoken Chinese subtitles; avoid translationese.",
"video_context": {
"title": "Episode or scene title when known",
"description": "Brief setting, relationship, or plot context when known"
},
"glossary": [],
"context_before": [
{ "index": 28, "text": "What the hell are you doing here?" }
],
"subtitles": [
{ "index": 29, "text": "I told you, I had nowhere else to go." }
],
"context_after": [
{ "index": 30, "text": "You shouldn't have come back." }
]
}
Validate that every requested cue index has a non-empty translated text. On partial or invalid JSON, retry with a smaller batch; for a single-cue failure, report the model error. Write:
translation.<target-code>.jsonas a resumable checkpointtranslation.<target-code>.srttranslation.<target-code>.vtt, when VTT was requested
8. Prepare Display Subtitles
Before creating external subtitles, soft subtitles, or burned-in subtitles, normalize the final translated SRT into a display SRT. This keeps all delivery forms consistent: sidecar SRT, styled ASS, soft MKV ASS, and burned-in MP4 should use the same cue text and line breaks.
For Simplified Chinese and other CJK subtitles, do not use the English-style
37 characters per line as the visual line length. Use a CJK-aware line limit:
- default CJK line length:
18characters per line - strict Simplified Chinese delivery:
16characters per line - maximum lines per cue:
2 - prefer one-line subtitles when the cue fits
- when two lines are needed, prefer a bottom-heavy shape and avoid leaving only one or two characters on the top line
- split overlong cues into multiple sequential cues before generating ASS, rather than forcing a second line beyond the line limit
Run:
node "$SKILL_DIR/scripts/subtitle-tools.mjs" prepare-display-srt \
--input "$WORK_DIR/translation.$TARGET_CODE.srt" \
--output "$WORK_DIR/translation.$TARGET_CODE.display.srt" \
--cjk-line-length 18 \
--max-lines 2
Use --cjk-line-length 16 when the user asks for stricter professional or
Netflix-style Simplified Chinese line limits. Use the display SRT as
$SUBTITLE_SRT for the rest of the workflow.
9. Add Subtitles to Video
For audio-only inputs, deliver sidecar subtitle files. For video inputs, choose the user’s requested mode or default to burned-in MP4.
Default mode rules:
- If the user does not specify an output mode, create a burned-in MP4.
- If the user asks for “subtitled video”, “video with subtitles”, “translated subtitles on this video”, “add subtitles to this video”, or similar generic wording, create a burned-in MP4 unless the user asks for soft/selectable subtitle tracks.
- If the user asks for styled subtitles, visual parity, social sharing, upload compatibility, or subtitles that always display, prefer burned-in MP4.
- If the user asks for “烧录”, “硬字幕”, “hard subtitles”, “burned-in”, “permanent subtitles”, “export MP4”, or “MP4 with subtitles”, create a burned-in MP4.
- If the user asks for “soft subtitles”, “外挂字幕”, “可开关字幕”, “subtitle track”, “selectable subtitles”, “no re-encode”, “MKV”, or multiple subtitle languages in one file, create a soft MKV.
- When the user wants sidecar subtitles and burned-in subtitles from the same job, generate both from the display SRT so cue boundaries and line breaks match.
Sidecar files:
- Return
translation.<target-code>.display.srtas the compatibility-first external subtitle file. - Also return
translation.<target-code>.display.asswhen styled external subtitles are useful or when the user wants visual parity with burned-in output.
Create the styled ASS sidecar from the display SRT:
node "$SKILL_DIR/scripts/subtitle-tools.mjs" srt-to-burn-ass \
--input "$SUBTITLE_SRT" \
--output "$WORK_DIR/translation.$TARGET_CODE.display.ass" \
--video-width "$VIDEO_WIDTH" \
--video-height "$VIDEO_HEIGHT" \
--font-name "PingFang SC" \
--font-size 56 \
--margin-v 38 \
--margin-l 80 \
--margin-r 80 \
--outline 5
Soft MKV, compatibility-first:
ffmpeg -y -i "$INPUT_VIDEO" -i "$SUBTITLE_SRT" \
-map 0:v? -map 0:a? -map 1:0 \
-c copy -c:s srt \
-disposition:s:0 default \
-metadata:s:s:0 language="$LANG_CODE" \
-metadata:s:s:0 title="Translated subtitles" \
"$OUTPUT_VIDEO.mkv"
Soft MKV, style-consistent with burn-in:
ffmpeg -y -i "$INPUT_VIDEO" -i "$SUBTITLE_ASS" \
-map 0:v? -map 0:a? -map 1:0 \
-c copy -c:s ass \
-disposition:s:0 default \
-metadata:s:s:0 language="$LANG_CODE" \
-metadata:s:s:0 title="Styled translated subtitles" \
"$OUTPUT_VIDEO.styled.mkv"
Soft MP4:
ffmpeg -y -i "$INPUT_VIDEO" -i "$SUBTITLE_SRT" \
-map 0:v? -map 0:a? -map 1:0 \
-c copy -c:s mov_text \
-disposition:s:0 default \
-metadata:s:s:0 language="$LANG_CODE" \
-metadata:s:s:0 title="Translated subtitles" \
"$OUTPUT_VIDEO.mp4"
Burned-in MP4:
Before burning, convert the display SRT to ASS with explicit video resolution and bottom-centered styling:
node "$SKILL_DIR/scripts/subtitle-tools.mjs" srt-to-burn-ass \
--input "$SUBTITLE_SRT" \
--output "$WORK_DIR/subtitles.burn.ass" \
--video-width "$VIDEO_WIDTH" \
--video-height "$VIDEO_HEIGHT" \
--font-name "PingFang SC" \
--font-size 56 \
--margin-v 38 \
--margin-l 80 \
--margin-r 80 \
--outline 5
Use ffprobe to set VIDEO_WIDTH and VIDEO_HEIGHT from the actual input
video. For 1920x1080 videos, the default burn-in style is bottom-centered
Chinese subtitles with PlayResX: 1920, PlayResY: 1080, Alignment=2, and
MarginV=38. Keep this as the fixed default unless the user explicitly asks
for a different position. Do not reposition subtitles by checking individual
screenshots frame by frame.
Then burn the ASS file:
ffmpeg -y -i "$INPUT_VIDEO" \
-vf "ass=$WORK_DIR/subtitles.burn.ass" \
-c:v libx264 -crf 18 -preset medium -c:a copy -sn \
"$OUTPUT_VIDEO.burned.mp4"
Use burned-in MP4 as the normal video delivery default. Prefer soft subtitles only when the user values editability, selectable tracks, multiple languages, smaller processing cost, or avoiding video re-encoding over universal playback.
Soft MP4 subtitles (mov_text) cannot preserve the same typography, outline,
or exact positioning as ASS/burn-in. Treat soft MP4 as a niche compatibility
mode, not the default. If visual consistency matters, prefer burned-in MP4; use
ASS sidecar or soft MKV with ASS only when the user asks for editable or
selectable subtitles.
Burn-in positioning rules:
- Prefer ASS over direct
subtitles=$SUBTITLE_SRTfor burned-in output. - Always include
PlayResXandPlayResYmatching the input video. ASS margins, font size, and coordinates are script-resolution pixels; mismatched or implicit resolution can makeMarginVappear much higher than intended. - For normal horizontal subtitles, use bottom center (
Alignment=2) and a fixed bottom margin. On 1080p output, useMarginV=38; for other heights, default to about3.5%of video height, with a floor near28pixels. - Treat the whole video as one canvas by default. Only avoid lower-third graphics or on-screen text when the user specifically requests manual per-scene placement.
- If the user says the subtitles are too high or too low, adjust the ASS
MarginVonly. SmallerMarginVmoves subtitles closer to the bottom; largerMarginVmoves them upward. - Use white text with a black outline by default for readability. For Chinese
on 1080p,
Fontsize=56,Outline=5,Shadow=0,MarginL=80, andMarginR=80are the fixed defaults unless the user asks otherwise.
Result Handling
Report the generated files with clear local paths. At minimum, return the source SRT path and, when requested, the translated SRT path. For video inputs, also return the subtitled video path when soft muxing or burning was requested.
If a generated video or subtitle file is practical to preview in the current environment, open or display it for the user; otherwise provide the exact path. Do not print raw transcript JSON unless the user asks for debugging details. Do not print the OO LLM API key.
Typical output names:
job.created.jsonjob.done.jsontranscript.jsontranscript.txttranscript.srttranscript.word-timed.srttranslation.<target-code>.jsontranslation.<target-code>.srttranslation.<target-code>.display.srttranslation.<target-code>.display.asstranslation.<target-code>.burn.ass<name>.subtitled.mkv<name>.styled.mkv<name>.subtitled.mp4<name>.burned.mp4
Failure Handling
- Missing FFmpeg or FFprobe: stop, give install instructions, and ask the user to rerun after installation.
- Missing input media: stop and ask for the path or URL.
oo file uploadfails: report the upload error; for local files, verify the path exists and retry with a normalized audio file when format rejection is likely.- Fusion API submit schema rejection: check
fileURL, language code,enableWords, andchannelID; do not rediscover actions. - ASR timeout: report the saved
sessionIdand explain that the agent can resume by pollingqwen_asr_filetrans_stateandqwen_asr_filetrans_result. - ASR result has no word timestamps: write
transcript.txtif text exists, but explain that timed subtitles requireenableWords: trueor a transcript with sentence/word timing. oo llm config --jsonfails: report that OO-hosted LLM configuration is not available and ask the user to fix oo CLI authentication/configuration.- LLM translation returns invalid JSON or missing cue indexes: retry smaller batches, then fail with the specific cue indexes that could not be translated.
- Display SRT still has CJK lines over the requested limit: rerun
prepare-display-srtwith a smaller--cjk-line-length, usually16, and regenerate ASS from the display SRT. - FFmpeg soft MP4 subtitle muxing fails: use burned-in MP4 as the default fallback. If burn-in is unavailable or the user explicitly needs selectable subtitle tracks, retry soft MKV.
- Burn-in encoding fails because
libx264is unavailable: ask the user to install a full FFmpeg build withlibx264support or choose soft subtitles. - Burned-in subtitles look too high: verify that the burn-in input is ASS, not
raw SRT, and that the ASS file has
PlayResXandPlayResYmatching the video. If those are correct, reduceMarginVinstead of moving subtitles with per-frame screenshot adjustments.