-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathappendix-en.html
More file actions
180 lines (153 loc) · 12.3 KB
/
appendix-en.html
File metadata and controls
180 lines (153 loc) · 12.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Appendix - Boundless Flow</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div class="sidebar">
<a href="welcome-en.html" class="sidebar-logo">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="color: var(--primary-color);"><path d="M12 2a3 3 0 0 0-3 3v7a3 3 0 0 0 6 0V5a3 3 0 0 0-3-3Z"></path><path d="M19 10v2a7 7 0 0 1-14 0v-2"></path><line x1="12" y1="19" x2="12" y2="22"></line></svg>
Boundless Flow
</a>
<div class="sidebar-group">Getting Started</div>
<ul>
<li><a href="welcome-en.html">What is Boundless Flow?</a></li>
<li><a href="onboarding-en.html">Onboarding Wizard</a></li>
</ul>
<div class="sidebar-group">Core Features</div>
<ul>
<li><a href="stt-en.html">Real-time STT & Models</a></li>
<li><a href="translation-en.html">Real-time Translation</a></li>
<li><a href="proofreading-summary-en.html">AI Proofreading & Summary</a></li>
<li><a href="tts-voice-cloning-en.html">TTS & Voice Cloning</a></li>
<li><a href="sts-en.html">STS Speech Workbench</a></li>
<li><a href="linglu-en.html">LingLu · Live Topic Tree</a></li>
</ul>
<div class="sidebar-group">Appendix</div>
<ul>
<li><a href="appendix-en.html" class="active">Beginner Guide</a></li>
</ul>
<div style="margin-top: auto; padding-top: 1rem; border-top: 1px solid var(--border-color);">
<a href="appendix.html" style="display: flex; align-items: center; gap: 0.5rem;">
<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><circle cx="12" cy="12" r="10"></circle><line x1="2" y1="12" x2="22" y2="12"></line><path d="M12 2a15.3 15.3 0 0 1 4 10 15.3 15.3 0 0 1-4 10 15.3 15.3 0 0 1-4-10 15.3 15.3 0 0 1 4-10z"></path></svg>
中文版本
</a>
</div>
</div>
<div class="main-content">
<div class="content-wrapper">
<h1>Appendix (Beginner-friendly)</h1>
<p>If this is your first time dealing with “model downloads / local LLMs / cloud APIs”, follow the steps below.</p>
<h2 id="appendix-modelscope">Appendix A: ModelScope (Model Downloads)</h2>
<p><strong>When you need it:</strong> Download local STT (SenseVoice) and local TTS (Qwen3-TTS / Index-TTS2) model files.</p>
<p><strong>Official docs:</strong> <a href="https://www.modelscope.cn/docs/" target="_blank" rel="noopener noreferrer">https://www.modelscope.cn/docs/</a></p>
<h3>1) Do I need to sign up?</h3>
<p>Most public models can be downloaded without login. If you see an access/permission error, you typically need to log in on ModelScope and request access as prompted.</p>
<h3>2) Check your OS / CPU architecture</h3>
<ul>
<li><strong>Windows</strong>: Settings → System → About → System type; or PowerShell <code>$env:PROCESSOR_ARCHITECTURE</code></li>
<li><strong>macOS</strong>: Apple menu → About This Mac → Chip; or Terminal <code>uname -m</code></li>
<li><strong>Linux</strong>: Terminal <code>uname -m</code></li>
</ul>
<h3>3) Install ModelScope CLI</h3>
<p>The download command comes from the ModelScope CLI (requires Python + pip):</p>
<pre><code>pip install modelscope
modelscope --help</code></pre>
<h3>4) Find models (optional)</h3>
<p>Use the ModelScope website search, open the model page, and copy the model ID.</p>
<h3>5) Download examples (aligned with this manual)</h3>
<pre><code>modelscope download --model iic/SenseVoiceSmall --local_dir ./SenseVoiceSmall
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-Base
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
modelscope download --model IndexTeam/IndexTTS-2 --local_dir ./IndexTeam/IndexTTS-2</code></pre>
<h3>6) Where do I set the path in Boundless Flow?</h3>
<ul>
<li><strong>STT Model Directory</strong>: point to the SenseVoice download folder</li>
<li><strong>TTS Model Directory</strong>: point to the specific TTS model folder you want to use</li>
</ul>
<h2 id="appendix-ollama">Appendix B: Ollama (Local LLMs for Translation / Summary)</h2>
<p><strong>When you need it:</strong> Run translation/summary models locally or on your LAN. Boundless Flow calls Ollama's <strong>native</strong> <code>/api/chat</code> endpoint, so the Base URL only needs the <strong>server address</strong> — do not append <code>/v1</code> (typical Base URL: <code>http://localhost:11434</code>).</p>
<p><strong>Official docs:</strong> <a href="https://docs.ollama.com/" target="_blank" rel="noopener noreferrer">https://docs.ollama.com/</a></p>
<p><strong>API references:</strong> <a href="https://github.com/ollama/ollama/blob/main/docs/api.md" target="_blank" rel="noopener noreferrer">Ollama native API</a> · <a href="https://docs.ollama.com/api/openai-compatibility" target="_blank" rel="noopener noreferrer">OpenAI compatibility</a></p>
<h3>1) Install</h3>
<p>Follow the official “Download / Get started” instructions for your OS and CPU architecture.</p>
<h3>2) Verify</h3>
<pre><code>ollama --version</code></pre>
<h3>3) Pull models (aligned with this manual)</h3>
<pre><code>ollama pull ZimaBlueAI/HY-MT1.5-1.8:1.8b # or :7b
ollama pull qwen3:4b</code></pre>
<p><strong>Heads up:</strong> this translation model has no <code>latest</code> tag in Ollama, so you <strong>must</strong> include the <code>:tag</code> suffix (e.g. <code>:1.8b</code> or <code>:7b</code>). Without it Ollama returns <code>model 'ZimaBlueAI/HY-MT1.5-1.8' not found</code>.</p>
<h3>4) Suggested settings in Boundless Flow</h3>
<ul>
<li><strong>Translation</strong>: Base URL <code>http://localhost:11434</code> (local) or <code>http://<LAN-IP>:11434</code> (LAN); Model <code>ZimaBlueAI/HY-MT1.5-1.8:1.8b</code></li>
<li><strong>Proofreading/Summary</strong>: Base URL <code>http://localhost:11434</code>; Model <code>qwen3:4b</code></li>
<li><strong>STS workbench → Translation API Base URL</strong>: same as above, <code>http://localhost:11434</code></li>
</ul>
<p><strong>Compatibility note:</strong> the recommended form is without <code>/v1</code> (which routes to Ollama's native <code>/api/chat</code>). If you paste a Base URL that ends in <code>/v1</code>, Boundless Flow will recognize the Ollama port and strip the suffix automatically — so all three places (realtime STT translation, summary, STS translation) behave identically.</p>
<h2 id="appendix-volcengine">Appendix C: Volcengine Cloud TTS</h2>
<p><strong>When you need it:</strong> Use Volcengine cloud voices / higher quality TTS / voice replication features.</p>
<p><strong>Useful links:</strong></p>
<ul>
<li><a href="https://www.volcengine.com/product/tts" target="_blank" rel="noopener noreferrer">Volcengine TTS product page</a></li>
<li><a href="https://www.volcengine.com/docs/6561" target="_blank" rel="noopener noreferrer">Speech docs hub</a></li>
<li><a href="https://www.volcengine.com/docs/6561/1257544" target="_blank" rel="noopener noreferrer">VoiceType list</a></li>
</ul>
<h3>1) Do I need to sign up?</h3>
<p>Yes. You need a Volcengine account and enabled speech services in the console.</p>
<h3>2) What info do I need?</h3>
<ul>
<li><strong>AppId</strong></li>
<li><strong>Token</strong></li>
<li><strong>Cluster</strong> (e.g. <code>volcano_tts</code> / <code>volcengine_tts</code>, depends on your console)</li>
<li><strong>VoiceType</strong> (pick from the VoiceType list)</li>
</ul>
<h3>3) Minimal setup in Boundless Flow</h3>
<ol>
<li>Settings → TTS Model: select <strong>Volcengine TTS</strong></li>
<li>Choose <strong>HTTP</strong> mode</li>
<li>Fill AppId / Token / Cluster / VoiceType</li>
<li>Synthesize a short sentence to verify audio output</li>
</ol>
<h2 id="appendix-speaker-diarization">Appendix D: sherpa-onnx Speaker Diarization</h2>
<p><strong>When you need it:</strong> Separate <code>Speaker_1 / Speaker_2 / Speaker_3</code> in real-time STT. In addition to the main STT model (SenseVoice / FunASR), you need two extra ONNX files driven by the <code>sherpa-onnx</code> runtime.</p>
<p><strong>Official repo:</strong> <a href="https://github.com/k2-fsa/sherpa-onnx.git" target="_blank" rel="noopener noreferrer">https://github.com/k2-fsa/sherpa-onnx</a></p>
<p><strong>Official model guide:</strong> <a href="https://k2-fsa.github.io/sherpa/onnx/speaker-diarization/models.html" target="_blank" rel="noopener noreferrer">sherpa-onnx speaker diarization models</a></p>
<h3>1) Two files you need</h3>
<ul>
<li><code>segmentation.onnx</code> — speaker segmentation model (recommended: <code>sherpa-onnx-pyannote-segmentation-3-0</code>)</li>
<li><code>embedding.onnx</code> — speaker embedding model (recommended: <code>3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx</code>)</li>
</ul>
<h3>2) Where to download</h3>
<ul>
<li>Segmentation models: <a href="https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-segmentation-models" target="_blank" rel="noopener noreferrer">speaker-segmentation-models</a></li>
<li>Embedding models: <a href="https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models" target="_blank" rel="noopener noreferrer">speaker-recongition-models</a></li>
</ul>
<p>Download the <code>.onnx</code> files directly from the GitHub Releases above — no need to build sherpa-onnx from source. If the asset is an archive, extract <code>segmentation.onnx</code> and <code>embedding.onnx</code> first and keep the filenames as-is.</p>
<h3>3) Recommended directory layout</h3>
<pre><code>./speaker-diarization/
segmentation.onnx
embedding.onnx</code></pre>
<h3>4) Where do I set the paths in Boundless Flow?</h3>
<ul>
<li><strong>Speaker segmentation model</strong>: point to <code>segmentation.onnx</code></li>
<li><strong>Speaker embedding model</strong>: point to <code>embedding.onnx</code></li>
</ul>
<p>You can also point both fields to the same directory — the app will auto-detect <code>segmentation.onnx</code> and <code>embedding.onnx</code> inside it.</p>
<h3>5) Notes</h3>
<ul>
<li>The SenseVoice / FunASR "Model Directory" only handles speech-to-text; it <strong>cannot</strong> replace the speaker diarization models.</li>
<li>Speaker diarization in this project supports <code>ONNX</code> only. A <code>Fun-ASR-Nano-2512</code> folder cannot substitute for <code>segmentation.onnx / embedding.onnx</code>.</li>
<li>If diarization still fails, the UI now surfaces a readable error. The most common cause is an <code>onnx</code> file whose schema doesn't match the bundled <code>sherpa-onnx</code> runtime — start with the two recommended models above.</li>
</ul>
<div class="doc-copyright">
<p>Copyright(c) ZimaBlueAI</p>
<p>齐码蓝智能(大理市 )有限责任公司</p>
</div>
</div>
</div>
</body>
</html>