Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions OPTIMIZATION_REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Bailing AI Chatbot Performance Optimization Report

## Executive Summary

This report documents performance bottlenecks and optimization opportunities identified in the Bailing AI chatbot codebase. The analysis reveals several critical inefficiencies that impact real-time audio processing, memory usage, and overall system responsiveness.

## Critical Issues Identified

### 1. **CRITICAL: Blocking Queue Operations in Main Processing Loop**
**File:** `bailing/robot.py`
**Lines:** 178-179
**Severity:** High 🔴

**Issue:** The main audio processing loop uses blocking queue operations that can freeze the entire system:
```python
def _duplex(self):
data = self.vad_queue.get() # Blocks indefinitely if no data available
```

**Impact:**
- System freezes when no VAD data is available
- Breaks real-time audio processing requirements
- Can cause the entire voice assistant to become unresponsive

**Solution:** Replace with timeout-based queue operations to maintain responsiveness.

### 2. **Memory Inefficiency: Redundant Audio File Creation**
**File:** `bailing/asr.py`
**Lines:** 90-95
**Severity:** Medium 🟡

**Issue:** ASR module creates temporary WAV files for every recognition request:
```python
def recognizer(self, stream_in_audio):
session_id = str(uuid.uuid4())
tmpfile = os.path.join(self.output_dir, f"asr-{datetime.now().date()}@{session_id}.wav")
self._save_audio_to_file(stream_in_audio, tmpfile)
```

**Impact:**
- Disk I/O overhead for every speech recognition
- Temporary files accumulate over time
- Unnecessary file system operations in real-time processing

**Recommendation:** Process audio data in memory when possible, only save files for debugging.

### 3. **TTS Module: Multiple Event Loop Creation**
**File:** `bailing/tts.py`
**Lines:** 128-131, 304-307
**Severity:** Medium 🟡

**Issue:** TTS modules create new asyncio event loops for each request:
```python
def to_tts(self, text):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
result = loop.run_until_complete(self._stream_tts(text, tmpfile))
loop.close()
```

**Impact:**
- Event loop creation overhead
- Potential resource leaks
- Inefficient async handling

**Recommendation:** Use a single event loop or proper async context management.

### 4. **Synchronous HTTP Requests in Real-time Context**
**File:** `bailing/memory.py`
**Lines:** 60-67
**Severity:** Medium 🟡

**Issue:** Memory updates use synchronous HTTP requests:
```python
response = requests.post(
f"{self.endpoint}/chat-messages",
headers=self.headers,
json={...}
)
```

**Impact:**
- Blocks execution during API calls
- Can cause delays in real-time processing
- No timeout handling for network issues

**Recommendation:** Use async HTTP clients with proper timeout handling.

### 5. **Inefficient Thread Pool Usage**
**File:** `bailing/robot.py`
**Lines:** 87-88
**Severity:** Low 🟢

**Issue:** Fixed thread pool size may not be optimal:
```python
self.executor = ThreadPoolExecutor(max_workers=10)
```

**Impact:**
- May create too many threads for simple tasks
- Resource usage not optimized for actual workload

**Recommendation:** Make thread pool size configurable based on system resources.

### 6. **VAD Processing: Redundant Audio Conversion**
**File:** `bailing/vad.py`
**Lines:** 86-93
**Severity:** Low 🟢

**Issue:** Audio data is converted multiple times:
```python
if self.channels > 1:
audio_int16 = self._process_multichannel(data)
else:
audio_int16 = np.frombuffer(data, dtype=np.int16)
audio_float32 = self.int2float(audio_int16)
```

**Impact:**
- Unnecessary CPU cycles for audio conversion
- Memory allocations for intermediate arrays

**Recommendation:** Optimize audio pipeline to minimize conversions.

### 7. **Dialogue History: Inefficient File I/O**
**File:** `bailing/dialogue.py`
**Lines:** 44-51
**Severity:** Low 🟢

**Issue:** Dialogue history is written to disk after every conversation:
```python
def dump_dialogue(self):
# Processes entire dialogue list every time
dialogue = []
for d in self.get_llm_dialogue():
if d["role"] not in ("user", "assistant"):
continue
dialogue.append(d)
file_name = os.path.join(self.dialogue_history_path, f"dialogue-{self.current_time}.json")
write_json_file(file_name, dialogue)
```

**Impact:**
- Frequent disk I/O operations
- Processing overhead for filtering

**Recommendation:** Batch dialogue writes or use in-memory buffering.

## Performance Impact Analysis

### Real-time Processing Impact
- **Critical:** Blocking queue operations can cause system freezes
- **High:** Synchronous operations break real-time requirements
- **Medium:** Memory inefficiencies cause gradual performance degradation

### Resource Usage Impact
- **Memory:** Temporary file creation and redundant audio conversions
- **CPU:** Inefficient threading and repeated processing
- **I/O:** Frequent file operations and synchronous network requests

## Recommended Implementation Priority

1. **Immediate (Critical):** Fix blocking queue operations in robot.py
2. **Short-term (High):** Convert synchronous HTTP requests to async
3. **Medium-term (Medium):** Optimize audio processing pipeline
4. **Long-term (Low):** Fine-tune thread pool and file I/O operations

## Testing Recommendations

- **Unit Tests:** Add tests for timeout handling in queue operations
- **Integration Tests:** Verify real-time audio processing performance
- **Load Tests:** Test system behavior under sustained audio input
- **Memory Tests:** Monitor for memory leaks in long-running sessions

## Conclusion

The most critical issue is the blocking queue operation in the main processing loop, which can cause complete system freezes. This should be addressed immediately to ensure system reliability. Other optimizations can be implemented incrementally to improve overall performance and resource efficiency.

The implemented fix for the blocking queue operation maintains the same functionality while preventing system freezes through timeout-based queue access with appropriate error handling.
7 changes: 5 additions & 2 deletions bailing/robot.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,11 @@ def start_recording_and_vad(self):
self._tts_priority()

def _duplex(self):
# 处理识别结果
data = self.vad_queue.get()
try:
data = self.vad_queue.get(timeout=0.1) # 100ms timeout
except queue.Empty:
return False # No data available, continue loop

# 识别到vad开始
if self.vad_start:
self.speech.append(data)
Expand Down