427 lines
9.1 KiB
Markdown
427 lines
9.1 KiB
Markdown
|
|
# 服务器端语音识别部署指南(内网环境)
|
|||
|
|
|
|||
|
|
## 🎯 架构说明
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
前端 (uni-app)
|
|||
|
|
↓ 录音
|
|||
|
|
↓ 上传音频文件
|
|||
|
|
后端 (192.168.1.80:30091)
|
|||
|
|
↓ 接收音频
|
|||
|
|
↓ 调用语音识别引擎
|
|||
|
|
↓ 返回识别结果和评分
|
|||
|
|
前端
|
|||
|
|
↓ 显示结果
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📦 方案对比
|
|||
|
|
|
|||
|
|
### 方案 A:Vosk(推荐 - 轻量级)⭐
|
|||
|
|
|
|||
|
|
**优点:**
|
|||
|
|
- ✅ 完全离线,支持内网
|
|||
|
|
- ✅ 轻量级,CPU 即可运行
|
|||
|
|
- ✅ 支持中文
|
|||
|
|
- ✅ 开源免费
|
|||
|
|
- ✅ 集成简单
|
|||
|
|
|
|||
|
|
**缺点:**
|
|||
|
|
- ⚠️ 识别准确度中等(80-85%)
|
|||
|
|
|
|||
|
|
**适合:**
|
|||
|
|
- 内网环境
|
|||
|
|
- 服务器配置一般
|
|||
|
|
- 快速部署
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 方案 B:Whisper(更准确)
|
|||
|
|
|
|||
|
|
**优点:**
|
|||
|
|
- ✅ OpenAI开源
|
|||
|
|
- ✅ 识别准确度高(90-95%)
|
|||
|
|
- ✅ 支持多语言
|
|||
|
|
- ✅ 完全离线
|
|||
|
|
|
|||
|
|
**缺点:**
|
|||
|
|
- ⚠️ 需要 GPU(推荐)
|
|||
|
|
- ⚠️ 模型较大(几百MB到几GB)
|
|||
|
|
- ⚠️ 响应较慢(2-5秒)
|
|||
|
|
|
|||
|
|
**适合:**
|
|||
|
|
- 服务器有 GPU
|
|||
|
|
- 对准确度要求高
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 方案 C:自建语音识别(最灵活)
|
|||
|
|
|
|||
|
|
使用 PaddleSpeech 或 FunASR
|
|||
|
|
|
|||
|
|
**优点:**
|
|||
|
|
- ✅ 国产,中文优化好
|
|||
|
|
- ✅ 支持声纹识别、情感识别等
|
|||
|
|
- ✅ 性能可调
|
|||
|
|
|
|||
|
|
**缺点:**
|
|||
|
|
- ⚠️ 部署复杂
|
|||
|
|
- ⚠️ 需要Python环境
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 推荐实现:Vosk + Python Flask
|
|||
|
|
|
|||
|
|
### 后端代码示例(Python)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# speech_recognition_server.py
|
|||
|
|
|
|||
|
|
from flask import Flask, request, jsonify
|
|||
|
|
from vosk import Model, KaldiRecognizer
|
|||
|
|
import wave
|
|||
|
|
import json
|
|||
|
|
import os
|
|||
|
|
from difflib import SequenceMatcher
|
|||
|
|
|
|||
|
|
app = Flask(__name__)
|
|||
|
|
|
|||
|
|
# 加载 Vosk 中文模型
|
|||
|
|
MODEL_PATH = "./model/vosk-model-cn-0.22"
|
|||
|
|
model = Model(MODEL_PATH)
|
|||
|
|
|
|||
|
|
def recognize_audio(audio_path):
|
|||
|
|
"""
|
|||
|
|
识别音频文件
|
|||
|
|
"""
|
|||
|
|
wf = wave.open(audio_path, "rb")
|
|||
|
|
|
|||
|
|
# 检查音频格式
|
|||
|
|
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getframerate() not in [8000, 16000, 32000, 48000]:
|
|||
|
|
print("音频格式不正确")
|
|||
|
|
return None
|
|||
|
|
|
|||
|
|
rec = KaldiRecognizer(model, wf.getframerate())
|
|||
|
|
rec.SetWords(True)
|
|||
|
|
|
|||
|
|
result_text = ""
|
|||
|
|
while True:
|
|||
|
|
data = wf.readframes(4000)
|
|||
|
|
if len(data) == 0:
|
|||
|
|
break
|
|||
|
|
if rec.AcceptWaveform(data):
|
|||
|
|
result = json.loads(rec.Result())
|
|||
|
|
result_text += result.get('text', '')
|
|||
|
|
|
|||
|
|
# 最终结果
|
|||
|
|
final_result = json.loads(rec.FinalResult())
|
|||
|
|
result_text += final_result.get('text', '')
|
|||
|
|
|
|||
|
|
return result_text
|
|||
|
|
|
|||
|
|
def calculate_similarity(text1, text2):
|
|||
|
|
"""
|
|||
|
|
计算文本相似度(0-100分)
|
|||
|
|
"""
|
|||
|
|
# 去除空格和标点
|
|||
|
|
text1 = ''.join(filter(str.isalnum, text1))
|
|||
|
|
text2 = ''.join(filter(str.isalnum, text2))
|
|||
|
|
|
|||
|
|
# 计算相似度
|
|||
|
|
similarity = SequenceMatcher(None, text1, text2).ratio()
|
|||
|
|
return round(similarity * 100, 2)
|
|||
|
|
|
|||
|
|
def evaluate_pronunciation(reference_text, recognized_text):
|
|||
|
|
"""
|
|||
|
|
评估发音质量
|
|||
|
|
"""
|
|||
|
|
score = calculate_similarity(reference_text, recognized_text)
|
|||
|
|
|
|||
|
|
if score >= 90:
|
|||
|
|
comment = "优秀!发音非常标准"
|
|||
|
|
elif score >= 80:
|
|||
|
|
comment = "良好,发音较为准确"
|
|||
|
|
elif score >= 70:
|
|||
|
|
comment = "一般,需要多加练习"
|
|||
|
|
else:
|
|||
|
|
comment = "需要改进,请注意发音"
|
|||
|
|
|
|||
|
|
return {
|
|||
|
|
'score': score,
|
|||
|
|
'comment': comment,
|
|||
|
|
'recognizedText': recognized_text,
|
|||
|
|
'referenceText': reference_text
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
@app.route('/api/speech/recognize', methods=['POST'])
|
|||
|
|
def speech_recognize():
|
|||
|
|
"""
|
|||
|
|
语音识别和评测接口
|
|||
|
|
"""
|
|||
|
|
try:
|
|||
|
|
# 获取上传的音频文件
|
|||
|
|
if 'audio' not in request.files:
|
|||
|
|
return jsonify({
|
|||
|
|
'code': 400,
|
|||
|
|
'msg': '未上传音频文件'
|
|||
|
|
}), 400
|
|||
|
|
|
|||
|
|
audio_file = request.files['audio']
|
|||
|
|
reference_text = request.form.get('referenceText', '')
|
|||
|
|
|
|||
|
|
# 保存临时文件
|
|||
|
|
temp_path = './temp_audio.wav'
|
|||
|
|
audio_file.save(temp_path)
|
|||
|
|
|
|||
|
|
# 识别音频
|
|||
|
|
recognized_text = recognize_audio(temp_path)
|
|||
|
|
|
|||
|
|
if not recognized_text:
|
|||
|
|
return jsonify({
|
|||
|
|
'code': 500,
|
|||
|
|
'msg': '音频识别失败'
|
|||
|
|
}), 500
|
|||
|
|
|
|||
|
|
# 评测
|
|||
|
|
result = evaluate_pronunciation(reference_text, recognized_text)
|
|||
|
|
|
|||
|
|
# 删除临时文件
|
|||
|
|
os.remove(temp_path)
|
|||
|
|
|
|||
|
|
return jsonify({
|
|||
|
|
'code': 200,
|
|||
|
|
'msg': '成功',
|
|||
|
|
'data': result
|
|||
|
|
})
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"错误: {str(e)}")
|
|||
|
|
return jsonify({
|
|||
|
|
'code': 500,
|
|||
|
|
'msg': str(e)
|
|||
|
|
}), 500
|
|||
|
|
|
|||
|
|
if __name__ == '__main__':
|
|||
|
|
# 内网运行,监听所有接口
|
|||
|
|
app.run(host='0.0.0.0', port=5000, debug=True)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📦 部署步骤
|
|||
|
|
|
|||
|
|
### 1. 安装依赖
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 在服务器上 (192.168.1.80)
|
|||
|
|
|
|||
|
|
# 安装 Python 3.8+
|
|||
|
|
sudo apt update
|
|||
|
|
sudo apt install python3 python3-pip
|
|||
|
|
|
|||
|
|
# 安装 Vosk 和 Flask
|
|||
|
|
pip3 install vosk flask
|
|||
|
|
|
|||
|
|
# 安装音频处理库
|
|||
|
|
sudo apt install ffmpeg
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 下载中文模型
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 下载 Vosk 中文模型
|
|||
|
|
cd /path/to/your/project
|
|||
|
|
mkdir model
|
|||
|
|
cd model
|
|||
|
|
|
|||
|
|
# 下载模型(约 1.8GB)
|
|||
|
|
wget https://alphacephei.com/vosk/models/vosk-model-cn-0.22.zip
|
|||
|
|
|
|||
|
|
# 解压
|
|||
|
|
unzip vosk-model-cn-0.22.zip
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 运行服务
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 启动识别服务
|
|||
|
|
python3 speech_recognition_server.py
|
|||
|
|
|
|||
|
|
# 服务将运行在: http://192.168.1.80:5000
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 集成到现有后端
|
|||
|
|
|
|||
|
|
如果你已有 Spring Boot 后端,可以:
|
|||
|
|
|
|||
|
|
```java
|
|||
|
|
// SpeechRecognitionController.java
|
|||
|
|
|
|||
|
|
@RestController
|
|||
|
|
@RequestMapping("/api/speech")
|
|||
|
|
public class SpeechRecognitionController {
|
|||
|
|
|
|||
|
|
// Vosk 服务地址
|
|||
|
|
private static final String VOSK_SERVICE_URL = "http://localhost:5000/api/speech/recognize";
|
|||
|
|
|
|||
|
|
@PostMapping("/recognize")
|
|||
|
|
public Result recognize(@RequestParam("audio") MultipartFile audioFile,
|
|||
|
|
@RequestParam("referenceText") String referenceText) {
|
|||
|
|
try {
|
|||
|
|
// 转发到 Vosk 服务
|
|||
|
|
RestTemplate restTemplate = new RestTemplate();
|
|||
|
|
|
|||
|
|
MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
|
|||
|
|
body.add("audio", audioFile.getResource());
|
|||
|
|
body.add("referenceText", referenceText);
|
|||
|
|
|
|||
|
|
HttpHeaders headers = new HttpHeaders();
|
|||
|
|
headers.setContentType(MediaType.MULTIPART_FORM_DATA);
|
|||
|
|
|
|||
|
|
HttpEntity<MultiValueMap<String, Object>> requestEntity = new HttpEntity<>(body, headers);
|
|||
|
|
|
|||
|
|
ResponseEntity<String> response = restTemplate.postForEntity(
|
|||
|
|
VOSK_SERVICE_URL,
|
|||
|
|
requestEntity,
|
|||
|
|
String.class
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
return Result.success(response.getBody());
|
|||
|
|
} catch (Exception e) {
|
|||
|
|
return Result.error("识别失败: " + e.getMessage());
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 配置说明
|
|||
|
|
|
|||
|
|
### uni-app 端配置
|
|||
|
|
|
|||
|
|
在 `utils/config.js` 中确认服务器地址:
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
const DEFAULT_SERVER_HOST = '192.168.1.80'
|
|||
|
|
const DEFAULT_SERVER_PORT = 30091
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 音频格式要求
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
- 采样率: 16000 Hz
|
|||
|
|
- 声道: 单声道 (mono)
|
|||
|
|
- 格式: WAV 或 MP3
|
|||
|
|
- 编码: PCM 16-bit
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✅ 测试步骤
|
|||
|
|
|
|||
|
|
### 1. 测试 Vosk 服务
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 使用 curl 测试
|
|||
|
|
curl -X POST http://192.168.1.80:5000/api/speech/recognize \
|
|||
|
|
-F "audio=@test.wav" \
|
|||
|
|
-F "referenceText=你好世界"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 在 uni-app 中测试
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// 在页面中调用
|
|||
|
|
import speechRecorder from '@/utils/speech-recorder.js'
|
|||
|
|
|
|||
|
|
// 开始录音
|
|||
|
|
speechRecorder.start()
|
|||
|
|
|
|||
|
|
// 停止并上传
|
|||
|
|
const filePath = await speechRecorder.stop()
|
|||
|
|
const result = await speechRecorder.uploadAndRecognize(filePath, {
|
|||
|
|
referenceText: '测试文本'
|
|||
|
|
})
|
|||
|
|
|
|||
|
|
console.log('识别结果:', result)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 优化建议
|
|||
|
|
|
|||
|
|
### 1. 缓存模型
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 使用单例模式,避免重复加载模型
|
|||
|
|
class VoskRecognizer:
|
|||
|
|
_instance = None
|
|||
|
|
_model = None
|
|||
|
|
|
|||
|
|
def __new__(cls):
|
|||
|
|
if cls._instance is None:
|
|||
|
|
cls._instance = super().__new__(cls)
|
|||
|
|
cls._model = Model(MODEL_PATH)
|
|||
|
|
return cls._instance
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 异步处理
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from celery import Celery
|
|||
|
|
|
|||
|
|
app = Celery('speech_recognition')
|
|||
|
|
|
|||
|
|
@app.task
|
|||
|
|
def recognize_async(audio_path, reference_text):
|
|||
|
|
# 异步识别
|
|||
|
|
return recognize_and_evaluate(audio_path, reference_text)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 添加队列
|
|||
|
|
|
|||
|
|
使用 Redis + Celery 处理并发请求
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 性能参考
|
|||
|
|
|
|||
|
|
| 配置 | 响应时间 | 并发能力 |
|
|||
|
|
|------|---------|---------|
|
|||
|
|
| 2核CPU | 1-2秒 | 5-10请求/秒 |
|
|||
|
|
| 4核CPU | 0.5-1秒 | 15-20请求/秒 |
|
|||
|
|
| GPU | 0.2-0.5秒 | 30+请求/秒 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔐 安全建议
|
|||
|
|
|
|||
|
|
1. **添加认证**:使用 Token 验证
|
|||
|
|
2. **限流**:防止恶意请求
|
|||
|
|
3. **文件大小限制**:最大 10MB
|
|||
|
|
4. **定时清理**:删除临时音频文件
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 总结
|
|||
|
|
|
|||
|
|
### 优势
|
|||
|
|
|
|||
|
|
1. ✅ **完全内网** - 不依赖外网服务
|
|||
|
|
2. ✅ **开源免费** - 无需购买插件
|
|||
|
|
3. ✅ **易于维护** - Python 服务简单
|
|||
|
|
4. ✅ **可扩展** - 可以添加更多功能
|
|||
|
|
|
|||
|
|
### 下一步
|
|||
|
|
|
|||
|
|
1. 在服务器 (192.168.1.80) 部署 Vosk 服务
|
|||
|
|
2. 集成到现有 Spring Boot 后端
|
|||
|
|
3. uni-app 调用新的录音接口
|
|||
|
|
4. 测试和优化
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**现在可以开始云打包了,不再依赖 UTS 插件!** 🎉
|