Merge pull request #2181 from 6vision/webp_images

Support images in webp format.
Merge pull request #2203 from 6vision/fix_issues
2026-05-19 19:10:20 +08:00 · 2024-08-02 13:47:39 +08:00 · 2024-08-02 13:30:14 +08:00 · 2024-08-02 01:38:15 +08:00 · 2024-08-01 17:57:48 +08:00 · 2024-07-31 14:09:33 +08:00
22 changed files with 188 additions and 42 deletions
@@ -5,7 +5,7 @@
 最新版本支持的功能如下：

 -  ✅   **多端部署：** 有多种部署方式可选择且功能完备，目前已支持微信公众号、企业微信应用、飞书、钉钉等部署方式
-  ✅   **基础对话：** 私聊及群聊的消息智能回复，支持多轮会话上下文记忆，支持 GPT-3.5, GPT-4, GPT-4o, Claude-3.5, Gemini, 文心一言, 讯飞星火, 通义千问，ChatGLM-4，Kimi(月之暗面), MiniMax
+-  ✅   **基础对话：** 私聊及群聊的消息智能回复，支持多轮会话上下文记忆，支持 GPT-3.5, GPT-4o-mini, GPT-4o,  GPT-4, Claude-3.5, Gemini, 文心一言, 讯飞星火, 通义千问，ChatGLM-4，Kimi(月之暗面), MiniMax
 -  ✅   **语音能力：** 可识别语音消息，通过文字或语音回复，支持 azure, baidu, google, openai(whisper/tts) 等多种语音模型
 -  ✅   **图像能力：** 支持图片生成、图片识别、图生图（如照片修复），可选择 Dall-E-3, stable diffusion, replicate, midjourney, CogView-3, vision模型
 -  ✅   **丰富插件：** 支持个性化插件扩展，已实现多角色切换、文字冒险、敏感词过滤、聊天记录总结、文档总结和对话、联网搜索等插件
@@ -46,6 +46,8 @@ DEMO视频：https://cdn.link-ai.tech/doc/cow_demo.mp4

 # 🏷 更新日志

+>**2024.07.19：** [1.6.9版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.6.9) 新增 gpt-4o-mini 模型、阿里语音识别、企微应用渠道路由优化
+
 >**2024.07.05：** [1.6.8版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.6.8) 和 [1.6.7版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.6.7)，Claude3.5, Gemini 1.5 Pro, MiniMax模型、工作流图片输入、模型列表完善

 >**2024.06.04：** [1.6.6版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.6.6) 和 [1.6.5版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.6.5)，gpt-4o模型、钉钉流式卡片、讯飞语音识别/合成
@@ -173,7 +175,7 @@ pip3 install -r requirements-optional.txt

 **4.其他配置**

-+ `model`: 模型名称，目前支持 `gpt-3.5-turbo`, `gpt-4o`, `gpt-4-turbo`, `gpt-4`, `wenxin` , `claude` , `gemini`, `glm-4`,  `xunfei`, `moonshot`等，全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
+ `model`: 模型名称，目前支持 `gpt-3.5-turbo`, `gpt-4o-mini`, `gpt-4o`, `gpt-4`, `wenxin` , `claude` , `gemini`, `glm-4`,  `xunfei`, `moonshot`等，全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
 + `temperature`,`frequency_penalty`,`presence_penalty`: Chat API接口参数，详情参考[OpenAI官方文档。](https://platform.openai.com/docs/api-reference/chat)
 + `proxy`：由于目前 `openai` 接口国内无法访问，需配置代理客户端的地址，详情参考  [#351](https://github.com/zhayujie/chatgpt-on-wechat/issues/351)
 + 对于图像生成，在满足个人或群组触发条件外，还需要额外的关键词前缀来触发，对应配置 `image_create_prefix `
@@ -208,11 +208,33 @@ class AzureChatGPTBot(ChatGPTBot):
            headers = {"api-key": api_key, "Content-Type": "application/json"}
            try:
                body = {"prompt": query, "size": conf().get("image_create_size", "1024x1024"), "quality": conf().get("dalle3_image_quality", "standard")}
-                submission = requests.post(url, headers=headers, json=body)
-                image_url = submission.json()['data'][0]['url']
-                return True, image_url
+                response = requests.post(url, headers=headers, json=body)
+                response.raise_for_status()  # 检查请求是否成功
+                data = response.json()
+
+                # 检查响应中是否包含图像 URL
+                if 'data' in data and len(data['data']) > 0 and 'url' in data['data'][0]:
+                    image_url = data['data'][0]['url']
+                    return True, image_url
+                else:
+                    error_message = "响应中没有图像 URL"
+                    logger.error(error_message)
+                    return False, "图片生成失败"
+
+            except requests.exceptions.RequestException as e:
+                # 捕获所有请求相关的异常
+                try:
+                    error_detail = response.json().get('error', {}).get('message', str(e))
+                except ValueError:
+                    error_detail = str(e)
+                error_message = f"{error_detail}"
+                logger.error(error_message)
+                return False, error_message
+
            except Exception as e:
-                logger.error("create image error: {}".format(e))
+                # 捕获所有其他异常
+                error_message = f"生成图像时发生错误: {e}"
+                logger.error(error_message)
                return False, "图片生成失败"
        else:
            return False, "图片生成失败，未配置text_to_image参数"
@@ -67,7 +67,7 @@ def num_tokens_from_messages(messages, model):
    elif model in ["gpt-4-0314", "gpt-4-0613", "gpt-4-32k", "gpt-4-32k-0613", "gpt-3.5-turbo-0613",
                   "gpt-3.5-turbo-16k", "gpt-3.5-turbo-16k-0613", "gpt-35-turbo-16k", "gpt-4-turbo-preview",
                   "gpt-4-1106-preview", const.GPT4_TURBO_PREVIEW, const.GPT4_VISION_PREVIEW, const.GPT4_TURBO_01_25,
-                   const.GPT_4o, const.LINKAI_4o, const.LINKAI_4_TURBO]:
+                   const.GPT_4o, const.GPT_4o_MINI, const.LINKAI_4o, const.LINKAI_4_TURBO]:
        return num_tokens_from_messages(messages, model="gpt-4")
    elif model.startswith("claude-3"):
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo")
@@ -399,6 +399,7 @@ class LinkAIBot(Bot):
            return
        max_send_num = conf().get("max_media_send_count")
        send_interval = conf().get("media_send_interval")
+        file_type = (".pdf", ".doc", ".docx", ".csv", ".xls", ".xlsx", ".txt", ".rtf", ".ppt", ".pptx")
        try:
            i = 0
            for url in image_urls:
@@ -407,7 +408,7 @@ class LinkAIBot(Bot):
                i += 1
                if url.endswith(".mp4"):
                    reply_type = ReplyType.VIDEO_URL
-                elif url.endswith(".pdf") or url.endswith(".doc") or url.endswith(".docx") or url.endswith(".csv"):
+                elif url.endswith(file_type):
                    reply_type = ReplyType.FILE
                    url = _download_file(url)
                    if not url:
@@ -41,14 +41,15 @@ class XunFeiBot(Bot):
        self.api_key = conf().get("xunfei_api_key")
        self.api_secret = conf().get("xunfei_api_secret")
        # 默认使用v2.0版本: "generalv2"
-        # v1.5版本为 "general"
-        # v3.0版本为: "generalv3"
-        self.domain = "generalv3"
-        # 默认使用v2.0版本: "ws://spark-api.xf-yun.com/v2.1/chat"
-        # v1.5版本为: "ws://spark-api.xf-yun.com/v1.1/chat"
-        # v3.0版本为: "ws://spark-api.xf-yun.com/v3.1/chat"
-        # v3.5版本为: "wss://spark-api.xf-yun.com/v3.5/chat"
-        self.spark_url = "wss://spark-api.xf-yun.com/v3.5/chat"
+        # Spark Lite请求地址(spark_url): wss://spark-api.xf-yun.com/v1.1/chat, 对应的domain参数为: "general"
+        # Spark V2.0请求地址(spark_url): wss://spark-api.xf-yun.com/v2.1/chat, 对应的domain参数为: "generalv2"
+        # Spark Pro 请求地址(spark_url): wss://spark-api.xf-yun.com/v3.1/chat, 对应的domain参数为: "generalv3"
+        # Spark Pro-128K请求地址(spark_url):  wss://spark-api.xf-yun.com/chat/pro-128k, 对应的domain参数为: "pro-128k"
+        # Spark Max 请求地址(spark_url): wss://spark-api.xf-yun.com/v3.5/chat, 对应的domain参数为: "generalv3.5"
+        # Spark4.0 Ultra 请求地址(spark_url): wss://spark-api.xf-yun.com/v4.0/chat, 对应的domain参数为: "4.0Ultra"
+        # 后续模型更新，对应的参数可以参考官网文档获取：https://www.xfyun.cn/doc/spark/Web.html
+        self.domain = conf().get("xunfei_domain", "generalv3.5")
+        self.spark_url = conf().get("xunfei_spark_url", "wss://spark-api.xf-yun.com/v3.5/chat")
        self.host = urlparse(self.spark_url).netloc
        self.path = urlparse(self.spark_url).path
        # 和wenxin使用相同的session机制
@@ -100,7 +100,7 @@ class DingTalkChanel(ChatChannel, dingtalk_stream.ChatbotHandler):
        super(dingtalk_stream.ChatbotHandler, self).__init__()
        self.logger = self.setup_logger()
        # 历史消息id暂存，用于幂等控制
-        self.receivedMsgs = ExpiredDict(conf().get("expires_in_seconds"))
+        self.receivedMsgs = ExpiredDict(conf().get("expires_in_seconds", 3600))
        logger.info("[DingTalk] client_id={}, client_secret={} ".format(
            self.dingtalk_client_id, self.dingtalk_client_secret))
        # 无需群校验和前缀
@@ -9,7 +9,6 @@ import json
 import os
 import threading
 import time
-
 import requests

 from bridge.context import *
@@ -21,6 +20,7 @@ from common.expired_dict import ExpiredDict
 from common.log import logger
 from common.singleton import singleton
 from common.time_check import time_checker
+from common.utils import convert_webp_to_png
 from config import conf, get_appdata_dir
 from lib import itchat
 from lib.itchat.content import *
@@ -109,7 +109,7 @@ class WechatChannel(ChatChannel):

    def __init__(self):
        super().__init__()
-        self.receivedMsgs = ExpiredDict(conf().get("expires_in_seconds"))
+        self.receivedMsgs = ExpiredDict(conf().get("expires_in_seconds", 3600))
        self.auto_login_times = 0

    def startup(self):
@@ -229,6 +229,12 @@ class WechatChannel(ChatChannel):
                image_storage.write(block)
            logger.info(f"[WX] download image success, size={size}, img_url={img_url}")
            image_storage.seek(0)
+            if ".webp" in img_url:
+                try:
+                    image_storage = convert_webp_to_png(image_storage)
+                except Exception as e:
+                    logger.error(f"Failed to convert image: {e}")
+                    return
            itchat.send_image(image_storage, toUserName=receiver)
            logger.info("[WX] sendImage url={}, receiver={}".format(img_url, receiver))
        elif reply.type == ReplyType.IMAGE:  # 从文件读取图片
@@ -266,6 +272,7 @@ def _send_login_success():
    except Exception as e:
        pass

+
 def _send_logout():
    try:
        from common.linkai_client import chat_client
@@ -274,6 +281,7 @@ def _send_logout():
    except Exception as e:
        pass

+
 def _send_qr_code(qrcode_list: list):
    try:
        from common.linkai_client import chat_client
@@ -281,3 +289,4 @@ def _send_qr_code(qrcode_list: list):
            chat_client.send_qrcode(qrcode_list)
    except Exception as e:
        pass
+
@@ -17,7 +17,7 @@ from channel.wechatcom.wechatcomapp_client import WechatComAppClient
 from channel.wechatcom.wechatcomapp_message import WechatComAppMessage
 from common.log import logger
 from common.singleton import singleton
-from common.utils import compress_imgfile, fsize, split_string_by_utf8_length
+from common.utils import compress_imgfile, fsize, split_string_by_utf8_length, convert_webp_to_png
 from config import conf, subscribe_msg
 from voice.audio_convert import any_to_amr, split_audio

@@ -44,7 +44,7 @@ class WechatComAppChannel(ChatChannel):

    def startup(self):
        # start message listener
-        urls = ("/wxcomapp", "channel.wechatcom.wechatcomapp_channel.Query")
+        urls = ("/wxcomapp/?", "channel.wechatcom.wechatcomapp_channel.Query")
        app = web.application(urls, globals(), autoreload=False)
        port = conf().get("wechatcomapp_port", 9898)
        web.httpserver.runsimple(app.wsgifunc(), ("0.0.0.0", port))
@@ -99,6 +99,12 @@ class WechatComAppChannel(ChatChannel):
                image_storage = compress_imgfile(image_storage, 10 * 1024 * 1024 - 1)
                logger.info("[wechatcom] image compressed, sz={}".format(fsize(image_storage)))
            image_storage.seek(0)
+            if ".webp" in img_url:
+                try:
+                    image_storage = convert_webp_to_png(image_storage)
+                except Exception as e:
+                    logger.error(f"Failed to convert image: {e}")
+                    return
            try:
                response = self.client.media.upload("image", image_storage)
                logger.debug("[wechatcom] upload image response: {}".format(response))
@@ -32,6 +32,7 @@ GPT4_TURBO_11_06 = "gpt-4-1106-preview"
 GPT4_VISION_PREVIEW = "gpt-4-vision-preview"

 GPT4 = "gpt-4"
+GPT_4o_MINI = "gpt-4o-mini"
 GPT4_32k = "gpt-4-32k"
 GPT4_06_13 = "gpt-4-0613"
 GPT4_32k_06_13 = "gpt-4-32k-0613"
@@ -57,7 +58,7 @@ GEMINI_15_PRO = "gemini-1.5-pro"

 MODEL_LIST = [
              GPT35, GPT35_0125, GPT35_1106, "gpt-3.5-turbo-16k",
-              GPT_4o, GPT4_TURBO, GPT4_TURBO_PREVIEW, GPT4_TURBO_01_25, GPT4_TURBO_11_06, GPT4, GPT4_32k, GPT4_06_13, GPT4_32k_06_13,
+              GPT_4o, GPT_4o_MINI, GPT4_TURBO, GPT4_TURBO_PREVIEW, GPT4_TURBO_01_25, GPT4_TURBO_11_06, GPT4, GPT4_32k, GPT4_06_13, GPT4_32k_06_13,
              WEN_XIN, WEN_XIN_4,
              XUNFEI, ZHIPU_AI, MOONSHOT, MiniMax,
              GEMINI, GEMINI_PRO, GEMINI_15_flash, GEMINI_15_PRO,
@@ -45,8 +45,11 @@ class ChatClient(LinkAIClient):
            elif reply_voice_mode == "always_reply_voice":
                local_config["always_reply_voice"] = True

-        if config.get("admin_password") and plugin_config.get("Godcmd"):
-            plugin_config["Godcmd"]["password"] = config.get("admin_password")
+        if config.get("admin_password"):
+            if not plugin_config.get("Godcmd"):
+                plugin_config["Godcmd"] = {"password": config.get("admin_password"), "admin_users": []}
+            else:
+                plugin_config["Godcmd"]["password"] = config.get("admin_password")
            PluginManager().instances["GODCMD"].reload()

        if config.get("group_app_map") and pconf("linkai"):
@@ -2,7 +2,7 @@ import io
 import os
 from urllib.parse import urlparse
 from PIL import Image
-
+from common.log import logger

 def fsize(file):
    if isinstance(file, io.BytesIO):
@@ -54,3 +54,17 @@ def split_string_by_utf8_length(string, max_length, max_split=0):
 def get_path_suffix(path):
    path = urlparse(path).path
    return os.path.splitext(path)[-1].lstrip('.')
+
+
+def convert_webp_to_png(webp_image):
+    from PIL import Image
+    try:
+        webp_image.seek(0)
+        img = Image.open(webp_image).convert("RGBA")
+        png_image = io.BytesIO()
+        img.save(png_image, format="PNG")
+        png_image.seek(0)
+        return png_image
+    except Exception as e:
+        logger.error(f"Failed to convert WEBP to PNG: {e}")
+        raise
@@ -17,7 +17,7 @@ available_setting = {
    "open_ai_api_base": "https://api.openai.com/v1",
    "proxy": "",  # openai使用的代理
    # chatgpt模型， 当use_azure_chatgpt为true时，其名称为Azure上model deployment名称
-    "model": "gpt-3.5-turbo",  # 可选择: gpt-4o, gpt-4-turbo, claude-3-sonnet, wenxin, moonshot, qwen-turbo, xunfei, glm-4, minimax, gemini等模型，全部可选模型详见common/const.py文件
+    "model": "gpt-3.5-turbo",  # 可选择: gpt-4o, pt-4o-mini, gpt-4-turbo, claude-3-sonnet, wenxin, moonshot, qwen-turbo, xunfei, glm-4, minimax, gemini等模型，全部可选模型详见common/const.py文件
    "bot_type": "",  # 可选配置，使用兼容openai格式的三方服务时候，需填"chatGPT"。bot具体名称详见common/const.py文件列出的bot_type，如不填根据model名称判断，
    "use_azure_chatgpt": False,  # 是否使用azure的chatgpt
    "azure_deployment_id": "",  # azure 模型部署名称
@@ -73,6 +73,8 @@ available_setting = {
    "xunfei_app_id": "",  # 讯飞应用ID
    "xunfei_api_key": "",  # 讯飞 API key
    "xunfei_api_secret": "",  # 讯飞 API secret
+    "xunfei_domain": "",  # 讯飞模型对应的domain参数，Spark4.0 Ultra为 4.0Ultra，其他模型详见: https://www.xfyun.cn/doc/spark/Web.html
+    "xunfei_spark_url": "",  # 讯飞模型对应的请求地址，Spark4.0 Ultra为 wss://spark-api.xf-yun.com/v4.0/chat，其他模型参考详见: https://www.xfyun.cn/doc/spark/Web.html
    # claude 配置
    "claude_api_cookie": "",
    "claude_uuid": "",
@@ -95,8 +97,8 @@ available_setting = {
    "group_speech_recognition": False,  # 是否开启群组语音识别
    "voice_reply_voice": False,  # 是否使用语音回复语音，需要设置对应语音合成引擎的api key
    "always_reply_voice": False,  # 是否一直使用语音回复
-    "voice_to_text": "openai",  # 语音识别引擎，支持openai,baidu,google,azure
-    "text_to_voice": "openai",  # 语音合成引擎，支持openai,baidu,google,pytts(offline),ali,azure,elevenlabs,edge(online)
+    "voice_to_text": "openai",  # 语音识别引擎，支持openai,baidu,google,azure,xunfei,ali
+    "text_to_voice": "openai",  # 语音合成引擎，支持openai,baidu,google,azure,xunfei,ali,pytts(offline),elevenlabs,edge(online)
    "text_to_voice_model": "tts-1",
    "tts_voice_id": "alloy",
    # baidu 语音api配置， 使用百度语音识别和语音合成时需要
@@ -10,9 +10,7 @@
    },
    "tool": {
        "tools": [
-            "python",
            "url-get",
-            "terminal",
            "meteo-weather"
        ],
        "kwargs": {
@@ -55,7 +55,7 @@ class Keyword(Plugin):
            reply_text = self.keyword[content]

            # 判断匹配内容的类型
-            if (reply_text.startswith("http://") or reply_text.startswith("https://")) and any(reply_text.endswith(ext) for ext in [".jpg", ".jpeg", ".png", ".gif", ".img"]):
+            if (reply_text.startswith("http://") or reply_text.startswith("https://")) and any(reply_text.endswith(ext) for ext in [".jpg", ".webp", ".jpeg", ".png", ".gif", ".img"]):
            # 如果是以 http:// 或 https:// 开头，且".jpg", ".jpeg", ".png", ".gif", ".img"结尾，则认为是图片 URL。
                reply = Reply()
                reply.type = ReplyType.IMAGE_URL
@@ -18,6 +18,7 @@ class Plugin:
        if not plugin_conf:
            # 全局配置不存在，则获取插件目录下的配置
            plugin_config_path = os.path.join(self.path, "config.json")
+            logger.debug(f"loading plugin config, plugin_config_path={plugin_config_path}, exist={os.path.exists(plugin_config_path)}")
            if os.path.exists(plugin_config_path):
                with open(plugin_config_path, "r", encoding="utf-8") as f:
                    plugin_conf = json.load(f)
@@ -99,7 +99,8 @@ class Role(Plugin):
        if e_context["context"].type != ContextType.TEXT:
            return
        btype = Bridge().get_bot_type("chat")
-        if btype not in [const.OPEN_AI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI]:
+        if btype not in [const.OPEN_AI, const.CHATGPT, const.CHATGPTONAZURE, const.QWEN_DASHSCOPE, const.XUNFEI, const.BAIDU, const.ZHIPU_AI, const.MOONSHOT, const.MiniMax]:
+            logger.warn(f'不支持的bot: {btype}')
            return
        bot = Bridge().get_bot("chat")
        content = e_context["context"].content[:]
@@ -22,7 +22,7 @@
    },
    "pictureChange": {
      "url": "https://github.com/Yanyutin753/pictureChange.git",
-      "desc": "利用stable-diffusion和百度Ai进行图生图或者画图的插件"
+      "desc": "1. 支持百度AI和Stable Diffusion WebUI进行图像处理，提供多种模型选择，支持图生图、文生图自定义模板。2. 支持Suno音乐AI可将图像和文字转为音乐。3. 支持自定义模型进行文件、图片总结功能。4. 支持管理员控制群聊内容与参数和功能改变。"
    },
    "Blackroom": {
      "url": "https://github.com/dividduang/blackroom.git",
@@ -1,8 +1,6 @@
 {
  "tools": [
-    "python",
    "url-get",
-    "terminal",
    "meteo"
  ],
  "kwargs": {
@@ -22,11 +22,13 @@ class Tool(Plugin):
    def __init__(self):
        super().__init__()
        self.handlers[Event.ON_HANDLE_CONTEXT] = self.on_handle_context
-
        self.app = self._reset_app()
-
+        if not self.tool_config.get("tools"):
+            logger.warn("[tool] init failed, ignore ")
+            raise Exception("config.json not found")
        logger.info("[tool] inited")

+
    def get_help_text(self, verbose=False, **kwargs):
        help_text = "这是一个能让chatgpt联网，搜索，数字运算的插件，将赋予强大且丰富的扩展能力。"
        trigger_prefix = conf().get("plugin_trigger_prefix", "$")
@@ -8,6 +8,7 @@ Description:

 """

+import http.client
 import json
 import time
 import requests
@@ -61,6 +62,69 @@ def text_to_speech_aliyun(url, text, appkey, token):

    return output_file

+def speech_to_text_aliyun(url, audioContent, appkey, token):
+    """
+    使用阿里云的语音识别服务识别音频文件中的语音。
+
+    参数:
+    - url (str): 阿里云语音识别服务的端点URL。
+    - audioContent (byte): pcm音频数据。
+    - appkey (str): 您的阿里云appkey。
+    - token (str): 阿里云API的认证令牌。
+
+    返回值:
+    - str: 成功时输出识别到的文本，否则为None。
+    """
+    format = 'pcm'
+    sample_rate = 16000
+    enablePunctuationPrediction  = True
+    enableInverseTextNormalization = True
+    enableVoiceDetection  = False
+
+    # 设置RESTful请求参数
+    request = url + '?appkey=' + appkey
+    request = request + '&format=' + format
+    request = request + '&sample_rate=' + str(sample_rate)
+
+    if enablePunctuationPrediction :
+        request = request + '&enable_punctuation_prediction=' + 'true'
+
+    if enableInverseTextNormalization :
+        request = request + '&enable_inverse_text_normalization=' + 'true'
+
+    if enableVoiceDetection :
+        request = request + '&enable_voice_detection=' + 'true'
+        
+    host = 'nls-gateway-cn-shanghai.aliyuncs.com'
+
+    # 设置HTTPS请求头部
+    httpHeaders = {
+        'X-NLS-Token': token,
+        'Content-type': 'application/octet-stream',
+        'Content-Length': len(audioContent)
+        }
+
+    conn = http.client.HTTPSConnection(host)
+    conn.request(method='POST', url=request, body=audioContent, headers=httpHeaders)
+
+    response = conn.getresponse()
+    body = response.read()
+    try:
+        body = json.loads(body)
+        status = body['status']
+        if status == 20000000 :
+            result = body['result']
+            if result :
+                logger.info(f"阿里云语音识别到了：{result}")
+            conn.close()
+            return result
+        else :
+            logger.error(f"语音识别失败，状态码: {status}")
+    except ValueError:
+        logger.error(f"语音识别失败，收到非JSON格式的数据: {body}")
+    conn.close()
+    return None
+

 class AliyunTokenGenerator:
    """
@@ -15,9 +15,9 @@ import time

 from bridge.reply import Reply, ReplyType
 from common.log import logger
+from voice.audio_convert import get_pcm_from_wav
 from voice.voice import Voice
-from voice.ali.ali_api import AliyunTokenGenerator
-from voice.ali.ali_api import text_to_speech_aliyun
+from voice.ali.ali_api import AliyunTokenGenerator, speech_to_text_aliyun, text_to_speech_aliyun
 from config import conf


@@ -34,7 +34,8 @@ class AliVoice(Voice):
            self.token = None
            self.token_expire_time = 0
            # 默认复用阿里云千问的 access_key 和 access_secret
-            self.api_url = config.get("api_url")
+            self.api_url_voice_to_text = config.get("api_url_voice_to_text")
+            self.api_url_text_to_voice = config.get("api_url_text_to_voice")
            self.app_key = config.get("app_key")
            self.access_key_id = conf().get("qwen_access_key_id") or config.get("access_key_id")
            self.access_key_secret = conf().get("qwen_access_key_secret") or config.get("access_key_secret")
@@ -53,7 +54,7 @@ class AliVoice(Voice):
                      r'äöüÄÖÜáéíóúÁÉÍÓÚàèìòùÀÈÌÒÙâêîôûÂÊÎÔÛçÇñÑ，。！？,.]', '', text)
        # 提取有效的token
        token_id = self.get_valid_token()
-        fileName = text_to_speech_aliyun(self.api_url, text, self.app_key, token_id)
+        fileName = text_to_speech_aliyun(self.api_url_text_to_voice, text, self.app_key, token_id)
        if fileName:
            logger.info("[Ali] textToVoice text={} voice file name={}".format(text, fileName))
            reply = Reply(ReplyType.VOICE, fileName)
@@ -61,6 +62,25 @@ class AliVoice(Voice):
            reply = Reply(ReplyType.ERROR, "抱歉，语音合成失败")
        return reply

+    def voiceToText(self, voice_file):
+        """
+        将语音文件转换为文本。
+
+        :param voice_file: 要转换的语音文件。
+        :return: 返回一个Reply对象，其中包含转换得到的文本或错误信息。
+        """
+        # 提取有效的token
+        token_id = self.get_valid_token()
+        logger.debug("[Ali] voice file name={}".format(voice_file))
+        pcm = get_pcm_from_wav(voice_file)
+        text = speech_to_text_aliyun(self.api_url_voice_to_text, pcm, self.app_key, token_id)
+        if text:
+            logger.info("[Ali] VoicetoText = {}".format(text))
+            reply = Reply(ReplyType.TEXT, text)
+        else:
+            reply = Reply(ReplyType.ERROR, "抱歉，语音识别失败")
+        return reply
+
    def get_valid_token(self):
        """
        获取有效的阿里云token。
@@ -1,5 +1,6 @@
 {
-    "api_url": "https://nls-gateway-cn-shanghai.aliyuncs.com/stream/v1/tts",
+    "api_url_text_to_voice": "https://nls-gateway-cn-shanghai.aliyuncs.com/stream/v1/tts",
+    "api_url_voice_to_text": "https://nls-gateway.cn-shanghai.aliyuncs.com/stream/v1/asr",
    "app_key": "",
    "access_key_id": "",
    "access_key_secret": ""
Author	SHA1	Message	Date
vision	3f5b976a87	Merge pull request #2181 from 6vision/webp_images Support images in webp format.	2024-08-02 13:47:39 +08:00
vision	49f2339cc2	Merge pull request #2203 from 6vision/fix_issues Fix issues	2024-08-02 13:30:14 +08:00
vision	29f1699de8	Merge pull request #2198 from 6vision/update_spark Support Spark4.0 Ultra model, optimize model configuration.	2024-08-02 01:38:15 +08:00
6vision	c415485801	Support Spark4.0 Ultra model, optimize model configuration.	2024-08-01 17:57:48 +08:00
zhayujie	6937673472	Merge pull request #2193 from 6vision/fix_tool Default close tool plugin.	2024-07-31 14:09:33 +08:00
6vision	c4f10fe876	fix: Default close tool plugin.	2024-07-31 00:01:56 +08:00
6vision	55ca652ad8	Default close tool plugin.	2024-07-30 23:14:23 +08:00
Saboteur7	000c2029de	fix: remove some tools	2024-07-30 12:35:12 +08:00
Saboteur7	ab88e3af06	fix: remove some default tools	2024-07-30 12:15:35 +08:00
6vision	b544a4c954	fix: Use default expiration time for ExpiredDict if not set in config	2024-07-29 20:14:41 +08:00
6vision	baff5fafec	Optimization	2024-07-28 00:03:16 +08:00
6vision	1673de73ba	Role plugin supports more bots.	2024-07-25 22:58:57 +08:00
6vision	e68936e36e	Support images in webp format.	2024-07-25 01:19:44 +08:00
6vision	7dbd195e45	Support images in webp format.	2024-07-25 01:12:53 +08:00
vision	3dc22f98bf	Merge pull request #2177 from 6vision/Opti-azure-dalle Optimize error messages when using Azure Dalle	2024-07-24 12:38:13 +08:00
6vision	805e870c18	Optimize error messages when using Azure Dalle	2024-07-24 00:06:18 +08:00
Saboteur7	de2c031797	docs: update readme	2024-07-19 15:46:19 +08:00
Saboteur7	3aa571aa1b	Merge pull request #2163 from 6vision/wechatcom_app Ensure compatibility for /wxcomapp URL with trailing slash	2024-07-19 15:38:20 +08:00
Saboteur7	3e4969efe6	Merge branch 'master' into wechatcom_app	2024-07-19 15:38:08 +08:00
Saboteur7	446e94df76	Merge pull request #2164 from 6vision/mini_bot Support gpt-4o-mini model	2024-07-19 15:37:30 +08:00
Saboteur7	5b26066a4c	Merge pull request #2154 from distiny-cool/ali_api 增加了使用阿里云进行语音识别的引擎	2024-07-19 15:37:05 +08:00
Saboteur7	8a80de5c3f	Merge pull request #2141 from Yanyutin753/new PictureChange插件功能升级	2024-07-19 15:36:02 +08:00
6vision	52a490c87e	Support gpt-4o-mini model	2024-07-19 11:04:45 +08:00
6vision	29490741fd	Ensure compatibility for /wxcomapp URL with trailing slash	2024-07-18 23:21:45 +08:00
kody	f0e416455f	增加了使用阿里云进行语音识别的引擎	2024-07-15 22:03:31 +08:00
vision	f7a2c97943	Merge pull request #2153 from 6vision/update_linkaibot support more file types.	2024-07-15 19:09:05 +08:00
6vision	993853757b	Linkai bot supports more file types.	2024-07-15 18:57:58 +08:00
6vision	a3abfb987d	update	2024-07-15 18:50:38 +08:00
Saboteur7	2711fa1b1b	Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat	2024-07-08 19:00:03 +08:00
Saboteur7	1f7afaba07	fix: client cmd config bug	2024-07-08 18:57:27 +08:00
Clivia	e02c8bff81	PictureChange插件功能升级	2024-07-08 17:58:59 +08:00