AI搜索项目LangChain-SearXNG-支持Streamlit WebUI快速体验✨最近我针对LangChain

站长

2024年09月06日 10:43 · 阅读数 113

✨最近我针对LangChain-SearXNG项目进行一版更新，增加Streamlit WebUI支持，让更多朋友能快速上手体验，我们来看一下具体如何实现

这是LangChain-SearXNG搜索引擎系列的第6篇文章，前面几篇文章详情如下：（感兴趣的朋友可以跳转阅读）

一.运行体验

下载项目

GitHub地址： github.com/ptonlix/Lan…

参考项目README部署好环境，启动项目

启动项目

# 启动项目
python -m langchain_searxng

# 查看API
访问: http://localhost:8002/docs 获取 API 信息

# 启动前端页面
cd webui
streamlit run webui.py

AI搜索项目LangChain-SearXNG-支持Streamlit WebUI快速体验✨最近我针对LangChain

配置介绍

服务地址： python -m langchain_searxng 启动的Fastapi server的地址
搜索方式：支持SearXNG搜索 和 智谱搜索
搜索模型： SearXNG搜索需要用到的大模型，目前支持zhipuai deepseek openai 具体的模型型号，需要在langchain_searxng配置文件中配置

功能体验

一.选择SearXNG搜索和zhipuai，输入搜索内容 今年有哪些好莱坞大片上映？

AI搜索项目LangChain-SearXNG-支持Streamlit WebUI快速体验✨最近我针对LangChain

流失输出内容相关顺序

相关视频
搜索来源
总结回复

其它模型更换，欢迎大家自己体验～

二.选择智谱搜索，输入相同的搜索内容今年有哪些好莱坞大片上映？

AI搜索项目LangChain-SearXNG-支持Streamlit WebUI快速体验✨最近我针对LangChain

流失输出内容相关顺序

相关视频
总结回复

智谱搜索会缺少引用搜索来源的数据，因为智谱API没有返回相关数据，直接返回的是总结的内容

如何使用Streamlit来实现

相信Streamlit大家都比较熟悉，该框架支持你用python就能快速实现一个产品原型Demo，非常方便

Streamlit如何实现对接流式输出是实现我们这个搜索Demo的一个难点，接下来我们一起看一下

定义接口

class SearchVideo(ApiRequest):
    def __init__(self, base_url: str):
        super().__init__(base_url=base_url)

    def search_video(
        self, endpoint: str, data: Any, headers: Optional[Dict[str, str]] = None
    ):
        try:
            resp_dict = self.post(endpoint=endpoint, data=data, headers=headers)
            logger.info(resp_dict)
            resp_model = RespModel[VideoSearchResponse](**resp_dict)

            return resp_model
        except Exception as e:
            logger.exception(f"Unexpected error in SearchVideo request : {e}")
            return None


class SearchSSE:
    def __init__(self, base_url: str):
        self.base_url = base_url
        self.session = requests.Session()

    def with_requests(self, endpoint, data, headers):
        """Get a streaming response for the given event feed using requests."""
        import requests

        return requests.post(
            self.base_url + endpoint,
            json=data,
            stream=True,
            headers=headers,
        )

    def connect(
        self, endpoint: str, data: Any, headers: Optional[Dict[str, str]] = None
    ):
        try:
            response = self.with_requests(endpoint, data, headers)
            if response.status_code != 200:
                logger.warning(response.text)
                raise Exception(response.text)

            client = sseclient.SSEClient(response)
            for event in client.events():
                yield event

        except requests.RequestException as e:
            logger.warning(f"Error in SSE request to {endpoint}: {e}")
            raise
        except Exception as e:
            logger.exception(e)

    def close(self):
        self.session.close()

这里我主要定义了两个类，一个是SearchVideo和SearchSSE,分别访问LangChain-SearXNG服务提供视频搜索和搜索SSE流式接口

AI搜索项目LangChain-SearXNG-支持Streamlit WebUI快速体验✨最近我针对LangChain

具体实现可以参考LangChain-SearXNG项目代码

WebUI布局

整体布局采用简单的 侧边栏+功能页 的形式

侧边栏布局代码：

# 侧边栏配置
with st.sidebar:
    st.markdown("## 配置")
    base_url = st.text_input(
        "LangChain-SearXNG服务地址",
        value="http://127.0.0.1:8002",
        key="langchain_searxng_base_url",
    )
    search_model_options = ["SearXNG搜索", "智谱搜索"]
    search_model_index = st.selectbox(
        "LangChain-SearXNG搜索方式",
        options=range(len(search_model_options)),
        format_func=lambda x: search_model_options[x],
        key="search_modelssearxng_search_model",
    )
    llm_model = ""
    retriever_model = ""
    if search_model_index == 0:
        llm_model = st.selectbox(
            "LangChain-SearXNG搜索模型",
            options=["zhipuai", "deepseek", "openai"],
            key="langchain_searxng_llm_model",
        )
        retriever_model = "searx"
    elif search_model_index == 1:
        llm_model = "zhipuwebsearch"
        retriever_model = "zhipuwebsearch"

    network_flag = st.toggle(
        "是否联网搜索",
        value=True,
        key="network_search",
    )
    "[View the source code](https://github.com/ptonlix/LangChain-SearXNG)"
    "[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/ptonlix/LangChain-SearXNG?quickstart=1)"

侧边栏是用sidebar组件，里面都是项目的配置内容，所以采用如text_input, selectbox和 toggle等组件来做配置信息的录入

搜索功能

def main():
    """主函数，初始化会话状态并处理用户输入"""
    initialize_session_state()
    display_chat_history()

    if prompt := st.chat_input(placeholder="请输入搜索内容"):
        process_user_input(prompt, base_url, network_flag, llm_model, retriever_model)

搜索功能主要分为三个部分：

初始化会话状态
展示聊天历史记录
获取用户输入，执行搜索流程

初始化会话状态

我们需要用到streamlit session来保存聊天的历史记录，所以首先我们先初始化Session

def initialize_session_state():
    """初始化会话状态，确保必要的变量存在"""
    if "messages" not in st.session_state:
        st.session_state.messages = [
            {
                "role": "assistant",
                "content": "您好！我是AI搜索助手，有什么可以帮您的吗？",
                "video_sources": [],
                "sources_markdown": "",
            }
        ]
    st.session_state.setdefault("video_sources", None)
    st.session_state.setdefault("sources_markdown", "**搜索来源：**\n\n")

这里有三部分内容是我们要初始化的

content：文字聊天记录
video_sources：相关视频记录
sources_markdown：搜索来源记录

展示聊天历史记录

streamlit 页面上的操作会引发重新加载整个页面，所以我们需要一个方法将session_state.messages记录的聊天历史记录，重新加载显示到页面

def display_chat_history():
    """显示聊天历史记录"""
    for msg in st.session_state.messages:
        with st.chat_message(msg["role"]):
            display_video_sources(msg["video_sources"])
            if msg["sources_markdown"]:
                st.markdown(msg["sources_markdown"])
            st.write(msg["content"])


def display_video_sources(video_sources: List[Any]):
    """显示视频来源"""
    if not video_sources:
        return

    st.markdown("**相关视频：**")
    cols = st.columns(len(video_sources))
    for idx, video in enumerate(video_sources):
        with cols[idx]:
            try:
                response = requests.get(video.pic.replace("////", "//"))
                img = BytesIO(response.content)
                st.image(img, use_column_width=True)
            except Exception as e:
                st.error(f"无法加载图片: {str(e)}")
            st.markdown(f"[{video.author}]({video.arcurl})")
            st.caption(f"{video.description[:50]}...")

获取用户输入，执行搜索流程

将历史记录转成chat_history格式

chat_history = [
        (msg["role"], msg["content"]) for msg in st.session_state.messages[1:]
    ]
chat_history = [
    ("ai" if role == "assistant" else "human", content)
    for role, content in chat_history
]

定义回答容器

with st.chat_message("assistant"):
    video_placeholder = st.empty()
    sources_placeholder = st.empty()
    content_placeholder = st.empty()
    progress_bar = st.empty()
    timer_text = st.empty()

st.chat_message("assistant") 是回答的总容器，下面video_placeholder5个容器是我们接下来要填充不同内容的，也是按照我们展示内容先后顺序来输出 相关视频 ➡️ 搜索来源记录 ➡️ 模型总结内容 ➡️ 进度条和总耗时

流式获取搜索内容

def process_sse_events(
    sse_client,
    data,
    headers,
    video_client,
    prompt,
    video_placeholder,
    sources_placeholder,
    content_placeholder,
    progress_bar,
    timer_text,
    start_time,
):
    """处理服务器发送的事件（SSE）"""
    full_response = ""
    sources_markdown = "**搜索来源：**\n\n"

    video_sources = video_client.search_video(
        endpoint="/v1/search/video",
        data={"query": prompt, "conversation_id": ""},
        headers=headers,
    ).data

    update_progress(progress_bar, timer_text, 0.1, time.time() - start_time)

    if video_sources:
        display_video_sources_in_placeholder(
            video_placeholder, video_sources, progress_bar, timer_text, start_time
        )

    st.session_state.video_sources = video_sources

    source_idx = 1
    for event in sse_client.connect("/v2/search/sse", data, headers):
        if event.event == "source":
            source_data = ast.literal_eval(event.data)
            sources_markdown += f"{source_idx}. [{source_data.get('title','')}]({source_data.get('url','')})\n"
            source_idx += 1
            sources_placeholder.markdown(sources_markdown)
        elif event.event == "message":
            full_response += event.data
            content_placeholder.markdown(full_response + "▌")
            progress = min(0.2 + len(full_response) / 1000, 0.9)
            update_progress(
                progress_bar, timer_text, progress, time.time() - start_time
            )
        elif event.event == "error":
            st.error(f"错误: {event.data}")

    content_placeholder.markdown(full_response)
    return full_response, sources_markdown

先获取相关视频数据并展示，其次根据 SSE 格式分别解析出 搜索来源和 总结内容，并通过st.markdown组件展示

最后将新搜索结果记录

total_time = time.time() - start_time
update_progress(progress_bar, timer_text, 1.0, total_time)

full_response_with_time = f"{full_response}\n\n*总耗时: {total_time:.2f} 秒*"
st.session_state.messages.append(
    {
        "role": "assistant",
        "content": full_response_with_time,
        "video_sources": (
            st.session_state.video_sources.video_list
            if st.session_state.video_sources
            else []
        ),
        "sources_markdown": sources_markdown,
    }
)

三.最后

AI搜索项目LangChain-SearXNG-支持Streamlit WebUI快速体验✨最近我针对LangChain

GitHub地址：github.com/ptonlix/Lan…

👏 欢迎大家前往Clone项目体验，并提出您宝贵的建议

🚩 项目下一步计划

项目整体效果有所提升，但是针对搜索功能，还可以继续优化升级，这也是我下一步的计划

Docker化项目，便于部署传播
优化Prompt，支持输出更丰富的内容

💥 参与贡献

欢迎大家贡献力量，一起共建 LangChain-SearXNG，您可以做任何有益事情

报告错误
建议改进
文档贡献
代码贡献 ... 👏👏👏

转载自:https://juejin.cn/post/7407648740927291392