LangChain 在 FastAPI 中的部署與應用整合

簡介

在當前的 AI 驅動應用程式開發中，LangChain 已成為串接大型語言模型（LLM）與外部工具、資料庫的核心框架。它提供了「鏈」的概念，讓開發者可以把 LLM、檢索、工具呼叫等功能模組化、可重用化。

另一方面，FastAPI 以其高效能、簡潔的宣告式路由與自動產生 OpenAPI 文件的特性，成為建構 RESTful 服務與微服務的首選框架。將 LangChain 融入 FastAPI，不僅能讓模型推論即時化，還能把複雜的對話流程、資料檢索或工具呼叫，封裝成易於呼叫的 HTTP API，讓前端、行動端或其他系統都能輕鬆整合 AI 能力。

本篇文章將從 核心概念、實作範例、常見陷阱與最佳實踐，一步步帶你在 FastAPI 中部署 LangChain，最終呈現可直接上線的範例應用。

核心概念

1. LangChain 的「Chain」與「Agent」

Chain：把多個 LLM 呼叫、檢索、資料前處理等步驟串成一條流水線。每個節點只負責單一職責，最後回傳結果。
Agent：在執行過程中能根據 LLM 的回應動態決定要使用哪個工具（如搜尋、計算、資料庫查詢），屬於更高階的自動化決策層。

在 FastAPI 中，我們通常把 Chain 包裝成單一的路由端點，而 Agent 則可以用於需要多步互動的對話式 API。

2. FastAPI 與非同步（async）

FastAPI 完全支援 非同步 處理，這對於呼叫遠端 LLM（例如 OpenAI、Azure OpenAI）或向資料庫發送查詢時，能有效避免阻塞。LangChain 的大部分鏈結元件皆支援 async，只要在 FastAPI 路由上加上 async def 即可。

3. 環境變數與機密管理

LLM 的 API 金鑰、向量資料庫的連線字串等機密資訊不應硬寫在程式碼中。建議使用 python‑dotenv 或 Pydantic Settings 來統一管理，並在 Docker / CI/CD 流程中以環境變數方式注入。

4. 輸入驗證與回傳結構

FastAPI 內建的 Pydantic 能自動完成請求參數的驗證與序列化。對於 LLM 輸入，我們通常會：

定義一個 PromptRequest 模型，限制字數、必填欄位。
在回傳時使用 BaseModel 包裝 LLM 的回應，保持 API 的一致性與可讀性。

程式碼範例

以下範例以 Python 3.11、FastAPI 0.110、LangChain 0.1.x 為基礎，示範三個常見情境：

簡易問答 Chain（單一步驟）
檢索增強生成（RAG）（結合向量資料庫）
Agent 呼叫外部工具（動態工具選擇）

⚠️ 注意：請先在本機或容器中安裝 fastapi[all] langchain openai chromadb 等相依套件。

1. 基礎設定與環境變數

# app/config.py
import os
from pathlib import Path
from dotenv import load_dotenv
from pydantic import BaseSettings, Field

# 讀取 .env 檔案
env_path = Path(__file__).parent.parent / ".env"
load_dotenv(dotenv_path=env_path)

class Settings(BaseSettings):
    openai_api_key: str = Field(..., env="OPENAI_API_KEY")
    # 若使用 Azure OpenAI，可自行擴充 endpoint、api_version 等欄位

    class Config:
        env_file = ".env"
        env_file_encoding = "utf-8"

settings = Settings()

.env 範例（切勿上傳至 Git）：

OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2. 建立 FastAPI 應用與共用 LLM 客戶端

# app/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from app.config import settings

app = FastAPI(title="LangChain + FastAPI Demo")

# 建立全域的 OpenAI LLM 實例（支援 async）
llm = OpenAI(api_key=settings.openai_api_key, temperature=0.0, model_name="gpt-3.5-turbo")

3. 範例一：簡易問答 Chain

# app/schemas.py
class PromptRequest(BaseModel):
    question: str = Field(..., min_length=1, max_length=500, description="使用者的問題")

class PromptResponse(BaseModel):
    answer: str = Field(..., description="LLM 回答的文字")

# app/routes/simple_qa.py
from fastapi import APIRouter
from app.main import llm, app
from app.schemas import PromptRequest, PromptResponse
from langchain.chains import LLMChain

router = APIRouter()

# 建立 PromptTemplate（可重複使用）
qa_template = PromptTemplate(
    input_variables=["question"],
    template="請用中文簡潔回答以下問題：\n\n{question}"
)

# 建立 LLMChain（同步版示範，實務上建議 async）
qa_chain = LLMChain(llm=llm, prompt=qa_template)

@router.post("/qa", response_model=PromptResponse)
async def ask_question(payload: PromptRequest):
    """接受使用者問題，回傳 LLM 的答案。"""
    try:
        # 呼叫鏈結，取得文字回應
        result = await qa_chain.arun(question=payload.question)
        return PromptResponse(answer=result)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

app.include_router(router, prefix="/api")

重點：qa_chain.arun 為非同步版本，能與 FastAPI 完美協同。

4. 範例二：檢索增強生成（RAG）

此範例使用 Chroma 向量資料庫，將本地文件（例如 FAQ）向量化，並在查詢時先檢索相關段落，再交給 LLM 產生答案。

# app/rag.py
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from pathlib import Path
import json

# 1️⃣ 建立或載入向量資料庫
persist_dir = Path(__file__).parent / "chroma_db"
embeddings = OpenAIEmbeddings(openai_api_key=settings.openai_api_key)

if not persist_dir.exists():
    # 假設有一個 faq.json，格式為 [{ "question": "...", "answer": "..." }, ...]
    raw = json.loads(Path("data/faq.json").read_text(encoding="utf-8"))
    docs = [f"Q: {item['question']}\nA: {item['answer']}" for item in raw]
    vectorstore = Chroma.from_texts(docs, embeddings, persist_directory=str(persist_dir))
else:
    vectorstore = Chroma(persist_directory=str(persist_dir), embedding_function=embeddings)

# 2️⃣ 建立檢索器
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 3️⃣ 建立 RAG Chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",          # 直接把檢索結果拼接到 Prompt
    retriever=retriever,
    return_source_documents=True
)

# app/routes/rag.py
from fastapi import APIRouter, HTTPException
from app.schemas import PromptRequest, PromptResponse
from app.rag import rag_chain

router = APIRouter()

@router.post("/rag", response_model=PromptResponse)
async def ask_rag(payload: PromptRequest):
    """使用 RAG 方式回答問題，會先檢索相關文件再產生答案。"""
    try:
        # rag_chain.arun 會回傳 dict，包含 answer 與 source_documents
        res = await rag_chain.arun(question=payload.question)
        answer = res["result"] if isinstance(res, dict) else res
        return PromptResponse(answer=answer)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

app.include_router(router, prefix="/api")

技巧：在正式環境建議把向量資料庫放在 Redis, Pinecone 或 Weaviate，以提升擴展性與查詢效能。

5. 範例三：Agent 呼叫外部工具（天氣查詢）

此範例示範如何使用 OpenAI Functions（或 LangChain 的 Tool）讓 LLM 動態決定是否需要呼叫天氣 API。

# app/tools/weather.py
import httpx
from typing import Dict

async def get_weather(city: str) -> str:
    """呼叫公開的天氣 API，回傳簡易文字描述。"""
    url = f"https://wttr.in/{city}?format=3"
    async with httpx.AsyncClient() as client:
        resp = await client.get(url, timeout=5.0)
        resp.raise_for_status()
        return resp.text.strip()

# app/agent.py
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from app.tools.weather import get_weather

# 建立 Tool 物件
weather_tool = Tool(
    name="weather",
    func=get_weather,
    description="取得指定城市的即時天氣資訊，輸入參數為城市名稱（中文或英文）"
)

# 初始化 Agent（使用 OpenAI Functions 方式）
agent = initialize_agent(
    tools=[weather_tool],
    llm=llm,
    agent_type="openai-functions",
    verbose=True
)

# app/routes/agent.py
from fastapi import APIRouter, HTTPException
from app.schemas import PromptRequest, PromptResponse
from app.agent import agent

router = APIRouter()

@router.post("/agent", response_model=PromptResponse)
async def chat_with_agent(payload: PromptRequest):
    """使用 Agent 讓 LLM 自動決定是否呼叫天氣工具。"""
    try:
        # agent.arun 為非同步呼叫
        answer = await agent.arun(payload.question)
        return PromptResponse(answer=answer)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

app.include_router(router, prefix="/api")

要點：

Tool 必須是 非同步（async def）才能在 FastAPI 中不阻塞。

若使用 OpenAI Functions，務必在 settings 中設定 openai_api_key，且模型需支援 function_call（如 gpt-3.5-turbo-1106、gpt-4-1106-preview）。

常見陷阱與最佳實踐

常見問題	為什麼會發生	解決方案／最佳實踐
LLM 呼叫阻塞	使用同步的 `run()` 而非 `arun()`，FastAPI 只能在單一執行緒內處理請求，容易卡住。	全部改成 `async`，同時確保底層的 HTTP 客戶端（如 `httpx.AsyncClient`）也是非同步。
Prompt 注入攻擊	直接把使用者輸入拼接到 Prompt，LLM 可能被惡意指令利用。	使用 PromptTemplate 只接受白名單變數，並在 Pydantic 中限制字數、過濾特殊字元。
向量資料庫同步讀寫	大量查詢時向量資料庫的同步 API 會導致 FastAPI 併發瓶頸。	選擇支援 async 的向量服務（如 Milvus, Weaviate）或將查詢放入背景任務（`BackgroundTasks`）。
API 金鑰外洩	金鑰硬寫在程式碼或 Dockerfile 中。	透過環境變數、Secret Manager（AWS Secrets Manager、GCP Secret Manager）注入。
回傳過長或不安全的文字	LLM 產生的文字可能包含敏感資訊或過長導致前端卡頓。	在回傳前截斷（如 `max_tokens=200`）或使用內容過濾（OpenAI moderation endpoint）。
模型溫度設定不當	溫度過高會產生不穩定答案，過低則缺乏創意。	針對問答使用 `temperature=0`，對於創意寫作可調高至 `0.7~0.9`，並在設定檔中統一管理。

其他最佳實踐

日誌與監控：使用 loguru 或 structlog 統一記錄請求、LLM 呼叫耗時、錯誤訊息，並結合 Prometheus / Grafana 監控 API latency。
單元測試：利用 pytest 搭配 httpx.AsyncClient 測試 FastAPI 端點；對 LangChain 的 Chain 使用 Mock LLM（如 FakeListLLM）避免外部呼叫。
版本鎖定：LangChain 與 LLM 客戶端變化頻繁，請在 requirements.txt 中明確指定版本，並在 CI 中跑 dependabot 更新提醒。
容器化部署：在 Dockerfile 中使用 multi‑stage build，只保留執行環境；將向量資料庫掛載為 volume，或使用外部服務（如 Pinecone）減少容器大小。

實際應用場景

場景	為什麼適合用 LangChain + FastAPI	範例實作
客服機器人	需要即時回覆、可查詢 FAQ、並在必要時呼叫訂單 API。	使用 RAG 讀取 FAQ，搭配 Agent 呼叫訂單查詢工具。
內部知識庫搜尋	員工輸入自然語言問題，系統從文件、Confluence、GitHub Issue 中檢索答案。	建立 Chroma 向量索引，結合 RetrievalQA，提供 `/api/search` 端點。
金融報表分析	使用者問「本月營收較去年同期成長多少？」需要先抓取資料庫、計算、產出文字說明。	使用 Agent 內建 SQLTool，自動生成 SQL 並回傳結果。
IoT 裝置控制	透過語音或文字指令控制智慧燈、恆溫器等，需要即時回應與安全驗證。	建立 Tool 包裝 MQTT 或 REST 控制介面，Agent 決定是否執行。
內容生成平台	為行銷人員產出部落格、社群貼文，需結合品牌語調、關鍵字。	使用 PromptTemplate 與 LLMChain，提供 `/api/generate`，加入 `temperature` 調整。

這些案例皆展示了 FastAPI 作為 API 門面，將 LangChain 的高階 AI 流程抽象化、服務化的威力。只要把 Chain 或 Agent 包裝成一個 HTTP 端點，就能讓前端、行動 App，甚至第三方系統以標準化方式使用 AI 能力。

總結

LangChain 為 LLM 的功能擴充提供了「鏈」與「代理人」兩大抽象，讓檢索、工具呼叫、資料前處理等複雜流程變得可組合、可測試。
FastAPI 的非同步設計與自動文件生成，使得把 LangChain 的 Chain/Agent 直接暴露為 RESTful API 成為自然且高效的選擇。
透過 Pydantic 進行請求驗證、dotenv / Settings 管理機密、以及 async 呼叫 LLM，能在開發階段就避免常見的阻塞與安全問題。
本文提供了 三個實務範例（簡易問答、RAG、Agent+Tool），說明如何從 環境設定 → Chain 建構 → FastAPI 路由 完整落地。
最後，別忘了加入 日誌、監控、單元測試，並以容器化方式部署，才能在生產環境中穩定提供 AI 服務。

結語：把 LangChain 融入 FastAPI，不僅能讓開發者快速將 LLM 能力商品化，更能在保持彈性與可維護性的同時，為各式應用場景提供即時、可信的智慧服務。祝你開發順利，AI 應用無限可能！ 🚀