Python 並行與非同步：CPU‑bound vs I/O‑bound

簡介

在開發效能敏感的 Python 程式時，常會聽到「CPU‑bound」與「I/O‑bound」的說法。了解兩者的差異是選擇正確併發模型（執行緒、行程、協程）的前提。若把 CPU‑bound 的工作交給 threading，可能會因為全域解釋器鎖（GIL）而得不到預期的加速；相反地，將 I/O‑bound 任務交給 asyncio，則能在單一執行緒內同時處理大量網路或磁碟 I/O，顯著提升吞吐量。本文將從概念、程式碼範例、常見陷阱與最佳實踐，帶你一步步掌握如何根據工作性質選擇適合的併發方式。

核心概念

1. 什麼是 CPU‑bound？

CPU‑bound（CPU 受限）指的是程式的執行時間主要花在大量計算、演算法或資料處理上。典型例子包括：

數學運算（矩陣乘法、加密解密）
大規模資料分析（排序、聚合）
圖像或影片的編碼/解碼

此類工作需要不斷佔用 CPU，若在單一執行緒中執行，CPU 使用率會接近 100%。

2. 什麼是 I/O‑bound？

I/O‑bound（I/O 受限）指的是程式的執行時間大多花在等待外部資源（磁碟、網路、資料庫、使用者輸入）回應。常見情況有：

網路請求（HTTP、WebSocket）
讀寫檔案或資料庫查詢
使用者介面事件處理

在這種情況下，CPU 大部分時間是空閒的，真正的瓶頸是 I/O 的延遲。

3. 為什麼要區分？

效能：不同的瓶頸需要不同的解法。CPU‑bound 任務適合多行程（multiprocessing）或 C 擴充模組；I/O‑bound 任務則適合非同步（asyncio）或多執行緒（threading）。
資源利用：錯誤的併發模型會浪費記憶體或導致效能下降。
程式設計：非同步程式碼的寫法與傳統同步程式碼不同，了解何時使用可以避免不必要的複雜度。

程式碼範例

以下範例分別示範 CPU‑bound 與 I/O‑bound 的典型處理方式，並比較 threading、multiprocessing、asyncio 的效能差異。

1️⃣ 範例一：計算費波那契（CPU‑bound）

# fibonacci.py
import time
from multiprocessing import Pool, cpu_count

def fib(n: int) -> int:
    """遞迴計算第 n 個費波那契數，純 CPU 計算"""
    if n < 2:
        return n
    return fib(n - 1) + fib(n - 2)

def serial():
    start = time.time()
    results = [fib(35) for _ in range(8)]   # 8 個工作
    print("Serial:", time.time() - start, "seconds")

def multi_process():
    start = time.time()
    with Pool(cpu_count()) as p:
        results = p.map(fib, [35] * 8)
    print("Multiprocessing:", time.time() - start, "seconds")

if __name__ == "__main__":
    serial()
    multi_process()

說明

fib(35) 本身相當耗時，適合作為 CPU‑bound 測試。

multiprocessing 能真正利用多核心，繞過 GIL，通常比純序列快 4‑8 倍（視 CPU 數量而定）。

2️⃣ 範例二：同時發送 100 個 HTTP GET（I/O‑bound）— 使用 `threading`

# threading_http.py
import threading, time, requests

URL = "https://httpbin.org/delay/1"   # 模擬 1 秒延遲

def fetch(session, idx):
    resp = session.get(URL)
    print(f"Thread-{idx} status: {resp.status_code}")

def main():
    start = time.time()
    threads = []
    session = requests.Session()      # 共享連線池

    for i in range(100):
        t = threading.Thread(target=fetch, args=(session, i))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()
    print("Threading total:", time.time() - start, "seconds")

if __name__ == "__main__":
    main()

說明

requests 本身是阻塞的，但在多執行緒下，每個執行緒在等待網路回應時會釋放 GIL，讓其他執行緒繼續跑。

這種方式適合 I/O‑bound，但若執行緒數過多仍會耗盡系統資源（檔案描述符上限）。

3️⃣ 範例三：同時發送 100 個 HTTP GET（I/O‑bound）— 使用 `asyncio` + `aiohttp`

# async_http.py
import asyncio, time
import aiohttp

URL = "https://httpbin.org/delay/1"

async def fetch(session, idx):
    async with session.get(URL) as resp:
        print(f"Task-{idx} status:", resp.status)

async def main():
    start = time.time()
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, i) for i in range(100)]
        await asyncio.gather(*tasks)
    print("AsyncIO total:", time.time() - start, "seconds")

if __name__ == "__main__":
    asyncio.run(main())

說明

aiohttp 完全非阻塞，單一執行緒即可同時處理上百個請求。

對於 I/O‑bound 的大量網路呼叫，asyncio 的效能往往比 threading 更佳且資源佔用更低。

4️⃣ 範例四：CPU‑bound + I/O‑bound 混合（使用 `concurrent.futures`）

# mixed.py
import time, random
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def cpu_task(x):
    # 模擬 CPU 密集運算
    total = 0
    for _ in range(10_000_000):
        total += x * random.random()
    return total

def io_task(url):
    import requests
    return requests.get(url).status_code

def mixed_workflow():
    urls = ["https://httpbin.org/delay/1"] * 20

    # 1. 先用 ProcessPool 處理 CPU 任務
    with ProcessPoolExecutor() as proc:
        cpu_results = list(proc.map(cpu_task, range(20)))

    # 2. 再用 ThreadPool 處理 I/O 任務
    with ThreadPoolExecutor(max_workers=10) as thr:
        io_results = list(thr.map(io_task, urls))

    print("CPU results:", sum(cpu_results) % 1000)
    print("I/O results:", io_results)

if __name__ == "__main__":
    start = time.time()
    mixed_workflow()
    print("Total elapsed:", time.time() - start, "seconds")

說明

先用 ProcessPoolExecutor 讓 CPU 密集工作跑在多個行程上，免受 GIL 限制。

隨後使用 ThreadPoolExecutor 處理 I/O，因為 I/O 任務不需要大量 CPU，使用執行緒即可。

5️⃣ 範例五：使用 `asyncio.to_thread` 把阻塞函式搬到執行緒（兼容性）

# async_to_thread.py
import asyncio, time
import hashlib

def heavy_hash(data: bytes) -> str:
    """阻塞的 CPU 密集運算：計算 10 萬次 SHA256"""
    for _ in range(100_000):
        data = hashlib.sha256(data).digest()
    return data.hex()

async def main():
    start = time.time()
    # 把 heavy_hash 包裝成執行緒任務
    result = await asyncio.to_thread(heavy_hash, b"hello world")
    print("Hash result length:", len(result))
    print("Elapsed:", time.time() - start, "seconds")

if __name__ == "__main__":
    asyncio.run(main())

說明

asyncio.to_thread 讓我們在純 asyncio 程式中，安全地執行 CPU‑bound 的阻塞函式，而不必自行建立執行緒池。

常見陷阱與最佳實踐

陷阱	為何會發生	正確做法
把 CPU‑bound 任務交給 `threading`	受 GIL 限制，同時只能有一個執行緒取得 CPU 執行 Python bytecode。	使用 `multiprocessing`、`ProcessPoolExecutor`，或改寫成 C/Cython 擴充模組。
在 `asyncio` 中直接呼叫阻塞函式	會阻塞事件迴圈，導致所有協程停頓。	用 `await loop.run_in_executor(...)` 或 `asyncio.to_thread` 把阻塞函式搬到執行緒。
過度建立執行緒或行程	系統檔案描述符或記憶體耗盡，效能反而下降。	依據工作負載設定合理的上限（如 `ThreadPoolExecutor(max_workers=cpu_count()*5)`），並使用資源池。
忘記關閉 `aiohttp` 或 `requests.Session`	連線資源未釋放，會導致連線洩漏。	使用 `async with` 或 `with` 语句確保關閉。
在 Windows 上使用 `multiprocessing` 時忘記 `if __name__ == '__main__'`	會產生遞迴 spawn 錯誤。	必須將啟動程式碼放在 `if __name__ == '__main__':` 區塊內。

最佳實踐小結

先辨識任務類型：CPU‑bound → 多行程；I/O‑bound → 非同步或多執行緒。
使用標準庫抽象：concurrent.futures、asyncio 提供跨平台、易於維護的介面。
資源池化：ThreadPoolExecutor、ProcessPoolExecutor、aiohttp.ClientSession 都應該重複使用而非每次建立。
測試與測量：使用 timeit、cProfile、asyncio 的 loop.time() 量測效能，避免盲目猜測。
適度抽象：把「CPU‑bound」與「I/O‑bound」的切換封裝成函式或類別，讓程式碼在不同環境下仍能保持可讀與可維護。

實際應用場景

場景	任務類型	推薦的併發模型	為什麼
Web 爬蟲（同時抓取上千頁）	I/O‑bound（網路請求）	`asyncio + aiohttp`	單執行緒即可同時管理大量連線，降低記憶體與 CPU 開銷。
影片轉碼	CPU‑bound（編碼）	`multiprocessing` 或 `ProcessPoolExecutor`	每個影片檔案可分配到獨立行程，充分利用多核心。
即時聊天伺服器	I/O‑bound（WebSocket、資料庫）	`asyncio`（或 `uvicorn` + `FastAPI`）	高併發且低延遲，單執行緒即可支援成千上萬連線。
機器學習模型推論（大量矩陣運算）	CPU‑bound（或 GPU）	`multiprocessing` + `numpy`/`torch`（或直接使用 GPU）	計算密集，需要多核心或加速硬體。
資料 ETL 工作流（讀檔 + 處理 + 寫入）	混合	`ProcessPoolExecutor` 處理 CPU 端；`ThreadPoolExecutor` 處理 I/O 端	把兩種瓶頸分別交給最適合的併發模型，提高整體吞吐量。

總結

CPU‑bound 與 I/O‑bound 的差別是效能瓶頸的根本所在。
對於 CPU‑bound，多行程（multiprocessing、ProcessPoolExecutor）是突破 GIL 限制的主要手段；對於 I/O‑bound，非同步（asyncio）或 多執行緒（threading）能在單一執行緒內同時等待多個 I/O。
正確的做法是先 辨識任務類型，再選擇最適合的併發模型，並注意資源池化、錯誤處理與效能測試。
透過本文的範例與最佳實踐，你可以在日常開發中快速定位瓶頸、選擇正確工具，讓 Python 程式在 CPU 與 I/O 兩方面都能發揮最大的效能。

記住：「不管是 CPU‑bound 還是 I/O‑bound，最重要的是先測量，再優化。」

祝你寫出更快、更可靠的 Python 程式！ 🚀