Sliding Window

滑动窗口

只保留最近 N 轮对话，老的直接丢弃——最朴素的策略。

详解

滑动窗口是最简单的上下文管理策略：设定一个固定消息数量上限 N，对话历史始终只保留最近的 N 条或最近 N 轮，超出部分直接丢弃。优点是零额外成本，不需要额外 LLM 调用，适合对话轮次短、每轮相对独立的场景。缺点是硬丢弃会造成「上下文失忆」：如果用户第一轮说「我不能吃海鲜」，十轮后这条信息被滑掉，Agent 可能仍推荐虾仁。工程实现还要注意消息边界：Anthropic 的 messages 需要 user/assistant 合法交替，裁剪时应从 user 消息开始，不能只盲目 `history[-N:]` 导致列表以 assistant 开头。若有需要全程保留的偏好或任务背景，应转成摘要或写入长期记忆。

一个类比

就像在一条固定长度的纸带上打字：纸带只有这么长，新内容从右侧加入，左侧最老的内容自动从纸带边缘掉出去。你只能看到纸带上现有的这段，掉出去的字就彻底没了，没有副本。

举个例子

import anthropic

client = anthropic.Anthropic()

WINDOW_SIZE = 10  # 最多保留最近 10 条消息（5 来 5 回）
history = []

def trim_to_user_boundary(msgs: list, size: int) -> list:
    """保留最近消息，并确保窗口从 user 开始，避免 assistant 开头"""
    window = msgs[-size:]
    while window and window[0]["role"] != "user":
        window = window[1:]
    return window

def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})

    # 滑动窗口核心：保留最近 N 条，同时维持 Anthropic 合法消息结构
    if len(history) > WINDOW_SIZE:
        history[:] = trim_to_user_boundary(history, WINDOW_SIZE)

    resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=history,
    )
    reply = resp.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

# 临时窗口版：不修改原始列表，只给本次调用传入裁剪后的窗口
def chat_v2(history: list, user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    window = trim_to_user_boundary(history, WINDOW_SIZE)
    resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=window,
    )
    reply = resp.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

PYTHON 示例