블로그 | 모하지?

원문: Koog Documentation — prompts/prompt-creation/cache-control.md 이 글은 Koog 공식 문서의 prompts/prompt-creation/cache-control.md 페이지를 한국어로 옮긴 번역본입니다. 문서 구조와 링크 의미를 유지하되, MkDocs 전용 UI 문법은 블로그에서 읽기 좋도록 정리했습니다.

신속한 캐싱 제어

프롬프트 캐싱 제어를 통해 지원되는 LLM 공급자에게 다음 중 일부를 저장하도록 지시할 수 있습니다. 동일한 접두사를 공유하는 후속 요청이 토큰을 재처리하는 대신 캐시에서 제공됩니다. 이를 통해 다중 턴 대화와 같은 반복적인 작업 부하에 대한 대기 시간과 비용이 모두 줄어듭니다. 대규모 시스템 프롬프트 또는 고정 도구 정의.

참고: Prompt caching vs. response caching 프롬프트 캐싱 제어는 공급자 측 기능입니다. 공급자는 프롬프트 접두사를 저장합니다. 응답이 아닙니다. 이는 CachedPromptExecutor과는 다릅니다. 동일한 프롬프트가 네트워크 호출을 완전히 건너뛸 수 있도록 전체 LLM 응답을 로컬에 저장합니다.

Koog는 Anthropic 및 Amazon Bedrock에 대한 신속한 캐싱 제어를 지원합니다.

인류학

Anthropic은 프롬프트 캐싱에 대한 두 가지 보완적인 접근 방식을 지원합니다.

자동 캐싱(요청 수준)

AnthropicParams에서 cacheControl 속성을 설정하고 프롬프트에 전달합니다. Anthropic은 자동으로 캐시 가능한 마지막 블록에 캐시 중단점을 배치합니다. 개별 메시지에 주석을 달 필요 없이 요청하세요. 이는 여러 차례 대화에 권장되는 접근 방식입니다.

코틀린

1// Enable automatic caching with the default 5-minute TTL2val params = AnthropicParams(cacheControl = AnthropicCacheControl.Default)34val prompt = prompt("assistant", params = params) {5    system("You are a helpful assistant with a very long system prompt...")6    user("What can you help me with?")7}89val response = client.execute(prompt, AnthropicModels.Sonnet_4)10println(response)

자바

1// Enable automatic caching with the default 5-minute TTL2AnthropicParams params = new AnthropicParams(3    null, null, null, null, null, null, null, null,4    null, null, null, null, null, null, null,5    AnthropicCacheControl.Default.INSTANCE6);78Prompt prompt = Prompt.builder("assistant")9    .system("You are a helpful assistant with a very long system prompt...")10    .user("What can you help me with?")11    .build()12    .withParams(params);

수동 캐싱(블록 수준)

개별 메시지 또는 도구 정의에 cacheControl 인수를 첨부하여 특정 위치에 캐시 중단점을 지정합니다. 주석이 달린 블록까지의 모든 것 캐싱에 적합합니다.

시스템 메시지

코틀린

1val prompt = prompt("assistant") {2    // Cache the system prompt for 1 hour3    system("You are a knowledgeable assistant...", AnthropicCacheControl.OneHour)4    user("Summarize the latest AI research.")5}67val response = client.execute(prompt, AnthropicModels.Sonnet_4)8println(response)

자바

1Prompt prompt = Prompt.builder("assistant")2    // Cache the system prompt for 1 hour3    .system("You are a knowledgeable assistant...", AnthropicCacheControl.OneHour.INSTANCE)4    .user("Summarize the latest AI research.")5    .build();

사용자 및 어시스턴트 메시지

코틀린

1val prompt = prompt("conversation") {2    system("You are a helpful assistant.")3    // Cache after a large user message (e.g. document content)4    user(listOf(ContentPart.Text("Here is a long document: ...")), AnthropicCacheControl.Default)5    assistant("I have read the document.", AnthropicCacheControl.Default)6    user("Summarize it.")7}89val response = client.execute(prompt, AnthropicModels.Sonnet_4)10println(response)

자바

1Prompt prompt = Prompt.builder("conversation")2    .system("You are a helpful assistant.")3    // Cache after a large user message (e.g. document content)4    .user(List.of(new ContentPart.Text("Here is a long document: ...")), AnthropicCacheControl.Default.INSTANCE)5    .assistant("I have read the document.", AnthropicCacheControl.Default.INSTANCE)6    .user("Summarize it.")7    .build();

도구 정의

여러 요청에 걸쳐 도구 목록이 수정되면 마지막 도구 정의를 캐싱하는 것은 모든 요청을 의미합니다. 도구 스키마는 함께 캐시됩니다.

코틀린

1val searchTool = ToolDescriptor(2    name = "web_search",3    description = "Search the web for information.",4    requiredParameters = listOf(5        ToolParameterDescriptor("query", "Search query", ToolParameterType.String)6    ),7    // Cache all tool definitions up to and including this one8    cacheControl = AnthropicCacheControl.Default9)

자바

1ToolDescriptor searchTool = new ToolDescriptor(2    "web_search",3    "Search the web for information.",4    List.of(5        new ToolParameterDescriptor("query", "Search query", ToolParameterType.String.INSTANCE)6    ),7    Collections.emptyList(),8    // Cache all tool definitions up to and including this one9    AnthropicCacheControl.Default.INSTANCE10);

캐시 TTL 옵션

옵션	TTL	가격 승수
`AnthropicCacheControl.Default`	5분	1.25× 기본 입력 가격
`AnthropicCacheControl.OneHour`	1시간	2× 기본 입력 가격

캐시 쓰기는 일반 입력 토큰보다 높은 요금이 부과되지만 캐시 읽기는 더 저렴합니다. Anthropic prompt caching docs 참조 현재 가격.

캐시 사용량 모니터링

Anthropic은 응답 사용량의 캐시 통계를 보고합니다. 이러한 기능은 다음을 통해 액세스할 수 있습니다. 원시 API 응답이며 추적 또는 로깅 기능을 통해 관찰할 수 있습니다.

필드	의미
`cacheReadInputTokens`	기존 캐시 항목에서 읽은 토큰
`cacheCreationInputTokens`	새 캐시 항목에 기록된 토큰

자동 및 블록 수준 캐싱 결합

두 모드를 동시에 사용할 수 있습니다. 블록 수준 cacheControl 마커는 세분화된 정보를 제공합니다. 중단점 위치를 제어하는 동시에 AnthropicParams의 요청 수준 cacheControl을 제어합니다. 대화의 뒷부분을 자동으로 처리합니다.

코틀린

1// Block-level: pin the system prompt in the 1-hour cache tier2// Automatic: let Anthropic manage breakpoints for the conversation tail3val params = AnthropicParams(cacheControl = AnthropicCacheControl.Default)45val prompt = prompt("combined", params = params) {6    system("You are a helpful assistant...", AnthropicCacheControl.OneHour)7    user("Hello!")8}

자바

1// Block-level: pin the system prompt in the 1-hour cache tier2// Automatic: let Anthropic manage breakpoints for the conversation tail3AnthropicParams params = new AnthropicParams(4    AnthropicCacheControl.Default.INSTANCE5);67Prompt prompt = Prompt.builder("combined")8    .system("You are a helpful assistant...", AnthropicCacheControl.OneHour.INSTANCE)9    .user("Hello!")10    .build()11    .withParams(params);

아마존 기반암

Amazon Bedrock은 Converse API를 통해 블록 수준 캐싱 모델을 사용합니다. 메시지나 도구에 cacheControl이 설정되면 Bedrock은 CachePoint 블록을 삽입합니다. 주석이 달린 요소 바로 뒤에.

참고 Bedrock 클라이언트 자체는 JVM 전용이므로 Bedrock 프롬프트 캐싱은 JVM 전용 기능입니다.

시스템 메시지

코틀린

1val prompt = prompt("assistant") {2    // Cache the system prompt using the default TTL3    system("You are a knowledgeable assistant...", BedrockCacheControl.Default)4    user("What is prompt caching?")5}67val response = client.execute(prompt, BedrockModels.AnthropicClaude4Sonnet)8println(response)

자바

1Prompt prompt = Prompt.builder("assistant")2    // Cache the system prompt using the default TTL3    .system("You are a knowledgeable assistant...", BedrockCacheControl.Default.INSTANCE)4    .user("What is prompt caching?")5    .build();

사용자 및 어시스턴트 메시지

코틀린

1val prompt = prompt("conversation") {2    system("You are a helpful assistant.")3    // Cache after the large context message4    user("Here is the document: ...", BedrockCacheControl.FiveMinutes)5    assistant("I have read the document.", BedrockCacheControl.Default)6    user("Summarize it.")7}89val response = client.execute(prompt, BedrockModels.AnthropicClaude4Sonnet)10println(response)

자바

1Prompt prompt = Prompt.builder("conversation")2    .system("You are a helpful assistant.")3    // Cache after the large context message4    .user("Here is the document: ...", BedrockCacheControl.FiveMinutes.INSTANCE)5    .assistant("I have read the document.", BedrockCacheControl.Default.INSTANCE)6    .user("Summarize it.")7    .build();

도구 정의

코틀린

1val searchTool = ToolDescriptor(2    name = "web_search",3    description = "Search the web for information.",4    requiredParameters = listOf(5        ToolParameterDescriptor("query", "Search query", ToolParameterType.String)6    ),7    // Cache all tool definitions up to and including this one8    cacheControl = BedrockCacheControl.Default9)

자바

1ToolDescriptor searchTool = new ToolDescriptor(2    "web_search",3    "Search the web for information.",4    List.of(5        new ToolParameterDescriptor("query", "Search query", ToolParameterType.String.INSTANCE)6    ),7    Collections.emptyList(),8    // Cache all tool definitions up to and including this one9    BedrockCacheControl.Default.INSTANCE10);

캐시 TTL 옵션

옵션	TTL
`BedrockCacheControl.Default`	공급자 기본값(명시적인 TTL이 전송되지 않음)
`BedrockCacheControl.FiveMinutes`	5분
`BedrockCacheControl.OneHour`	1시간

Amazon Bedrock prompt caching docs 참조 지원되는 모델 및 가격은

캐싱 전략 선택

상황	권장 접근 방식
크고 고정된 시스템 프롬프트를 통한 다중 회전 채팅	시스템의 인류 자동 캐싱 또는 Bedrock 블록 수준
요청 전반에 걸쳐 재사용되는 안정적인 도구 정의	마지막 도구 정의에 대한 블록 수준 `cacheControl`
사용자 컨텍스트로 전달된 긴 문서	사용자 메시지의 블록 수준 `cacheControl`
임의의 다회전 대화(Anthropic)	`AnthropicParams.cacheControl`을 통한 자동 캐싱
1시간 캐시 보존 필요	`AnthropicCacheControl.OneHour` / `BedrockCacheControl.OneHour`