Ollama API
  1. Generate a chat completion
Ollama API
  • Endpoints
  • Conventions
  • Generate a completion
    • Overview
    • Generate request (Streaming)
      POST
    • Request (No streaming)
      POST
    • Request (with suffix)
      POST
    • Request (Structured outputs)
      POST
    • Request (JSON mode)
      POST
    • Request (with images)
      POST
    • Request (Raw Mode)
      POST
    • Request (Reproducible outputs)
      POST
    • Generate request (With options)
      POST
    • Load a model
      POST
    • Unload a model
      POST
  • Generate a chat completion
    • Overview
    • Chat Request (Streaming)
      POST
    • Chat request (No streaming)
      POST
    • Chat request (Structured outputs)
      POST
    • Chat request (With History)
      POST
    • Chat request (with images)
      POST
    • Chat request (Reproducible outputs)
      POST
    • Chat request (with tools)
      POST
    • Load a model
      POST
    • Unload a model
      POST
  • Create a Model
    • Overview
    • Create a new model
      POST
    • Quantize a model
      POST
    • Create a model from GGUF
      POST
    • Create a model from a Safetensors directory
      POST
  • Check if a Blob Exists
    • Overview
  • Push a Blob
    • Overview
  • List Local Models
    • Overview
    • Examples
  • Show Model Information
    • Overview
    • Examples
  • Copy a Model
    • Overview
    • Examples
  • Delete a Model
    • Overview
    • Examples
  • Pull a Model
    • Overview
    • Examples
  • Push a Model
    • Overview
  • Generate Embeddings
    • Overview
    • Examples
    • Request (Multiple input)
  • List Running Models
    • Overview
    • Examples
  • Generate Embedding
    • Overview
    • Examples
  • Version
    • Overview
  1. Generate a chat completion

Overview

POST /api/chat
Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using "stream": false. The final response object will include statistics and additional data from the request.

Parameters#

model: (required) the model name
messages: the messages of the chat, this can be used to keep a chat memory
tools: list of tools in JSON for the model to use if supported
The message object has the following fields:
role: the role of the message, either system, user, assistant, or tool
content: the content of the message
images (optional): a list of images to include in the message (for multimodal models such as llava)
tool_calls (optional): a list of tools in JSON that the model wants to use
Advanced parameters (optional):
format: the format to return a response in. Format can be json or a JSON schema.
options: additional model parameters listed in the documentation for the Modelfile such as temperature
stream: if false the response will be returned as a single response object, rather than a stream of objects
keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)

Structured outputs#

Structured outputs are supported by providing a JSON schema in the format parameter. The model will generate a response that matches the schema. See the Chat request (Structured outputs) example below.
Previous
Unload a model
Next
Chat Request (Streaming)
Built with