Ollama API
    Ollama API
    • Endpoints
    • Conventions
    • Generate a completion
      • Overview
      • Generate request (Streaming)
        POST
      • Request (No streaming)
        POST
      • Request (with suffix)
        POST
      • Request (Structured outputs)
        POST
      • Request (JSON mode)
        POST
      • Request (with images)
        POST
      • Request (Raw Mode)
        POST
      • Request (Reproducible outputs)
        POST
      • Generate request (With options)
        POST
      • Load a model
        POST
      • Unload a model
        POST
    • Generate a chat completion
      • Overview
      • Chat Request (Streaming)
        POST
      • Chat request (No streaming)
        POST
      • Chat request (Structured outputs)
        POST
      • Chat request (With History)
        POST
      • Chat request (with images)
        POST
      • Chat request (Reproducible outputs)
        POST
      • Chat request (with tools)
        POST
      • Load a model
        POST
      • Unload a model
        POST
    • Create a Model
      • Overview
      • Create a new model
        POST
      • Quantize a model
        POST
      • Create a model from GGUF
        POST
      • Create a model from a Safetensors directory
        POST
    • Check if a Blob Exists
      • Overview
    • Push a Blob
      • Overview
    • List Local Models
      • Overview
      • Examples
    • Show Model Information
      • Overview
      • Examples
    • Copy a Model
      • Overview
      • Examples
    • Delete a Model
      • Overview
      • Examples
    • Pull a Model
      • Overview
      • Examples
    • Push a Model
      • Overview
    • Generate Embeddings
      • Overview
      • Examples
      • Request (Multiple input)
    • List Running Models
      • Overview
      • Examples
    • Generate Embedding
      • Overview
      • Examples
    • Version
      • Overview

    Conventions

    Conventions#

    Model names#

    Model names follow a model:tag format, where model can have an optional namespace such as example/model. Some examples are orca-mini:3b-q4_1 and llama3:70b. The tag is optional and, if not provided, will default to latest. The tag is used to identify a specific version.

    Durations#

    All durations are returned in nanoseconds.

    Streaming responses#

    Certain endpoints stream responses as JSON objects. Streaming can be disabled by providing {"stream": false} for these endpoints.
    Previous
    Endpoints
    Next
    Overview
    Built with