Ollama API
  1. Generate a completion
Ollama API
  • Endpoints
  • Conventions
  • Generate a completion
    • Overview
    • Generate request (Streaming)
      POST
    • Request (No streaming)
      POST
    • Request (with suffix)
      POST
    • Request (Structured outputs)
      POST
    • Request (JSON mode)
      POST
    • Request (with images)
      POST
    • Request (Raw Mode)
      POST
    • Request (Reproducible outputs)
      POST
    • Generate request (With options)
      POST
    • Load a model
      POST
    • Unload a model
      POST
  • Generate a chat completion
    • Overview
    • Chat Request (Streaming)
      POST
    • Chat request (No streaming)
      POST
    • Chat request (Structured outputs)
      POST
    • Chat request (With History)
      POST
    • Chat request (with images)
      POST
    • Chat request (Reproducible outputs)
      POST
    • Chat request (with tools)
      POST
    • Load a model
      POST
    • Unload a model
      POST
  • Create a Model
    • Overview
    • Create a new model
      POST
    • Quantize a model
      POST
    • Create a model from GGUF
      POST
    • Create a model from a Safetensors directory
      POST
  • Check if a Blob Exists
    • Overview
  • Push a Blob
    • Overview
  • List Local Models
    • Overview
    • Examples
  • Show Model Information
    • Overview
    • Examples
  • Copy a Model
    • Overview
    • Examples
  • Delete a Model
    • Overview
    • Examples
  • Pull a Model
    • Overview
    • Examples
  • Push a Model
    • Overview
  • Generate Embeddings
    • Overview
    • Examples
    • Request (Multiple input)
  • List Running Models
    • Overview
    • Examples
  • Generate Embedding
    • Overview
    • Examples
  • Version
    • Overview
  1. Generate a completion

Generate request (With options)

POST
http://localhost:11434/api/generate
If you want to set custom options for the model at runtime rather than in the Modelfile, you can do so with the options parameter. This example sets every available option, but you can set any of them individually and omit the ones you do not want to override.
Request Request Example
Shell
JavaScript
Java
Swift
curl --location --request POST 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "llama2-7b",
    "prompt": "string",
    "stream": false,
    "options": {
        "num_keep": 1024,
        "seed": 42,
        "num_predict": 128,
        "top_k": 40,
        "top_p": 0.9,
        "temperature": 0.8,
        "repeat_penalty": 1.1,
        "stop": [
            "\n",
            "。"
        ]
    }
}'
Response Response Example
{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "The sky is blue because it is the color of the sky.",
  "done": true,
  "context": [1, 2, 3],
  "total_duration": 4935886791,
  "load_duration": 534986708,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 107345000,
  "eval_count": 237,
  "eval_duration": 4289432000
}

Request

Body Params application/json
model
string 
required
Example:
llama2-7b
prompt
string 
required
stream
boolean 
required
Example:
false
options
object 
required
num_keep
integer 
required
Example:
1024
seed
integer 
required
Example:
42
num_predict
integer 
required
Example:
128
top_k
integer 
required
Example:
40
top_p
number 
required
Example:
0.9
temperature
number 
required
Example:
0.8
repeat_penalty
number 
required
Example:
1.1
stop
array[string]
required
Example:
["\n","。"]
Examples

Responses

🟢200Success
application/json
Body
model
string 
required
created_at
string 
required
response
string 
required
done
boolean 
required
context
array[integer]
required
total_duration
integer 
required
load_duration
integer 
required
prompt_eval_count
integer 
required
prompt_eval_duration
integer 
required
eval_count
integer 
required
eval_duration
integer 
required
Previous
Request (Reproducible outputs)
Next
Load a model
Built with