Ollama API
  1. Create a Model
Ollama API
  • Endpoints
  • Conventions
  • Generate a completion
    • Overview
    • Generate request (Streaming)
      POST
    • Request (No streaming)
      POST
    • Request (with suffix)
      POST
    • Request (Structured outputs)
      POST
    • Request (JSON mode)
      POST
    • Request (with images)
      POST
    • Request (Raw Mode)
      POST
    • Request (Reproducible outputs)
      POST
    • Generate request (With options)
      POST
    • Load a model
      POST
    • Unload a model
      POST
  • Generate a chat completion
    • Overview
    • Chat Request (Streaming)
      POST
    • Chat request (No streaming)
      POST
    • Chat request (Structured outputs)
      POST
    • Chat request (With History)
      POST
    • Chat request (with images)
      POST
    • Chat request (Reproducible outputs)
      POST
    • Chat request (with tools)
      POST
    • Load a model
      POST
    • Unload a model
      POST
  • Create a Model
    • Overview
    • Create a new model
      POST
    • Quantize a model
      POST
    • Create a model from GGUF
      POST
    • Create a model from a Safetensors directory
      POST
  • Check if a Blob Exists
    • Overview
  • Push a Blob
    • Overview
  • List Local Models
    • Overview
    • Examples
  • Show Model Information
    • Overview
    • Examples
  • Copy a Model
    • Overview
    • Examples
  • Delete a Model
    • Overview
    • Examples
  • Pull a Model
    • Overview
    • Examples
  • Push a Model
    • Overview
  • Generate Embeddings
    • Overview
    • Examples
    • Request (Multiple input)
  • List Running Models
    • Overview
    • Examples
  • Generate Embedding
    • Overview
    • Examples
  • Version
    • Overview
  1. Create a Model

Quantize a model

POST
http://localhost:11434/api/create
Quantize a non-quantized model.
Request Request Example
Shell
JavaScript
Java
Swift
curl --location --request POST 'http://localhost:11434/api/create' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "llama3.1:quantized",
    "from": "llama3.1:8b-instruct-fp16",
    "quantize": "q4_K_M"
}'
Response Response Example
{"status":"quantizing F16 model to Q4_K_M"}
{"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}
{"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}
{"status":"using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177"}
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
{"status":"creating new layer sha256:455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a"}
{"status":"writing manifest"}
{"status":"success"}

Request

Body Params application/json
model
enum<string> 
required
Allowed value:
llama3.1:quantized
Example:
llama3.1:quantized
from
enum<string> 
required
Allowed value:
llama3.1:8b-instruct-fp16
Example:
llama3.1:8b-instruct-fp16
quantize
enum<string> 
required
Allowed value:
q4_K_M
Example:
q4_K_M
Examples

Responses

🟢200Success
application/json
A stream of JSON objects is returned:
Body
status
string 
required
Previous
Create a new model
Next
Create a model from GGUF
Built with