Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.Parameters#
prompt
: the prompt to generate a response for
suffix
: the text after the model response
images
: (optional) a list of base64-encoded images (for multimodal models such as llava
)
Advanced parameters (optional):format
: the format to return a response in. Format can be json
or a JSON schema
options
: additional model parameters listed in the documentation for the Modelfile such as temperature
system
: system message to (overrides what is defined in the Modelfile
)
template
: the prompt template to use (overrides what is defined in the Modelfile
)
stream
: if false
the response will be returned as a single response object, rather than a stream of objects
raw
: if true
no formatting will be applied to the prompt. You may choose to use the raw
parameter if you are specifying a full templated prompt in your request to the API
keep_alive
: controls how long the model will stay loaded into memory following the request (default: 5m
)
context
(deprecated): the context parameter returned from a previous request to /generate
, this can be used to keep a short conversational memory
Structured outputs#
Structured outputs are supported by providing a JSON schema in the format
parameter. The model will generate a response that matches the schema. See the structured outputs example below.JSON mode#
Enable JSON mode by setting the format
parameter to json
. This will structure the response as a valid JSON object. See the JSON mode example below.It's important to instruct the model to use JSON in the prompt
. Otherwise, the model may generate large amounts whitespace. Modified at 2025-03-14 07:55:41