Skip to content

Using Models

Cortex’s Models API is compatible with OpenAI’s Models endpoint. It is a fork of the OpenAI API used for model management. Additionally, Cortex exposes lower-level operations for managing models like downloading models from a model hub and model loading.

Model Operation

Model Operation allows you to pull, run, and stop models.

Run Model

Terminal window
curl --request POST \
--url http://localhost:39281/v1/models/mistral/start \
--header 'Content-Type: application/json' \
--data '{
"prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant",
"stop": [],
"ngl": 4096,
"ctx_len": 4096,
"cpu_threads": 10,
"n_batch": 2048,
"caching_enabled": true,
"grp_attn_n": 1,
"grp_attn_w": 512,
"mlock": false,
"flash_attn": true,
"cache_type": "f16",
"use_mmap": true,
"engine": "llamacpp"
}'

Stop Model

Terminal window
curl --request POST \
--url http://localhost:39281/models/mistral/stop

Pull Model

Terminal window
curl --request POST \
--url http://localhost:39281/v1/models/mistral/pull

Models Management

Model Management allows you to manage your local models, which can be found in ~users/user_name/cortex/models.

List Models

Terminal window
curl --request GET \
--url http://localhost:39281/v1/models

Get Model

Terminal window
curl --request GET \
--url http://localhost:39281/v1/models/mistral

Delete Model

Terminal window
curl --request DELETE \
--url http://localhost:39281/v1/models/mistral

Update Model

Terminal window
curl --request PATCH \
--url http://localhost:39281/v1/models/mistral \
--header 'Content-Type: application/json' \
--data '{}'