Getting Started with Cortex
Installation
Cortex has a Local Installer with all of the required dependencies, so that once you’ve downloaded it, no internet connection is required during the installation process.
Starting the Server
Cortex runs an API server on localhost:39281
by default. The port can be customized in .cortexrc
with the apiServerPort
parameter.
cortex start
cortex-p <port_number>
cortex --data_folder_path <your_directory>
Engine Management
Cortex supports specialized engines for different multi-modal foundation models: llama.cpp and ONNXRuntime. By default, Cortex installs llama.cpp
as its main engine.
For more information, check out Engine Management.
List Available Engines
curl --request GET \ --url http://127.0.0.1:39281/v1/engines
Install an Engine
curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \ --request POST \ --header 'Content-Type: application/json'
Model Management
Pull a Model
You can download models from:
- Cortex Built-in Models
- Hugging Face (GGUF):
cortex pull <author/ModelRepo>
cortex pull llama3.3
Or for specific models
cortex pull bartowski/Meta-Llama-3.1-8B-Instruct-GGUF
curl --request POST \ --url http://127.0.0.1:39281/v1/models/pull \ -H "Content-Type: application/json" \ --data '{"model": "tinyllama:1b-gguf-q3-km"}'
All model files are stored in the ~/cortex/models
folder.
Stop Model Download
curl --request DELETE \ --url http://127.0.0.1:39281/v1/models/pull \ --header 'Content-Type: application/json' \ --data '{"taskId": "tinyllama:tinyllama:1b-gguf-q3-km"}'
List All Models
curl --request GET \ --url http://127.0.0.1:39281/v1/models
Delete a Model
curl --request DELETE \ --url http://127.0.0.1:39281/v1/models/tinyllama:1b-gguf-q3-km
Running Models
Start a Model
# This downloads (if needed) and starts the model in one commandcortex run llama3.3
curl --request POST \ --url http://127.0.0.1:39281/v1/models/start \ --header 'Content-Type: application/json' \ --data '{"model": "llama3.1:8b-gguf-q4-km"}'
Create Chat Completion
curl --request POST \ --url http://localhost:39281/v1/chat/completions \ -H "Content-Type: application/json" \ --data '{ "model": "llama3.1:8b-gguf", "messages": [ { "role": "user", "content": "Write a Haiku about cats and AI" } ], "stream": false }'
System Status
Check the running model and hardware system status (RAM, Engine, VRAM, Uptime).
cortex ps
Stop a Model
cortex models stop llama3.3
curl --request POST \ --url http://127.0.0.1:39281/v1/models/stop \ --header 'Content-Type: application/json' \ --data '{ "model": "tinyllama:1b-gguf"}'
Stopping the Server
cortex stop
curl --request DELETE \ --url http://127.0.0.1:39281/processManager/destroy
What’s Next?
Now that Cortex is set up, you can continue to:
- Adjust the folder path and configuration using the
.cortexrc
file - Explore the Cortex’s data folder to understand how data gets stored
- Learn about the structure of the
model.yaml
file in Cortex