Getting Started with Cortex

Installation

Cortex has a Local Installer with all of the required dependencies, so that once you’ve downloaded it, no internet connection is required during the installation process.

Starting the Server

Cortex runs an API server on localhost:39281 by default. The port can be customized in .cortexrc with the apiServerPort parameter.

cortex start

cortex-p <port_number>

cortex --data_folder_path <your_directory>

Engine Management

Cortex supports specialized engines for different multi-modal foundation models: llama.cpp and ONNXRuntime. By default, Cortex installs llama.cpp as its main engine.

For more information, check out Engine Management.

List Available Engines

curl --request GET \
  --url http://127.0.0.1:39281/v1/engines

Install an Engine

curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \
  --request POST \
  --header 'Content-Type: application/json'

Model Management

Pull a Model

You can download models from:

Cortex Built-in Models
Hugging Face (GGUF): cortex pull <author/ModelRepo>

Command Line
API

cortex pull llama3.3

Or for specific models

cortex pull bartowski/Meta-Llama-3.1-8B-Instruct-GGUF

curl --request POST \
  --url http://127.0.0.1:39281/v1/models/pull \
  -H "Content-Type: application/json" \
  --data '{"model": "tinyllama:1b-gguf-q3-km"}'

All model files are stored in the ~/cortex/models folder.

Stop Model Download

curl --request DELETE \
  --url http://127.0.0.1:39281/v1/models/pull \
  --header 'Content-Type: application/json' \
  --data '{"taskId": "tinyllama:tinyllama:1b-gguf-q3-km"}'

List All Models

curl --request GET \
  --url http://127.0.0.1:39281/v1/models

Delete a Model

curl --request DELETE \
  --url http://127.0.0.1:39281/v1/models/tinyllama:1b-gguf-q3-km

Running Models

# This downloads (if needed) and starts the model in one command
cortex run llama3.3

curl --request POST \
  --url http://127.0.0.1:39281/v1/models/start \
  --header 'Content-Type: application/json' \
  --data '{"model": "llama3.1:8b-gguf-q4-km"}'

Create Chat Completion

curl --request POST \
  --url http://localhost:39281/v1/chat/completions \
  -H "Content-Type: application/json" \
  --data '{
    "model": "llama3.1:8b-gguf",
    "messages": [
      {
        "role": "user",
        "content": "Write a Haiku about cats and AI"
      }
    ],
    "stream": false
  }'

System Status

Check the running model and hardware system status (RAM, Engine, VRAM, Uptime).

cortex ps

Stop a Model

Command Line
API

cortex models stop llama3.3

curl --request POST \
  --url http://127.0.0.1:39281/v1/models/stop \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "tinyllama:1b-gguf"
}'

Stopping the Server

Command Line
API

cortex stop

curl --request DELETE \
  --url http://127.0.0.1:39281/processManager/destroy

What’s Next?

Now that Cortex is set up, you can continue to:

Adjust the folder path and configuration using the .cortexrc file
Explore the Cortex’s data folder to understand how data gets stored
Learn about the structure of the model.yaml file in Cortex