CPU Inference Optimizations
Discover techniques to optimize LLM inference on CPU-only machines for better performance.
Latest news, updates, and insights from the Cortex team
We're excited to announce the release of Cortex 2.0, featuring significant performance improvements and new capabilities.
READ MOREDiscover techniques to optimize LLM inference on CPU-only machines for better performance.
A step-by-step guide to building applications using our API...
Implementing RAG systems for accurate and contextual responses...
Our guide to selecting the right GPU for running various models...
How to implement function calling capabilities with local models...
# Install Cortex
pip install cortexcpp
# Initialize with your first model
cortex pull llama3
# Start serving
cortex serve --model llama3