Embeddings
Cortex now support an embeddings endpoint that is fully compatible with OpenAI’s one. This tutorial show you how to create embeddings in cortex using the OpenAI python SDK.
Embedding with openai compatible
Start server and run model in detached mode.
cortex run -d llama3.1:8b-gguf-q4-km
Create a directory and a python environment, and start a python or IPython shell.
mkdir test-embeddingscd test-embeddings
python -m venv .venvsource .venv/bin/activatepip install ipython openai
ipython
Import the necessary modules and create a client.
from datetime import datetimefrom openai import OpenAIfrom pydantic import BaseModel
client = OpenAI( base_url="http://localhost:39281/v1", api_key="not-needed")
3. Create embeddings
output_embs = client.embeddings.create( input="Roses are red, violets are blue, Cortex is great, and so is Jan too!", model="llama3.1:8b-gguf-q4-km", # encoding_format="base64")
print(output_embs)
CreateEmbeddingResponse( data=[ Embedding( embedding=[-0.017303412780165672, -0.014513173140585423, ...], index=0, object='embedding' ) ], model='llama3.1:8b-gguf-q4-km', object='list', usage=Usage( prompt_tokens=22, total_tokens=22 ))
Cortex also supports the same input types as OpenAI.
# input as stringresponse = client.embeddings.create(input = "single prompt or article or other", model=MODEL)
# input as array of stringresponse = client.embeddings.create(input = ["list", "of", "prompts"], model=MODEL)
# input as array of tokensresponse = client.embeddings.create(input = [12, 44, 123], model=MODEL)
# input as array of arrays contain tokensresponse = client.embeddings.create(input = [[912,312,54],[12,433,1241]], model=MODEL)