API Reference

Two endpoints for discovering embedding models and generating vectors. Both require a Bearer token.

GET /v1/models
POST/v1/embeddings

Authentication

Both endpoints require an Authorization header with a Bearer token. Generate keys from the dashboard.

Authorization: Bearer $API_KEY
GET

/v1/models

Returns all available embedding models with their capabilities and limits.

curl https://api.vectors.space/v1/models \
  -H "Authorization: Bearer $API_KEY"
Response — model object
idstring
Model identifier used in embedding requests.
typestring
Always "embedding" for embedding models.
providerstring
Inference provider (e.g. llama).
embedding_dimnumber
Default output vector dimension.
max_input_tokensnumber
Maximum tokens the model accepts per input.
max_batch_inputsnumber
Maximum number of inputs per request.
max_batch_tokensnumber
Maximum total tokens across the entire batch.
POST

/v1/embeddings

Generate embeddings for one or more texts. Pass a string or an array of strings to input.

curl -X POST https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": "text to embed",
    "content_type": "text",
    "output_dimension": 768,
    "strategy": {
      "type": "fail",
      "max_tokens": 2048
    }
  }'
Request body
modelstringrequired
Model ID. Use GET /v1/models to list available IDs.
providerstring
Inference provider (e.g. llama).
inputstring | string[]required
Text or array of texts to embed.
strategyobject
Overflow strategy. Defaults to fail with the model's max_input_tokens. See Strategies below.
content_typestring
Optional. Adjusts token estimation multipliers for more accurate counting.
output_dimensionnumber
Optional. Reduces the output vector to this size. Also accepted as output_dim.
Response
data[].embeddingnumber[]
The embedding vector.
data[].indexnumber
Position of this result relative to the request input.
providerstring
Inference provider used.
output_dimensionnumber
Resolved final vector dimension used.
strategyobject
Normalized strategy with all resolved fields.
content_typestring
Echoed from the request.
usage.prompt_tokensnumber
Actual prompt tokens billed.
usage.total_tokensnumber
Total tokens billed.
usage.estimated_prompt_tokensnumber
Estimated token count used for strategy decisions.

Strategies

The strategy object controls what happens when an input exceeds max_tokens. If max_tokens is omitted, the model's max_input_tokens is used and must be <= model max_input_tokens.

fail

Returns an error if any input exceeds max_tokens. Use this for strict pipelines where oversized inputs should never be silently altered.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": "very long text...",
    "strategy": { "type": "fail", "max_tokens": 2048 }
  }'

truncate

Trims the input to fit within max_tokens. Accepts a string or array. Each response item includes truncation metadata.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": ["first long input", "second long input"],
    "strategy": { "type": "truncate", "max_tokens": 2048 }
  }'
Additional response fields
truncatedboolean
true if the input was trimmed to fit.
original_charsnumber
Character count of the original input.
used_charsnumber
Character count of the portion that was embedded.

chunk

Splits each input into overlapping chunks. chunk_overlap controls overlap between chunks (token-estimate based). Use pooling to control the output shape.

chunk strategy fields
max_tokensnumber
Max tokens per chunk. Defaults to model max_input_tokens.
chunk_overlapnumber
Token overlap between consecutive chunks.
pooling"none" | "mean"
none (default): one embedding per chunk. mean: one embedding per input, averaged across all its chunks.
pooling: none

Returns one embedding per chunk with source-mapping fields.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": ["first input", "second input"],
    "strategy": {
      "type": "chunk",
      "max_tokens": 1024,
      "chunk_overlap": 128,
      "pooling": "none"
    }
  }'
Per-chunk response fields
input_indexnumber
Index of the source input in the request array.
chunk_indexnumber
Position of this chunk within its source input.
chunk_startnumber
Start rune index in the original input string.
chunk_endnumber
End rune index in the original input string.

Output order follows input order then chunk order. Offsets are rune indexes in the original string.

pooling: mean

Returns one embedding per input by averaging all chunk vectors. Each item includes a chunks array with per-chunk metadata.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": ["first input", "second input"],
    "strategy": {
      "type": "chunk",
      "max_tokens": 1024,
      "chunk_overlap": 128,
      "pooling": "mean"
    }
  }'
Pooled response fields
input_indexnumber
Index of the source input in the request array.
chunk_countnumber
Total number of chunks the input was split into.
chunks[].chunk_indexnumber
Position of this chunk within its source input.
chunks[].chunk_startnumber
Start rune index in the original input string.
chunks[].chunk_endnumber
End rune index in the original input string.
chunks[].estimated_tokensnumber
Estimated token count for this chunk.