API Reference

Two endpoints for discovering embedding models and generating vectors. Both require a Bearer token.

GET /v1/models

POST/v1/embeddings

Authentication

Both endpoints require an Authorization header with a Bearer token. Generate keys from the dashboard.

Authorization: Bearer $API_KEY

GET

/v1/models

Returns all available embedding models with their capabilities and limits.

curl https://api.vectors.space/v1/models \
  -H "Authorization: Bearer $API_KEY"

Response — model object

idstring

Model identifier used in embedding requests.

typestring

Always "embedding" for embedding models.

providerstring

Inference provider (e.g. llama).

embedding_dimnumber

Default output vector dimension.

max_input_tokensnumber

Maximum tokens the model accepts per input.

max_batch_inputsnumber

Maximum number of inputs per request.

max_batch_tokensnumber

Maximum total tokens across the entire batch.

POST

/v1/embeddings

Generate embeddings for one or more texts. Pass a string or an array of strings to input.

curl -X POST https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": "text to embed",
    "content_type": "text",
    "output_dimension": 768,
    "strategy": {
      "type": "fail",
      "max_tokens": 2048
    }
  }'

Request body

modelstringrequired

Model ID. Use GET /v1/models to list available IDs.

providerstring

Inference provider (e.g. llama).

inputstring | string[]required

Text or array of texts to embed.

strategyobject

Overflow strategy. Defaults to fail with the model's max_input_tokens. See Strategies below.

content_typestring

Optional. Adjusts token estimation multipliers for more accurate counting.

output_dimensionnumber

Optional. Reduces the output vector to this size. Also accepted as output_dim.

Response

data[].embeddingnumber[]

The embedding vector.

data[].indexnumber

Position of this result relative to the request input.

providerstring

Inference provider used.

output_dimensionnumber

Resolved final vector dimension used.

strategyobject

Normalized strategy with all resolved fields.

content_typestring

Echoed from the request.

usage.prompt_tokensnumber

Actual prompt tokens billed.

usage.total_tokensnumber

Total tokens billed.

usage.estimated_prompt_tokensnumber

Estimated token count used for strategy decisions.

Strategies

The strategy object controls what happens when an input exceeds max_tokens. If max_tokens is omitted, the model's max_input_tokens is used and must be <= model max_input_tokens.

fail

Returns an error if any input exceeds max_tokens. Use this for strict pipelines where oversized inputs should never be silently altered.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": "very long text...",
    "strategy": { "type": "fail", "max_tokens": 2048 }
  }'

truncate

Trims the input to fit within max_tokens. Accepts a string or array. Each response item includes truncation metadata.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": ["first long input", "second long input"],
    "strategy": { "type": "truncate", "max_tokens": 2048 }
  }'

Additional response fields

truncatedboolean

true if the input was trimmed to fit.

original_charsnumber

Character count of the original input.

used_charsnumber

Character count of the portion that was embedded.

chunk

Splits each input into overlapping chunks. chunk_overlap controls overlap between chunks (token-estimate based). Use pooling to control the output shape.

chunk strategy fields

max_tokensnumber

Max tokens per chunk. Defaults to model max_input_tokens.

chunk_overlapnumber

Token overlap between consecutive chunks.

pooling"none" | "mean"

none (default): one embedding per chunk. mean: one embedding per input, averaged across all its chunks.

pooling: none

Returns one embedding per chunk with source-mapping fields.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": ["first input", "second input"],
    "strategy": {
      "type": "chunk",
      "max_tokens": 1024,
      "chunk_overlap": 128,
      "pooling": "none"
    }
  }'

Per-chunk response fields

input_indexnumber

Index of the source input in the request array.

chunk_indexnumber

Position of this chunk within its source input.

chunk_startnumber

Start rune index in the original input string.

chunk_endnumber

End rune index in the original input string.

Output order follows input order then chunk order. Offsets are rune indexes in the original string.

pooling: mean

Returns one embedding per input by averaging all chunk vectors. Each item includes a chunks array with per-chunk metadata.

curl -sS https://api.vectors.space/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma-300m",
    "provider": "llama",
    "input": ["first input", "second input"],
    "strategy": {
      "type": "chunk",
      "max_tokens": 1024,
      "chunk_overlap": 128,
      "pooling": "mean"
    }
  }'

Pooled response fields

input_indexnumber

Index of the source input in the request array.

chunk_countnumber

Total number of chunks the input was split into.

chunks[].chunk_indexnumber

Position of this chunk within its source input.

chunks[].chunk_startnumber

Start rune index in the original input string.

chunks[].chunk_endnumber

End rune index in the original input string.

chunks[].estimated_tokensnumber

Estimated token count for this chunk.