Generating Embeddings

Pinecone hosts embedding models so you can generate vectors without managing your own embedding infrastructure. Call pc.inference.embed and pass your text inputs directly.

Basic usage

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")

result = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=["The quick brown fox", "A second piece of text"],
    parameters={"input_type": "passage"},
)

for embedding in result:
    print(embedding.values[:5])   # first five values

The parameters dict is model-specific. Common keys:

  • input_type"query" for search queries, "passage" for documents being indexed.

  • truncate"END" (default) or "NONE" to raise an error on overlong input.

Discover supported parameters for any model:

info = pc.inference.model.get("multilingual-e5-large")
print(info.supported_parameters)

Response: EmbeddingsList

embed returns an EmbeddingsList containing:

  • .data — list of DenseEmbedding or SparseEmbedding objects (one per input).

  • .model — model name used.

  • .usage.total_tokens — token count consumed.

Iterate to access individual embeddings:

for emb in result:
    print(emb.values)       # DenseEmbedding: list of floats

For sparse embeddings (e.g. pinecone-sparse-english-v0), access sparse_indices and sparse_values instead:

result = pc.inference.embed(
    model="pinecone-sparse-english-v0",
    inputs=["machine learning frameworks"],
)
sparse = result.data[0]
print(sparse.sparse_indices)
print(sparse.sparse_values)

Some models return hybrid (dense + sparse) embeddings as two separate items per input.

Using the EmbedModel enum

Use the EmbedModel enum for tab-completion and typo safety:

from pinecone import Pinecone
from pinecone.client.inference import Inference

pc = Pinecone(api_key="your-api-key")

result = pc.inference.embed(
    model=Inference.EmbedModel.Multilingual_E5_Large,
    inputs=["search query"],
    parameters={"input_type": "query"},
)

Batch size

Send multiple inputs in a single call to amortize network overhead. The API enforces a per-call token limit; for large batches, split inputs into chunks and iterate:

texts = [...]   # potentially hundreds of documents

batch_size = 96
all_embeddings = []
for i in range(0, len(texts), batch_size):
    batch = texts[i : i + batch_size]
    result = pc.inference.embed(model="multilingual-e5-large", inputs=batch)
    all_embeddings.extend(result.data)

Storing embeddings in an index

Extract raw values and upsert into a standard (non-integrated) index:

index = pc.index("product-search")

vectors = [
    (f"doc-{i}", emb.values)
    for i, emb in enumerate(result.data)
]
index.upsert(vectors=vectors)

For server-side embedding (no manual embed step), use an integrated index and upsert_records() instead — see Integrated Records (Server-Side Embedding).

List available models

models = pc.inference.model.list(type="embed")
print(models.names())