Generating Embeddings¶
Pinecone hosts embedding models so you can generate vectors without managing your own
embedding infrastructure. Call pc.inference.embed and pass your text inputs directly.
Basic usage¶
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
result = pc.inference.embed(
model="multilingual-e5-large",
inputs=["The quick brown fox", "A second piece of text"],
parameters={"input_type": "passage"},
)
for embedding in result:
print(embedding.values[:5]) # first five values
The parameters dict is model-specific. Common keys:
input_type—"query"for search queries,"passage"for documents being indexed.truncate—"END"(default) or"NONE"to raise an error on overlong input.
Discover supported parameters for any model:
info = pc.inference.model.get("multilingual-e5-large")
print(info.supported_parameters)
Response: EmbeddingsList¶
embed returns an EmbeddingsList containing:
.data— list ofDenseEmbeddingorSparseEmbeddingobjects (one per input)..model— model name used..usage.total_tokens— token count consumed.
Iterate to access individual embeddings:
for emb in result:
print(emb.values) # DenseEmbedding: list of floats
For sparse embeddings (e.g. pinecone-sparse-english-v0), access sparse_indices
and sparse_values instead:
result = pc.inference.embed(
model="pinecone-sparse-english-v0",
inputs=["machine learning frameworks"],
)
sparse = result.data[0]
print(sparse.sparse_indices)
print(sparse.sparse_values)
Some models return hybrid (dense + sparse) embeddings as two separate items per input.
Using the EmbedModel enum¶
Use the EmbedModel enum for tab-completion and typo
safety:
from pinecone import Pinecone
from pinecone.client.inference import Inference
pc = Pinecone(api_key="your-api-key")
result = pc.inference.embed(
model=Inference.EmbedModel.Multilingual_E5_Large,
inputs=["search query"],
parameters={"input_type": "query"},
)
Batch size¶
Send multiple inputs in a single call to amortize network overhead. The API enforces a per-call token limit; for large batches, split inputs into chunks and iterate:
texts = [...] # potentially hundreds of documents
batch_size = 96
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i : i + batch_size]
result = pc.inference.embed(model="multilingual-e5-large", inputs=batch)
all_embeddings.extend(result.data)
Storing embeddings in an index¶
Extract raw values and upsert into a standard (non-integrated) index:
index = pc.index("product-search")
vectors = [
(f"doc-{i}", emb.values)
for i, emb in enumerate(result.data)
]
index.upsert(vectors=vectors)
For server-side embedding (no manual embed step), use an integrated index and
upsert_records() instead — see
Integrated Records (Server-Side Embedding).
List available models¶
models = pc.inference.model.list(type="embed")
print(models.names())