GrpcIndex

Obtain a GrpcIndex instance via pinecone.Pinecone.index() with grpc=True, or construct one directly.

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")

# Resolve host automatically by index name
idx = pc.index("my-index", grpc=True)

# — or — construct directly with a host URL
from pinecone.grpc import GrpcIndex
idx = GrpcIndex(host="my-index-abc123.svc.pinecone.io", api_key="your-api-key")

GrpcIndex exposes the same data-plane operations as Index but uses gRPC transport (backed by a Rust extension) and returns PineconeFuture objects from the *_async() methods.

Method groups:

class pinecone.grpc.GrpcIndex(*, host, api_key=None, api_version='2025-10', source_tag=None, secure=True, timeout=20.0, connect_timeout=1.0)[source]

Bases: object

Synchronous gRPC data plane client targeting a specific Pinecone index.

Provides the same interface as Index but routes data-plane operations through a gRPC transport (via the Rust-backed GrpcChannel) instead of HTTP/REST.

Parameters:
  • host (str) – The index-specific data plane host URL.

  • api_key (str | None) – Pinecone API key. Falls back to PINECONE_API_KEY env var.

  • api_version (str) – API version string. Defaults to the current data plane version.

  • source_tag (str | None) – Tag appended to the User-Agent string for request attribution.

  • secure (bool) – Whether to use TLS encryption. Defaults to True.

  • timeout (float) – Request timeout in seconds. Defaults to 20.0.

  • connect_timeout (float) – Connection timeout in seconds. Defaults to 1.0.

Raises:

ValidationError – If no API key can be resolved or the host is invalid.

Examples

from pinecone.grpc import GrpcIndex

idx = GrpcIndex(host="movie-recs-abc123.svc.pinecone.io", api_key="...")
__init__(*, host, api_key=None, api_version='2025-10', source_tag=None, secure=True, timeout=20.0, connect_timeout=1.0)[source]
Parameters:
Return type:

None

property host: str

The data plane host URL for this index.

upsert(*, vectors, namespace='', batch_size=None, max_concurrency=4, show_progress=True, timeout=None)[source]

Upsert a batch of vectors into a namespace.

If a vector with the same ID already exists in the namespace, it is overwritten.

Parameters:
  • vectors (Sequence[Vector | tuple[str, list[float]] | tuple[str, list[float], dict[str, Any]] | dict[str, Any]]) – Sequence of vectors to upsert. Each element can be a Vector instance, a tuple of (id, values) or (id, values, metadata), or a dict with id, values, and optional sparse_values / metadata keys.

  • namespace (str) – Target namespace. Defaults to the default (empty-string) namespace.

  • batch_size (int | None) – If set, splits vectors into batches of this size and submits them in parallel via a ThreadPoolExecutor. None (default) sends all vectors in a single channel call. Must be a positive integer when set.

  • max_concurrency (int) – Number of parallel threads used when batch_size is set. Default 4, range [1, 64]. Ignored when batch_size is None.

  • show_progress (bool) – If True and tqdm is installed, display a progress bar while submitting batches. Ignored when batch_size is None. Defaults to True.

  • timeout (float | None) – Per-call timeout in seconds. Applied per batch when batching. None uses the client-level default.

Returns:

UpsertResponse with the count of vectors upserted.

Raises:
  • TypeError – If a vector element is not a recognized format.

  • ValueError – If a vector element is malformed.

  • PineconeValueError – If batch_size is not a positive integer or max_concurrency is outside [1, 64].

  • PineconeTimeoutError – If the call exceeds timeout or the server returns CANCELLED with a timeout cause.

Return type:

UpsertResponse

Notes

When batch_size is set, batches are submitted in parallel via a ThreadPoolExecutor of max_concurrency workers (default 4, range 1–64). Per-batch retries are handled by the gRPC channel’s own retry policy. Partial failures do not raise — the returned UpsertResponse carries upserted_count, failed_item_count, errors, and failed_items for inspection / retry. Pass response.failed_items back to upsert(...) to retry only the failures.

Examples

from pinecone.grpc import GrpcIndex
from pinecone.models.vectors.vector import Vector

idx = GrpcIndex(host="article-search-abc123.svc.pinecone.io", api_key="...")
response = idx.upsert(
    vectors=[
        Vector(
            id="article-101",
            values=[0.012, -0.087, 0.153, ...],  # 1536-dim
        ),
        ("article-102", [0.045, 0.021, -0.064, ...]),
        {"id": "article-103", "values": [0.091, -0.032, 0.178, ...]},
    ],
    namespace="articles-en",
)
print(response.upserted_count)
query(*, top_k, vector=None, id=None, namespace='', filter=None, include_values=False, include_metadata=False, sparse_vector=None, scan_factor=None, max_candidates=None, timeout=None)[source]

Query a namespace for the nearest neighbors of a vector.

Parameters:
  • top_k (int) – Number of results to return (must be >= 1).

  • vector (list[float] | None) – Dense query vector values.

  • id (str | None) – ID of a stored vector to use as the query.

  • namespace (str) – Namespace to query. Defaults to the default namespace.

  • filter (dict[str, Any] | None) – Metadata filter expression.

  • include_values (bool) – Whether to include vector values in results.

  • include_metadata (bool) – Whether to include metadata in results.

  • sparse_vector (SparseValues | dict[str, Any] | None) – Sparse query vector with indices and values.

  • scan_factor (float | None) – DRN optimization — adjusts how much of the index is scanned. Range 0.5–4.0. Only supported for dedicated read node indexes. None uses server default.

  • max_candidates (int | None) – DRN optimization — caps candidate vectors to rerank. Range 1–100000. Only supported for dedicated read node indexes. None uses server default.

  • timeout (float | None) – Per-call timeout in seconds. None uses the client-level default.

Returns:

QueryResponse with matches, namespace, and usage info.

Raises:
  • ValidationError – If top_k < 1, both vector and id are provided, or none of vector, id, or sparse_vector are provided.

  • PineconeTimeoutError – If the call exceeds timeout or the server returns CANCELLED with a timeout cause.

Return type:

QueryResponse

Examples

response = idx.query(
    top_k=10,
    vector=[0.012, -0.087, 0.153, ...],  # 1536-dim embedding
)
for match in response.matches:
    print(match.id, match.score)
fetch(*, ids, namespace='', timeout=None)[source]

Fetch vectors by their IDs from a namespace.

Parameters:
  • ids (list[str]) – List of vector IDs to fetch (must be non-empty).

  • namespace (str) – Namespace to fetch from. Defaults to the default namespace.

  • timeout (float | None) – Per-call timeout in seconds. None uses the client-level default.

Returns:

FetchResponse with a map of vector IDs to Vector objects, namespace, and usage info.

Raises:
  • ValidationError – If ids is empty.

  • PineconeTimeoutError – If the call exceeds timeout or the server returns CANCELLED with a timeout cause.

Return type:

FetchResponse

Examples

response = idx.fetch(ids=["article-101", "article-102"])
for vid, vec in response.vectors.items():
    print(vid, vec.values)
delete(*, ids=None, delete_all=False, filter=None, namespace='', timeout=None)[source]

Delete vectors from a namespace by ID, filter, or delete-all flag.

Exactly one of ids, delete_all, or filter must be specified.

Parameters:
  • ids (list[str] | None) – List of vector IDs to delete.

  • delete_all (bool) – If True, delete all vectors in the namespace.

  • filter (dict[str, Any] | None) – Metadata filter expression selecting vectors to delete.

  • namespace (str) – Namespace to delete from. Defaults to the default namespace.

  • timeout (float | None) – Per-call timeout in seconds. None uses the client-level default.

Returns:

None

Raises:
  • ValidationError – If zero or more than one deletion mode is specified.

  • PineconeTimeoutError – If the call exceeds timeout or the server returns CANCELLED with a timeout cause.

Return type:

None

Examples

# Delete by IDs
idx.delete(ids=["article-101", "article-102"])

# Delete all vectors in a namespace
idx.delete(delete_all=True, namespace="articles-deprecated")

# Delete by metadata filter
idx.delete(filter={"category": {"$eq": "obsolete"}})
update(*, id=None, values=None, sparse_values=None, set_metadata=None, namespace='', filter=None, dry_run=False, timeout=None)[source]

Update vectors by ID or metadata filter.

Parameters:
  • id (str | None) – ID of the vector to update.

  • values (list[float] | None) – New dense vector values.

  • sparse_values (SparseValues | dict[str, Any] | None) – New sparse vector.

  • set_metadata (dict[str, Any] | None) – Metadata fields to set or overwrite.

  • namespace (str) – Namespace to target. Defaults to the default namespace.

  • filter (dict[str, Any] | None) – Metadata filter expression selecting vectors to update.

  • dry_run (bool) – If True, return the count of records that would be affected without applying changes.

  • timeout (float | None) – Per-call timeout in seconds. None uses the client-level default.

Returns:

UpdateResponse with matched_records count (when available).

Raises:
  • ValidationError – If both or neither of id and filter are provided.

  • PineconeTimeoutError – If the call exceeds timeout or the server returns CANCELLED with a timeout cause.

Return type:

UpdateResponse

Examples

# Update by ID
idx.update(id="article-101", values=[0.012, -0.087, 0.153, ...])

# Bulk-update metadata by filter
idx.update(
    filter={"genre": {"$eq": "drama"}},
    set_metadata={"year": 2020},
)
list_paginated(*, prefix=None, limit=None, pagination_token=None, namespace='', timeout=None)[source]

Fetch a single page of vector IDs from a namespace.

Parameters:
  • prefix (str | None) – Return only IDs starting with this prefix.

  • limit (int | None) – Maximum number of IDs to return in this page.

  • pagination_token (str | None) – Token from a previous response to fetch the next page.

  • namespace (str) – Namespace to list from. Defaults to the default namespace.

  • timeout (float | None) – Per-call timeout in seconds. None uses the client-level default.

Returns:

ListResponse with vector IDs, pagination info, namespace, and usage.

Raises:

PineconeTimeoutError – If the call exceeds timeout or the server returns CANCELLED with a timeout cause.

Return type:

ListResponse

Examples

response = idx.list_paginated(prefix="doc1#", limit=50)
for item in response.vectors:
    print(item.id)
list(*, prefix=None, limit=None, namespace='', timeout=None)[source]

List vector IDs in a namespace, automatically following pagination.

Yields one ListResponse per page.

Parameters:
  • prefix (str | None) – Return only IDs starting with this prefix.

  • limit (int | None) – Maximum number of IDs to return per page.

  • namespace (str) – Namespace to list from. Defaults to the default namespace.

  • timeout (float | None) – Per-call timeout in seconds applied to each page request. None uses the client-level default.

Yields:

ListResponse for each page of results.

Raises:

PineconeTimeoutError – If any page call exceeds timeout or the server returns CANCELLED with a timeout cause.

Return type:

Iterator[ListResponse]

Examples

for page in idx.list(prefix="doc1#"):
    for item in page.vectors:
        print(item.id)
describe_index_stats(*, filter=None, timeout=None)[source]

Return statistics for this index.

Parameters:
  • filter (dict[str, Any] | None) – Metadata filter expression. When provided, only vectors matching the filter are counted.

  • timeout (float | None)

Returns:

DescribeIndexStatsResponse with namespace summaries, dimension, total vector count, and fullness metrics.

Return type:

DescribeIndexStatsResponse

Examples

stats = idx.describe_index_stats()
print(stats.total_vector_count, stats.dimension)

# With filter — only count vectors matching the expression
stats = idx.describe_index_stats(
    filter={"genre": {"$eq": "drama"}}
)
upsert_from_dataframe(df, namespace='', batch_size=500, show_progress=True)[source]

Upsert vectors from a pandas DataFrame using async batching.

Splits the DataFrame into batches of batch_size rows and submits each batch asynchronously via upsert_async(), then aggregates the results.

Parameters:
  • df (pd.DataFrame) – A pandas.DataFrame with at least id and values columns. sparse_values and metadata columns are included when present and non-None.

  • namespace (str) – Target namespace. Defaults to the default namespace.

  • batch_size (int) – Number of rows per upsert batch. Defaults to 500.

  • show_progress (bool) – If True and tqdm is installed, display a progress bar. If tqdm is not installed, silently falls back to no progress bar.

Returns:

UpsertResponse with the total count of vectors upserted across all batches.

Raises:
Return type:

UpsertResponse

Examples

import pandas as pd
from pinecone.grpc import GrpcIndex

idx = GrpcIndex(
    host="article-search-abc123.svc.pinecone.io",
    api_key="your-api-key",
)
df = pd.DataFrame([
    {"id": "article-101", "values": [0.012, -0.087, 0.153]},
    {"id": "article-102", "values": [0.045, 0.021, -0.064]},
])
response = idx.upsert_from_dataframe(df)
response.upserted_count
df = pd.DataFrame([
    {
        "id": "article-101",
        "values": [0.012, -0.087, 0.153],
        "metadata": {"topic": "science", "year": 2024},
    },
    {
        "id": "article-102",
        "values": [0.045, 0.021, -0.064],
        "metadata": {"topic": "technology", "year": 2024},
    },
])
response = idx.upsert_from_dataframe(
    df,
    namespace="articles-en",
    batch_size=100,
)
upsert_async(*, vectors, namespace='', timeout=None)[source]

Submit an upsert operation and return a PineconeFuture.

Same parameters as upsert(), including timeout (float | None) which sets a per-call timeout in seconds.

Returns:

PineconeFuture [UpsertResponse] that resolves to the upsert result.

Parameters:
Return type:

PineconeFuture[UpsertResponse]

Examples

future = index.upsert_async(
    vectors=[("doc-42", [0.012, -0.087, 0.153])],
)
result = future.result()
result.upserted_count  # 1
query_async(*, top_k, vector=None, id=None, namespace='', filter=None, include_values=False, include_metadata=False, sparse_vector=None, scan_factor=None, max_candidates=None, timeout=None)[source]

Submit a query operation and return a PineconeFuture.

Same parameters as query(), including timeout (float | None) which sets a per-call timeout in seconds.

Returns:

PineconeFuture [QueryResponse] that resolves to the query result containing scored matches.

Parameters:
Return type:

PineconeFuture[QueryResponse]

Examples

future = index.query_async(
    vector=[0.012, -0.087, 0.153],
    top_k=5,
)
result = future.result()
result.matches[0].id    # 'doc-42'
result.matches[0].score  # 0.95
fetch_async(*, ids, namespace='', timeout=None)[source]

Submit a fetch operation and return a PineconeFuture.

Same parameters as fetch(), including timeout (float | None) which sets a per-call timeout in seconds.

Returns:

PineconeFuture [FetchResponse] that resolves to the fetched vectors keyed by ID.

Parameters:
Return type:

PineconeFuture[FetchResponse]

Examples

future = index.fetch_async(ids=["doc-42", "doc-43"])
result = future.result()
result.vectors["doc-42"].values  # [0.012, -0.087, 0.153]
delete_async(*, ids=None, delete_all=False, filter=None, namespace='', timeout=None)[source]

Submit a delete operation and return a PineconeFuture.

Same parameters as delete(), including timeout (float | None) which sets a per-call timeout in seconds.

Returns:

PineconeFuture [None] that resolves when the delete operation completes.

Parameters:
Return type:

PineconeFuture[None]

Examples

future = index.delete_async(ids=["doc-42", "doc-43"])
future.result()
future = index.delete_async(delete_all=True, namespace="docs")
future.result()
update_async(*, id=None, values=None, sparse_values=None, set_metadata=None, filter=None, namespace='', dry_run=False, timeout=None)[source]

Submit an update call without blocking; returns a PineconeFuture.

Parameters:
Return type:

PineconeFuture[UpdateResponse]

upsert_records(*, records, namespace, timeout=None)[source]

Upsert records for indexes with integrated inference.

Records are sent as newline-delimited JSON (NDJSON) over REST. Embeddings are generated server-side. This method delegates to the REST endpoint because the Pinecone gRPC API does not expose a records upsert operation.

Parameters:
  • records (list[dict[str, Any]]) – List of record dicts. Each must contain an _id or id field. Additional fields are passed through for server-side embedding.

  • namespace (str) – Target namespace (required). Use "" for the default namespace.

  • timeout (float | None)

Returns:

UpsertRecordsResponse with the count of records submitted.

Raises:
  • PineconeValueError – If namespace is not a string or is empty/whitespace, records is empty, or a record is missing an identifier field.

  • ApiError – If the API returns an error response.

  • PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).

  • PineconeTimeoutError – If the request exceeds the configured timeout.

Return type:

UpsertRecordsResponse

Examples

pc = Pinecone(api_key="YOUR_API_KEY")
idx = pc.index("my-index", grpc=True)
response = idx.upsert_records(
    namespace="articles-en",
    records=[
        {"_id": "article-101", "text": "Vector DBs enable similarity search."},
        {"_id": "article-102", "text": "RAG combines search with LLMs."},
    ],
)
print(response.record_count)
search(*, namespace, top_k, inputs=None, vector=None, id=None, filter=None, fields=None, rerank=None, match_terms=None, timeout=None)[source]

Search records by text, vector, or ID with optional reranking.

Delegates to the REST endpoint because the Pinecone gRPC API does not expose a records search operation for integrated inference indexes.

Note

Use this method for indexes with integrated inference. For classic indexes where you provide your own vectors, use query().

Parameters:
  • namespace (str) – Namespace to search in (required).

  • top_k (int) – Number of results to return (must be >= 1).

  • inputs (SearchInputs | dict[str, Any] | None) – Inputs for server-side embedding (e.g. {"text": "query text"}).

  • vector (list[float] | None) – Dense query vector values.

  • id (str | None) – ID of an existing record to use as the query.

  • filter (dict[str, Any] | None) – Metadata filter expression.

  • fields (list[str] | None) – Field names to include in results. When None, the server returns all available fields.

  • rerank (RerankConfig | dict[str, Any] | None) – Reranking configuration with model (required), rank_fields (required), and optional top_n, parameters, query keys. Use RerankConfig for IDE autocompletion.

  • match_terms (dict[str, Any] | None) – Term-matching constraint for sparse search. Requires keys "strategy" (currently only "all") and "terms" (list of strings). Only supported for sparse indexes using pinecone-sparse-english-v0. None disables term matching.

  • timeout (float | None)

Returns:

SearchRecordsResponse with hits and usage statistics.

Raises:
Return type:

SearchRecordsResponse

Examples

response = idx.search(
    namespace="articles-en",
    top_k=10,
    inputs={"text": "benefits of vector databases for search"},
)
for hit in response.result.hits:
    print(hit.id, hit.score)

Search with reranking:

response = idx.search(
    namespace="articles-en",
    top_k=10,
    inputs={"text": "benefits of vector databases"},
    rerank={
        "model": "bge-reranker-v2-m3",
        "rank_fields": ["text"],
        "top_n": 5,
    },
)
for hit in response.result.hits:
    print(hit.id, hit.score)

Note

Use inline rerank when searching and reranking in a single call. Use pc.inference.rerank() when reranking results from a different source or when you need to rerank without searching.

search_records(*, namespace, top_k, inputs=None, vector=None, id=None, filter=None, fields=None, rerank=None, match_terms=None, timeout=None)[source]

Alias for search().

Prefer calling search() directly — this alias exists for backwards compatibility.

Parameters:
Return type:

SearchRecordsResponse

close()[source]

Close the underlying gRPC channel, REST client, and release resources.

Return type:

None

__enter__()[source]
Return type:

GrpcIndex

__exit__(*args)[source]
Parameters:

args (Any)

Return type:

None

PineconeFuture

*_async() methods on GrpcIndex return a PineconeFuture which is fully compatible with concurrent.futures.as_completed() and concurrent.futures.wait().

class pinecone.grpc.future.PineconeFuture(underlying)[source]

Bases: Future[_T]

Future returned by GrpcIndex.*_async() methods.

Wraps a concurrent.futures.Future and is fully compatible with concurrent.futures.as_completed() and concurrent.futures.wait().

The default result() timeout is 5 seconds. When the timeout elapses, PineconeTimeoutError is raised with the message "deadline exceeded".

Examples

from pinecone.grpc import GrpcIndex
idx = GrpcIndex(host="article-search-abc123.svc.pinecone.io", api_key="your-api-key")
future = idx.upsert_async(vectors=[("article-101", [0.012, -0.087, 0.153, ...])])
result = future.result()  # blocks up to 5 seconds
result.upserted_count
# 1
from concurrent.futures import as_completed
futures = [
    idx.upsert_async(vectors=[("article-101", [0.012, -0.087, 0.153, ...])]),
    idx.upsert_async(vectors=[("article-102", [0.045, 0.021, -0.064, ...])]),
]
for future in as_completed(futures):
    print(future.result().upserted_count)
Parameters:

underlying (Future[_T])

__init__(underlying)[source]

Initializes the future. Should not be called by clients.

Parameters:

underlying (Future[_T])

Return type:

None

add_done_callback(fn)[source]

Attach a callable to be called when the future finishes.

The callable will be called with the future as its only argument.

Parameters:

fn (Callable[[...], Any])

Return type:

None

cancel()[source]

Attempt to cancel the underlying call.

Returns True if the call was successfully cancelled, False if the call has already completed or is running.

Return type:

bool

cancelled()[source]

Return True if the call was successfully cancelled.

Return type:

bool

done()[source]

Return True if the call has completed or was cancelled.

Return type:

bool

exception(timeout=5.0)[source]

Return the exception raised by the call, or None.

Parameters:

timeout (float | None) – Maximum seconds to wait. Defaults to 5.0.

Raises:

PineconeTimeoutError – If timeout seconds elapse.

Return type:

BaseException | None

result(timeout=5.0)[source]

Return the result of the call that the future represents.

Parameters:

timeout (float | None) – Maximum seconds to wait. Defaults to 5.0. Pass None to block indefinitely.

Returns:

The result value set by the underlying future.

Raises:

PineconeTimeoutError – If timeout seconds elapse before the result is available.

Return type:

_T

Examples

future = idx.upsert_async(vectors=[("article-101", [0.012, -0.087, 0.153, ...])])
result = future.result()
result.upserted_count  # 1
future = idx.upsert_async(vectors=large_batch)
result = future.result(timeout=30.0)
result = future.result(timeout=None)
running()[source]

Return True if the call is currently being executed.

Return type:

bool