Index¶
Obtain an Index instance via pinecone.Pinecone.index().
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
# Resolve host automatically by index name
idx = pc.index("my-index")
# — or — connect directly with a host URL
idx = pc.index(host="my-index-abc123.svc.pinecone.io")
Method groups:
Vectors —
upsert(),upsert_from_dataframe(),upsert_records(),query(),query_namespaces(),fetch(),fetch_by_metadata(),update(),delete(),list(),list_paginated()Stats —
describe_index_stats()Integrated Inference —
search(),search_records()Namespaces —
create_namespace(),describe_namespace(),delete_namespace(),list_namespaces(),list_namespaces_paginated()Bulk Import —
start_import(),describe_import(),cancel_import(),list_imports(),list_imports_paginated()Lifecycle —
close()
- class pinecone.index.Index(*, host, api_key=None, additional_headers=None, timeout=30.0, proxy_url=None, proxy_headers=None, ssl_ca_certs=None, ssl_verify=True, source_tag=None, connection_pool_maxsize=0, **kwargs)[source]¶
Bases:
objectSynchronous data plane client targeting a specific Pinecone index.
Can be constructed directly with a host URL, or via the
Pinecone.index()factory method.- Parameters:
host (str) – The index-specific data plane host URL.
api_key (str | None) – Pinecone API key. Falls back to
PINECONE_API_KEYenv var.additional_headers (dict[str, str] | None) – Extra headers included in every request.
timeout (float) – Request timeout in seconds. Defaults to
30.0.proxy_url (str | None) – HTTP proxy URL for outgoing requests.
ssl_ca_certs (str | None) – Path to a CA certificate bundle for SSL verification.
ssl_verify (bool) – Whether to verify SSL certificates. Defaults to
True.source_tag (str | None) – Tag appended to the User-Agent string for request attribution.
connection_pool_maxsize (int) – Maximum number of connections to keep in the pool.
0(default) uses httpx defaults.pool_threads (int | None) – Tune the thread pool used by the legacy
async_req=Trueexecution model onupsert,query,describe_index_stats, andlist_paginated. Defaults to10. The pool is lazy-constructed on firstasync_req=Truecall and shut down byclose();multiprocessing.poolis not imported until then. For new code, preferAsyncIndexorconcurrent.futures.ThreadPoolExecutor. This kwarg exists for backcompat with pre-rewrite callers.kwargs (Any)
- Raises:
PineconeValueError – If no API key can be resolved or the host is invalid.
Examples
from pinecone import Index idx = Index(host="movie-recs-abc123.svc.pinecone.io", api_key="...")
- __init__(*, host, api_key=None, additional_headers=None, timeout=30.0, proxy_url=None, proxy_headers=None, ssl_ca_certs=None, ssl_verify=True, source_tag=None, connection_pool_maxsize=0, **kwargs)[source]¶
- upsert(*, vectors, namespace='', batch_size=None, show_progress=True, max_concurrency=4, timeout=None)[source]¶
Upsert a batch of vectors into a namespace.
If a vector with the same ID already exists in the namespace, it is overwritten.
- Parameters:
vectors (Sequence[Vector | tuple[str, list[float]] | tuple[str, list[float], dict[str, Any]] | dict[str, Any]]) – Sequence of vectors to upsert. Each element can be a
Vectorinstance, a tuple of(id, values)or(id, values, metadata), or a dict withid,values, and optionalsparse_values/metadatakeys.namespace (str) – Target namespace. Defaults to the default (empty-string) namespace.
batch_size (int | None) – Split vectors into chunks of this size and send one request per chunk. Default
Nonesends a single request (current behaviour). Must be a positive integer if provided.show_progress (bool) – When
Trueandtqdmis installed, display a progress bar across batches. Has no effect whenbatch_sizeisNoneortqdmis not installed. Defaults toTrue.max_concurrency (int) – Thread pool size for concurrent batch requests (range 1–64, default 4). Only used when
batch_sizeis set.timeout (float | None) – Per-request timeout in seconds. Overrides the client-level default for this call only.
- Returns:
UpsertResponsewith the count of vectors upserted. Whenbatch_sizetriggers multiple requests,response_infocarries the aggregate LSN from all successful batches (orNoneif no LSN headers were returned).- Raises:
PineconeTypeError – If a vector element is not a recognized format.
PineconeValueError – If a vector element is malformed.
PineconeValueError – If batch_size is not a positive integer.
PineconeValueError – If max_concurrency is outside [1, 64].
ApiError – If the API returns an error response (e.g. authentication failure or server error).
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout. Pass
timeout=<seconds>to override the client-level default for this call only.
- Return type:
Notes
When
batch_sizeis set, batches are submitted in parallel via aThreadPoolExecutorofmax_concurrencyworkers (default 4, range 1–64). Per-batch HTTP retries are handled by the client’s configuredRetryConfig(connection errors and retryable status codes).Partial failures do not raise. When
batch_sizeis set, per-batch errors are captured on the returnedUpsertResponse(seeresponse.has_errors,response.errors,response.failed_items). To retry only the failures, passresponse.failed_itemsback toupsert(...).Examples
from pinecone import Index, Vector idx = Index(host="article-search-abc123.svc.pinecone.io", api_key="...") response = idx.upsert( vectors=[ Vector( id="article-101", values=[0.012, -0.087, 0.153], # truncated; use your actual dimension ), ("article-102", [0.045, 0.021, -0.064]), # truncated {"id": "article-103", "values": [0.091, -0.032, 0.178]}, # truncated ], namespace="articles-en", ) print(response.upserted_count) # Upsert 1000 vectors in batches of 100 response = idx.upsert( vectors=large_vector_list, batch_size=100, show_progress=True, ) print(response.upserted_count)
See also
upsert_records()— for indexes with integrated inference (text in, server-side embedding).upsert_from_dataframe()— for loading from a pandas DataFrame with automatic batching.start_import()— for bulk loading millions of vectors from cloud storage (S3, GCS).
- upsert_from_dataframe(df, namespace=None, batch_size=500, show_progress=True, timeout=None)[source]¶
Upsert vectors from a pandas DataFrame.
Convenience method that accepts a DataFrame with columns
id,values, and optionallysparse_valuesandmetadata, batches the rows, and upserts them viaupsert().- Parameters:
df (pd.DataFrame) – A
pandas.DataFramewith at leastidandvaluescolumns.sparse_valuesandmetadatacolumns are included when present and non-None.namespace (str | None) – Target namespace. Defaults to the default namespace.
batch_size (int) – Number of rows per upsert batch. Defaults to 500.
show_progress (bool) – If
Trueandtqdmis installed, display a progress bar. Iftqdmis not installed, silently falls back to no progress bar.timeout (float | None)
- Returns:
UpsertResponsewith the total count of vectors upserted across all batches.- Raises:
RuntimeError – If
pandasis not installed.PineconeValueError – If df is not a
pandas.DataFrame.PineconeValueError – If batch_size is not a positive integer.
- Return type:
Examples
# Upsert article embeddings from a DataFrame import pandas as pd from pinecone import Pinecone pc = Pinecone(api_key="your-api-key") index = pc.index("article-search") df = pd.DataFrame([ {"id": "article-101", "values": [0.012, -0.087, 0.153]}, {"id": "article-102", "values": [0.045, 0.021, -0.064]}, ]) response = index.upsert_from_dataframe(df) response.upserted_count # 2 # Upsert with metadata, a custom namespace, and a smaller batch size df = pd.DataFrame([ { "id": "article-101", "values": [0.012, -0.087, 0.153], "metadata": {"topic": "science", "year": 2024}, }, { "id": "article-102", "values": [0.045, 0.021, -0.064], "metadata": {"topic": "technology", "year": 2024}, }, ]) response = index.upsert_from_dataframe( df, namespace="articles-en", batch_size=100, )
See also
upsert()— for upserting vectors directly (accepts optionalbatch_size; no DataFrame dependency).upsert_records()— for indexes with integrated inference (text in, server-side embedding).start_import()— for bulk loading millions of vectors from cloud storage (S3, GCS).
- upsert_records(*, records, namespace, timeout=None)[source]¶
Upsert records for indexes with integrated inference.
Records are sent as newline-delimited JSON (NDJSON). Embeddings are generated server-side.
- Parameters:
records (list[dict[str, Any]]) – List of record dicts. Each must contain an
_idoridfield. Additional fields are passed through for server-side embedding.namespace (str) – Target namespace (required). Unlike
upsert(), namespace has no default because the records API requires an explicit namespace (must be non-empty).timeout (float | None)
- Returns:
UpsertRecordsResponsewith the count of records submitted.- Raises:
PineconeValueError – If namespace is not a string or is empty/whitespace, records is empty, or a record is missing an identifier field.
ApiError – If the API returns an error response.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
response = idx.upsert_records( namespace="articles-en", records=[ {"_id": "article-101", "text": "Vector databases for search."}, {"_id": "article-102", "text": "RAG combines search with LLMs."}, ], ) print(response.record_count)
See also
upsert()— for indexes where you provide your own vectors (no server-side embedding).upsert_from_dataframe()— for loading vectors from a pandas DataFrame with automatic batching.start_import()— for bulk loading millions of vectors from cloud storage (S3, GCS).
- query(*, top_k, vector=None, id=None, namespace='', filter=None, include_values=False, include_metadata=False, sparse_vector=None, scan_factor=None, max_candidates=None, timeout=None)[source]¶
Query a namespace for the nearest neighbors of a vector.
Note
Use this method for indexes where you provide your own vectors. For indexes with integrated inference (
IntegratedSpec), usesearch()which handles embedding server-side.- Parameters:
top_k (int) – Number of results to return (must be >= 1).
id (str | None) – ID of a stored vector to use as the query.
namespace (str) – Namespace to query. Defaults to the default namespace.
filter (dict[str, Any] | None) – Metadata filter expression.
include_values (bool) – Whether to include vector values in results.
include_metadata (bool) – Whether to include metadata in results.
sparse_vector (SparseValues | dict[str, Any] | None) – Sparse query vector with indices and values.
scan_factor (float | None) – DRN optimization — adjusts how much of the index is scanned. Range 0.5–4.0. Only supported for dedicated read node indexes. None uses server default.
max_candidates (int | None) – DRN optimization — caps candidate vectors to rerank. Range 1–100000. Only supported for dedicated read node indexes. None uses server default.
timeout (float | None)
- Returns:
QueryResponsewith matches, namespace, and usage info.- Raises:
PineconeValueError – If top_k < 1, both vector and id are provided, or none of vector, id, or sparse_vector are provided.
ApiError – If the API returns an error response (e.g. authentication failure or server error).
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
response = idx.query( top_k=10, vector=[0.012, -0.087, 0.153], # truncated; use your actual dimension ) for match in response.matches: print(match.id, match.score)
- query_namespaces(*, vector=None, namespaces, metric, top_k=None, filter=None, include_values=False, include_metadata=False, sparse_vector=None, scan_factor=None, max_candidates=None, timeout=None)[source]¶
Query multiple namespaces in parallel and return merged top results.
Fans out individual
query()calls across all given namespaces using a thread pool, then merges results via a heap-based aggregator that returns the overall top-k matches ranked by the specified metric.- Parameters:
vector (list[float] | None) – Dense query vector values. Required for dense and hybrid indexes; omit for sparse-only indexes (use sparse_vector instead).
namespaces (list[str]) – Namespaces to query (must be non-empty). Duplicates are removed while preserving order.
metric (str) – Distance metric —
"cosine","euclidean", or"dotproduct".top_k (int | None) – Maximum number of results to return. Defaults to 10.
filter (dict[str, Any] | None) – Metadata filter expression applied to every namespace.
include_values (bool) – Whether to include vector values in results.
include_metadata (bool) – Whether to include metadata in results.
sparse_vector (SparseValues | dict[str, Any] | None) – Sparse query vector with indices and values. Required for sparse-only indexes when vector is omitted.
scan_factor (float | None) – DRN performance tuning — controls how much of the index is scanned during a query. Higher values scan more data and may improve recall at the cost of latency.
max_candidates (int | None) – DRN performance tuning — maximum number of candidate vectors to consider during the search phase.
timeout (float | None)
- Returns:
QueryNamespacesResultswith the merged top-k matches, total usage, and per-namespace usage.- Raises:
PineconeValueError – If namespaces is empty, or if both vector and sparse_vector are absent/empty.
ValueError – If metric is not a recognized value.
ApiError – If any individual namespace query fails.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
# Dense query results = idx.query_namespaces( vector=[0.012, -0.087, 0.153], # truncated; use your actual dimension namespaces=["articles-en", "articles-fr", "articles-de"], metric="cosine", top_k=10, ) # Sparse-only query (sparse index) results = idx.query_namespaces( sparse_vector={"indices": [0, 1, 2], "values": [0.1, 0.2, 0.3]}, namespaces=["docs-en", "docs-fr"], metric="dotproduct", top_k=10, ) for match in results.matches: print(match.id, match.score)
- fetch(*, ids, namespace='', timeout=None)[source]¶
Fetch vectors by their IDs from a namespace.
- Parameters:
- Returns:
FetchResponsewith a map of vector IDs to Vector objects, namespace, and usage info. IDs that do not exist are omitted from the map rather than raising an error.- Raises:
PineconeValueError – If ids is empty.
ApiError – If the API returns an error response (e.g. authentication failure or server error).
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
response = idx.fetch(ids=["article-101", "article-102"]) for vid, vec in response.vectors.items(): print(vid, vec.values)
- fetch_by_metadata(*, filter, namespace='', limit=None, pagination_token=None, timeout=None)[source]¶
Fetch vectors matching a metadata filter expression.
Returns vectors whose metadata satisfies the given filter, with pagination support. The server returns up to 100 vectors per page when no limit is specified.
- Parameters:
filter (dict[str, Any]) – Metadata filter expression (required).
namespace (str) – Namespace to fetch from. Defaults to the default namespace.
limit (int | None) – Maximum number of vectors to return per page. When
None, the server default (100) is used.pagination_token (str | None) – Token from a previous response to fetch the next page. When
None, fetches the first page.timeout (float | None)
- Returns:
FetchByMetadataResponsewith matched vectors, namespace, usage, and pagination token for the next page (if any).- Raises:
ApiError – If the API returns an error response (e.g. authentication failure or server error).
- Return type:
Examples
response = idx.fetch_by_metadata( filter={"genre": {"$eq": "comedy"}}, namespace="movies", ) for vid, vec in response.vectors.items(): print(vid, vec.values) # Paginate through all results token = response.pagination.next if response.pagination else None while token: response = idx.fetch_by_metadata( filter={"genre": {"$eq": "comedy"}}, namespace="movies", pagination_token=token, ) token = response.pagination.next if response.pagination else None
- delete(*, ids=None, delete_all=False, filter=None, namespace='', timeout=None)[source]¶
Delete vectors from a namespace by ID, filter, or delete-all flag.
Exactly one of
ids,delete_all, orfiltermust be specified. Deleting IDs that do not exist does not raise an error.- Parameters:
- Returns:
None — a successful delete returns no payload.
- Raises:
PineconeValueError – If zero or more than one deletion mode is specified.
ApiError – If the API returns an error response (e.g. authentication failure or server error).
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
None
Examples
# Delete by IDs idx.delete(ids=["article-101", "article-102"]) # Delete all vectors in a namespace idx.delete(delete_all=True, namespace="articles-deprecated") # Delete by metadata filter idx.delete(filter={"category": {"$eq": "obsolete"}})
- update(*, id=None, values=None, sparse_values=None, set_metadata=None, namespace='', filter=None, dry_run=False, timeout=None)[source]¶
Update vectors by ID or metadata filter.
Updates a single vector’s dense values, sparse values, or metadata by identifier, or bulk-updates metadata on all vectors matching a filter.
Exactly one of
idorfiltermust be specified.- Parameters:
id (str | None) – ID of the vector to update.
sparse_values (SparseValues | dict[str, Any] | None) – New sparse vector with
indicesandvalueskeys.set_metadata (dict[str, Any] | None) – Metadata fields to set or overwrite.
namespace (str) – Namespace to target. Defaults to the default namespace.
filter (dict[str, Any] | None) – Metadata filter expression selecting vectors to update.
dry_run (bool) – If True, return the count of records that would be affected without applying changes. Only applies to filter-based updates.
timeout (float | None)
- Returns:
UpdateResponsewith matched_records count (when available).- Raises:
PineconeValueError – If both or neither of id and filter are provided.
ApiError – If the API returns an error response (e.g. authentication failure or server error).
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
# Update by ID # truncated; use your actual dimension idx.update(id="article-101", values=[0.012, -0.087, 0.153]) # Bulk-update metadata by filter idx.update( filter={"genre": {"$eq": "drama"}}, set_metadata={"year": 2020}, )
- describe_index_stats(*, filter=None, timeout=None)[source]¶
Return statistics for this index.
Returns aggregate statistics including total vector count, per-namespace vector counts, dimension, and index fullness.
- Parameters:
- Returns:
DescribeIndexStatsResponsewith namespace summaries, dimension, total vector count, and fullness metrics.- Raises:
ApiError – If the API returns an error response (e.g. authentication failure or server error).
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
stats = idx.describe_index_stats() print(stats.total_vector_count, stats.dimension) # With filter — only count vectors matching the expression stats = idx.describe_index_stats( filter={"genre": {"$eq": "drama"}} )
- search(*, namespace, top_k, inputs=None, vector=None, id=None, filter=None, fields=None, rerank=None, match_terms=None, timeout=None)[source]¶
Search records by text, vector, or ID with optional reranking.
Searches a namespace using integrated inference (text inputs embedded server-side), a raw vector, or an existing record ID as the query.
Note
Use this method for indexes with integrated inference. For classic indexes where you provide your own vectors, use
query().- Parameters:
namespace (str) – Namespace to search in (required).
top_k (int) – Number of results to return (must be >= 1).
inputs (SearchInputs | dict[str, Any] | None) – Inputs for server-side embedding (e.g.
{"text": "query text"}). UseSearchInputsfor typed key validation and IDE autocompletion (e.g.SearchInputs(text="query text")).id (str | None) – ID of an existing record to use as the query.
filter (dict[str, Any] | None) – Metadata filter expression.
fields (list[str] | None) – Field names to include in results. When
None, the server returns all available fields.rerank (RerankConfig | dict[str, Any] | None) – Reranking configuration with
model(required),rank_fields(required), and optionaltop_n,parameters,querykeys. UseRerankConfigfor IDE autocompletion.match_terms (dict[str, Any] | None) – Term-matching constraint for sparse search. Requires keys
"strategy"(currently only"all") and"terms"(list of strings). Only supported for sparse indexes usingpinecone-sparse-english-v0.Nonedisables term matching.timeout (float | None)
- Returns:
SearchRecordsResponsewith hits and usage statistics.- Raises:
PineconeValueError – If
namespaceis not a string,top_k < 1, orrerankis missing required keys.ApiError – If the API returns an error response.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
# Basic search response = idx.search( namespace="articles-en", top_k=10, inputs={"text": "benefits of vector databases for search"}, ) for hit in response.result.hits: print(hit.id, hit.score) # Search with reranking response = idx.search( namespace="articles-en", top_k=10, inputs={"text": "benefits of vector databases"}, rerank={ "model": "bge-reranker-v2-m3", "rank_fields": ["text"], "top_n": 5, }, ) for hit in response.result.hits: print(hit.id, hit.score)
Note
Use inline
rerankwhen searching and reranking in a single call. Usepc.inference.rerank()when reranking results from a different source or when you need to rerank without searching.
- search_records(*, namespace, top_k, inputs=None, vector=None, id=None, filter=None, fields=None, rerank=None, match_terms=None, timeout=None)[source]¶
Alias for
search().Prefer calling
search()directly — this alias exists for backwards compatibility.- Parameters:
- Return type:
- create_namespace(*, name, schema=None)[source]¶
Create a named namespace in the index.
- Parameters:
- Returns:
NamespaceDescriptionwith the namespace name and record count.- Raises:
PineconeValueError – If the name is not a string or is empty/whitespace.
ApiError – If the API returns an error response (e.g. 409 conflict when namespace already exists).
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
ns = idx.create_namespace(name="movies-en") print(ns.name, ns.record_count) ns = idx.create_namespace( name="movies-en", schema={"fields": {"genre": {"filterable": True}}}, )
- describe_namespace(*, name=None, **kwargs)[source]¶
Describe a namespace by name.
- Parameters:
- Returns:
NamespaceDescriptionwith the namespace name, record count, and schema information.- Raises:
PineconeValueError – If the name is not a string or is empty/whitespace.
ApiError – If the API returns an error response.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
ns = idx.describe_namespace(name="movies-en") print(ns.name, ns.record_count)
- delete_namespace(*, name=None, timeout=None, **kwargs)[source]¶
Delete a namespace by name, removing all its vectors.
- Parameters:
- Returns:
None — a successful delete returns no payload.
- Raises:
PineconeValueError – If the name is not a string or is empty/whitespace.
ApiError – If the API returns an error response.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
None
Examples
idx.delete_namespace(name="movies-deprecated")
- list_namespaces_paginated(*, prefix=None, limit=None, pagination_token=None)[source]¶
Fetch a single page of namespace descriptions.
- Parameters:
- Returns:
ListNamespacesResponsewith namespace descriptions, pagination info, and total count.- Raises:
ApiError – If the API returns an error response.
- Return type:
Examples
response = idx.list_namespaces_paginated(prefix="prod-", limit=10) for ns in response.namespaces: print(ns.name, ns.record_count)
- list_namespaces(*, prefix=None, limit=None)[source]¶
List namespaces, automatically following pagination.
Yields one
ListNamespacesResponseper page. The generator automatically follows pagination tokens until all pages have been retrieved.- Parameters:
- Yields:
ListNamespacesResponsefor each page of results.- Return type:
Examples
for page in idx.list_namespaces(prefix="prod-"): for ns in page.namespaces: print(ns.name, ns.record_count)
- list_paginated(*, prefix=None, limit=None, pagination_token=None, namespace='', timeout=None)[source]¶
Fetch a single page of vector IDs from a namespace.
- Parameters:
prefix (str | None) – Return only IDs starting with this prefix.
limit (int | None) – Maximum number of IDs to return in this page.
pagination_token (str | None) – Token from a previous response to fetch the next page.
namespace (str) – Namespace to list from. Defaults to the default namespace.
timeout (float | None)
- Returns:
ListResponsewith vector IDs, pagination info, namespace, and usage.- Raises:
PineconeValueError – If inputs are invalid.
ApiError – If the API returns an error response (e.g. authentication failure or server error).
- Return type:
Examples
response = idx.list_paginated(prefix="doc1#", limit=50) for item in response.vectors: print(item.id)
- list(*, prefix=None, limit=None, namespace='', timeout=None)[source]¶
List vector IDs in a namespace, automatically following pagination.
Yields one
ListResponseper page. The generator automatically follows pagination tokens until all pages have been retrieved.- Parameters:
- Yields:
ListResponsefor each page of results.- Return type:
Examples
for page in idx.list(prefix="doc1#"): for item in page.vectors: print(item.id)
- start_import(uri, *, error_mode='continue', integration_id=None)[source]¶
Start a bulk import operation from an external data source.
Initiates an asynchronous bulk import of vectors from cloud storage into the index. The import runs server-side; use
describe_import()to poll for progress and completion.Note
The import URI must point to a directory of Parquet files in cloud storage (
s3://orgs://). Each Parquet file must follow the Pinecone-required schema. See Pinecone import docs for the required Parquet schema and supported storage formats.- Parameters:
- Returns:
StartImportResponsewith the ID of the created import operation.- Raises:
PineconeValueError – If
error_modeis not"continue"or"abort".ApiError – If the API returns an error response.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
# Start an import and poll until complete import time response = idx.start_import(uri="s3://my-bucket/vectors/") import_id = response.id # Poll until the import finishes import_op = idx.describe_import(import_id) while import_op.status not in ("Completed", "Failed", "Cancelled"): time.sleep(10) import_op = idx.describe_import(import_id) print(f"Status: {import_op.status}, records imported: {import_op.records_imported}") # Abort on first error instead of continuing response = idx.start_import( uri="s3://my-bucket/vectors/", error_mode="abort", )
See also
upsert()— for upserting vectors directly in small batches (single request per call).upsert_records()— for indexes with integrated inference (text in, server-side embedding).upsert_from_dataframe()— for loading vectors from a pandas DataFrame with automatic batching.
- describe_import(id)[source]¶
Describe a bulk import operation by ID.
- Parameters:
id (str | int) – Import operation ID. Integers are converted to strings silently.
- Returns:
ImportModelwith the import operation details.- Raises:
PineconeValueError – If the ID is empty or exceeds 1000 characters.
ApiError – If the API returns an error response.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
Examples
import_op = idx.describe_import("import-123") print(import_op.status, import_op.percent_complete)
- cancel_import(id)[source]¶
Cancel a bulk import operation by ID.
- Parameters:
id (str | int) – Import operation ID. Integers are converted to strings silently.
- Returns:
None — a successful cancellation returns no payload.
- Raises:
PineconeValueError – If the ID is empty or exceeds 1000 characters.
ApiError – If the API returns an error response.
PineconeConnectionError – If a network-level connection fails (DNS, refused, transport error).
PineconeTimeoutError – If the request exceeds the configured timeout.
- Return type:
None
Examples
idx.cancel_import("import-123")
- list_imports(*, limit=None, pagination_token=None)[source]¶
List bulk import operations, automatically following pagination.
Yields individual
ImportModelobjects, fetching additional pages transparently until all results have been returned.- Parameters:
- Yields:
ImportModelfor each import operation.- Raises:
ApiError – If the API returns an error response.
- Return type:
Examples
for imp in idx.list_imports(): print(imp.id, imp.status)
- list_imports_paginated(*, limit=None, pagination_token=None)[source]¶
Fetch a single page of bulk import operations.
Returns an
ImportListfor one page. The caller is responsible for managing the pagination token.- Parameters:
- Returns:
ImportListwith the import operations for the requested page.- Raises:
ApiError – If the API returns an error response.
- Return type:
Examples
page = idx.list_imports_paginated(limit=10) for imp in page: print(imp.id, imp.status)