Retrieval API¶

Embedding provider helpers.

class sqldbagent.retrieval.embeddings.HashEmbeddings(*, dimensions=256)[source]¶

Bases: object

Deterministic local embeddings for offline tests and smoke flows.

Parameters:: dimensions (int, default: 256)

__init__(*, dimensions=256)[source]¶

Initialize the hash embeddings backend.

Parameters:: dimensions (int, default: 256) – Number of output dimensions.
Return type:: None

embed_documents(texts)[source]¶

Embed a batch of documents.

Parameters:: texts (list[str]) – Input texts to embed.
Returns:: Deterministic unit vectors.
Return type:: list[list[float]]

embed_query(text)[source]¶

Embed one query.

Parameters:: text (str) – Query text.
Returns:: Deterministic unit vector.
Return type:: list[float]

sqldbagent.retrieval.embeddings.build_embeddings(*, embeddings_settings, llm_settings, artifacts)[source]¶

Build a cached embeddings backend.

Parameters:

embeddings_settings (EmbeddingSettings) – Embedding backend settings.
llm_settings (LLMSettings) – Provider API settings.
artifacts (ArtifactSettings) – Artifact directory settings.

Returns:

LangChain-compatible embeddings backend.

Return type:

Any

Retrieval and vector-index models.

class sqldbagent.retrieval.models.RetrievedDocumentModel(**data)[source]¶

Bases: BaseModel

One retrieved document returned from the vector store.

Variables:

document_id – Stable document identifier.
page_content – Retrieved page content.
metadata – Filterable metadata associated with the document.
score – Optional similarity score.
summary – Short result summary.

Parameters:

data (Any)
document_id (str)
page_content (str)
metadata (dict[str, Any])
score (float | None)
summary (str | None)

document_id: str¶

page_content: str¶

metadata: dict[str, Any]¶

score: float | None¶

summary: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sqldbagent.retrieval.models.RetrievalIndexManifestModel(**data)[source]¶

Bases: BaseModel

Persisted manifest for one vector-indexing pass.

Variables:

datasource_name – Datasource identifier.
schema_name – Indexed schema name.
snapshot_id – Snapshot identifier that was indexed.
collection_name – Target Qdrant collection name.
document_bundle_path – Saved document-bundle path.
document_count – Number of indexed documents.
embedding_provider – Embedding provider used to build vectors.
embedding_model – Embedding model or hash backend name.
created_at – Manifest creation timestamp.
summary – Short index summary.

Parameters:

data (Any)
datasource_name (str)
schema_name (str)
snapshot_id (str)
collection_name (str)
document_bundle_path (str)
document_count (int)
embedding_provider (str)
embedding_model (str)
created_at (datetime)
summary (str | None)

datasource_name: str¶

schema_name: str¶

snapshot_id: str¶

collection_name: str¶

document_bundle_path: str¶

document_count: int¶

embedding_provider: str¶

embedding_model: str¶

created_at: datetime¶

summary: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sqldbagent.retrieval.models.RetrievalResultModel(**data)[source]¶

Bases: BaseModel

Retrieval query result.

Variables:

query – User or agent retrieval query.
datasource_name – Datasource identifier bound to the service.
schema_name – Optional schema filter.
table_name – Optional table filter.
snapshot_id – Optional snapshot filter.
collection_name – Qdrant collection that served the search.
documents – Retrieved documents.
summary – Short retrieval summary.

Parameters:

data (Any)
query (str)
datasource_name (str)
schema_name (str | None)
table_name (str | None)
snapshot_id (str | None)
collection_name (str)
documents (list[RetrievedDocumentModel])
summary (str | None)

query: str¶

datasource_name: str¶

schema_name: str | None¶

table_name: str | None¶

snapshot_id: str | None¶

collection_name: str¶

documents: list[RetrievedDocumentModel]¶

summary: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Snapshot retrieval service backed by Qdrant.

class sqldbagent.retrieval.service.SnapshotRetrievalService(*, datasource_name, snapshotter, document_service, artifacts, embeddings_settings, llm_settings, retrieval_settings, embeddings=None, client=None)[source]¶

Bases: object

Index and retrieve snapshot documents through Qdrant.

Parameters:

datasource_name (str)
snapshotter (SnapshotService)
document_service (SnapshotDocumentService)
artifacts (ArtifactSettings)
embeddings_settings (EmbeddingSettings)
llm_settings (LLMSettings)
retrieval_settings (RetrievalSettings)
embeddings (Any | None, default: None)
client (Any | None, default: None)

__init__(*, datasource_name, snapshotter, document_service, artifacts, embeddings_settings, llm_settings, retrieval_settings, embeddings=None, client=None)[source]¶

Initialize the retrieval service.

Parameters:

datasource_name (str) – Datasource identifier.
snapshotter (SnapshotService) – Snapshot service used to load latest snapshots.
document_service (SnapshotDocumentService) – Service used to export snapshot documents.
artifacts (ArtifactSettings) – Artifact directory settings.
embeddings_settings (EmbeddingSettings) – Embedding backend settings.
llm_settings (LLMSettings) – Provider API settings.
retrieval_settings (RetrievalSettings) – Vectorstore settings.
embeddings (Any | None, default: None) – Optional explicit embeddings backend override.
client (Any | None, default: None) – Optional explicit Qdrant client override.

Return type:

None

index_snapshot_bundle(bundle, *, recreate_collection=False)[source]¶

Index one snapshot bundle into Qdrant.

Parameters:

bundle (SnapshotBundleModel) – Snapshot bundle to index.
recreate_collection (bool, default: False) – Whether to recreate the collection first.

Returns:

Persisted index manifest.

Return type:

RetrievalIndexManifestModel

index_latest_schema_snapshot(schema_name, *, recreate_collection=False)[source]¶

Index the latest saved snapshot for one schema.

Parameters:

schema_name (str) – Schema name to index.
recreate_collection (bool, default: False) – Whether to recreate the collection first.

Returns:

Persisted index manifest.

Return type:

RetrievalIndexManifestModel

retrieve(query, *, schema_name=None, table_name=None, snapshot_id=None, artifact_types=None, limit=None)[source]¶

Retrieve relevant schema context from Qdrant.

Parameters:

query (str) – Retrieval query.
schema_name (str | None, default: None) – Optional schema filter.
table_name (str | None, default: None) – Optional table filter.
snapshot_id (str | None, default: None) – Optional snapshot filter.
artifact_types (list[str] | None, default: None) – Optional artifact-type filters.
limit (int | None, default: None) – Optional result limit override.

Returns:

Retrieval result payload.

Return type:

RetrievalResultModel

static load_manifest(path)[source]¶

Load a saved retrieval manifest.

Parameters:: path (str | Path)
Return type:: RetrievalIndexManifestModel

property manifest_dir: Path¶: Return the retrieval-manifest root directory.

manifest_path(*, schema_name, snapshot_id)[source]¶

Return the saved manifest path for one schema snapshot.

Parameters:

schema_name (str)
snapshot_id (str)

Return type:

Path

load_saved_manifest(*, schema_name, snapshot_id)[source]¶

Load a saved retrieval manifest when one exists.

Parameters:

schema_name (str)
snapshot_id (str)

Return type:

RetrievalIndexManifestModel | None