Retrieval API

Embedding provider helpers.

class sqldbagent.retrieval.embeddings.HashEmbeddings(*, dimensions=256)[source]

Bases: object

Deterministic local embeddings for offline tests and smoke flows.

Parameters:

dimensions (int, default: 256)

__init__(*, dimensions=256)[source]

Initialize the hash embeddings backend.

Parameters:

dimensions (int, default: 256) – Number of output dimensions.

Return type:

None

embed_documents(texts)[source]

Embed a batch of documents.

Parameters:

texts (list[str]) – Input texts to embed.

Returns:

Deterministic unit vectors.

Return type:

list[list[float]]

embed_query(text)[source]

Embed one query.

Parameters:

text (str) – Query text.

Returns:

Deterministic unit vector.

Return type:

list[float]

sqldbagent.retrieval.embeddings.build_embeddings(*, embeddings_settings, llm_settings, artifacts)[source]

Build a cached embeddings backend.

Parameters:
Returns:

LangChain-compatible embeddings backend.

Return type:

Any

Retrieval and vector-index models.

class sqldbagent.retrieval.models.RetrievedDocumentModel(**data)[source]

Bases: BaseModel

One retrieved document returned from the vector store.

Variables:
  • document_id – Stable document identifier.

  • page_content – Retrieved page content.

  • metadata – Filterable metadata associated with the document.

  • score – Optional similarity score.

  • summary – Short result summary.

Parameters:
document_id: str
page_content: str
metadata: dict[str, Any]
score: float | None
summary: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sqldbagent.retrieval.models.RetrievalIndexManifestModel(**data)[source]

Bases: BaseModel

Persisted manifest for one vector-indexing pass.

Variables:
  • datasource_name – Datasource identifier.

  • schema_name – Indexed schema name.

  • snapshot_id – Snapshot identifier that was indexed.

  • collection_name – Target Qdrant collection name.

  • document_bundle_path – Saved document-bundle path.

  • document_count – Number of indexed documents.

  • embedding_provider – Embedding provider used to build vectors.

  • embedding_model – Embedding model or hash backend name.

  • created_at – Manifest creation timestamp.

  • summary – Short index summary.

Parameters:
  • data (Any)

  • datasource_name (str)

  • schema_name (str)

  • snapshot_id (str)

  • collection_name (str)

  • document_bundle_path (str)

  • document_count (int)

  • embedding_provider (str)

  • embedding_model (str)

  • created_at (datetime)

  • summary (str | None)

datasource_name: str
schema_name: str
snapshot_id: str
collection_name: str
document_bundle_path: str
document_count: int
embedding_provider: str
embedding_model: str
created_at: datetime
summary: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sqldbagent.retrieval.models.RetrievalResultModel(**data)[source]

Bases: BaseModel

Retrieval query result.

Variables:
  • query – User or agent retrieval query.

  • datasource_name – Datasource identifier bound to the service.

  • schema_name – Optional schema filter.

  • table_name – Optional table filter.

  • snapshot_id – Optional snapshot filter.

  • collection_name – Qdrant collection that served the search.

  • documents – Retrieved documents.

  • summary – Short retrieval summary.

Parameters:
query: str
datasource_name: str
schema_name: str | None
table_name: str | None
snapshot_id: str | None
collection_name: str
documents: list[RetrievedDocumentModel]
summary: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Snapshot retrieval service backed by Qdrant.

class sqldbagent.retrieval.service.SnapshotRetrievalService(*, datasource_name, snapshotter, document_service, artifacts, embeddings_settings, llm_settings, retrieval_settings, embeddings=None, client=None)[source]

Bases: object

Index and retrieve snapshot documents through Qdrant.

Parameters:
__init__(*, datasource_name, snapshotter, document_service, artifacts, embeddings_settings, llm_settings, retrieval_settings, embeddings=None, client=None)[source]

Initialize the retrieval service.

Parameters:
  • datasource_name (str) – Datasource identifier.

  • snapshotter (SnapshotService) – Snapshot service used to load latest snapshots.

  • document_service (SnapshotDocumentService) – Service used to export snapshot documents.

  • artifacts (ArtifactSettings) – Artifact directory settings.

  • embeddings_settings (EmbeddingSettings) – Embedding backend settings.

  • llm_settings (LLMSettings) – Provider API settings.

  • retrieval_settings (RetrievalSettings) – Vectorstore settings.

  • embeddings (Any | None, default: None) – Optional explicit embeddings backend override.

  • client (Any | None, default: None) – Optional explicit Qdrant client override.

Return type:

None

index_snapshot_bundle(bundle, *, recreate_collection=False)[source]

Index one snapshot bundle into Qdrant.

Parameters:
  • bundle (SnapshotBundleModel) – Snapshot bundle to index.

  • recreate_collection (bool, default: False) – Whether to recreate the collection first.

Returns:

Persisted index manifest.

Return type:

RetrievalIndexManifestModel

index_latest_schema_snapshot(schema_name, *, recreate_collection=False)[source]

Index the latest saved snapshot for one schema.

Parameters:
  • schema_name (str) – Schema name to index.

  • recreate_collection (bool, default: False) – Whether to recreate the collection first.

Returns:

Persisted index manifest.

Return type:

RetrievalIndexManifestModel

retrieve(query, *, schema_name=None, table_name=None, snapshot_id=None, artifact_types=None, limit=None)[source]

Retrieve relevant schema context from Qdrant.

Parameters:
  • query (str) – Retrieval query.

  • schema_name (str | None, default: None) – Optional schema filter.

  • table_name (str | None, default: None) – Optional table filter.

  • snapshot_id (str | None, default: None) – Optional snapshot filter.

  • artifact_types (list[str] | None, default: None) – Optional artifact-type filters.

  • limit (int | None, default: None) – Optional result limit override.

Returns:

Retrieval result payload.

Return type:

RetrievalResultModel

static load_manifest(path)[source]

Load a saved retrieval manifest.

Parameters:

path (str | Path)

Return type:

RetrievalIndexManifestModel

property manifest_dir: Path

Return the retrieval-manifest root directory.

manifest_path(*, schema_name, snapshot_id)[source]

Return the saved manifest path for one schema snapshot.

Parameters:
  • schema_name (str)

  • snapshot_id (str)

Return type:

Path

load_saved_manifest(*, schema_name, snapshot_id)[source]

Load a saved retrieval manifest when one exists.

Parameters:
  • schema_name (str)

  • snapshot_id (str)

Return type:

RetrievalIndexManifestModel | None