Profiling API

Generic SQLAlchemy-backed profiling service.

class sqldbagent.profile.service.SQLAlchemyProfilingService(engine, inspector, settings=None)[source]

Bases: object

Profiling service backed by SQLAlchemy queries.

Parameters:
__init__(engine, inspector, settings=None)[source]

Initialize the profiling service.

Parameters:
Return type:

None

profile_table(table_name, schema=None, *, sample_size=5, top_value_limit=5)[source]

Build a normalized table profile.

Parameters:
  • table_name (str) – Table name to profile.

  • schema (str | None, default: None) – Optional schema name.

  • sample_size (int, default: 5) – Number of sample rows to include.

  • top_value_limit (int, default: 5) – Number of top values to include per column.

Returns:

Profile result for the table.

Return type:

TableProfileModel

sample_table(table_name, schema=None, *, limit=5)[source]

Return sample rows from a table.

Parameters:
  • table_name (str) – Table name to sample.

  • schema (str | None, default: None) – Optional schema name.

  • limit (int, default: 5) – Maximum number of rows to return.

Returns:

Sample rows.

Return type:

list[dict[str, object | None]]

get_unique_values(table_name, column_name, schema=None, *, limit=20)[source]

Return distinct values and counts for one column.

Parameters:
  • table_name (str) – Table name to inspect.

  • column_name (str) – Column name whose distinct values should be returned.

  • schema (str | None, default: None) – Optional schema name.

  • limit (int, default: 20) – Maximum number of distinct values to return.

Returns:

Distinct-value distribution for the column.

Return type:

ColumnUniqueValuesModel

Normalized profiling models.

class sqldbagent.core.models.profile.ColumnProfileModel(**data)[source]

Bases: BaseModel

Normalized column profile.

Variables:
  • name – Column name.

  • data_type – Reflected column data type.

  • null_count – Exact null count when available.

  • non_null_count – Exact non-null count when available.

  • null_ratio – Null ratio when row count is available.

  • unique_value_count – Exact unique count when available.

  • unique_ratio – Ratio of unique non-null values to total rows when available.

  • min_value – Best-effort minimum value.

  • max_value – Best-effort maximum value.

  • sample_values – Best-effort sample values for the column.

  • top_values – Most frequent values and counts.

  • summary – Generated short summary.

Parameters:
name: str
data_type: str
null_count: int | None
non_null_count: int | None
null_ratio: float | None
unique_value_count: int | None
unique_ratio: float | None
min_value: object | None
max_value: object | None
sample_values: list[object]
top_values: list[dict[str, object]]
summary: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sqldbagent.core.models.profile.ColumnUniqueValuesModel(**data)[source]

Bases: BaseModel

Normalized unique-values payload for one column.

Variables:
  • database – Optional database name containing the table.

  • schema_name – Optional schema containing the table.

  • table_name – Table name containing the column.

  • column_name – Column name whose values were inspected.

  • row_count – Exact table row count when available.

  • null_count – Exact null count for the column when available.

  • non_null_count – Exact non-null count for the column when available.

  • unique_value_count – Exact number of distinct non-null values.

  • values – Distinct values with their frequencies.

  • truncated – Whether values was limited by the caller-supplied cap.

  • summary – Generated short summary.

Parameters:
  • data (Any)

  • database (str | None)

  • schema_name (str | None)

  • table_name (str)

  • column_name (str)

  • row_count (int | None)

  • null_count (int | None)

  • non_null_count (int | None)

  • unique_value_count (int | None)

  • values (list[dict[str, object]])

  • truncated (bool)

  • summary (str | None)

database: str | None
schema_name: str | None
table_name: str
column_name: str
row_count: int | None
null_count: int | None
non_null_count: int | None
unique_value_count: int | None
values: list[dict[str, object]]
truncated: bool
summary: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sqldbagent.core.models.profile.TableProfileModel(**data)[source]

Bases: BaseModel

Normalized cheap table profile.

Variables:
  • database – Optional database name containing the table.

  • schema_name – Optional schema name containing the table.

  • table_name – Table name.

  • row_count – Exact row count when available.

  • row_count_exact – Whether the row count is exact.

  • storage_bytes – Best-effort storage bytes when available.

  • storage_scope – Scope represented by storage_bytes.

  • storage_source – How storage bytes were obtained.

  • entity_kind – Heuristic entity classification for the table.

  • related_tables – Related tables inferred from foreign keys.

  • relationships – Relationships inferred from foreign keys.

  • relationship_count – Number of inferred relationships.

  • columns – Per-column profile summaries.

  • sample_rows – Sample rows from the table.

  • summary – Generated short summary.

Parameters:
database: str | None
schema_name: str | None
table_name: str
row_count: int | None
row_count_exact: bool
storage_bytes: int | None
storage_scope: str | None
storage_source: str | None
entity_kind: str | None
related_tables: list[str]
relationships: list[ForeignKeyModel]
relationship_count: int
columns: list[ColumnProfileModel]
sample_rows: list[dict[str, object | None]]
summary: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].