Implementing data sources

Updated Oct 20, 2025

DataSource abstracts item storage and distance calculations. Implementations must provide three methods:

  • len returns the number of items.
  • name yields a human-readable identifier surfaced in telemetry and errors.
  • distance computes a pairwise distance, returning DataSourceError on failure.

The default distance_batch helper uses distance to fill an output buffer and keeps it unchanged if any pair fails. Override when the backend can compute batches more efficiently.

Empty inputs should be handled by returning DataSourceError::EmptyData or ZeroDimension during ingestion. Chutoro rejects a DataSource with zero items, or one with fewer than min_cluster_size items, before invoking the backend.