df12 Productions — Running the clustering pipeline

A Chutoro instance is constructed with ChutoroBuilder, followed by invocation of run with a DataSource implementation.

use chutoro_core::{ChutoroBuilder, DataSource, DataSourceError, ExecutionStrategy};

struct Dummy(Vec<f32>);

impl DataSource for Dummy {
    fn len(&self) -> usize { self.0.len() }
    fn name(&self) -> &str { "dummy" }
    fn distance(&self, i: usize, j: usize) -> Result<f32, DataSourceError> {
        let a = self.0.get(i).ok_or(DataSourceError::OutOfBounds { index: i })?;
        let b = self.0.get(j).ok_or(DataSourceError::OutOfBounds { index: j })?;
        Ok((a - b).abs())
    }
}

let chutoro = ChutoroBuilder::new()
    .with_min_cluster_size(8)
    .with_execution_strategy(ExecutionStrategy::CpuOnly)
    .build()?;
let result = chutoro.run(&Dummy(vec![1.0, 2.0, 4.0, 8.0]))?;
assert_eq!(result.cluster_count(), 1);
# Ok::<(), chutoro_core::ChutoroError>(())

ExecutionStrategy::Auto resolves to the CPU skeleton today. Once a GPU backend ships, the strategy will prefer GPU execution when compiled with the gpu feature.

The walking skeleton partitions input indices into contiguous buckets sized by min_cluster_size. This behaviour suits smoke testing orchestration only; the algorithm will change once the full FISHDBC pipeline lands.