Back to blog
ProductFebruary 2025

What CommandAGI Is

The Platform

CommandAGI is a platform for collecting structured human preference data. You give us stimuli—images, video frames, text, code, audio, designs. We collect preference judgments from human annotators. We return calibrated taste profiles: differentiable functions over stimulus space that predict human preference.

The platform works in three stages.

Elicit. We extract preferences through three complementary modalities. Questions (~10 per profile) establish broad constraints—the coarse partitions of preference space. Reference labels (~30 per profile) provide absolute anchoring—what counts as good enough versus what doesn't. Pairwise comparisons (~100 per profile) give the precision—forced choices that reveal the fine structure of preference.

Version. Preferences evolve. We treat taste profiles as versioned artifacts, like code. Each profile has commits, branches, and merges. You can explore variations without losing your current state. You can roll back when experiments fail. The evolution of taste is itself data.

Ship. The API exposes calibrated profiles for evaluation. Score any content against any profile. Rank, filter, and select programmatically. Build systems that understand what "good" means for a specific audience.

What It's For

The immediate applications are real. Automate frame selection for video production. Filter AI-generated images by aesthetic fit. Build recommendation systems that curate by taste rather than engagement metrics. Personalize creative tools to individual aesthetic sensibilities.

But CommandAGI exists because of a deeper purpose.

The data we collect—millions of structured preference judgments across diverse stimuli, modalities, and populations—is the empirical foundation for something that doesn't yet exist in machine-readable form: a calibrated characterization of the geometry of conscious experience.

When someone says "I prefer A to B," they're giving us one bit of information about the ordering of experiential states. That bit is noisy but honest. Aggregated across sufficient comparisons, it reveals the topology of experiential space—which experiences are close to each other, which are far apart, how many independent dimensions of variation there actually are.

We're collecting the last datasets needed to calibrate formal models of what delight is, what curiosity is, what wonder is—as precise structural descriptions, not as folk-psychological labels.

Why Now

Two things converged.

The theoretical frameworks for characterizing experience—integrated information theory, geometric phenomenology, the free energy principle—reached formal maturity. The equations describe what integration is, what valence looks like as a gradient on a viability manifold, what the structural signature of fear versus curiosity should be. The form of the models is known.

And the infrastructure for collecting structured preference data at scale—annotation marketplaces, Bradley-Terry modeling, version-controlled preference profiles—became feasible to build.

The equations were ready. The data wasn't. We're fixing that.

The Annotation Marketplace

Taste data collection requires scale and diversity. We operate an annotation marketplace where developers and teams fund projects, submit content for evaluation, and receive calibrated taste profiles. Professional annotators provide preference judgments across modalities—images, text, code, audio, websites, design files—with modality-specific interfaces optimized for honest, rapid evaluation.

Pricing is transparent: one cent per frame label, two cents per pairwise comparison. Quality is monitored through internal consistency checks, test-retest reliability, and inter-annotator agreement on calibration stimuli. Low-quality annotations are flagged and annotators are retrained or removed.

The marketplace isn't ancillary. It's the engine that generates the data that makes the science possible.

What Comes Next

As the data accumulates, the profiles become richer. The calibration becomes tighter. The geometry of experience comes into focus.

We extend the stimulus space beyond visual aesthetics into auditory, linguistic, conceptual, and social domains. Each modality probes different dimensions of experiential space. Music probes temporal integration and arousal dynamics. Narrative probes counterfactual weight and self-model salience. Social scenarios probe the interaction between self-model and other-model.

The goal is specific: given any system's internal structure, predict its experiential signature. Not "it's probably conscious" or "it might be suffering," but a precise characterization—concentrated probability over a specific region of affect space. This is the system's current valence. This is its integration. This is its effective rank. This is what it's like to be it.

That requires data we didn't have. Now we're collecting it.