physical data infrastructure · 01

The AI models that win are trained on data you can’t scrape.

23 Bulbs is the world’s first physical data factory. By fusing an ultra-low-cost, mass-producible on-body capture suit with ARRAY — our physics-accurate simulation and rendering engine — we have bridged the gap between physical reality and model-ready intelligence.

We don’t estimate movement; we capture its exact ground-truth mechanics through a global, decentralized operator network. Then, we multiply it. For Generative AI and Embodied Robotics developers, this is the ultimate data moat: a continuous pipeline that transforms a single real-world action into hundreds of millions of unique, physically-validated datasets annually.

100% proprietary · Zero copyright risk · Delivered training-ready to your ML architecture

Request a Data Sample See the capture system →

200M+

datasets / year

legal exposure

∞

edge case coverage

F_00120

F_00121

F_00122

F_00123

F_00124

F_00125

F_00126

F_00127

F_00128

Generating·+1dataset / 0.3s·Queue:47,291pending

ARRAY Engine · Active · 200,000,000+ datasets / year

the data famine · 02

GenAI has a
data famine.
We built
the farm.

The open web has been scraped to the studs, and what remains is legally contested. At the same time, Embodied AI and Robotics developers are hitting a wall — vision-only models are physics-blind, rendering them completely brittle in the unstructured real world. Buying more compute does not help when there is nothing new to learn from.

We solve the raw human data scarcity crisis directly at the source. By deploying our mass-producible on-body suits across a decentralized, global operator network, we capture the unconstrained, ground-truth physics — inertia, torque and real-world friction — that models desperately need to scale.

Our hardware architecture captures reality. Then, our software engine multiplies it. 23 Bulbs manufactures the supply — proprietary, labeled and infinite.

$1T+

Value at risk from data-provenance litigation

90%

Of usable internet text already scraped

10×

Projected data demand vs. supply by 2030

inside the product · 03

Inside an ARRAY dataset

Every dataset ARRAY generates contains hundreds of frames, each carrying deep metadata that no manual labelling pipeline could produce at this scale.

FRAME GRID · 2,400 frames

Frame 124 / 2,400

METADATA TREE · per frame

{ "dataset_id": "ARR-20260512", "frame_count": 2400, "capture_fps": 25, "render_fps": 120, "resolution": "3840×2160", "annotations": { "object_classes": [ "person", "hand", "surface" ], "geometry": "mesh_3d", "kinematics": { "accel_ms2": [3.42, 0.18, 9.79], "angular_vel": 128.4, "torque_nm": 1.18 }, "material_props": [ "fabric", "specular:0.4" ], "semantic_tags": [ "indoor", "motion", "interaction" ], "temporal_coherence": true }, "delivery": { "format": "mp4+json", "endpoint": "/v2/datasets/stream", "copyright_safe": true } }

DELIVERY SPEC · API

GET /v2/datasets/stream Query params: render_fps 120 format mp4+json batch 50 Response: 200 OK Content-Type: application/x-ndjson X-Frame-Count: 120000 X-Copyright-Safe: true

capabilities · 04

Built for training pipelines

Every dataset is built from proprietary physical capture — never scraped. No contested provenance, no litigation exposure.

Physics & spatial depth

Per-frame inertia, torque and motion vectors alongside depth and 3D geometry — the physical dimensions a model needs to learn the real world.

Rich metadata DNA

Object classes, kinetic vectors, semantic tags and confidence — tagged against anatomical ground-truth at 23 Bulbs HQ, not hand-annotated.

Massive scale & variation

200M+ datasets a year across lighting, motion, material and scene permutations no manual pipeline can reach.

API-native delivery

Stream training-ready datasets straight into your pipeline as mp4 + JSON. No reformatting, no cleanup.

On-body capture hardware

A mass-producible, machine-washable capture suit feeds the engine ground-truth physics that simulation alone cannot produce.

the engine, in numbers · 05

labelled datasets / year

scraped frames used

∞

edge-case permutations

“We stopped negotiating data licences. ARRAY ships datasets our lawyers never have to read.”

Director of ML Infrastructure · frontier vision lab

“The metadata depth is the unlock. Our models converge on physical reasoning faster than on any scraped corpus.”

Principal Research Scientist · robotics foundation model

how it works · 06

How it works.

Request. Capture. Multiply. Deliver.

01Request & Dispatch

A partner specifies a target data requirement (e.g. “high-performance running”). We deploy the request to our global, decentralized network of freelance operators equipped with modular on-body capture garments.

02Unconstrained Capture

Operators put on the machine-washable garments and hit record via our mobile app, living their day-to-day lives in the real world. The adaptive sensor architecture captures high-frequency kinetic vectors — inertia, torque and friction at 25fps+ — while concurrent video is recorded (where possible) to supply high-fidelity visual streams without camera occlusion.

03HQ Ingestion & Tagging

Raw data streams are transmitted via mobile edge processing straight to 23 Bulbs HQ. Our engineering pipeline ingests, aligns and tags the exact anatomical ground-truth, mapping the physics of human movement and intent with precise metadata.

04Algorithmic Multiplication

The verified data is fed into ARRAY, our software backbone — fusing an ultra-lightweight renderer with a physics-accurate simulation engine. ARRAY acts as a “Reality Multiplier,” permuting the original physical recording into hundreds of millions of unique, physically-validated training variations.

05Industrial Delivery

The completed datasets are packaged as high-density MP4 + JSON metadata streams. Delivered directly via API to your machine learning architecture — 100% proprietary, copyright-safe, and instantly training-ready.

ARRAY

Train on data nobody else can make.

Request a sample dataset and run it against your pipeline. See the metadata depth for yourself.

Request a Data Sample Read the thesis