The AI models that win are trained on data you can’t scrape.

23 Bulbs is the world’s first physical data factory. By fusing an ultra-low-cost, mass-producible on-body capture suit with ARRAY — our physics-accurate simulation and rendering engine — we have bridged the gap between physical reality and model-ready intelligence.

We don’t estimate movement; we capture its exact ground-truth mechanics through a global, decentralized operator network. Then, we multiply it. For Generative AI and Embodied Robotics developers, this is the ultimate data moat: a continuous pipeline that transforms a single real-world action into hundreds of millions of unique, physically-validated datasets annually.

100% proprietary  ·  Zero copyright risk  ·  Delivered training-ready to your ML architecture

200M+
datasets / year
0
legal exposure
edge case coverage
F_00120
F_00121
F_00122
F_00123
F_00124
F_00125
F_00126
F_00127
F_00128
Generating·+1dataset / 0.3s·Queue:47,291pending
ARRAY Engine · Active · 200,000,000+ datasets / year

GenAI has a
data famine.
We built
the farm.

The open web has been scraped to the studs, and what remains is legally contested. At the same time, Embodied AI and Robotics developers are hitting a wall — vision-only models are physics-blind, rendering them completely brittle in the unstructured real world. Buying more compute does not help when there is nothing new to learn from.

We solve the raw human data scarcity crisis directly at the source. By deploying our mass-producible on-body suits across a decentralized, global operator network, we capture the unconstrained, ground-truth physics — inertia, torque and real-world friction — that models desperately need to scale.

Our hardware architecture captures reality. Then, our software engine multiplies it. 23 Bulbs manufactures the supply — proprietary, labeled and infinite.

$1T+
Value at risk from data-provenance litigation
90%
Of usable internet text already scraped
10×
Projected data demand vs. supply by 2030

Inside an ARRAY dataset

Every dataset ARRAY generates contains hundreds of frames, each carrying deep metadata that no manual labelling pipeline could produce at this scale.

FRAME GRID · 2,400 frames
Frame 124 / 2,400
METADATA TREE · per frame
"dataset_id": "ARR-20260512", "frame_count": 2400, "capture_fps": 25, "render_fps": 120, "resolution": "3840×2160", "annotations": "object_classes": "person", "hand", "surface" , "geometry": "mesh_3d", "kinematics": "accel_ms2": 3.42, 0.18, 9.79, "angular_vel": 128.4, "torque_nm": 1.18 , "material_props": "fabric", "specular:0.4" , "semantic_tags": "indoor", "motion", "interaction" , "temporal_coherence": true , "delivery": "format": "mp4+json", "endpoint": "/v2/datasets/stream", "copyright_safe": true
DELIVERY SPEC · API
GET /v2/datasets/stream Query params: render_fps 120 format mp4+json batch 50 Response: 200 OK Content-Type: application/x-ndjson X-Frame-Count: 120000 X-Copyright-Safe: true

Built for training pipelines

01
Copyright-safe by design
Every dataset is built from proprietary physical capture — never scraped. No contested provenance, no litigation exposure.
02
Physics & spatial depth
Per-frame inertia, torque and motion vectors alongside depth and 3D geometry — the physical dimensions a model needs to learn the real world.
03
Rich metadata DNA
Object classes, kinetic vectors, semantic tags and confidence — tagged against anatomical ground-truth at 23 Bulbs HQ, not hand-annotated.
04
Massive scale & variation
200M+ datasets a year across lighting, motion, material and scene permutations no manual pipeline can reach.
05
API-native delivery
Stream training-ready datasets straight into your pipeline as mp4 + JSON. No reformatting, no cleanup.
06
On-body capture hardware
A mass-producible, machine-washable capture suit feeds the engine ground-truth physics that simulation alone cannot produce.
0
labelled datasets / year
0
scraped frames used
0
copyright-safe by construction
edge-case permutations
“We stopped negotiating data licences. ARRAY ships datasets our lawyers never have to read.”
Director of ML Infrastructure · frontier vision lab
“The metadata depth is the unlock. Our models converge on physical reasoning faster than on any scraped corpus.”
Principal Research Scientist · robotics foundation model

How it works.

Request. Capture. Multiply. Deliver.

01Request & Dispatch
A partner specifies a target data requirement (e.g. “high-performance running”). We deploy the request to our global, decentralized network of freelance operators equipped with modular on-body capture garments.
02Unconstrained Capture
Operators put on the machine-washable garments and hit record via our mobile app, living their day-to-day lives in the real world. The adaptive sensor architecture captures high-frequency kinetic vectors — inertia, torque and friction at 25fps+ — while concurrent video is recorded (where possible) to supply high-fidelity visual streams without camera occlusion.
03HQ Ingestion & Tagging
Raw data streams are transmitted via mobile edge processing straight to 23 Bulbs HQ. Our engineering pipeline ingests, aligns and tags the exact anatomical ground-truth, mapping the physics of human movement and intent with precise metadata.
04Algorithmic Multiplication
The verified data is fed into ARRAY, our software backbone — fusing an ultra-lightweight renderer with a physics-accurate simulation engine. ARRAY acts as a “Reality Multiplier,” permuting the original physical recording into hundreds of millions of unique, physically-validated training variations.
05Industrial Delivery
The completed datasets are packaged as high-density MP4 + JSON metadata streams. Delivered directly via API to your machine learning architecture — 100% proprietary, copyright-safe, and instantly training-ready.
ARRAY

Train on data nobody else can make.

Request a sample dataset and run it against your pipeline. See the metadata depth for yourself.

Typical sample delivery: 1 business day · NDA available on request