Habilis-β: A Fast-Motion and Long-Lasting
On-Device Vision-Language-Action Model

1-hour successful continuous operation of Habilis-β on the dual-bin conveyor packing task

Abstract

We introduce Habilis-β, a fast-motion, long-lasting, and on-device vision-language-action (VLA) model designed for real-world deployment. Current VLA evaluation remains largely confined to single-trial success rates under curated resets, which fails to capture the fast-motion and long-lasting capabilities essential for practical operation. To address this, we introduce the Productivity–Reliability Plane (PRP) evaluating performance through Tasks per Hour (TPH) and Mean Time Between Intervention (MTBI) under a continuous-run protocol that demands both high-speed execution and sustained robustness. Habilis-β achieves high-performance by integrating language-free pre-training on large-scale play data for robust interaction priors with post-training on cyclic demonstrations that capture state drift across consecutive task iterations. The system further employs ESPADA for phase-adaptive motion shaping to accelerate free-space transit, utilizes rectified-flow distillation to enable high-frequency control on edge devices, and incorporates classifier-free guidance (CFG) as a deployment-time knob to dynamically balance instruction adherence and learned interaction priors. In 1-hour continuous-run evaluations, Habilis-β significantly outperforms π0.5 in both simulation and real-world environments. In simulation, Habilis-β improves MTBI to 39.2s and TPH to 572.6 (from 30.5s and 120.5 for π0.5), while on a real-world humanoid logistics workflow, it achieves 137.4s MTBI and 124 TPH, surpassing π0.5's 46.1s MTBI and 19 TPH. Finally, Habilis-β achieves the highest reported performance on the standard RoboTwin 2.0 leaderboard across representative tasks, validating its effectiveness in complex manipulation scenarios.

Rethinking Evaluation

PRP

We introduce a continuous-run protocol, where a robot operates for a fixed wall-clock duration without manual resets. Performance is measured along two axes:

  • Productivity — Tasks per Hour (TPH)
  • Reliability — Mean Time Between Intervention (MTBI)

Together, they form the Productivity–Reliability Plane (PRP). Deployment readiness means moving up and to the right.

System Overview

Habilis-β Model Overview

Simulation Performance

Simulation Performance
Habilis-β
Habilis-β (w/o ESPADA)
π0.5

Real-World Performance

Real-World Performance

Habilis-β with Different CFG Values

Habilis-β w/ CFG 0.8
Habilis-β w/ CFG 1.5
  • Lower guidance (CFG < 1.0): Prioritizes learned interaction priors, yielding smoother motion and improved stability.
  • Higher guidance (CFG > 1.0): Strengthens instruction adherence, increasing task speed and aggressiveness.

Baseline (π0.5) Failure Cases

Over-grasping

Recovery failure

Success then stuck

Timeout

BibTeX

@article{tr2026habilisbeta,
  author  = {Jesoon Kang and Taegeon Park and Jisu An and Soo Min Kimm and Jaejoon Kim and Jinu Pahk and Byungju Kim and Junseok Lee and Namheon Baek and Sungwan Ha and Hojun Baek and Eduardo Ayerve Cruz and Wontae Kim and Junghyeon Choi and Yousuk Lee and Joonmo Han and Sunghyun Cho and Sunghyun Kwon and Soyoung Lee and Jun Ki Lee and Seung-Joon Yi and Byoung-Tak Zhang and Theo Taeyeong Kim},
  title   = {{Habilis-$\beta$: A Fast-Motion and Long-Lasting On-Device Vision-Language-Action Model}},
  year    = {2026},
}