We present OpenRoboCare, a multi-modal dataset for robot-assisted caregiving, capturing expert occupational therapist demonstrations of Activities of Daily Living (ADLs).
Caregiving tasks involve complex physical human-robot interactions, requiring precise perception under occlusions, safe physical contact, and long-horizon planning. While recent advances in robot learning from demonstrations have shown promise, there is a lack of a large-scale, diverse, and expert-driven dataset that captures real-world caregiving routines.
To address this gap, we collected data from 21 occupational therapists performing 15 ADL tasks on two manikins. The dataset spans five modalities — RGBD video, pose tracking, eye-gaze tracking, task and action annotations, and tactile sensing, providing rich multi-modal insights into caregiver movement, attention, force application, and task execution strategies. We further analyze expert caregiving principles and strategies, offering insights to improve robot efficiency and task feasibility. Additionally, our evaluations demonstrate that OpenRoboCare presents challenges for state-of-the-art robot perception and human activity recognition methods, both critical for developing safe and adaptive assistive robots, highlighting the value of our contribution.
Previous works in the rehabilitation and public health literature have collected survey and interview-based data on caregiving. These efforts focus on the health, social, and financial aspects of caregiving, rather than the physical process of caregiving that we consider here. Works close to ours do not collect data from expert caregivers [6] or lack multimodality [4, 5]. To the best of our knowledge, OpenRoboCare is the first dataset for multi-task, multi-modal, expert caregiving.
Left: setup of sensors and equipment. Center: assistive devices used by caregivers. Right: sequence of tasks performed by each caregiver.
We develop a custom tactile skin to fit the manikins and record physical interactions between the caregiver and the manikin. The sensor design is guided by three key considerations: (1) customizability to accommodate various manikin body shapes and sizes, (2) flexibility to ensure secure attachment to curved surfaces, and (3) durability to withstand pressure exerted by the manikin’s weight.
Throughout data collection, we observe various techniques that OTs use to perform tasks more efficiently while minimizing physical effort. We collaborate with an experienced OT to analyze the observed techniques and distill their underlying principles that can guide robot design for caregiving tasks.
Principle 1: OTs prioritize safety by carefully preparing the care recipient before initiating a task. They ensure that the care recipient’s posture, stability, joint angles, and supporting surfaces are appropriate for task execution.
Principle 2: OTs anticipate and organize their body mechanics to support the entire task sequence, particularly for large-scale movements. They anticipate both the final position and the trajectory of the care recipient’s body and limbs, which influences task execution decisions.
Principle 3: OTs prioritize accuracy and timely completion of tasks to ensure efficiency. Care recipients with severe mobility limitations often have medical conditions, making efficient ADL execution crucial.
Technique 1: Bridge Strategy — bending the care recipient’s knees and applying pressure behind the knees at the top of the calf to momentarily elevate the pelvis.
Technique 2: Segmental Roll — gradually turning the care recipient’s body. The OT bends the care recipient’s opposite-side knee and applies pressure on the bent knee to initiate a progressive rolling motion toward the OT.
Technique 3: Wheelchair Recline During Transfer — reclining the wheelchair to a 45-degree backward tilt before transferring the care recipient helps improve positioning on the chair.
Technique 4: Stabilizing Key Points of Control — The pelvic bone, shoulders, and head serve as the primary points of control. OTs place their hands on key control points — like the scapula and pelvis — to initiate, support, and control movement.
We visualize the top-down view of caregiver head trajectory. Caregiver trajectories for transfer tasks involve more complex movements compared to other tasks. Caregivers take different approachs for tasks like bathing and dressing, with some approaching the manikin from one side and others from both sides. This diversity in strategy highlights the importance of a comprehensive expert demonstration dataset.
We visualize the global workspace of the caregivers' hands relative to the hospital bed and assistive wheelchair to provide insights into the robot workspace needed for caregiving tasks.
Distribution of data across tasks, location, manikin, and task duration.
Distribution of occupational therapist strategies across tasks.
Physical contact distribution across tasks and body regions.
Differences in force magnitude across tasks and during a bathing task.
We run VidChapter-7M on a subset of the RGB videos to demonstrate that OpenRoboCare poses significant challenges for off-the-shelf state-of-the-art video captioning models. Compared to the ground truth action labels annotated by occupational therapists, the captions predicted by VidChapter-7M are noticeably sparser and lack fine-grained detail. Furthermore, the model struggles with domain-specific terminology and context, evident in cases where it mislabels key caregiving actions. For example, when the caregiver aligns the Hoyer lift to the bed, the model inaccurately captions the activity as "positioning the foyer."