OpenRoboCare: A Multi-Modal Multi-Task Expert Demonstration Dataset
for Robot Caregiving

In Submission

1Cornell University, 2Columbia University, 3National University of Singapore, 4University of Massachusetts Lowell
*Equal contribution
Teaser


OpenRoboCare is an expert demonstration dataset for robot caregiving, featuring 21 occupational therapists demonstrating 15 common caregiving tasks, captured across 5 data modalities. It consists of 315 sessions, totaling 19.8 hours, with a collection of 31,185 samples.



Expert Demonstrations

Caregiving Tasks

Choose a task to visualize:


Occupational Therapists

Choose an OT to visualize:



Abstract

We present OpenRoboCare, a multi-modal dataset for robot-assisted caregiving, capturing expert occupational therapist demonstrations of Activities of Daily Living (ADLs).

Caregiving tasks involve complex physical human-robot interactions, requiring precise perception under occlusions, safe physical contact, and long-horizon planning. While recent advances in robot learning from demonstrations have shown promise, there is a lack of a large-scale, diverse, and expert-driven dataset that captures real-world caregiving routines.

To address this gap, we collected data from 21 occupational therapists performing 15 ADL tasks on two manikins. The dataset spans five modalities — RGBD video, pose tracking, eye-gaze tracking, task and action annotations, and tactile sensing, providing rich multi-modal insights into caregiver movement, attention, force application, and task execution strategies. We further analyze expert caregiving principles and strategies, offering insights to improve robot efficiency and task feasibility. Additionally, our evaluations demonstrate that OpenRoboCare presents challenges for state-of-the-art robot perception and human activity recognition methods, both critical for developing safe and adaptive assistive robots, highlighting the value of our contribution.



Video

OpenRoboCare


Visualizing in RCareWorld


Tactile Skin Activation and Eye Gaze

Choose a task to visualize the tactile skin activation and eye gaze as the OT performs the task:



Comparing To Existing Datasets

Previous works in the rehabilitation and public health literature have collected survey and interview-based data on caregiving. These efforts focus on the health, social, and financial aspects of caregiving, rather than the physical process of caregiving that we consider here. Works close to ours do not collect data from expert caregivers [6] or lack multimodality [4, 5]. To the best of our knowledge, OpenRoboCare is the first dataset for multi-task, multi-modal, expert caregiving.

Comparison table.



Data Collection Setup

Left: setup of sensors and equipment. Center: assistive devices used by caregivers. Right: sequence of tasks performed by each caregiver.

Data collection setup.


Tactile Skin Design

We develop a custom tactile skin to fit the manikins and record physical interactions between the caregiver and the manikin. The sensor design is guided by three key considerations: (1) customizability to accommodate various manikin body shapes and sizes, (2) flexibility to ensure secure attachment to curved surfaces, and (3) durability to withstand pressure exerted by the manikin’s weight.

Tactile skin design.


Insights from Human Caregiving


Throughout data collection, we observe various techniques that OTs use to perform tasks more efficiently while minimizing physical effort. We collaborate with an experienced OT to analyze the observed techniques and distill their underlying principles that can guide robot design for caregiving tasks.

Principles

Principle 1: OTs prioritize safety by carefully preparing the care recipient before initiating a task. They ensure that the care recipient’s posture, stability, joint angles, and supporting surfaces are appropriate for task execution.

Principle 2: OTs anticipate and organize their body mechanics to support the entire task sequence, particularly for large-scale movements. They anticipate both the final position and the trajectory of the care recipient’s body and limbs, which influences task execution decisions.

Principle 3: OTs prioritize accuracy and timely completion of tasks to ensure efficiency. Care recipients with severe mobility limitations often have medical conditions, making efficient ADL execution crucial.


Techniques

Technique 1: Bridge Strategy — bending the care recipient’s knees and applying pressure behind the knees at the top of the calf to momentarily elevate the pelvis.

Technique 2: Segmental Roll — gradually turning the care recipient’s body. The OT bends the care recipient’s opposite-side knee and applies pressure on the bent knee to initiate a progressive rolling motion toward the OT.

Technique 4: Stabilizing Key Points of Control — The pelvic bone, shoulders, and head serve as the primary points of control. OTs place their hands on key control points — like the scapula and pelvis — to initiate, support, and control movement.



Dataset Analysis

Caregiver Trajectory Across Tasks

We visualize the top-down view of caregiver head trajectory. Caregiver trajectories for transfer tasks involve more complex movements compared to other tasks. Caregivers take different approachs for tasks like bathing and dressing, with some approaching the manikin from one side and others from both sides. This diversity in strategy highlights the importance of a comprehensive expert demonstration dataset.


Caregiver Hand Workspace

We visualize the global workspace of the caregivers' hands relative to the hospital bed and assistive wheelchair to provide insights into the robot workspace needed for caregiving tasks.


General Data Statistics

Distribution of data across tasks, location, manikin, and task duration.





Occupational Therapist Strategies

Distribution of occupational therapist strategies across tasks.





Physical Contact

Physical contact distribution across tasks and body regions.

Force

Differences in force magnitude across tasks and during a bathing task.





Qualitative Evaluations

Video Captioning

We run VidChapter-7M on a subset of the RGB videos to demonstrate that OpenRoboCare poses significant challenges for off-the-shelf state-of-the-art video captioning models. Compared to the ground truth action labels annotated by occupational therapists, the captions predicted by VidChapter-7M are noticeably sparser and lack fine-grained detail. Furthermore, the model struggles with domain-specific terminology and context, evident in cases where it mislabels key caregiving actions. For example, when the caregiver aligns the Hoyer lift to the bed, the model inaccurately captions the activity as "positioning the foyer."