Data Format Reference¶

HDF5 input format¶

The app reads accelerometry data from HDF5 files (.h5). Each file must contain a readings table with these columns:

Column	Type	Description
`timestamp`	datetime64	Measurement timestamp
`x`	float	Acceleration along the x-axis
`y`	float	Acceleration along the y-axis
`z`	float	Acceleration along the z-axis

The readings table is stored as a PyTables (HDF5) table so the app can query a time window without loading the whole file.

File placement¶

Place HDF5 files in:

visualize_accelerometry/data/readings/

Any file with an .h5 extension in this directory will be discovered by the application.

File naming¶

Any .h5 filename works. The file picker dropdown shows filenames as-is, so use something descriptive. A common pattern:

participant_id-date.h5

File assignment¶

Files are spread across annotators with a deterministic shuffle (fixed seed). The same user gets the same assignment across sessions, and the workload is balanced. Two annotators see different assignment orders, which reduces redundant work.

The assignment runs at startup from the file list and registered users. Admins see all files and can impersonate another user to view their assignments.

Annotation output format¶

Annotations are saved as Excel files (.xlsx) in:

visualize_accelerometry/data/output/

Each user’s annotations are stored in a separate file named annotations_{username}.xlsx. Clicking Export in the toolbar writes the current user’s complete annotation set to this file.

Column schema¶

The annotation DataFrame uses the following columns, defined in config.py as ANNOTATION_COLUMNS:

Column	Type	Description
`fname`	string	Source HDF5 filename that was annotated
`artifact`	string	Activity type: `"chairstand"`, `"tug"`, `"3mw"`, or `"6mw"`
`segment`	bool	`True` if this annotation marks an individual repetition segment
`scoring`	bool	`True` if this segment was selected for frailty assessment scoring
`review`	bool	`True` if this annotation is flagged for peer review
`start_epoch`	float	Start time as Unix epoch (seconds since 1970-01-01)
`end_epoch`	float	End time as Unix epoch (seconds since 1970-01-01)
`start_time`	string	Human-readable start time (e.g., `"Nov 08 2021 11:39 AM"`)
`end_time`	string	Human-readable end time
`annotated_at`	string	Timestamp when the annotation was created or last modified
`user`	string	Username of the annotator who created the annotation
`notes`	string	Free-text notes (e.g., `"uncertain boundary"`, `"possible artifact"`)

The subset of columns displayed in the in-app data table is defined by DISPLAYED_ANNOTATION_COLUMNS and omits fname, start_epoch, and end_epoch.

Converting from CSV to HDF5¶

If your accelerometry data is in CSV format, you can convert it to the required HDF5 format using pandas:

import pandas as pd

# Read the CSV file
df = pd.read_csv("recording.csv", parse_dates=["timestamp"])

# Ensure the expected columns exist
assert set(["timestamp", "x", "y", "z"]).issubset(df.columns)

# Write to HDF5 in PyTables format
df.to_hdf(
    "recording.h5",
    key="readings",
    format="table",      # use 'table' format for queryable storage
    data_columns=True,   # index all columns for fast time-range queries
)

The format="table" argument matters. It creates a PyTables table that supports row-level queries, which the app needs to load just the visible time window.