Data Format Reference¶
HDF5 input format¶
The app reads accelerometry data from HDF5 files (.h5). Each file must contain a readings table with these columns:
Column |
Type |
Description |
|---|---|---|
|
datetime64 |
Measurement timestamp |
|
float |
Acceleration along the x-axis |
|
float |
Acceleration along the y-axis |
|
float |
Acceleration along the z-axis |
The readings table is stored as a PyTables (HDF5) table so the app can query a time window without loading the whole file.
File placement¶
Place HDF5 files in:
visualize_accelerometry/data/readings/
Any file with an .h5 extension in this directory will be discovered by the application.
File naming¶
Any .h5 filename works. The file picker dropdown shows filenames as-is, so use something descriptive. A common pattern:
participant_id-date.h5
File assignment¶
Files are spread across annotators with a deterministic shuffle (fixed seed). The same user gets the same assignment across sessions, and the workload is balanced. Two annotators see different assignment orders, which reduces redundant work.
The assignment runs at startup from the file list and registered users. Admins see all files and can impersonate another user to view their assignments.
Annotation output format¶
Annotations are saved as Excel files (.xlsx) in:
visualize_accelerometry/data/output/
Each user’s annotations are stored in a separate file named annotations_{username}.xlsx. Clicking Export in the toolbar writes the current user’s complete annotation set to this file.
Column schema¶
The annotation DataFrame uses the following columns, defined in config.py as ANNOTATION_COLUMNS:
Column |
Type |
Description |
|---|---|---|
|
string |
Source HDF5 filename that was annotated |
|
string |
Activity type: |
|
bool |
|
|
bool |
|
|
bool |
|
|
float |
Start time as Unix epoch (seconds since 1970-01-01) |
|
float |
End time as Unix epoch (seconds since 1970-01-01) |
|
string |
Human-readable start time (e.g., |
|
string |
Human-readable end time |
|
string |
Timestamp when the annotation was created or last modified |
|
string |
Username of the annotator who created the annotation |
|
string |
Free-text notes (e.g., |
The subset of columns displayed in the in-app data table is defined by DISPLAYED_ANNOTATION_COLUMNS and omits fname, start_epoch, and end_epoch.
Converting from CSV to HDF5¶
If your accelerometry data is in CSV format, you can convert it to the required HDF5 format using pandas:
import pandas as pd
# Read the CSV file
df = pd.read_csv("recording.csv", parse_dates=["timestamp"])
# Ensure the expected columns exist
assert set(["timestamp", "x", "y", "z"]).issubset(df.columns)
# Write to HDF5 in PyTables format
df.to_hdf(
"recording.h5",
key="readings",
format="table", # use 'table' format for queryable storage
data_columns=True, # index all columns for fast time-range queries
)
The format="table" argument matters. It creates a PyTables table that supports row-level queries, which the app needs to load just the visible time window.