Data Format Reference¶
HDF5 input format¶
The application reads accelerometry data from HDF5 files (.h5). Each file must contain a readings table with the following columns:
Column |
Type |
Description |
|---|---|---|
|
datetime64 |
Measurement timestamp |
|
float |
Acceleration along the x-axis |
|
float |
Acceleration along the y-axis |
|
float |
Acceleration along the z-axis |
The readings table is stored as a PyTables (HDF5) table, which allows efficient time-range queries without loading the entire file into memory.
File placement¶
Place HDF5 files in:
visualize_accelerometry/data/readings/
Any file with an .h5 extension in this directory will be discovered by the application.
File naming¶
There are no strict naming requirements — any .h5 extension works. However, the application displays filenames in the file picker dropdown, so descriptive names are recommended. A common convention is:
participant_id-date.h5
File assignment¶
Files are distributed across annotators using a deterministic shuffle with a fixed seed. This ensures that:
Each annotator sees a consistent set of files across sessions
The workload is evenly distributed
Two annotators never see the same assignment order (reducing redundant work)
The assignment is computed at startup based on the list of available files and registered users. Admins can see all files and can impersonate other users to view their assignments.
Annotation output format¶
Annotations are saved as Excel files (.xlsx) in:
visualize_accelerometry/data/output/
Each user’s annotations are stored in a separate file named annotations_{username}.xlsx. Clicking Export in the toolbar writes the current user’s complete annotation set to this file.
Column schema¶
The annotation DataFrame uses the following columns, defined in config.py as ANNOTATION_COLUMNS:
Column |
Type |
Description |
|---|---|---|
|
string |
Source HDF5 filename that was annotated |
|
string |
Activity type: |
|
bool |
|
|
bool |
|
|
bool |
|
|
float |
Start time as Unix epoch (seconds since 1970-01-01) |
|
float |
End time as Unix epoch (seconds since 1970-01-01) |
|
string |
Human-readable start time (e.g., |
|
string |
Human-readable end time |
|
string |
Timestamp when the annotation was created or last modified |
|
string |
Username of the annotator who created the annotation |
|
string |
Free-text notes (e.g., |
The subset of columns displayed in the in-app data table is defined by DISPLAYED_ANNOTATION_COLUMNS and omits fname, start_epoch, and end_epoch.
Converting from CSV to HDF5¶
If your accelerometry data is in CSV format, you can convert it to the required HDF5 format using pandas:
import pandas as pd
# Read the CSV file
df = pd.read_csv("recording.csv", parse_dates=["timestamp"])
# Ensure the expected columns exist
assert set(["timestamp", "x", "y", "z"]).issubset(df.columns)
# Write to HDF5 in PyTables format
df.to_hdf(
"recording.h5",
key="readings",
format="table", # use 'table' format for queryable storage
data_columns=True, # index all columns for fast time-range queries
)
The format="table" argument is important — it creates a PyTables table that supports efficient row-level queries, which the application uses to load only the visible time window rather than the entire file.