Kwik format
Cortex Lab, University College London
Cyrille Rossant
Overview
- Description of the format
- Performance considerations
- Analysis and visualization framework
Einevoll 2012
Spike sorting
The Kwik format
.kwik
.kwx
.kwd
tens of MB
tens of GB
tens of GB
spike times
clustering
metadata
probe
features
masks
waveforms
raw data
filtered data
3 HDF5 files
# https://github.com/klusta-team/kwiklib/wiki/Kwik-format
/kwik_version* [=2]
/name*
/application_data
spikedetekt
MY_SPIKEDETEKT_PARAM*
...
/user_data
/channel_groups
[X] # channel group index
name*
channel_order* # list of channels
adjacency_graph* [Kx2 array of integers]
application_data
user_data
channels
[X] # Relative channel index from 0 to shanksize-1
name*
ignored*
position*
voltage_gain*
display_threshold*
application_data
klustaviewa
spikedetekt
user_data
spikes
time_samples* [N-long EArray of UInt64]
time_fractional* [N-long EArray of UInt8]
recording* [N-long EArray of UInt16]
clusters
main* [N-long EArray of UInt32]
original* [N-long EArray of UInt32]
features_masks
hdf5_path* [='{kwx}/channel_groups/X/features_masks']
waveforms_raw
hdf5_path* [='{kwx}/channel_groups/X/waveforms_raw']
waveforms_filtered
hdf5_path* [='{kwx}/channel_groups/X/waveforms_filtered']
clusters
[clustering_name]
[X] # Cluster number
application_data
klustaviewa
color*
cluster_group*
mean_waveform_raw*
mean_waveform_filtered*
quality_measures
isolation_distance*
matrix_isolation*
refractory_violation*
amplitude*
user_data
...
cluster_groups
[clustering_name]
[X] # Cluster group number
name*
application_data
klustaviewa
color*
user_data
/recordings
[X] # Recording index from 0 to Nrecordings-1
name*
start_time*
start_sample*
sample_rate*
bit_depth*
band_high*
band_low*
raw
hdf5_path* [='{raw.kwd}/recordings/X']
high
hdf5_path* [='{high.kwd}/recordings/X']
low
hdf5_path* [='{low.kwd}/recordings/X']
user_data
/event_types
[X] # The name of the event type.
user_data
application_data
klustaviewa
color*
events
time_samples* [N-long EArray of UInt64]
recording* [N-long EArray of UInt16]
user_data [group or EArray]
The .kwik file
The .kwx file
/channel_groups
[0]
features_masks # [(N x NFEATURES x 2) EArray of Float32]
waveforms_raw # [(N x NWAVESAMPLES x NCHANNELS) EArray of Int16]
waveforms_filtered # [(N x NWAVESAMPLES x NCHANNELS) EArray of Int16]
The .kwd files
# experiment.raw.kwd
/recordings
[0]
data # [(Nsamples x Nchannels) EArray of Int16]
[and some metadata...]
# idem for:
# experiment.high.kwd
# experiment.low.kwd
Probe format
channel_groups = {
0: {
# list of channels to keep
"channels": [0, 1, 2, 3],
# list of pairs of connected (nearby) channels
"graph": [[0, 1], [2, 3], ...],
"geometry": {
0: [0.1, 0.2],
1: [0.3, 0.4], ...}
},
1: {
"channels": [4, 5, 6, 7],
"graph": [[4, 5], [6, 7], ...],
"geometry": {
4: [0.1, 0.2],
5: [0.3, 0.4], ...}
}
}
Limitations: memory
- nspikes = 10M
- nchannels = 1000
- sample_rate = 30 kHz
- duration = 1 hour
- raw data: 200GB
- features_masks: 220GB
- waveforms: 930GB
Solutions
- Sparse arrays: not supported natively in HDF5 but we can do it (using masks)
- Waveforms: do we need to store them, or can we recompute them on the fly from raw data?
Limitations: performance
-
Random access is slow on HDDs: hard drives are the limiting factor (SSDs much faster)
- Seek time: ~10 ms
- Transfer rate: ~ 100 MB/sec
- Loading features from the disk for all spikes in a given cluster is slow
- Solution: loading the whole array, or big chunks, in memory
spikes
features
KlustaViewa
DEMO
KlustaViewa: limitations
Installation may be difficult
KlustaViewa: limitations
Local installation of software and data required (no remote work, hardly scalable to huge datasets)
Future major release
IPython notebook
IPython notebook
DEMO
hardware-accelerated visualization with the graphics card (GPU)
Conclusion
- Kwik format for storing large-scale electrophysiological data
- Memory and performance improvements
- Distributed analysis with the IPython notebook
- Remote visualization in the notebook with Vispy and WebGL
Kwik format
By Cyrille Rossant
Kwik format
- 3,815