Kwik format

Cortex Lab, University College London

Cyrille Rossant

Overview

  • Description of the format
  • Performance considerations
  • Analysis and visualization framework

Einevoll 2012

Spike sorting

The Kwik format

.kwik

.kwx

.kwd

tens of MB

tens of GB

tens of GB

spike times

clustering

metadata

probe

features

masks

waveforms

raw data

filtered data

3 HDF5 files

# https://github.com/klusta-team/kwiklib/wiki/Kwik-format

/kwik_version* [=2]
/name*
/application_data
    spikedetekt
        MY_SPIKEDETEKT_PARAM*
        ...
/user_data
/channel_groups
    [X]  # channel group index
        name*
        channel_order*  # list of channels
        adjacency_graph* [Kx2 array of integers]
        application_data
        user_data
        channels
            [X]  # Relative channel index from 0 to shanksize-1
                name*
                ignored*
                position*
                voltage_gain*
                display_threshold*
                application_data
                    klustaviewa
                    spikedetekt
                user_data
        spikes
            time_samples* [N-long EArray of UInt64]
            time_fractional* [N-long EArray of UInt8]
            recording* [N-long EArray of UInt16]
            clusters
                main* [N-long EArray of UInt32]
                original* [N-long EArray of UInt32]
            features_masks
                hdf5_path* [='{kwx}/channel_groups/X/features_masks']
            waveforms_raw
                hdf5_path* [='{kwx}/channel_groups/X/waveforms_raw']
            waveforms_filtered
                hdf5_path* [='{kwx}/channel_groups/X/waveforms_filtered']
        clusters
            [clustering_name]
                [X]  # Cluster number
                    application_data
                        klustaviewa
                            color*
                    cluster_group*
                    mean_waveform_raw*
                    mean_waveform_filtered*
                    quality_measures
                        isolation_distance*
                        matrix_isolation*
                        refractory_violation*
                        amplitude*
                    user_data
                        ...
        cluster_groups
            [clustering_name]
                [X]  # Cluster group number
                    name*
                    application_data
                        klustaviewa
                            color*
                    user_data
/recordings
    [X]  # Recording index from 0 to Nrecordings-1
        name*
        start_time*
        start_sample*
        sample_rate*
        bit_depth*
        band_high*
        band_low*
        raw
            hdf5_path* [='{raw.kwd}/recordings/X']
        high
            hdf5_path* [='{high.kwd}/recordings/X']
        low
            hdf5_path* [='{low.kwd}/recordings/X']
        user_data
/event_types
    [X]  # The name of the event type.
        user_data
        application_data
            klustaviewa
                color*
        events
            time_samples* [N-long EArray of UInt64]
            recording* [N-long EArray of UInt16]
            user_data [group or EArray]

The .kwik file

The .kwx file


/channel_groups
  [0]
    features_masks # [(N x NFEATURES x 2) EArray of Float32]
    waveforms_raw # [(N x NWAVESAMPLES x NCHANNELS) EArray of Int16]
    waveforms_filtered # [(N x NWAVESAMPLES x NCHANNELS) EArray of Int16]

The .kwd files

# experiment.raw.kwd
/recordings
  [0]
    data # [(Nsamples x Nchannels) EArray of Int16]
    [and some metadata...]

# idem for:
# experiment.high.kwd
# experiment.low.kwd

Probe format

channel_groups = {
    0: {
        # list of channels to keep
        "channels": [0, 1, 2, 3],
        # list of pairs of connected (nearby) channels
        "graph": [[0, 1], [2, 3], ...],  
        "geometry": {
            0: [0.1, 0.2],
            1: [0.3, 0.4], ...} 
        },
    1: {
        "channels": [4, 5, 6, 7],
        "graph": [[4, 5], [6, 7], ...],
        "geometry": {
            4: [0.1, 0.2], 
            5: [0.3, 0.4], ...}
        }
    }

Limitations: memory

  • nspikes = 10M
  • nchannels = 1000
  • sample_rate = 30 kHz
  • duration = 1 hour
  • raw data: 200GB
  • features_masks: 220GB
  • waveforms: 930GB

Solutions

  • Sparse arrays: not supported natively in HDF5 but we can do it (using masks)

 

  • Waveforms: do we need to store them, or can we recompute them on the fly from raw data?

Limitations: performance

  • Random access is slow on HDDs: hard drives are the limiting factor (SSDs much faster)
    • Seek time: ~10 ms
    • Transfer rate: ~ 100 MB/sec

 

  • Loading features from the disk for all spikes in a given cluster is slow
    • Solution: loading the whole array, or big chunks, in memory

spikes

features

KlustaViewa

DEMO

KlustaViewa: limitations

Installation may be difficult

KlustaViewa: limitations

Local installation of software and data required (no remote work, hardly scalable to huge datasets)

Future major release

IPython notebook

IPython notebook

DEMO

hardware-accelerated visualization with the graphics card (GPU)

Conclusion

  • Kwik format for storing large-scale electrophysiological data
  • Memory and performance improvements
  • Distributed analysis with the IPython notebook
  • Remote visualization in the notebook with Vispy and WebGL

Kwik format

By Cyrille Rossant