braindecode.datautil package

Utilities for data manipulation.

Submodules

braindecode.datautil.iterators module

braindecode.datautil.iterators.get_balanced_batches(n_trials, rng, shuffle, n_batches=None, batch_size=None)[source]

Create indices for batches balanced in size (batches will have maximum size difference of 1). Supply either batch size or number of batches. Resulting batches will not have the given batch size but rather the next largest batch size that allows to split the set into balanced batches (maximum size difference 1).

Parameters:
  • n_trials (int) – Size of set.
  • rng (RandomState)
  • shuffle (bool) – Whether to shuffle indices before splitting set.
  • n_batches (int, optional)
  • batch_size (int, optional)
class braindecode.datautil.iterators.BalancedBatchSizeIterator(batch_size, seed=328774)[source]

Bases: object

Create batches of balanced size.

Parameters:
  • batch_size (int) – Resulting batches will not necessarily have the given batch size but rather the next largest batch size that allows to split the set into balanced batches (maximum size difference 1).
  • seed (int) – Random seed for initialization of numpy.RandomState random generator that shuffles the batches.
get_batches(dataset, shuffle)[source]
reset_rng()[source]
class braindecode.datautil.iterators.ClassBalancedBatchSizeIterator(batch_size, seed=328774)[source]

Bases: object

Create batches of balanced size, that are also balanced per class, i.e. each class should be sampled roughly with the same frequency during training.

Parameters:
  • batch_size (int) – Resulting batches will not necessarily have the given batch size but rather the next largest batch size that allows to split the set into balanced batches (maximum size difference 1).
  • seed (int) – Random seed for initialization of numpy.RandomState random generator that shuffles the batches.
get_batches(dataset, shuffle)[source]
reset_rng()[source]
class braindecode.datautil.iterators.CropsFromTrialsIterator(batch_size, input_time_length, n_preds_per_input, seed=(2017, 6, 28))[source]

Bases: object

Iterator sampling crops out the trials so that each sample (after receptive size of the ConvNet) in each trial is predicted.

Predicting the given input batches can lead to some samples being predicted multiple times, if the receptive field size (input_time_length - n_preds_per_input + 1) is not a divisor of the trial length. compute_preds_per_trial_from_crops() can help with removing the overlapped predictions again for evaluation.

Parameters:
  • batch_size (int)
  • input_time_length (int) – Input time length of the ConvNet, determines size of batches in 3rd dimension.
  • n_preds_per_input (int) – Number of predictions ConvNet makes per one input. Can be computed by making a forward pass with the given input time length, the output length in 3rd dimension is n_preds_per_input.
  • seed (int) – Random seed for initialization of numpy.RandomState random generator that shuffles the batches.

See also

braindecode.experiments.monitors.compute_preds_per_trial_from_crops
Assigns predictions to trials, removes overlaps.
reset_rng()[source]
get_batches(dataset, shuffle)[source]

braindecode.datautil.lazy_iterators module

braindecode.datautil.lazy_iterators.custom_collate(batch, rng_state=None)[source]

Puts each data field into a ndarray with outer dimension batch size. Taken and adapted from pytorch to return ndarrays instead of tensors: https://pytorch.org/docs/0.4.1/_modules/torch/utils/data/dataloader.html

this function is needed, since tensors require more system RAM which we want to decrease using lazy loading

class braindecode.datautil.lazy_iterators.LazyCropsFromTrialsIterator(input_time_length, n_preds_per_input, batch_size, seed=328774, num_workers=0, collate_fn=<function custom_collate>, check_preds_smaller_trial_len=True, reset_rng_after_each_batch=False)[source]

Bases: object

This is basically the same code as CropsFromTrialsIterator adapted to work with lazy datasets. It uses pytorch DataLoader to load recordings from hdd with multiple threads when the data is actually needed. Reduces overall RAM requirements.

Parameters:
  • input_time_length (int) – Input time length of the ConvNet, determines size of batches in 3rd dimension.
  • n_preds_per_input (int) – Number of predictions ConvNet makes per one input. Can be computed by making a forward pass with the given input time length, the output length in 3rd dimension is n_preds_per_input.
  • batch_size (int)
  • seed (int) – Random seed for initialization of numpy.RandomState random generator that shuffles the batches.
  • num_workers (int) – The number of workers to load crops in parallel
  • collate_fn (func) – Merges a list of samples to form a mini-batch
  • check_preds_smaller_trial_len (bool) – Checking validity of predictions and trial lengths. Disable to decrease runtime.
reset_rng()[source]
get_batches(dataset, shuffle)[source]

braindecode.datautil.signal_target module

class braindecode.datautil.signal_target.SignalAndTarget(X, y)[source]

Bases: object

Simple data container class.

Parameters:
  • X (3darray or list of 2darrays) – The input signal per trial.
  • y (1darray or list) – Labels for each trial.
braindecode.datautil.signal_target.apply_to_X_y(fn, *sets)[source]

Apply a function to all X and y attributes of all given sets.

Applies function to list of X arrays and to list of y arrays separately.

Parameters:
Returns:

result_set – Dataset with X and y as the result of the application of the function.

Return type:

SignalAndTarget

braindecode.datautil.signalproc module

braindecode.datautil.signalproc.exponential_running_standardize(data, factor_new=0.001, init_block_size=None, eps=0.0001)[source]

Perform exponential running standardization.

Compute the exponental running mean \(m_t\) at time t as \(m_t=\mathrm{factornew} \cdot mean(x_t) + (1 - \mathrm{factornew}) \cdot m_{t-1}\).

Then, compute exponential running variance \(v_t\) at time t as \(v_t=\mathrm{factornew} \cdot (m_t - x_t)^2 + (1 - \mathrm{factornew}) \cdot v_{t-1}\).

Finally, standardize the data point \(x_t\) at time t as: \(x'_t=(x_t - m_t) / max(\sqrt{v_t}, eps)\).

Parameters:
  • data (2darray (time, channels))
  • factor_new (float)
  • init_block_size (int) – Standardize data before to this index with regular standardization.
  • eps (float) – Stabilizer for division by zero variance.
Returns:

standardized – Standardized data.

Return type:

2darray (time, channels)

braindecode.datautil.signalproc.exponential_running_demean(data, factor_new=0.001, init_block_size=None)[source]

Perform exponential running demeanining.

Compute the exponental running mean \(m_t\) at time t as \(m_t=\mathrm{factornew} \cdot mean(x_t) + (1 - \mathrm{factornew}) \cdot m_{t-1}\).

Deman the data point \(x_t\) at time t as: \(x'_t=(x_t - m_t)\).

Parameters:
  • data (2darray (time, channels))
  • factor_new (float)
  • init_block_size (int) – Demean data before to this index with regular demeaning.
Returns:

demeaned – Demeaned data.

Return type:

2darray (time, channels)

braindecode.datautil.signalproc.highpass_cnt(data, low_cut_hz, fs, filt_order=3, axis=0)[source]
Highpass signal applying causal butterworth filter of given order.
Parameters:
  • data (2d-array) – Time x channels
  • low_cut_hz (float)
  • fs (float)
  • filt_order (int)
Returns:

highpassed_data – Data after applying highpass filter.

Return type:

2d-array

braindecode.datautil.signalproc.lowpass_cnt(data, high_cut_hz, fs, filt_order=3, axis=0)[source]
Lowpass signal applying causal butterworth filter of given order.
Parameters:
  • data (2d-array) – Time x channels
  • high_cut_hz (float)
  • fs (float)
  • filt_order (int)
Returns:

lowpassed_data – Data after applying lowpass filter.

Return type:

2d-array

braindecode.datautil.signalproc.bandpass_cnt(data, low_cut_hz, high_cut_hz, fs, filt_order=3, axis=0, filtfilt=False)[source]
Bandpass signal applying causal butterworth filter of given order.
Parameters:
  • data (2d-array) – Time x channels
  • low_cut_hz (float)
  • high_cut_hz (float)
  • fs (float)
  • filt_order (int)
  • filtfilt (bool) – Whether to use filtfilt instead of lfilter
Returns:

bandpassed_data – Data after applying bandpass filter.

Return type:

2d-array

braindecode.datautil.signalproc.filter_is_stable(a)[source]

Check if filter coefficients of IIR filter are stable.

Parameters:a (list or 1darray of number) – Denominator filter coefficients a.
Returns:is_stable – Filter is stable or not.
Return type:bool

Notes

Filter is stable if absolute value of all roots is smaller than 1, see [1].

References

[1]HYRY, “SciPy ‘lfilter’ returns only NaNs” StackOverflow, http://stackoverflow.com/a/8812737/1469195

braindecode.datautil.splitters module

braindecode.datautil.splitters.concatenate_sets(sets)[source]

Concatenate all sets together.

Parameters:sets (list of SignalAndTarget)
Returns:concatenated_set
Return type:SignalAndTarget
braindecode.datautil.splitters.concatenate_two_sets(set_a, set_b)[source]

Concatenate two sets together.

Parameters:set_a, set_b (SignalAndTarget)
Returns:concatenated_set
Return type:SignalAndTarget
braindecode.datautil.splitters.concatenate_np_array_or_add_lists(a, b)[source]
braindecode.datautil.splitters.split_into_two_sets(dataset, first_set_fraction=None, n_first_set=None)[source]

Split set into two sets either by fraction of first set or by number of trials in first set.

Parameters:
  • dataset (SignalAndTarget)
  • first_set_fraction (float, optional) – Fraction of trials in first set.
  • n_first_set (int, optional) – Number of trials in first set
Returns:

first_set, second_set – The two splitted sets.

Return type:

SignalAndTarget

braindecode.datautil.splitters.select_examples(dataset, indices)[source]

Select examples from dataset.

Parameters:
  • dataset (SignalAndTarget)
  • indices (list of int, 1d-array of int) – Indices to select
Returns:

reduced_set – Dataset with only examples selected.

Return type:

SignalAndTarget

braindecode.datautil.splitters.split_into_train_valid_test(dataset, n_folds, i_test_fold, rng=None)[source]

Split datasets into folds, select one valid fold, one test fold and merge rest as train fold.

Parameters:
  • dataset (SignalAndTarget)
  • n_folds (int) – Number of folds to split dataset into.
  • i_test_fold (int) – Index of the test fold (0-based). Validation fold will be immediately preceding fold.
  • rng (numpy.random.RandomState, optional) – Random Generator for shuffling, None means no shuffling
Returns:

reduced_set – Dataset with only examples selected.

Return type:

SignalAndTarget

braindecode.datautil.splitters.split_into_train_test(dataset, n_folds, i_test_fold, rng=None)[source]
Split datasets into folds, select one test fold and merge rest as train fold.
Parameters:
  • dataset (SignalAndTarget)
  • n_folds (int) – Number of folds to split dataset into.
  • i_test_fold (int) – Index of the test fold (0-based)
  • rng (numpy.random.RandomState, optional) – Random Generator for shuffling, None means no shuffling
Returns:

reduced_set – Dataset with only examples selected.

Return type:

SignalAndTarget

braindecode.datautil.trial_segment module

braindecode.datautil.trial_segment.create_signal_target_from_raw_mne(raw, name_to_start_codes, epoch_ival_ms, name_to_stop_codes=None, prepad_trials_to_n_samples=None, one_hot_labels=False, one_label_per_trial=True)[source]

Create SignalTarget set from given mne.io.RawArray.

Parameters:
  • raw (mne.io.RawArray)
  • name_to_start_codes (OrderedDict (str -> int or list of int)) – Ordered dictionary mapping class names to marker code or marker codes. y-labels will be assigned in increasing key order, i.e. first classname gets y-value 0, second classname y-value 1, etc.
  • epoch_ival_ms (iterable of (int,int)) – Epoching interval in milliseconds. In case only name_to_codes given, represents start offset and stop offset from start markers. In case name_to_stop_codes given, represents offset from start marker and offset from stop marker. E.g. [500, -500] would mean 500ms after the start marker until 500 ms before the stop marker.
  • name_to_stop_codes (dict (str -> int or list of int), optional) – Dictionary mapping class names to stop marker code or stop marker codes. Order does not matter, dictionary should contain each class in name_to_codes dictionary.
  • prepad_trials_to_n_samples (int) – Pad trials that would be too short with the signal before it (only valid if name_to_stop_codes is not None).
  • one_hot_labels (bool, optional) – Whether to have the labels in a one-hot format, e.g. [0,0,1] or to have them just as an int, e.g. 2
  • one_label_per_trial (bool, optional) – Whether to have a timeseries of labels or just a single label per trial.
Returns:

dataset – Dataset with X as the trial signals and y as the trial labels.

Return type:

SignalAndTarget

braindecode.datautil.trial_segment.create_signal_target(data, events, fs, name_to_start_codes, epoch_ival_ms, name_to_stop_codes=None, prepad_trials_to_n_samples=None, one_hot_labels=False, one_label_per_trial=True)[source]

Create SignalTarget set given continuous data.

Parameters:
  • data (2d-array of number) – The continuous recorded data. Channels x times order.
  • events (2d-array) – Dimensions: Number of events, 2. For each event, should contain sample index and marker code.
  • fs (number) – Sampling rate.
  • name_to_start_codes (OrderedDict (str -> int or list of int)) – Ordered dictionary mapping class names to marker code or marker codes. y-labels will be assigned in increasing key order, i.e. first classname gets y-value 0, second classname y-value 1, etc.
  • epoch_ival_ms (iterable of (int,int)) – Epoching interval in milliseconds. In case only name_to_codes given, represents start offset and stop offset from start markers. In case name_to_stop_codes given, represents offset from start marker and offset from stop marker. E.g. [500, -500] would mean 500ms after the start marker until 500 ms before the stop marker.
  • name_to_stop_codes (dict (str -> int or list of int), optional) – Dictionary mapping class names to stop marker code or stop marker codes. Order does not matter, dictionary should contain each class in name_to_codes dictionary.
  • prepad_trials_to_n_samples (int, optional) – Pad trials that would be too short with the signal before it (only valid if name_to_stop_codes is not None).
  • one_hot_labels (bool, optional) – Whether to have the labels in a one-hot format, e.g. [0,0,1] or to have them just as an int, e.g. 2
  • one_label_per_trial (bool, optional) – Whether to have a timeseries of labels or just a single label per trial.
Returns:

dataset – Dataset with X as the trial signals and y as the trial labels.

Return type:

SignalAndTarget

braindecode.datautil.trial_segment.create_signal_target_with_breaks_from_mne(cnt, name_to_start_codes, trial_epoch_ival_ms, name_to_stop_codes, min_break_length_ms, max_break_length_ms, break_epoch_ival_ms, prepad_trials_to_n_samples=None)[source]

Create SignalTarget set from given mne.io.RawArray.

Parameters:
  • cnt (mne.io.RawArray)
  • name_to_start_codes (OrderedDict (str -> int or list of int)) – Ordered dictionary mapping class names to marker code or marker codes. y-labels will be assigned in increasing key order, i.e. first classname gets y-value 0, second classname y-value 1, etc.
  • trial_epoch_ival_ms (iterable of (int,int)) – Epoching interval in milliseconds. Represents offset from start marker and offset from stop marker. E.g. [500, -500] would mean 500ms after the start marker until 500 ms before the stop marker.
  • name_to_stop_codes (dict (str -> int or list of int), optional) – Dictionary mapping class names to stop marker code or stop marker codes. Order does not matter, dictionary should contain each class in name_to_codes dictionary.
  • min_break_length_ms (number) – Breaks below this length are excluded.
  • max_break_length_ms (number) – Breaks above this length are excluded.
  • break_epoch_ival_ms (number) – Break ival, offset from trial end to start of the break in ms and offset from trial start to end of break in ms.
  • prepad_trials_to_n_samples (int) – Pad trials that would be too short with the signal before it (only valid if name_to_stop_codes is not None).
Returns:

dataset – Dataset with X as the trial signals and y as the trial labels. Labels as timeseries and of integers, i.e., not one-hot encoded.

Return type:

SignalAndTarget

braindecode.datautil.trial_segment.add_breaks(events, fs, break_start_code, break_stop_code, name_to_start_codes, name_to_stop_codes, min_break_length_ms=None, max_break_length_ms=None, break_start_offset_ms=None, break_stop_offset_ms=None)[source]

Add break events to given events.

Parameters:
  • events (2d-array) – Dimensions: Number of events, 2. For each event, should contain sample index and marker code.
  • fs (number) – Sampling rate.
  • break_start_code (int) – Marker code that will be used for break start markers.
  • break_stop_code (int) – Marker code that will be used for break stop markers.
  • name_to_start_codes (OrderedDict (str -> int or list of int)) – Ordered dictionary mapping class names to start marker code or start marker codes.
  • name_to_stop_codes (dict (str -> int or list of int), optional) – Dictionary mapping class names to stop marker code or stop marker codes.
  • min_break_length_ms (number, optional) – Minimum length in milliseconds a break should have to be included.
  • max_break_length_ms (number, optional) – Maximum length in milliseconds a break can have to be included.
  • break_start_offset_ms (number, optional) – What offset from trial end to start of the break in ms.
  • break_stop_offset_ms (number, optional) – What offset from next trial start end to previous break end in ms.
Returns:

events – Events with break start and stop markers.

Return type:

2d-array

braindecode.datautil.util module

braindecode.datautil.util.ms_to_samples(ms, fs)[source]

Compute milliseconds to number of samples.

Parameters:
  • ms (number) – Milliseconds
  • fs (number) – Sampling rate
Returns:

n_samples – Number of samples

Return type:

int

braindecode.datautil.util.samples_to_ms(n_samples, fs)[source]

Compute milliseconds to number of samples.

Parameters:
  • n_samples (number) – Number of samples
  • fs (number) – Sampling rate
Returns:

milliseconds

Return type:

int