Sanity checks as data sidekicks

Abe Gong asked for good examples of 'data sidekicks', i.e. "i.e. small, curated data" that "can accelerate analysis, solve cold start problems, and simplify complicated data pipelines".

My response was "brain-data sidekick: sanity-check you can classify blank screen vs stimuli" - e.g. when classifying mental states with fMRI, if you can't tell stimuli vs none, probably something awry upstream in your pipeline.

I still haven't got the hang of distilling complex thoughts into 140 characters, and so I was worried my reply might have been compressed into cryptic nonsense. Here's what I was trying to say:

Let's say you're trying to do a difficult classification on a dataset that has had a lot of preprocessing/transformation, like fMRI brain data. There are a million reasons why things could be going wrong.

All successful analyses are alike, but every unsuccessful analysis is unsuccessful in its own way (sorry, Tolstoy).

Things could be failing for meaningful reasons, e.g.:

the brain doesn't work the way you think, so you're analysing the wrong brain regions or representing things in a different way
there's signal there but it's represented at a finer-grained resolution than you can measure.

But the most likely explanation is that you screwed up your preprocessing (mis-imported the data, mis-aligned the labels, mixed up the X-Y-Z dimensions etc).

If you can't classify someone staring at a blank screen vs a screen with something on it, it's probably something like this, since visual input is pretty much the strongest and most wide-spread signal in the brain - your whole posterior cortex lights up in response to high-salience images (like faces and places).

In the time I spent writing this, Abe had already figured out what I meant :).

Sanity checks as data sidekicks

Belongs to these tags