top of page

Here come the chestXrays!

  • Dr. Candace Makeda Moore
  • 26 בפבר׳ 2019
  • זמן קריאה 2 דקות

CheXpert is here! It's a new dataset of Xrays...but maybe with some of the same problems. I'm downloading the actual data in the background as I blog- and this is no small issue.

When the NIH released ChestX-ray8 the proclaimed it was free and open, but it wasn't. You needed to download BOX to get the data. BOX had different pricing plans, and subscription models...it was a nice name because the experience was like being told you could have a gift of free platinum, but you need to pay for the box it comes in. You will pay only shipping and handling...where have I heard that before?

Now funny thing- last I checked the N in NIH stood for national- as in I payed for it in tax dollars. The data came from- patients like me. And now I must ostensibly PAY to get a hold of data from me created by a group I fund. Well, I can't say I'm surprised. I would also have to pay to get a copy of a scientific paper I published- based on work funded by the NIH.

I'm thrilled I was able to just register and get this data! Here: https://stanfordmlgroup.github.io/competitions/chexpert/

But the data itself...we will see. I see from the associated documentation and paper, it isn't categorized the way I would want it. Without getting into the nitty gritty of every quirk….let's just say, the world could use some differently labelled data to accomplish the maximum with AI.

We need to remember that AI is somewhat of a misnomer. Artificial intelligence- at least in terms of convolutional neural nets, isn't all that 'smart'...a pure simple convolutional neural net can NOT take into account CONTEXT. Is that infiltrate a pneumonia or pulmonary hemorrhage? Well the context of a previously healthy person who just fell off a motorbike would give you a very different probability of pneumonia than a feverish 70 year old. But what if that feverish 70 year old is being treated for lung cancer? And on and on it goes...It actually becomes quite easy to see that depending on the overall diagnostic algorithms of a program one could make, one could get very different results. And since it's health, those results are quite important. The results might determine if you can get certain kinds of health insurance at all. This problem is why keeping not only datasets but also algorithms open is an ethical imperative. If a 'black-box' algorithm decides whether or not I can get insurance, or certain drugs it gives the illusion of objectivity without it...and I can't even check what is driving the decisions.

So few people are paying attention to these issues, we have already let 'black-box' algorithms rule over many aspect of our lives. ' Thank you to Stanford for doing the right thing!!!


 
 
 

Comentários


©2018 by Dr. Candace Makeda Moore.

bottom of page