Reproducable Data Science

I recently gave a talk entitled “Data Pipelines A La Mode”, with the following premise. We can use techniques from functional programming and distributed build systems in our (big) data (science) pipelines to allow us to know what code was used for »