Data engineers and their unlocking potential for business use-cases

IMG_20181030_141321Nate Kupp currently holds the position of Director of Infrastructure and Data Science at Thumbtack and has presented this year his talk and success story entitled: “From humble beginnings: building the data stack at Thumbtack”. This is one of the presentations I’ve enjoyed much because it was similar to one of the pains I’ve also experienced in my day-to-day work.

A difference between Nate’s approach and mine is the executive sponsors (and a bit of luck of being in the right place, right time and the right management mentality). My experience on the other hand is, from my perspective a failure, but for others a small success against overwhelming odds.

Continue reading →

Unicorn data engineers & scientists, a guide to catch, keep and sh*t rainbows

This year at CrunchConf 2018 there was an interesting talk by Andrey Sharapov an Data Engineer & Scientist at Lidl. Yes, Lidl. The store in your back alley or in your neighbourhood. Did you know it does Big Data? I assumed, yes, given one wants to optimize both the idea of minimizing waste and increasing profits (eg. how much of X do one store needs to order to ensure it’s gone by EOD).

Andrey’s talk was centered around “Building data products: from zero to hero!” and I would personally want to apraise the realism of his presentation which gives me content for more than one article on the subject. He’s one in a series of presenters at this year’s conference that has called out to the strategy of companies of investing too much in data scientists, then finding out they don’t have an infrastructure those scientists need, then trying to find data engineers a bit too late in the game (which are even more scarce than scientists).

Continue reading →

On workflow engines and where Airflow fits in

With the occasion of the CrunchConf 2018 there was a presentation on “Operating data pipeline using Airflow @ Slack” from Ananth Packkildurai. If you don’t know what Airflow is, it’s an workflow engine of the similar likes of Oozie and Azkaban. It’s based on the concept of a DAG which you write in Python and execute on a cluster.

As in the case of the Kafka presentation by Tim Berglund, we’ve asked the hard questions and they got popular pretty soon. In the case of Airflow, in the eco-system of workflow engines, we had quite a heavy question.

Continue reading →