Going to the Crunch Data Engineering and Analytics Conference, 29-31 October 2018 in Budapest

I remember in 2016 my current employer provided the opportunity to go to the Cassandra Summit 2016 edition in San Jose. An exhausting and long 30-hour flight, tons of preparations with the US visa a few weeks ahead, a booking mistake that I had to pay with my own card until it was fixed and many more “troubles” later, I was finally there.

The thing about some conferences is that not all presentations are put online. In this case for Cassandra Summit 2016, the Datastax community has provided all recordings of the conference presentations but this is not true for most. Which is just nice of them to do for the community as such material can be later referenced to.

This year, through the Bucharest Big Data meetup, via a small nudge from Valentina Crisan, our local Big Data community maintainer, came the opportunity to go for CrunchConf, boosting an interesting panel of data engineering and analytics speakers, much of them well-known authors of open-source libraries or high-profile employees to one of the Big Tech firms (eg. Google) talking about their data challenges. And it’s always nice to hear war-stories from others as it may apply to you in some not-so-distant future.

Joining me will by my wife and because Bucharest to Budapest is an 840km ride, we went cheap and wanted to do some drive-by site-seeing so we’re going to take the car, checking in at Cotton House near Nyugati station. The venue will happen at the Hungarian Railway Museum (Magyar Vasuttorteneti Park) which is reachable by direct trains in the morning. However, I may choose the car and try to find a near parking spot. Later update: the organizers were kind enough to inform us the museum parking is free (yay!).

Of the panels there I have a small run-down list of speakers and presentations that are of interest to my work, mostly centered around the engineering aspect of Big Data toolings:

  • JACEK LASKOWSKI is going to deep dive into the internals of Spark’s 2.3 execution engine. I’ve ‘founded’ the idea so let’s say a Groovy-based execution engine that bounded the Spark session object to a script, compiled the script and sent the JAR for distribution across the cluster allowing you effectively to script against Spark. So you can understand why such a topic attracts me most.
  • CHRIS TRAVERS is going to present PostgreSQL at 20TB and beyond which tickles my current attraction towards Greenplum and HAWQ, both forks of PostgreSQL working against Hadoop-stored data in the case of HAWQ or over a shared-nothing architecture in the case of Greenplum. The guys have 4PB of data in PostgreSQL while we’re managing 1PB of data at my current employer on Hadoop. It’s incomparable.
  • ABHISHEK TIWARI is going to talk about Apache Gobblin, an data ingestion framework, much of it is my current pain-point as an architect trying to integrate the different data-sources in the company with ease.
  • JEFFREY THEOBALD shares his war stories on bringing ML to production, some stories of which I’m mostly interested in to know the hurdles one could face in trying to put to production such an intricate assembly of data flows.
  • MILENE DARNIS & ATUL GUPTE are going to talk about democratic data access at Uber which is something that I personally believe is ‘A Good Thing’ as it provides freedom to DS/DA people off-loading some of the data-specific work from data-engineers, who can concentrate better on the systems (and integrating them better instead of having to deal with day-to-day data munging);
  • WES MCKINNEY talks about the Apache Arrow project, a library to be implemented/used as part of most Big Data projects to share columnar representations of data across language boundaries (Python/Java) but technology barriers also (eg. Arrow in Spark memory can be sent as-is to Cassandra or some other compatible technology). Call it the Parquet/ORC of in-memory, that’s what Arrow’s meaning is to the world.

Now I did not list all speakers, neither do they have less important presentations. Just that they’re not as tangent with my current work lately. I am excited a bit to see and hear their war stories, while also having a few hours in the day to visit Budapest a bit (on the outside, as I don’t think there’s sufficient time for site-seeing or museums, other than this venue). My wife on the other hand, she’ll enjoy the city much more than I will ever be able to.

C’est la vie!