Secret Sauce Partners, Inc.
Transforming apparel & footwear shopping through data.
Why on Earth would you want to replace your data warehouse with a bunch of files lying around “in the cloud” and expect that everyone from engineers to data analysts and scientists will tap more into that data to power their analyses and their research and in general, do their work? Separating the storage of data from the computational work and giving everyone the right tool for their job within a single, coherent environment has made it easy for our engineers, analysts, data scientists, and even non-technical people to collaborate on and work with data. Working with data is hard. Far from the La La Land of Machine Learningstan and the United States of AI, there’s many of us who still need to deal with messy data, ETL, backfilling, failed jobs, inefficient SQL queries, overloading production databases, partitioning, and with the advent of cloud computing: unterminated Spark clusters and infrastructure cost. This talk is about how we tackled some of those challenges while building out our new data warehouse using S3 and Spark. This talk is for the rest of us.
Trends in 2017 and predictions for 2018
Az SSP ruhaboltoknak és piactereknek nyújt szolgáltatásokat, segítve a vásárlókat a méretválasztásban és új stílusok felfedézéseben. A Data Platform csapatunk sokrétű problémákra használ Pythont és PyData eszközöket. Az elődásomban röviden megosztom a tapasztalatunkat az egyes területekről, ahol a Python hasznosnak bizonyult.
A Kafka consumer framework. Like Rack but for Kafka.
We have been using EMR (Elastic MapReduce) to run ephemeral Hadoop clusters in AWS since 2013. As a matter of fact, we have no persistent Hadoop cluster and store all our data in S3. Is EMR a service? Is EMR a distribution? Is EMR open source? I will answer all questions and more as part of my introduction and how we run Hadoop workloads in the cloud.
I believe in using the right tool for the right job and thus we use a handful of programming languages at Secret Sauce. Python is the de facto standard among dynamic languages when it comes to working with data so I spent a considerable amount of time beefing up my Python skills over the summer. I'll talk about how I did it and what has been my experience with Python in contrast with Ruby so far.
Containers and containerization have given life to a new breed of technologies concerned with managing clusters of computers, orchestrating services and scheduling containers. Docker, Kubernetes, Nomad and Amazon ECS all solve slightly different set of problems and my talk will give an overview of each technology and some of the new features that were announced at DockerCon and HashiConf EU. Most important of all, I will also answer the most pressing question of all: Do I need one?
Solving problems of apparel e-commerce with data Budapest Data Forum, 6/4/2015