Data modelling with Databricks

Hi,

Lately, I've got involved in two data analytics projects. In both cases I was asked to help clarify requirements, facilitate discussions with data sources, and mentor engineers on the early stage platform challenges.

The goal of the first project is reporting on top of custom build set of mobile and web applications. Data gets collected from internal Event Hubs, augment with core systems data, and eventually should be accessible in Power BI. The team consists of an analyst and two .Net application developers.

The second project tackles interactions data from Google Analytics. Similar output, the team should prepare set of reports about customer behavior, sales funnels, etc. The team consists of experienced analysts and a Power BI visualization magician.

Both teams have chosen the Azure Databricks architecture.

It became obvious rather quickly that both teams missed a key competence to meet projects' objectives - a data modeler.

The biggest challenge was for .Net developers, who struggled to get clear requirements from analysts:

"Do you want to aggregated dataset by age or gender?"

"What do you mean by drilling up and down using different factors?"

"What should we put in Silver and what should land in Gold?"

Here are a few links about the Kimball modelling (preferred by the company) I've shared with the teams:

Compare the Kimball and Inmon Data Warehouse Architectures

Is Kimball Still Relevant in the Modern Data Warehouse?

Achieving Lakehouse Models with Spark 3.0

Tech Chat | Slowly Changing Dimensions (SCD) Type 2

What is your approach with data lakes and data modelling? What is your favorite modelling approach?

Valdas Maksimavičius

IT Architect & Microsoft Data Platform MVP

https://www.dataplatformschool.com

Vilnius
Lithuania

This email was sent to | Unsubscribe | Forward this email to a friend