Skip to main content

📊 Sales by Country, Category & Time

This dashboard is part of the Kurumin Sports analytical environment and represents the output of a fully integrated data engineering pipeline that emulates a modern enterprise-grade architecture. The underlying data is produced, processed, and modeled through a multi-layered stack with automated ingestion, transformation, and reporting capabilities.

🔄 Pipeline Overview

Infrastructure & Data Flow Diagram

PostgreSQL_Pipeline_updated.png

Source System Layer

  • PostgreSQL (Docker, Linux VPS) operating as the OLTP simulation environment.
  • A scheduled incremental sales stored procedure generates new transactional records daily, simulating continuous operational activity.

Ingestion & Landing Layer

  • Apache NiFi (Docker) orchestrates ETL flows, including JDBC extraction from PostgreSQL.
  • Extracted datasets are serialized and pushed into AWS S3 as the raw landing zone.
  • NiFi pipelines handle flow management, schema normalization, and delivery guarantees.

Data Warehouse Loading Layer

  • NiFi + SnowSQL batch jobs automate COPY operations from AWS S3 into Snowflake.
  • Data is loaded into the RAW layer for immutable storage, followed by structured transformations into STG (cleaned/standardized) and DWH (dimensional modeling).

Analytical Warehouse Layer (Snowflake)

  • Multi-tier architecture: RAW → STG → DWH.
  • Model built using dimensional structures (fact tables + conformed dimensions).
  • Warehouse optimized for read-heavy analytical workloads and BI consumption.

Semantic Modeling & Visualization Layer

  • Power BI Desktop (Import Mode) used to model semantic relationships, measures, hierarchies, and time intelligence.
  • Fully imported datasets ensure high-performance, decoupled BI rendering.

Publishing & Presentation Layer

  • Report deployed to Power BI Service, serving as the centralized visualization endpoint.
  • The visualization is embedded in BookStack (Docker) through secure iframe integration, enabling documentation and dashboard presentation within a unified platform.

This dashboard illustrates the analytical output of the complete end-to-end pipeline, connecting operational simulation, data ingestion, cloud storage, warehouse transformations, semantic modeling, and dashboard delivery within a coherent and continuously running system.

📈 Interactive Dashboard

(If the dashboard does not load immediately, please wait a few seconds while Power BI initializes the visual.)


🚀 Next Steps

The current dashboard represents the first fully functional iteration of the Kurumin Sports analytical pipeline. The next development phase will extend the platform into a more complete, multi-source and multi-format data ecosystem. Upcoming enhancements include:

Multi-Cloud File Ingestion

Integration of file-based datasets generated via PyFeeder will be added to the ingestion pipeline. PyFeeder produces synthetic operational data in various industry-standard formats (log, xml, orc, json, parquet, txt) and distributes them across multiple cloud storage providers:

  • AWS S3
  • Google Cloud Storage
  • IBM Cloud Object Storage
  • Azure Blob Storage

Apache NiFi will be expanded to orchestrate ingestion from all buckets, standardizing schemas and routing each dataset to the appropriate RAW ingestion zone in Snowflake.

API Integration Layer

Additional external API sources will be incorporated using WireMock to emulate real services. These mocked APIs (payments, taxes, shipping, analytics, and more) will enable the pipeline to ingest REST-based data alongside file and database sources, providing a richer operational simulation.

Real-Time Streaming Integration

A streaming ingestion layer will be added using Debezium for CDC extraction and Kafka/Redpanda as the event streaming platform. This component will ingest real-time change events from PostgreSQL into Snowflake, complementing the existing batch-oriented ingestion.

Secure Secret Management

The entire pipeline will be gradually migrated to full usage of AWS Secrets Manager for credential rotation, connection management, and secure authentication across:

  • NiFi processors and controller services
  • PyFeeder (future integration)
  • SnowSQL batch operations
  • GitHub Actions workflows
  • Power BI to Snowflake connectivity (where applicable)

These enhancements will evolve the Kurumin Sports platform into a comprehensive, multi-modal data ecosystem combining batch, streaming, file ingestion, APIs, and secure cross-cloud integration.