WHAT IS A DATA PLATFORM?

stacks-of-multi-coloured-shipping-containers-2023

UNLOCKING THE POTENTIAL OF YOUR DATA

Data is often siloed within business applications. The Data Platform allows for the valorization of this data by connecting it across different domains. Data truly transforms into “assets,” making the company “data-centric.”

An ecosystem that prioritizes access to rich and trustworthy data.

Data mastery: security, observability, and compliance.

Data as an asset to optimize operations and develop new practices.

Data is often siloed within business applications. The Data Platform allows for the valorization of this data by connecting it across different domains. Data truly transforms into “assets,” making the company “data-centric.”

Platformization, based on a centralized architecture, is built around workspaces and enables moving beyond the logic of business “use cases,” empowering everyone to achieve their goals. Data becomes a reliable and available “product.”

The platform ensures data mastery:

But also governs it:

Platformization, based on a centralized architecture, is built around workspaces and enables moving beyond the logic of business “use cases,” empowering everyone to achieve their goals. Data becomes a reliable and available “product.”

The platform ensures data mastery:

But also governs it :

3 DATA LAYERS

The platform acts as a hub that ingests data at the entry point and propagates it at the output after transformation. Traditionally, there are 3 layers in the data processing workflow:

  1. Storage, also known as the Bronze layer. The choice of a dedicated layer for data storage, separate from processing, allows for storing value in its raw format. This approach offers maximum flexibility regarding types of ingestion (streaming or batch) and scalability. Additionally, this space is secured at the access level, usually reserved for technical managers.
  2. Processing, or Silver layer. This contains a validated and enriched version of the data with a high level of trust. It is obtained through transformation operations, joins, aggregations, and formatting, performed using processing-oriented frameworks (e.g., parallel processing). The data is queryable using common languages like SQL, Python, etc.
  3. Data presentation layer, or Gold layer. Here, the data must be usable information for the expected purposes of the platform: analytics, ML, or data as a product. End users interact with the data using their query language or through interfaces and applications.
  4.  

Data Centricity

The Data Platform aims to centralize all of the company’s data: reference, transactional, and behavioral. A number of automated pipelines and processes are implemented to:

Data Warehouse, Machine Learning, and Digital Frtons

A Data Platform is an integrated framework that aims to combine the functionalities of a data lake, data warehouse, and data hub. It addresses analytical needs, Data Science, and operational services.

Discover our Data content!

Our ambition is to implement innovative solutions that address your business imperatives.

Data sharing and marketplace

Once the best version of the data is centralized, sharing this data must be specifically designed as a service. This involves associating data consumption tools with strict governance of access. The establishment of a marketplace is a response that satisfies both the need for internal collaboration and the publication, even monetization, of data to external stakeholders.

The marketplace can also be seen as one of the outcomes of the Data Mesh logic, allowing data to be presented as products.

Warehouse and Workspace

Should we oppose warehouse and workspace? Beyond semantic disputes, we can distinguish three different needs:

Machine Learning

Machine Learning (ML) aims to uncover an understanding of data, such as links and segmentations, and to transform this information into predictive models that will serve as the foundation for commercial policies, regulations, etc. ML is organized within a storage and services environment that benefits from Cloud Computing platforms and their performance in storage and computation.

Based on a Big Data-type infrastructure environment, resource provisioning, etc., ML is structured around several steps:

Data Mesh

The Data Mesh is not a technology but rather an organization and architecture around data. It intertwines accountability (engaging business units to take ownership of data availability and quality) and management (empowering data producers on different processes). These two conditions enable data to be transformed into “products.”

However, this organization must be framed within a controlled enterprise architecture to ensure data availability, security, and cost management. The goal is thus to overcome the bottleneck of an organization where IT plays a role to define:

AI and Data Governance

AI is known for contributing to data analysis and modeling or its presentation through generative AI. The power of AI also manifests in:

Generative AI

The upheavals brought by generative AI do not stem from a new way of thinking about data: it also relies on AI algorithms based on a massive data stack. However, generative AI, through the use of a Large Language Model (LLM), tends to change and simplify the way of working:

The challenge for companies is that generative AI must operate on their internal data. Beyond data preparation, it is crucial to maintain data confidentiality while using a market LLM. Several options exist to achieve this, particularly by leveraging various software providers. The final outcome of this subject remains to be written.