Since the first appearance of a computer, data has been the indispensable prerequisite and raison d'être.

Computers process and analyze data in all areas of our daily lives. In the beginning, data was processed to passively handle customer relationships. Invoices were created and printed for ordered goods. Customer accounts were managed, services recorded, flight bookings made or production workflows documented. As technology advanced and machines became more powerful, the volume of data grew and the importance of data changed. Data, which was previously used for documentation and administrative purposes, is increasingly used for strategic orientation and planning for the further development and expansion of business areas.

Data in modern IT is collected and centrally stored in a Cloud. This is one of the main tasks of tcVISION.

To collect basic data and change data from traditional file and database systems of an IBM mainframe or basic data and change data from databases of LINUX, Unix, and Windows platforms and to make them available to one or more target systems.

It is important for companies to have as much data of their customers as possible in order to better understand them and their needs and demands. In order to achieve this goal, data warehouses have been set up, and their structure is mainly used for Analytics and Reporting.

In addition, new concepts for data management are emerging: data lakes and data hubs.

Before we look at the differences, it should be noted that both concepts are candidates for breaking down data silos.

Both datal lakes and data hubs are designed to access the same data across domains.

This article attempts to highlight the differences between these two concepts.

Data Lake

Wikipedia describes a data lake as follows:

A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning.

Data lakes became popular with the success of Hadoop, a system that makes it very easy to send data in its raw state to a central repository for cost-effective storage. Any structured as well as unstructured data from relational databases, comma-separated data, data in XML format, data from PDF documents or emails, audio, video, or image files can be stored in a data lake without the need to translate the data. Since all data is recorded in a data lake, it serves as a repository for data from all parts of an organization.

However, the data must be retrieved and the technical skills and tools are necessary to process the original data. Vendors such as Amazon Web Services and Microsoft supported data lake architectures.

Compared to other storage solutions, a data lake is a simple way to store data.

Data Hub

A data hub is a collection of data from multiple sources organized for distribution, sharing, and often subdivision and sharing.

In a data hub, the data is homogenized and may be provided in multiple desired formats. The ultimate goal is to provide and unify mission-critical data to enable its use by multiple applications. data integrity is fully preserved.

Data hubs are well suited for integrating multi-structured, changing data. They provide agility in terms of both data entry and rapid delivery of value.

Data hubs are ideally the point of entry for data within an organization. Previously used point-to-point connections between callers and data suppliers do not need to be established.

tcVISION

tcVISION works with data lakes as well as with data hubs.

tcVISION is an extremely powerful and agile replication platform. It acts as the central supplier of data originating from online processing on a mainframe system (CICS, IMS/DB, Adabas/Natural, CA IDMS) as well as transactional data from applications in the distributed environment and various cloud systems. Changes to the datasets on all platforms are captured by tcVISION in real time (change data capturing) and replicated to the target systems or transmitted as a data stream.

The tcVISION solution is ideally suited to synchronize data on the traditional mainframe for all Z mainframe operating systems such as z/OS, z/VSE as well as z Linux with a variety of database systems in a distributed environment, a Big Data environment or cloud systems.

An overview of all supported input and output targets can be found here.