Data Lake

Our areas of activity

Data Lake is a technology for obtaining and managing data in different formats: raw, unordered or, on the contrary, structured or loosely structured, in a single repository.

The term was coined in 2010 by Pentaho founder James Dixon. In describing the concept, he compared a Data Lake to a Data Mart. Data showcases are like bottled water - purified and packaged. Data Lakes are open bodies of water into which water flows from various sources. You can dive into a Data Lake, or you can take samples from the surface.

Data lakes are convenient for collecting, storing and processing large streams of information that arrive continuously. The information coming into the lake is assigned metadata: time of arrival, source, format, structure and others.

Traditional data warehouses for analytics and decision support systems have been in use for over 30 years. Data lakes combine the best of open source and free technologies to save money on data collection and processing.