Did you know that there are more connected objects than human beings around the world? Recent estimates talk of about 8.3 billion things and 7.5 billion people, with an expected average growth rate of 12% annually that will bring connected objects to exceed 125 billion by 2030.
This astonishing spread is shifting the attention of digital transformation processes towards data, as all connected devices generate an incredible quantity of events and information. About 2.5 trillion data bytes are triggered every day, and they need to be stored and somehow managed. Of course, not all this set is equally important for businesses, and not all adds value to products and processes. Smarter data management is therefore important for an organisation to distinguish between rich and irrelevant information, and inject worthy data into workflows and teams that could take advantage of them.
The real turning point is the ability to become data-centric: this means investing in architectures where data are the primary and permanent assets, and applications are built around them. In such a case, the data model is defined before the implementation of any software application and will remain valid long after it is dismissed.
Data-centricity might sound like something easy, but don’t get stuck in the trap. When a company develops or buys given software, it often has its data model, so it might be challenging to change or adapt it. Moreover, it might be much more complicated to integrate it with other existing, customised business applications the organisation relies on.
Traditional data warehouse systems are based on ETL (extract, transform, and load) routines to scrutinise, normalise and conform data to a predesigned scheme. However, this takes quite a lot of time and absorbs quite a lot of resources, so it is unlikely to fit Big Data and advanced analytics needs.
That’s probably why data lakes have become such popular. As information management systems or repositories, they allow to store and transform data from relational databases, semi-structured sources (CSV, logs, XML, JSON), and even unstructured ones (emails, documents, images, audio, video), making them available for a variety of processes including visualisation and reporting, analytics and machine learning.
Data lakes represent great progress, but the company needs to beware the explosion of data sources since they could obstruct the mechanism with inconsistency and inefficiency. Further benefits can be achieved when turning to an actual data-centric approach. In this case, a unique data model is established and shared with all current and future applications, enabling consistent analysis and correlation processes to speed up decisions and make them more accurate. That’s the foundation of what a data-driven enterprise should be.
Photo credit: Designed by rawpixel.com / Freepik