When IT experts first theorized data warehousing about 30 years ago, they couldn’t probably imagine that businesses would need to manage hundreds or thousands of data sources. But that’s the scenario we’re facing today, in the Internet of Things age, when data generation has such a complexity, heterogeneity and speed that it is almost impossible to be successful with traditional systems.

ETL (exact, transform and load) tools were useful to increase productivity and accelerate data integration processes without requiring specific coding. However, they are limited in terms of number and origin of data sources, and do not provide the features and the advanced analytics most companies are now used to. We should consider multiple data stores, dispersed intermediate files and contents, and hybrid cloud and on-premise infrastructures, so it’s increasingly difficult for IT to live up to business requirements.

Data management solutions should be able to reduce complexity, provide prompt and secure access to any kind of content, allow integration and collaboration, and – not surprisingly – keep costs and risks under control. User interface and analytics are priority elements to be discussed, when a decision about a certain software is to be taken. Company users expect data sources to be accessible anytime and anywhere, from any device, with a seamless experience and an intuitive, effortless interface. A recent conversation with one of our customers in the banking industry confirmed this was the first request when the IT department introduced the topic to business colleagues. Robust data profiling and mapping capabilities are also given for granted, since contents should be easily searched and retrieved.

Data management solutions should feature advanced analytics, far beyond queries and reporting: we span from predictive model generation to deep learning, from natural language treatment to graph analysis and streaming analytics.

To govern data in such an elaborate environment, precise policies and rules need to be defined. Data tags are used to identify and classify contents, linking them to specific rules to automatize recognition and routine management processes, including versioning and storage, user authorization and export. Artificial Intelligence (AI) is in the game, as it allows to leave almost any manual intervention and achieve higher accuracy at an inferior cost.

That’s why AI is more and more discussed when dealing with smart data management. By adopting this approach in increasingly multi-dimensional architectures, we will succeed in keeping our data warehouse agile and supportive to business operations.

Author: Sabis Chu, IT Technology Evangelist at KRIU