– “What color do you want that database?”— Dilbert and the Pointy-haired Boss
– “I think mauve has the most RAM.”
Why this post?
For many years I have treated data and databases as inseparable parts of the IT kingdom, at that time, my kingdom. As I remember today, among SQL programmers, we laughed at Dilbert’s immortal episode about color databases (see the Dilbert Comic Strip of November 17, 1995). Because what can business know about databases? They are difficult, require qualifications, you need to know how to work with them. And data? It’s typically dirty, distributed, hard to understand and to analyze.
However, when I started working on data analysis projects, my views were thoroughly revised. I began to think about “the why of databases” – why actually business require them, what are their best use cases, what requirements they have to meet in order to be useful for different industries. I also started to see “the bigger picture”. I realized that the databases are just a piece of a puzzle called the data platform. And, most of all, I learned that in real life the true value of data can be obtained only if there is someone who can make use of the insights of that data.
This post is my shy vision of how data platform of the future will look like. I think we are not there yet, but this is how I see the direction the world of data goes.
- The data platform of the future will put business outcomes and business users first
- It will require low or no maintenance and hide the complexity using automation
- Automated administration – re-indexing, statistics, partitioning, and so forth
- Automated performance – indexing, query fixing, execution plan optimization (see the next point – who will write queries in the future? ;-))
- Automated scalability – each layer should scale independently as a response to changing workloads, doesn’t matter – up/down or out/in, while keeping the cost of underlying resources optimized (and all the scalability will remain transparent to the end user)
- It will require no or minimum coding skills – users will use natural language UI (text, speech, gestures) to communicate with the system and work with data, the same about performing analytics (including advanced ones) and visualizations – no need to be an expert of dashboarding, reporting or machine learning
- The system will learn the patterns followed by the users and optimize the way users interact with data (also suggesting missed important things)
- There will be no need to know the schema or location of data – the NLP-based mechanisms (e.g. chat and voice bots) will guide users through the available data entities and suggest the appropriate pieces of information to explore, and internal replication mechanisms will bring data closer to the user to optimize user experience (users won’t need to know where the data is in the system)
Any data, any workload, any terms
- Any volume and format of data will be accepted and automatically recognized (parsed? analyzed?) by the platform
- Every data ingestion model from any source will be possible, including no ingestion – querying external data without copying
- The process of ingestion and transformation of external data will be automatically generated depending on the data source, its content and relationships with data already existing in the platform
- Both transactional and analytical workloads (e.g. ML features built-in) will be mixed together (HTAP?)
- Any number of users asking any business questions will be acceptable (see the one about automatic scalability above) and will not lead to performance problems
- The platform will be available for use in the cloud, on-premises or on the Edge (containers or their next generation?)
Secure and compliant
- Zero trust by default – before you see the data you will have to prove you’re authorized, that your device is authorized and healthy, and that you meet all the policies
- The system will perform intelligent and automatic threat detection (and prevention!)
- All sensitive data will be automatically labeled and protected from unauthorized access and copying
- Only strong authentication will be allowed (a mix of MFA, HSM, SSO and their successors)
- Authorization will be easy to setup and maintain – no complex RBAC
- The platform will allow secure data sharing/exchange (for both B2B and B2C models) for building new business models built on data
- The platform will be able to meet the requirements of all regulations and there will be a common protocol for passing the new requirements coming from the new regulations and policies
I’m pretty sure all the above characteristics sound familiar and there are platforms on the market (including those offered by Microsoft, my employer), which partially fit my vision. In my opinion, that’s the key – a strong competition on the Data & AI market drives constant development of leading platforms (and from time to time a new innovative one is born speeding up the market by filling some major gaps). And yes, it’s very likely we will never get to the point the above vision is reality. But at least I would love to see the world of data to head towards this direction.
Now it’s your turn. Use your imagination. If anything is possible and sky is the limit, what would you like to have there in your dream data platform?