Data Science is much like fine cuisine. Anyone can make a meal, but a good chef consistently creates culinary masterpieces. A good chef requires a kitchen capable of producing an entree of enduring value. It is the same with data science. A data platform is where a data scientist goes to work.
Data science software is undergoing a transformation prompted by the accelerating development of underlying cloud technology. Each of the largest cloud providers now contains native database platforms. These platforms boast high-speed storage and compute capabilities that are more than sufficient to provide scalable and economically practical capacity for large-scale advanced analytics and data science practices. In addition, multi-cloud database platforms are emerging with these same capabilities, increasing the available vendor options. These data platform engines now provide critical native support for XML and JSON data storage, must-haves for any modern integration platform. Since data science requires bringing together data from many different disparate systems and platforms, it is critically important to have a strong database foundation at the center of the technology platform. Utilizing these native cloud platforms, data engineers can focus on data definition, integration, and transformation. Even small data shops can create full-stack advanced analytics practices with minimal investment in people resources while still reaping the benefits of laughably cheap cloud storage. Large shops can cut the cost of their advanced analytic platform to a fraction of their current investment. All data teams can bring faster value to their organizations. Those who invest deeply in these modern platforms will reap the largest benefits.
Database technology is not the only cloud platform that has dramatically increased in scale and function over the past few years. Native cloud queuing and messaging platforms have also matured significantly over the past few years and are now capable of handling the volume and speed requirements of top data programs.