The Data Standard
In this episode of The Data Standard, Catherine Tao and Vinoo Ganash talk about large-scale data and data processing challenges. Vinoo starts the conversation by explaining his current obligations and how his company uses data to find working solutions for a wide range of problems.
Then he talks about OLTP and OLAP models and how large-scale data can help improve workflows and offer better results. Optimization is needed for every specific application, and Vinoo talks about the methods he uses to enhance existing platforms. Even when the newly developed systems show positive results, the work is never done, as optimization is a constant, dynamic process.
He then goes over the techniques used to extract useful data. The distribution of data and data types have the most significant impact on data quality. Vinoo talks about the challenges of working with data, where a simple data movement can present a massive problem. Constant profiling is needed to help scale the data and make sure that the computing power can cope.
Finally, the guest talks about handling messy data that doesn’t have the required quality. He talks about the multiple problems data scientists have to consider to sort messy data to make it more useful.