How to get start when you have a large data sets on hand?
Everyday, with so many datas from the database, excel file, image file, how can we collect and aggregate them into useful one? Here is the solution architect from Amazon Web Service (AWS) - Chow Wai Hei's experience and insights!
The first step is always data integration process that combines data from multiple data sources into a single, consistent data store. Second, figure out the relationship between various data source. Seek the correlation and you might have to do cleaning to eliminate irrelevant datas. Furthermore, carry out ETL (Extract Transform Load) to enrich, combine, and normalize the data. Finally, you might make good use of the processed data for your machine learning.
Chow Wai Hei
Amazon Web Service (AWS)