Data Processing


We support data processing and manipulation tasks on HUGE datasets (Above 10M+). Our support focuses mainly on the following areas, Content Extraction, Data Cleansing and Organization and Data Formatting. And we can do even more...
When the number of data rows above 1M+, EXCEL or traditional ways of data processing approaches may not work well.

Content Extraction


  • Web Data. Extract interested content from any websites and outputs can be further formatted and cleansed as well. For example, given the Web page Marina Bay Sands Singapore on Tripadvisor, we can help extract contents: User Reviews, User Ratings, Hotel Rankings, Hotel Location...
    Another example is for the popular game "PLAYERUNKNOWN'S BATTLEGROUNDS" on Steam, we can help extract contents: Release Date, Developer, Publisher, ABOUT THIS GAME, User Reviews... For more details, please see data samples here: Web Crawling.
  • Textual Data. For a number of text files, we assist to clean, format and extract those useful contents like date, price, company name and email address for you.
  • EXCEL Data. When the worksheet is BIG, traditional formula-based approaches in EXCEL may not work properly. In this case, we can help to handle them effectively and efficiently with our carefully prepared algorithms.
  • PDF Data. PDF is always a nightmare for content extraction. While facing a bunch of financial reports, and you are only interested in part of its contents like stock, budget or accountant information. Instead of locate, copy&paste manually, we could help do it automatically!
    However, we won't have 100% confidence since some PDFs are images or encrypted.

Data Cleansing and Organization


Data Cleansing is the process of altering data in a given storage resource to make sure that it is accurate and correct. Here, we assist to remove redundancy and lower dependency for your data. Data Organization helps to re-organize data according to the paradigm of Entity-relation model which is the most commonly used model for relational database.

Data Formatting


Data Formatting helps to format the dataset into a ready-to-use one. For example, transform csv to XML or Json format.