Data Processing

We provide support for data manipulation works on HUGE datasets (Above 10M+) and our work focuses mainly on the following areas, Content Extraction, Data Cleansing and Organization and Data Formatting. And we can do even more...
When the number of records exceeds 1M+, EXCEL or traditional ways of data processing approaches may not work well.

Content Extraction

  • Web Extract specific content from Web. For example, given the Web page Marina Bay Sands Singapore on Tripadvisor, we are able to accurately extract: User Reviews, User Ratings, Hotel Rankings, Hotel Location...
    Another example is for this popular game "PLAYERUNKNOWN'S BATTLEGROUNDS" on Steam, we are able to accurately extract: Release Date, Developer, Publisher, ABOUT THIS GAME, User Reviews... For more details, please see samples: Web Crawling
  • Free Text For a batch of text files, we assist to extract those useful contents like date, price, company name and email address...
  • EXCEL When spreedsheet is BIG, traditional formula-based extraction approaches predefined in EXCEL may not work properly. We are able to handle them effectively and efficiently with our own techniques.
  • PDF PDF is always a nightmare for content extraction. Facing a bunch of financial reports, what you usually do was manually copy&paste those specific contents like stock index, budget or accountant information to a spreadsheet then process it. Here, we can do it automatically with our techniques!
    However, we won't have 100% confidence since some PDFs are images or encrypted.

Data Cleansing and Organization

Data Cleansing is the process of altering data in a given storage resource to make sure that it is accurate and correct. Here, we assist to remove redundancy and lower dependency for your data. Data Organization helps to re-organize data according to the paradigm of Entity-relation model which is the most commonly used for modeling a relational database.

Data Formatting

Data Formatting helps to format a dataset into a ready-to-use one. For example, transform free text files into a csv file.