How to Pivot Data with Catalytic (Using MapReduce!)

Hi all, this is another "How To" style document I created initially to track details from a workflow I was working on. It's definitely more advanced requiring a decent understanding of data manipulation (think Excel PivotTable) and JavaScript. It does not require any understanding of MapReduce as it provides a high level overview of the concept in the document.

But if you've had to work with larger data sets (i.e. a couple of thousand rows or more), this is an incredibly efficient and powerful way to do so. Due to the MapReduce engine behind it, large data sets and processing are super fast. If you have a workflow that iterates over large data sets and takes a while to process, this could speed it up significantly.

Comments

  • Really great and powerful features!

  • Sean_510793
    Sean_510793 Posts: 69 admin

    Thanks @William_134612! The document doesn't cover it specifically, but it's also incredibly useful for summarizing and rolling up data, as well as finding unique entries. For example, we just used it recently to search through a large data set and return back only the top % of data based on complex requirements.

  • Thanks @Sean_510793 for pointing out this. I believed CSV doesn't have the max row limit. Does this approach using "CSV: Summarize rows with formulas" have any limitation such as row limit?
    Sounds that the action "Tables: Convert data table to text" may have issue when the data set is greater than certain limit even though some built-in functional JS functions can be applied to the text through "Field: Field Formulas" for filtering & summary. Can you help clarify? Thanks a lot.

  • Sean_510793
    Sean_510793 Posts: 69 admin

    Hi @William_134612 , CSV doesn't have a max row limit but it does have a max size limit of 1 GB. Fortunately, that can hold a lot of rows, and it's something Catalytic will continue to increase over time I'm sure. The Tables: Convert data table to text action saves the output to a field so you will be limited to the maximum field size which is currently 128K. For most data sets, you are better off using the Excel: Create a Spreadsheet from a Data Table action. Remember to name the file with a .csv extension so it will automatically save it as CSV rather than an Excel file.

  • This looks great! Looking forward to the future action Table: Pivot data with MapReduce which seems to be a flexible & more direct way to manipulate data in table. Thanks!