Pattern: Convert Table in PDF Document to Data Table
You may have a table contained in a .pdf file that you need to extract and manipulate (see the workflow-level PDF field in the attached .catalytic file). This pattern would also work for .docx files if you add a step at the top to save in a .pdf format.
Say for example you have a table like the one below:
This may be between two anchor points (labels) in the document. After you've used PDF: Extract text to a field or Images: Optical character recognition, you may need to use Text: Find text next to other text to parse the table from the rest of the document.
After that, we replace whitespaces ( ) with commas (,) to make the text comma-delimited. Note that if you have cells contained in your table that have whitespace, additional creativity may be required. This pattern assumes your cell values do not contain spaces.
After the workflow has run, view the pdf-to-data-table field in the Create data table from comma-delimited text step.
Once in a table format you can summarize as needed using the Tables:, Excel:, and/or CSV: suite of actions.