Extract from a table
The following rule aims to identify and extract data from tables containing particular header rows.
It targets tables contained in documents with a particular file attribute that contain two given headers (here: “Test Species” and “Growth Rate"). It extracts the test species and growth rates separately from each matching table and creates the corresponding entities labeled as “test_species” and “growth_rate” containing the semantic nodes of the identified table cells.
You need a representation of both entities within DocuMine that your rule can refer to. For further information, please see Create entity. In the given example, the entities are called "Test Species" (technical name: test_species) and "Growth Rate" (technical name: growth_rate).
Code example:
rule "DOC.7.0: Test Species Growth Rate'" when $table: Table( (hasHeader("Test Species") && (hasHeader("Growth Rate") ) then Stream.of($table.streamTableCellsWithHeader("Test Species") ).flatMap(a -> a) .map(tableCell -> entityCreationService.bySemanticNode(tableCell, "test_species", EntityType.ENTITY)) .filter(Optional::isPresent) .map(Optional::get) .forEach(entity -> { entity.apply("DOC.7.0", "Test Species found."); }); Stream.of($table.streamTableCellsWithHeader("Growth Rate") ).flatMap(a -> a) .map(tableCell -> entityCreationService.bySemanticNode(tableCell, "growth_rate", EntityType.ENTITY)) .filter(Optional::isPresent) .map(Optional::get) .forEach(entity -> { entity.apply("DOC.7.0", "Growth Rate found."); }); end
The following provides a detailed breakdown of the rule syntax:
Syntax | Explanation |
---|---|
rule "T.7.0" | Name of the rule Each rule must have a unique name. For further information, please see Rule naming. |
$table: Table(hasHeader("Test Species") && (hasHeader("Growth Rate]") | Filters for tables that contain the header "Test Species" as well as the header "Growth Rate”. |
entityCreationService | Invokes the class responsible for creating entities. |
Stream.of($table.streamTableCellsWithHeader("Test Species").flatMap(a -> a).map(tableCell -> | Invokes a stream to process each table cell with the "Test Species" header. |
.bySemanticNode(tableCell, "test_species", EntityType.ENTITY) | Invokes the “bySemanticNode” method to create an entity labeled as "test_species" using the semantic node of the identified table cell related to test species. |
Stream.of($table.streamTableCellsWithHeader("Growth Rate").flatMap(a -> a).map(tableCell -> | Invokes a stream to process each table cell with the "Growth Rate" header. |
.bySemanticNode(tableCell, "growht_rate", EntityType.ENTITY) | Invokes the “bySemanticNode” method to create an entity labeled as “growth_rate” that uses the semantic node of the identified table cell related to the growth rate. |
.forEach(entity -> { entity.apply("DOC.7.0", "Growth Rate found.”); | Applies the "T.7.0" identifier and the message "Growth Rate found." to each entity created. |
Notice
For further information about the methods listed in the table, please refer to the Javadoc.
Following the rule execution, you will be able to observe the following outcomes in the document and annotation list:

Results in document and annotation
Learn how to write a component rule that combines the two extracted entities into one component: Combining multiple entities into one component