Skip to main content

DocuMine Documentation

Extract from a table

The following rule aims to identify and extract data from tables containing particular header rows.

It targets tables contained in documents with a particular file attribute that contain two given headers (here: “Test Species” and “Growth Rate"). It extracts the test species and growth rates separately from each matching table and creates the corresponding entities labeled as “test_species” and “growth_rate” containing the semantic nodes of the identified table cells.

You need a representation of both entities within DocuMine that your rule can refer to. For further information, please see Create entity. In the given example, the entities are called "Test Species" (technical name: test_species) and "Growth Rate" (technical name: growth_rate).

Code example:

rule "DOC.7.0: Test Species Growth Rate'"
    when
        $table: Table(
            (hasHeader("Test Species")
            &&
            (hasHeader("Growth Rate")
        )
    then
        Stream.of($table.streamTableCellsWithHeader("Test Species")
                            ).flatMap(a -> a)
                            .map(tableCell -> entityCreationService.bySemanticNode(tableCell, "test_species", EntityType.ENTITY))
                            .filter(Optional::isPresent)
                            .map(Optional::get)
                            .forEach(entity -> {
                                entity.apply("DOC.7.0", "Test Species found.");
                            });
        Stream.of($table.streamTableCellsWithHeader("Growth Rate")
                            ).flatMap(a -> a)
                            .map(tableCell -> entityCreationService.bySemanticNode(tableCell, "growth_rate", EntityType.ENTITY))
                            .filter(Optional::isPresent)
                            .map(Optional::get)
                            .forEach(entity -> {
                                entity.apply("DOC.7.0", "Growth Rate found.");
                            });
    end

The following provides a detailed breakdown of the rule syntax:

Syntax

Explanation

rule "T.7.0"

Name of the rule

Each rule must have a unique name. For further information, please see Rule naming.

$table: Table(hasHeader("Test Species") && (hasHeader("Growth Rate]")

Filters for tables that contain the header "Test Species" as well as the header "Growth Rate”.

entityCreationService

Invokes the class responsible for creating entities.

Stream.of($table.streamTableCellsWithHeader("Test Species").flatMap(a -> a).map(tableCell ->

Invokes a stream to process each table cell with the "Test Species" header.

.bySemanticNode(tableCell, "test_species", EntityType.ENTITY)

Invokes the “bySemanticNode” method to create an entity labeled as "test_species" using the semantic node of the identified table cell related to test species.

Stream.of($table.streamTableCellsWithHeader("Growth Rate").flatMap(a -> a).map(tableCell ->

Invokes a stream to process each table cell with the "Growth Rate" header.

.bySemanticNode(tableCell, "growht_rate", EntityType.ENTITY)

Invokes the “bySemanticNode” method to create an entity labeled as “growth_rate” that uses the semantic node of the identified table cell related to the growth rate.

.forEach(entity -> { entity.apply("DOC.7.0", "Growth Rate found.”);

Applies the "T.7.0" identifier and the message "Growth Rate found." to each entity created.

Notice

For further information about the methods listed in the table, please refer to the Javadoc.

Following the rule execution, you will be able to observe the following outcomes in the document and annotation list:

Extract_from_table_1_2.png

Results in document and annotation

Learn how to write a component rule that combines the two extracted entities into one component: Combining multiple entities into one component