Skip to main content

DocuMine Documentation

Extract entire section

The following rule aims to identify and extract sections within a document whose headlines contain a particular word (here: "references"). It targets sections within a document whose headlines contain the word, and creates section entities for each matching section from its SemanticNode.

You need a representation of the entity within DocuMine that your rule can refer to. For further information, please see Create entity. In the given example, the entity is called "References"; you need the entity’s technical name to write the respective extraction rule (here: references).

Code example:

rule "DOC.36.0: References"
    when
        $section: Section(getHeadline().containsStringIgnoreCase("references"))
    then
        entityCreationService.bySemanticNode($section, "references", EntityType.ENTITY)
            .ifPresent(snode -> snode.apply("DOC.36.0", "References found.")
        );
    end

The following table provides a detailed breakdown of the rule syntax:

Syntax

Explanation

rule "T.0.0"

Name of the rule

Each rule must have a unique name. For further information, please see Rule naming.

$section: Section(getHeadline().containsStringIgnoreCase("references"))

Filters for section elements whose headline contains the word "references", ignoring the capitalization of the word (case-insensitive).

entityCreationService

Invokes the class responsible for creating entities.

.bySemanticNode($section, "references", EntityType.ENTITY)

Invokes the “bySemanticNode” method to create an entity named "references" containing the provided section.

.ifPresent(snode -> snode.apply("T.0.0", "References found."))

Applies the “T.0.0” identifier and the message “References found.” to each entity created.

Notice

For further information about the methods listed in the table, please refer to the Javadoc.

Following the rule execution, you will be able to observe the following outcomes in the editor:

The extracted section is highlighted in the document using the color you defined for the "Reference" entity. The corresponding entry in the annotations list shows the entity name (Type: References).

Extract_entire_section_1_2.png

Result in document and annotation list