Extract entire section
The following rule aims to identify and extract sections within a document whose headlines contain a particular word (here: "references"). It targets sections within a document whose headlines contain the word, and creates section entities for each matching section from its SemanticNode.
You need a representation of the entity within DocuMine that your rule can refer to. For further information, please see Create entity. In the given example, the entity is called "References"; you need the entity’s technical name to write the respective extraction rule (here: references).
Code example:
rule "DOC.36.0: References" when $section: Section(getHeadline().containsStringIgnoreCase("references")) then entityCreationService.bySemanticNode($section, "references", EntityType.ENTITY) .ifPresent(snode -> snode.apply("DOC.36.0", "References found.") ); end
The following table provides a detailed breakdown of the rule syntax:
Syntax | Explanation |
---|---|
rule "T.0.0" | Name of the rule Each rule must have a unique name. For further information, please see Rule naming. |
$section: Section(getHeadline().containsStringIgnoreCase("references")) | Filters for section elements whose headline contains the word "references", ignoring the capitalization of the word (case-insensitive). |
entityCreationService | Invokes the class responsible for creating entities. |
.bySemanticNode($section, "references", EntityType.ENTITY) | Invokes the “bySemanticNode” method to create an entity named "references" containing the provided section. |
.ifPresent(snode -> snode.apply("T.0.0", "References found.")) | Applies the “T.0.0” identifier and the message “References found.” to each entity created. |
Notice
For further information about the methods listed in the table, please refer to the Javadoc.
Following the rule execution, you will be able to observe the following outcomes in the editor:
The extracted section is highlighted in the document using the color you defined for the "Reference" entity. The corresponding entry in the annotations list shows the entity name (Type: References).

Result in document and annotation list