Skip to main content

DocuMine Documentation

Extract word from section

The following rule exemplifies the identification and extraction of a specific word (here: EIA) from a layout element that contains it. It targets sections whose body contains the word and creates "eia" entities for them.

You need a representation of the entity within DocuMine that your rule can refer to. For further information on how to create an entity, please see Create entity. In the given example, the entity is called "EIA"; you need the entity’s technical name to write the respective extraction rule (here: eia).

Code example:

rule "DOC.37.0: EIA"
  when
      $section: Section(
          containsString("EIA")
      )
  then
      entityCreationService.byString("EIA", "eia", EntityType.ENTITY, $section)
         .forEach(entity -> entity.apply("DOC.37.0T.1.0", "EIA found.")
      );
  end

The following table provides a detailed breakdown of the rule syntax:

Syntax

Explanation

rule "T.1.0"

Name of the rule

Each rule must have a unique name. For further information, please see Rule naming.

$section: Section(containsString("EIA"))

Filters for section elements that contain the string "EIA".

entityCreationService

Invokes the class responsible for creating entities.

.byString("EIA", "eia", EntityType.ENTITY, $section)

Invokes the “byString” method to create an entity named "EIA" that is extracted from matching sections.

.forEach(entity -> entity.apply("T.1.0", "EIA found.")

Applies the "T.1.0" identifier and the message "EIA found." to each entity created.

Notice

For further information about the methods listed in the table, please refer to the Javadoc.

Following the rule execution, you will be able to observe the following outcomes in the editor:

The extracted word is highlighted in the document using the color you defined for the "EIA" entity. The corresponding entry in the annotations list shows the entity name (Type: EIA).

Extract_word_1_2.png

Result in document and annotation list

You can filter the annotations list by the "EIA" entity. The page list to the right of the document will then display all the pages where the entity was found.

Filter_for_entity_1_2.png

Filter for entity

If you do not want to extract the entity from all sections where it appears, you can add additional constraints in the rule's "when" section. For instance, you can restrict the search to sections with a specific headline using getHeadline. In the example below, the search is limited to sections whose headings contain the string "References".

rule "DOC.37.0: EIA"
  when
      $section: Section(
          containsString("EIA")
          && getHeadline().containsStringIgnoreCase("references")
      )
  then
      entityCreationService.byString("EIA", "eia", EntityType.ENTITY, $section)
         .forEach(entity -> entity.apply("DOC.37.0", "EIA found.")
      );
  end