Skip to main content

DocuMine Documentation

Rule syntax details: entity rules

In this chapter, we will explore the when and then sections of entity rules, detailing how to define the conditions and specify the actions for entity extraction in DocuMine.

When part: target layout element

The "when" section of your DocuMine extraction rules narrows down the search space within the document(s) and defines the conditions to be met to execute the rule: It targets one or several of the available layout elements and defines constraints following one of the available patterns.

Please find below various examples that illustrate how to target different layout elements and set different types of constraints.

Available classes and methods

The Javadoc lists the layout objects available in DocuMine. Clicking on one of the available objects will allow you to access the available methods.

For detailed entity rule use cases, please see Use cases: entity extraction.

Then part: create entity

The “then” section outlines the action(s) to be executed when the conditions in the “when” are met.

When the conditions of an entity rule (“when” section) are satisfied, usually the EntityCreationService or the getEntitiesOfType class is invoked to create an entity. You need an entity representation within the application to ensure the respective text passage will be annotated in the document. (For further information, please see: Create entity in DocuMine settings).

The apply method is used to annotate the entity in the document.

Example #1:

rule "DOC.8.0: GLP Study"
  when
    $headline: Headline(containsString("GOOD LABORATORY PRACTICE COMPLIANCE")
  then
    entityCreationService.bySemanticNode($headline, "glp_study", EntityType.ENTITY).ifPresent(entity -> {
      entity.apply("DOC.8.0", "GLP Study found", "n-a");
    });
  end

Example #2:

rule "DOC.1.2: Guidelines"
  when
    $section: Section(
      (
        containsString("DATA REQUIREMENT")
        || containsString("TEST GUIDELINE")
        || containsString("MÉTODO(S) DE REFERÊNCIA(S):")
      )
    )
  then
    $section.getEntitiesOfType(List.of("project_guideline","ec_guideline", "epa_guideline")).forEach(entity -> {
      entity.apply("DOC.1.2", "Project guideline found.", "n-a");
    });
  end

For detailed component rule use cases, please see Use cases: component construction.