Rule syntax details: entity rules
In this chapter, we will explore the when and then sections of entity rules, detailing how to define the conditions and specify the actions for entity extraction in DocuMine.
When part: target layout element
The "when" section of your DocuMine extraction rules narrows down the search space within the document(s) and defines the conditions to be met to execute the rule: It targets one or several of the available layout elements and defines constraints following one of the available patterns.
Please find below various examples that illustrate how to target different layout elements and set different types of constraints.
Available classes and methods
The Javadoc lists the layout objects available in DocuMine. Clicking on one of the available objects will allow you to access the available methods.
For detailed entity rule use cases, please see Use cases: entity extraction.
Then part: create entity
The “then” section outlines the action(s) to be executed when the conditions in the “when” are met.
When the conditions of an entity rule (“when” section) are satisfied, usually the EntityCreationService or the getEntitiesOfType class is invoked to create an entity. You need an entity representation within the application to ensure the respective text passage will be annotated in the document. (For further information, please see: Create entity in DocuMine settings).
The apply method is used to annotate the entity in the document.
Example #1:
rule "DOC.8.0: GLP Study" when $headline: Headline(containsString("GOOD LABORATORY PRACTICE COMPLIANCE") then entityCreationService.bySemanticNode($headline, "glp_study", EntityType.ENTITY).ifPresent(entity -> { entity.apply("DOC.8.0", "GLP Study found", "n-a"); }); end
Example #2:
rule "DOC.1.2: Guidelines" when $section: Section( ( containsString("DATA REQUIREMENT") || containsString("TEST GUIDELINE") || containsString("MÉTODO(S) DE REFERÊNCIA(S):") ) ) then $section.getEntitiesOfType(List.of("project_guideline","ec_guideline", "epa_guideline")).forEach(entity -> { entity.apply("DOC.1.2", "Project guideline found.", "n-a"); }); end
For detailed component rule use cases, please see Use cases: component construction.