Skip to main content

DocuMine Documentation

Entity rule creation

Entity rules define the target layout element to be extracted for the entity and provide the information extraction commands, serving as the initial step for component extraction that is followed by the process of component mapping.

Creating entity rules

In order to extract an entity, you need a representation of the entity within DocuMine. For further information, please see Create an entity.

Entity rule skeleton:

rule "<Rule Group>.<Rule Unit>.<Rule Number>: <Rule Name>"
    when
        // Define the target layout element for the information extraction
    then
        // Apply information extraction commands
    end

In the case of entity rules, the when-then parts define the following conditions and actions:

  • When:

    The "when" part filters the document for relevant layout elements (semantic nodes) and content markers (e.g., a specific keyword or document type).

  • Then:

    The "then" part instructs DocuMine on what to do with the information identified within the filtered layout elements, how to apply the annotation, and how to extract the information (obtain entity values) to be included in a component.

Example:

rule "T.2.0"
	when
	$paragraph: Paragraph(
		getHeadline().containsStringIgnoreCase("references")
		&& containsString("GLP")
	)
	then
		entityCreationService.bySemanticNode($paragraph, "glp_reference", EntityType.ENTITY)
			.ifPresent(
				entity -> entity.apply("T.2.0","Reference paragraph found.")
			);
	end
Available methods