Skip to main content

DocuMine Documentation

Extract paragraph with specific word and headline

The following rule aims to identify and extract paragraphs within a document whose headlines contain a particular word (here: "references") and whose body contains a particular word (here: GLP). It targets the respective paragraphs and creates “glp_reference” entities for each matching paragraph from its semantic node.

You need a representation of the entity within DocuMine that your rule can refer to. For further information, please see Create entity. In the given example, the entity is called "GLP References"; you need the entity’s technical name to write the respective extraction rule (here: glp_references).

Code example:

rule "T.2.0"
	when
	$paragraph: Paragraph(
		getHeadline().containsStringIgnoreCase("references")
		&& containsString("GLP")
	)
	then
		entityCreationService
			.bySemanticNode($paragraph,"glp_reference", EntityType.ENTITY)
			.ifPresent(entity -> entity.apply("T.2.0","Reference paragraph found.")
		);
	end

The following provides a detailed breakdown of the rule syntax:

Syntax

Explanation

rule "T.2.0"

Name of the rule

Each rule must have a unique name. For further information, please see Rule naming.

$paragraph: Paragraph(getHeadline().containsStringIgnoreCase("references")

Filters for paragraph elements whose headline contains the word "references", ignoring the capitalization of the word (case insensitive).

&& containsString("GLP")

Defines an additional property to filter for: the paragraph must contain the string “GLP.”

entityCreationService

Invokes the class responsible for creating entities.

.bySemanticNode($paragraph,"glp_reference", EntityType.ENTITY)

Invokes the “bySemanticNode” method to create an entity named "glp_reference" containing the provided paragraph.

.ifPresent(entity -> entity.apply("T.2.0","Reference paragraph found.")

Applies the "T.2.0" identifier and the message "Reference paragraph found." to each entity created.

Notice

For further information about the methods listed in the table, please refer to the Javadoc.