Skip to main content

DocuMine Documentation

Extract list-like information

The following rule aims to identify and extract specific information from lists. It targets layout elements containing a certain term (here: Completion Date) and creates entities containing the text found between the given term and the end of the line of the matching paragraph.

You need a representation of the entity within DocuMine that your rule can refer to. For further information, please see Create entity. In the given example, the entity is called "Completion Date"; you need the entity’s technical name to write the respective extraction rule (here: study_initiation_date).

Code example:

rule "DOC.36.0: Completion Date"
    when
        $paragraph: Paragraph(
            containsString("COMPLETION DATE")
        )
    then
        entityCreationService
            .lineAfterString(
                "COMPLETION DATE:",
                "completion_date",
                EntityType.ENTITY,
                $paragraph
            )
            .forEach(entity -> entity.apply("DOC.36.0", "Completion date found.")
        );
    end

The following provides a detailed breakdown of the rule syntax:

Syntax

Explanation

rule "DOC.36.0"

Name of the rule

Each rule must have a unique name. For further information, please see Rule naming.

$paragraph: Paragraph( containsString("Completion Date:")

Filters for paragraph elements that contain the string "Completion Date".

entityCreationService

Invokes the class responsible for creating entities.

.lineAfterString("Completion Date:", “completion_date", EntityType.ENTITY, $paragraph)

Invokes the “bySemanticNode” method to create an entity named "completion_reference" containing the information in the line after the string.

.forEach(entity -> entity.apply("DOC.36.0", "Completion date found.")

Applies the "DOC.36.0" identifier and the message "Completion date found." to each entity created.

Notice

For further information about the methods listed in the table, please refer to the Javadoc.

Following the rule execution, you will be able to observe the following outcomes in the editor:

Extract_list-like_information_1_2.png

Results in the document and annotation list