Extract list-like information
The following rule aims to identify and extract specific information from lists. It targets layout elements containing a certain term (here: Completion Date) and creates entities containing the text found between the given term and the end of the line of the matching paragraph.
You need a representation of the entity within DocuMine that your rule can refer to. For further information, please see Create entity. In the given example, the entity is called "Completion Date"; you need the entity’s technical name to write the respective extraction rule (here: study_initiation_date).
Code example:
rule "DOC.36.0: Completion Date" when $paragraph: Paragraph( containsString("COMPLETION DATE") ) then entityCreationService .lineAfterString( "COMPLETION DATE:", "completion_date", EntityType.ENTITY, $paragraph ) .forEach(entity -> entity.apply("DOC.36.0", "Completion date found.") ); end
The following provides a detailed breakdown of the rule syntax:
Syntax | Explanation |
---|---|
rule "DOC.36.0" | Name of the rule Each rule must have a unique name. For further information, please see Rule naming. |
$paragraph: Paragraph( containsString("Completion Date:") | Filters for paragraph elements that contain the string "Completion Date". |
entityCreationService | Invokes the class responsible for creating entities. |
.lineAfterString("Completion Date:", “completion_date", EntityType.ENTITY, $paragraph) | Invokes the “bySemanticNode” method to create an entity named "completion_reference" containing the information in the line after the string. |
.forEach(entity -> entity.apply("DOC.36.0", "Completion date found.") | Applies the "DOC.36.0" identifier and the message "Completion date found." to each entity created. |
Notice
For further information about the methods listed in the table, please refer to the Javadoc.
Following the rule execution, you will be able to observe the following outcomes in the editor:

Results in the document and annotation list