Skip to main content

DocuMine Documentation

Rule condition patterns

A pattern in a rule condition (in the “when” block) defines the criteria to be matched by the decision engine. It can potentially match any fact that has been inserted into the rule engine’s working memory. Constraints within patterns can further specify the facts to be matched:

Drools uses the following rule condition patterns:

  • Without constraints

    The simplest pattern is a pattern without constraints. It matches against a given fact. For example, the following condition means that a file attribute must exist.

    when
      FileAttribute()
  • With constraints

    A pattern with constraints matches against a given fact and additional restrictions in parentheses that either return false or true.

    For example, the following condition means that the document must have an “OECD Number” file attribute with one of the given values (402, 403, or 404).

    when
      FileAttribute(label == "OECD Number", valueEqualsAnyOf("402","403","404"))

    You can also combine multiple constraints using constraint operators.

    The following example shows a condition block with two conditions: It targets documents with the OECD numbers 402, 403, and 404 (see example above). And within these, it targets sections whose headline contains the string “Analytical Report” or “Certificate of Analysis” and that include the string “batch” or “bath” or “barch”. (The obviously incorrect spellings “bath” and “batch” have been added as options to account for typical OCR errors.)

    rule "DOC.9.0: Batch number from Certificate of Analysis"
        when
            FileAttribute(label == "OECD Number", valueEqualsAnyOf("402","403","404"))
            $section: Section(
                (
                    anyHeadlineContainsString("Analytical Report")
                    || anyHeadlineContainsStringIgnoreCase("Certificate of Analysis")
                )
                && (
                    containsStringIgnoreCase("batch")
                    || containsStringIgnoreCase("bath")
                    || containsStringIgnoreCase("barch")
  • With binding variable

    A binding on a pattern serves as a concise reference that can be used in other parts of the rule to refer back to the defined pattern. Binding variables can be used to enhance rule clarity.

    In Drools, it is considered a good style to start variable names with the dollar symbol (“$”). That is not strictly necessary but helps distinguish between variable names and field names. The colon (“:”) binds the variable to a value.

    when
      $paragraph: Paragraph(containsString("Study Initiation Date:"))

    Variables are commonly used in DocuMine to refer back to layout elements that meet certain conditions. In the above example, the “when” section targets paragraphs that contain the string “Study Initiation Date:”. To refer back to paragraphs meeting this condition, you can use the $paragraph variable in the following parts of the rule.

    rule "T.3.0"
        when
            $paragraph: Paragraph(
                containsString("Study Initiation Date:")
            )
        then
            entityCreationService
                .lineAfterString(
                    "Study Initiation Date:",
                    “study_initiation_date",
                    EntityType.ENTITY,
                    $paragraph
                )
                .forEach(entity -> entity.apply("T.3.0", "Study initiation date found.")
            );
        end
Order of conditions and constraints

Notice

Performance tip

If your “when” section includes multiple conditions, it is recommended that you place the most restrictive one at the top and also sort the contraints from most to least restrictive. This saves the rule engine from having to evaluate the entire set of conditions if the most restrictive ones are not met.

Example for the DocuMine use case:

Do not start by targeting layout elements in the text if you do not want your rule to apply to all documents. Instead, start by targeting documents with particular file attributes. This ensures the second condition will only be matched against relevant documents.

rule "DOC.5.0: Ignore species and strain in irrelevant study types"
     when
         FileAttribute(label == "OECD Number", valueEqualsAnyOf("406","428","438"))
         $section: Section(hasEntitiesOfType("species") || hasEntitiesOfType("strain"))
Condition absence and inversion

You can also use negation and inversion to control rule execution:

  • Ensure absence of criteria

    Using 'not' in a Drools rule ensures that actions are triggered only when no objects in the working memory meet the specified condition.

    For example, the following rule activates only if there is no table on the first page containing “Final report” or “SPL”.

    rule "DOC.6.2: study title"
        when
            not Table(onPage(1), (containsString("Final Report") || containsString("SPL"))) 
  • Target objects that do not fulfill the criteria

    Invert criteria using an exclamation mark to target objects which do not fulfill the criteria.

    The following condition, for example, targets tables that are neither located on the first page nor contain “Final Report” or “SPL.”

    when
      Table(!onPage(1), !(containsString("Final Report") || containsString("SPL")))