Version 1.1

Discover the most important new features, fixes, and improvements contained in DocuMine 1.1.

Patches

Patch 1.1.8

Improvement:

Date converter extension:
The date converter now also supports the following formats:
- dd-MMMM-yyyy
- d-MMMM-yyyy
Explanation:
Format
Description
Example
d
Day of the month without leading zero
1st, 2nd, 3rd
dd
Day of the month with suffix and with leading zero
10th, 23rd, 26th
MMMM
Full month name (letters)
January, February, March

Format	Description	Example
d	Day of the month without leading zero	1st, 2nd, 3rd
dd	Day of the month with suffix and with leading zero	10th, 23rd, 26th
MMMM	Full month name (letters)	January, February, March

Patch 1.1.7

Improvement:

Date converter extension:

The date converter now also supports the following formats:

d['th']['st']['nd']['rd'] MMMM yyyy
e.g. 1st May 1988
dd['th']['st']['nd']['rd'] MMMM yyyy
e.g. 26th September 2022

Explanation:

Format	Description	Example
d	Day of the month without leading zero	1, 2, 3
dd	Day of the month with suffix and with leading zero	01, 02, 12,
MMMM	Full month name (letters)	January, February, March
yyyy	Year as a four-digit number	1988, 2003, 2023

Patch 1.1.6

Improvement:

Date converter extension:
The date converter now also supports the following formats:
- MMMM, dd yyyy
- MMMM, d yyyy
- MMMM dd,yyyy
- MMMM d,yyyy
Explanation:
Format
Description
Example
MMMM
Full month name (letters)
January, February, March
d
Day of the month without leading zero
3
dd
Day of the month with leading zero
03

Format	Description	Example
MMMM	Full month name (letters)	January, February, March
d	Day of the month without leading zero	3
dd	Day of the month with leading zero	03

Patch 1.1.5

Fix:

Fixed blank page issue caused by version caching problem

Improvement:

Re-analysis button available for files in error state
The reanalyze button is displayed to dossier owners and assignees for files in error state, helping prevent team blockages.
A file in error state can be identified by the document name being red and not clickable, with the status indicating "Re-processing required".
Re-processing required
When hovering over the list entry, the Analyze file button is displayed.
Analyze file
For further information, please see Document features.

Patch 1.1.4

Improvement:

Date converter extension

The date converter now supports additional formats, including:

d mm yyyy
MMM-d-yyyy
yyyy-MMM-d
yyyymmd
d.MMMM-yyyy
d.MMM.yyyy
m-dd-yyyy
m.dd.yyyy
d MMM yy
yyyy/mm/d
d/mm/yy
yyyy-mm-d
d.MMMM.yy
d.MMM.yy
d-mm-yy
d.MMMM.yyyy
d mm yyyy
yyyy.mm.d.
dd MMM. yyyy
yyyy MMM d

Explanation:

Format	Description	Example
MMM	Abbreviated month (letters)	Jan, Feb, Mar
MMMM	Full month name (letters)	January, February, March
mm	Month in numbers	01, 06, 11
d	Day of the month without leading zero	3
dd	Day of the month with leading zero	03
d['th']['st']['nd']['rd']	Day of the month with suffix	10th, 1st, 2nd, 23rd

Update rule X.0.0: Remove Entity contained by Entity of same type

Please replace rule X.0.0 with the new version below. This update solves the problem that sometimes rules are still matched because the section has not been updated, although the entity has been removed.

// Rule unit: X.0rule "X.0.0: Remove Entity contained by Entity of same type"
    salience 65
    when        
        $larger: TextEntity($type: type(), $entityType: entityType, !removed())        
        $contained: TextEntity(containedBy($larger), type() == $type, entityType == $entityType, this != $larger, !hasManualChanges())    
    then        
        $contained.getIntersectingNodes().forEach(node -> update(node));
        $contained.remove("X.0.0", "remove Entity contained by Entity of same type");
        retract($contained);
    end

The following part has been added to the then section of the rule to solve the problem detailed above. It should be added to all X rules that still do not contain it:

$contained.getIntersectingNodes().forEach(node -> update(node));

Patch 1.1.3

Bug fixes:

Fixed headline and footer detection issues
Headline and footer detection issues have been fixed to ensure more reliable identification of headlines and prevent footers from being mistaken for paragraphs.
The rules will need to be modified to ensure the fixes achieve their full effect. Please find further information on the details in the developer section of these patch notes.
Fixed occasional problems with report download generation
Fixed document loading issue in the document viewer
Fixed issue with dictionary entry annotation

Improvement:

Date detection has been improved

Fix:

Fixed headline and footer detection⸺rule changes required

Headline and footer detection issues have been fixed to ensure more reliable identification of headlines and prevent footers from being mistaken for paragraphs.

Please modify the entity rules as follows to ensure these fixes achieve their full effect:

Add "salience 11" to rule H.0.0
This change guarantees rule H.0.0 is executives before rule X.50.1, which removes entities after tables and appendices, to ensure headlines will be annotated on the pages that follow the TOC.
```
rule "H.0.0: retract table of contents page"
    salience 11
```

Adjust rule X.50.0:

Add the following variable to the "when" block of the rule:

$headline: Headline(containsString("SUMMARY"))

Add the following condition to the $section variable:

&& !anyHeadlineContainsStringIgnoreCase("SUMMARY") && !getSectionIdentifier().isChildOf($headline.getSectionIdentifier())
        )

These changes ensure that this rule does not affect sections related to or beneath "SUMMARY" headlines.

Complete rule:

rule "X.50.0: Remove entities in Tables and Appendix sections"
    when
        $headline: Headline(containsString("SUMMARY"))
        $section: Section(
            (anyHeadlineContainsStringIgnoreCase("Appendix") || anyHeadlineContainsStringIgnoreCase("Appendices") || anyHeadlineContainsStringIgnoreCase("Table") || containsAnyString("APPENDICES", "APPENDIX", "TABLE"))
            && !anyHeadlineContainsStringIgnoreCase("SUMMARY") && !getSectionIdentifier().isChildOf($headline.getSectionIdentifier())
        )
        $child: Section(getParent() == $section || this == $section)
        $entity: TextEntity(!getType().equals("batch_number_coa"),
             intersectingNodes contains $child)
    then
        $entity.remove("X.50.0", "Removed due to being in Appendix/Table");
    end

Adjust rule DOC.41.0

Change the "when" block as follows:

rule "DOC.41.0: Executive summary"    
    when       
        $headline: Headline(containsString("SUMMARY"))        
        $child: Headline(getSectionIdentifier().isChildOf($headline.getSectionIdentifier()) || getSectionIdentifier().equals($headline.getSectionIdentifier()))

Change the parameter of the createHeadlineEntityIfPresent class in the "then" block:

then
    createHeadlineEntityIfPresent((Section) $child.getParent(), "executive_summary", "DOC.41.0", "Executive summary (study design) header found.", entityCreationService);

The changes ensure the rule targets headlines specifically containing "SUMMARY" instead of broader sections. Actions are applied to the parent sections of these headlines, broadening the contextual scope and improving the rule's precision.

Complete rule:

rule "DOC.41.0: Executive summary"
    when
        $headline: Headline(containsString("SUMMARY"))
        $child: Headline(getSectionIdentifier().isChildOf($headline.getSectionIdentifier()) || getSectionIdentifier().equals($headline.getSectionIdentifier()))
    then
        createHeadlineEntityIfPresent((Section) $child.getParent(), "executive_summary", "DOC.41.0", "Executive summary (study design) header found.", entityCreationService);
        entityCreationService.bySemanticNodeParagraphsOnly($child.getParent(), "executive_summary", EntityType.ENTITY)
                .forEach(entity -> {
                    entity.apply("DOC.41.0", "Executive summary (study design) found.", "n-a");
                });
    end

Patch 1.1.2

Security fixes:

Fixed cross-site scripting vulnerability that could have led to an account takeover
Issue: A malicious PDF file would have been able to execute scripts in a victim's browser, posing risks of session hijacking, website defacement, or redirection to malicious sites.
Resolution: We have enhanced PDF sanitization to better recognize non-standard JavaScript elements, effectively mitigating this risk.
Fixed theoretical clickjacking vulnerability on the dashboard
Issue: Clickjacking allows attackers to trick users into interacting with a page in unintended ways. The former Content Security Policy (CSP) was not restrictive enough to completely prevent clickjacking, should an attacker have figured out a way to exploit this vulnerability.
Resolution: The appropriate CSP 'frame-ancestors' directive has been implemented, sending response headers that instruct the browser to disallow framing from external domains.

Bug fix:

Fixed issue with scrambled text
Issue: If a PDF had a scrambled structure, the text extraction process resulted in scrambled text, despite being handled correctly by the PDF viewer.
Resolution: The PDF processing now includes a descrambling functionality that follows the visual reading order, mimicking the behavior of the PDF viewer.

Patch 1.1.1

For users

General

Dossier

For developers

OpenAPI 3.0

The DocuMine API offers three new endpoints:

GET /api/dossier-templates/{dossierTemplateId}/entity-rules
The endpoint retrieves the entity rules that define components based on a file’s content within the respective dossier.
GET /api/dossier-templates/{dossierTemplateId}/component-rules
The endpoint retrieves the component rules that define components based on a file’s content within the respective dossier.
GET /api/dossier-templates/{dossierTemplateId}/file-attribute-definitions
The endpoint retrieves the file attribute definitions for a given file within the respective dossier.

The API documentation is provided to you as an OpenAPI 3.0 specification. Please ask your support contact for the specification.

For further information on the OpenAPI specification, please see: https://swagger.io/specification/

Entity rules

We recommend adding the following entity rule. It prevents annotations from being skipped when users change the type of an annotation. This will allow the recategorization to be applied directly.

rule "MAN.3.3: Apply recategorization entities by default"    
    salience 128
    when
        $entity: IEntity(getManualOverwrite().getRecategorized().orElse(false))    
    then        
        $entity.apply("MAN.3.3", "Recategorized entities are applied by default.");    
    end

We recommend changing rule MAN.1.0 as follows:

rule "MAN.1.0: Apply id removals that are valid and not in forced redactions to Entity"    
    salience 128    
    when
        $idRemoval: IdRemoval(
            $id: annotationId,
            !removeFromDictionary,
            !removeFromAllDossiers,
            status == AnnotationStatus.APPROVED)
        $entityToBeRemoved: TextEntity(matchesAnnotationId($id))
    then
        $entityToBeRemoved.getManualOverwrite().addChange($idRemoval);
        update($entityToBeRemoved);
        retract($idRemoval);
        $entityToBeRemoved.getIntersectingNodes().forEach(node -> update(node));
    end

Behavior so far: If you remove a dictionary-based annotation manually and then add this same annotation again, the annotation is skipped.

New behavior: The annotation is applied and not skipped.

In this section:

DocuMine Documentation

Version 1.1

Patches

Patch 1.1.8

Patch 1.1.7

Patch 1.1.6

Patch 1.1.5

Patch 1.1.4

Patch 1.1.3

Patch 1.1.2

Patch 1.1.1

For users

General

Dossier

For developers

OpenAPI 3.0

Entity rules

Search results