Class Document
java.lang.Object
com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
com.iqser.red.service.redaction.v1.server.model.document.nodes.Document
- All Implemented Interfaces:
com.iqser.red.service.redaction.v1.server.model.document.nodes.GenericSemanticNode
,SemanticNode
public class Document
extends com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
Represents the entire document as a node within the document's semantic structure.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
Document.DocumentBuilder<C extends Document,
B extends Document.DocumentBuilder<C, B>> Nested classes/interfaces inherited from class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode.AbstractSemanticNodeBuilder<C extends com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode,
B extends com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode.AbstractSemanticNodeBuilder<C, B>> -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic Document.DocumentBuilder<?,
?> builder()
boolean
Gets the sections of the document as a list.getBBox()
If this Node is a Leaf it will calculate the boundingBox of its LeafTextBlock, otherwise it will calculate the Union of the BoundingBoxes of all its Children.Gets the direct children of type SECTION or SUPER_SECTION of the document as a list of SemanticNode objects.Traverses the Tree up, until it hits a Headline or hits a Section which will then return the first Headline from its children.Deprecated, for removal: This API element is subject to removal in a future version.This method is marked for removal.getPages()
Each AtomicTextBlock is assigned a page, so to get the pages this node appears on, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.Returns the SectionIdentifier as a child of the SectionIdentifier returned by the getHeadline() method.The id is a List of Integers uniquely identifying this node in the DocumentTree.getType()
Returns the type of this node, such as Section, Paragraph, etc.int
hashCode()
void
setNumberOfPages
(Integer numberOfPages) void
void
This should only be used during graph construction.Streams all image nodes contained within the document.Stream<com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock>
Streams all terminal (leaf) text blocks within the document in their natural order.toString()
Methods inherited from class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
getBBoxCache, getDocumentTree, getEngines, getEntities, getTextBlock, setBBoxCache, setDocumentTree, setEngines, setEntities, setTextBlock
Methods inherited from interface com.iqser.red.service.redaction.v1.server.model.document.nodes.SemanticNode
accept, addEngine, addThisToEntityIfIntersects, containsAllStrings, containsAllStringsIgnoreCase, containsAllWords, containsAllWordsIgnoreCase, containsAnyString, containsAnyString, containsAnyStringIgnoreCase, containsAnyWord, containsAnyWordIgnoreCase, containsRectangle, containsString, containsStringIgnoreCase, containsWord, containsWordIgnoreCase, getEntitiesOfType, getEntitiesOfType, getEntitiesOfType, getFirstPage, getHighestParent, getLeafTextBlock, getNextSibling, getNumberOnPage, getPages, getParent, getPositionsPerPage, getPreviousSibling, getTextBlocksByPageNumbers, getTextRange, hasEntitiesOfAllTypes, hasEntitiesOfAnyType, hasEntitiesOfType, hasParent, hasText, intersectsRectangle, isLeaf, length, matchesRegex, matchesRegexIgnoreCase, onlyOnPage, onPage, setLeafTextBlock, streamAllSubNodes, streamAllSubNodesOfType, streamChildren, streamChildrenOfType, streamValidEntities
-
Constructor Details
-
Document
-
Document
public Document()
-
-
Method Details
-
getType
Description copied from interface:SemanticNode
Returns the type of this node, such as Section, Paragraph, etc.- Returns:
- NodeType of this node
-
getAllSections
Gets the sections of the document as a list.- Returns:
- A list of all sections within the document.
-
getMainSections
Deprecated, for removal: This API element is subject to removal in a future version.This method is marked for removal. UseSemanticNode.streamChildrenOfType(NodeType)
instead, orgetChildrenOfTypeSectionOrSuperSection()
which returns children of type SECTION as well as SUPER_SECTION.Gets the main sections of the document as a list.- Returns:
- A list of main sections within the document
-
getChildrenOfTypeSectionOrSuperSection
Gets the direct children of type SECTION or SUPER_SECTION of the document as a list of SemanticNode objects.- Returns:
- A list of all children of type SECTION or SUPER_SECTION.
-
streamTerminalTextBlocksInOrder
public Stream<com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock> streamTerminalTextBlocksInOrder()Streams all terminal (leaf) text blocks within the document in their natural order.- Returns:
- A stream of terminal
TextBlock
.
-
getTreeId
Description copied from interface:SemanticNode
The id is a List of Integers uniquely identifying this node in the DocumentTree.- Specified by:
getTreeId
in interfaceSemanticNode
- Overrides:
getTreeId
in classcom.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
- Returns:
- the DocumentTree ID
-
setTreeId
Description copied from interface:SemanticNode
This should only be used during graph construction.- Specified by:
setTreeId
in interfaceSemanticNode
- Overrides:
setTreeId
in classcom.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
- Parameters:
tocId
- List of Integers
-
getSectionIdentifier
Description copied from interface:SemanticNode
Returns the SectionIdentifier as a child of the SectionIdentifier returned by the getHeadline() method.- Returns:
- The SectionIdentifier from the first Headline.
-
getHeadline
Description copied from interface:SemanticNode
Traverses the Tree up, until it hits a Headline or hits a Section which will then return the first Headline from its children. If no Headline is found this way, it will recursively traverse the tree up and try again until it hits the root, where it will perform a BFS. If no Headline exists anywhere in the Document a dummy Headline is returned.- Returns:
- First Headline found.
-
streamAllImages
Streams all image nodes contained within the document.- Returns:
- A stream of
Image
nodes.
-
toString
- Overrides:
toString
in classcom.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
-
getBBox
Description copied from interface:SemanticNode
If this Node is a Leaf it will calculate the boundingBox of its LeafTextBlock, otherwise it will calculate the Union of the BoundingBoxes of all its Children. If called on the Document, it will return the cropbox of each page- Specified by:
getBBox
in interfaceSemanticNode
- Overrides:
getBBox
in classcom.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
- Returns:
- Rectangle2D fully encapsulating this Node for each page.
-
builder
-
getPages
Description copied from interface:SemanticNode
Each AtomicTextBlock is assigned a page, so to get the pages this node appears on, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.- Returns:
- Set of PageNodes this node appears on.
-
getNumberOfPages
-
setPages
-
setNumberOfPages
-
equals
- Overrides:
equals
in classcom.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
-
hashCode
public int hashCode()- Overrides:
hashCode
in classcom.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
-