Interface SemanticNode
- All Known Implementing Classes:
com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
,Document
,Header
,Headline
,Image
,Paragraph
,Section
,Table
,TableCell
public interface SemanticNode
-
Method Summary
Modifier and TypeMethodDescriptiondefault void
accept
(com.iqser.red.service.redaction.v1.server.service.document.NodeVisitor visitor) Accepts aNodeVisitor
and initiates a depth-first traversal of the semantic tree rooted at this node.default void
addEngine
(com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine engine) default void
addThisToEntityIfIntersects
(com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity textEntity) This function is used during insertion of EntityNodes into the graph, it checks if the TextRange of the RedactionEntity intersects or even contains the RedactionEntity.default boolean
containsAllStrings
(String... strings) Checks whether this SemanticNode contains all the provided Strings.default boolean
containsAllStringsIgnoreCase
(String... strings) Checks whether this SemanticNode contains any of the provided Strings case-insensitive.default boolean
containsAllWords
(String... words) Checks whether this SemanticNode contains all the provided Strings as word.default boolean
containsAllWordsIgnoreCase
(String... words) Checks whether this SemanticNode contains all the provided Strings as word case-insensitive.default boolean
containsAnyString
(String... strings) Checks whether this SemanticNode contains any of the provided Strings.default boolean
containsAnyString
(List<String> strings) Checks whether this SemanticNode contains any of the provided Strings.default boolean
containsAnyStringIgnoreCase
(String... strings) Checks whether this SemanticNode contains any of the provided Strings case-insensitive.default boolean
containsAnyWord
(String... words) Checks whether this SemanticNode contains any of the provided Strings as a word.default boolean
containsAnyWordIgnoreCase
(String... words) Checks whether this SemanticNode contains any of the provided Strings as a word case-insensitive.default boolean
containsRectangle
(Rectangle2D rectangle2D, Integer pageNumber) Checks whether the Bounding Box of this SemanticNode contains the provided rectangle on the provided page.default boolean
containsString
(String string) Checks whether this SemanticNode contains the provided String.default boolean
containsStringIgnoreCase
(String string) Checks whether this SemanticNode contains all the provided Strings case-insensitive.default boolean
containsWord
(String word) Checks whether this SemanticNode contains exactly the provided String as a word.default boolean
containsWordIgnoreCase
(String word) Checks whether this SemanticNode contains exactly the provided String as a word case-insensitive.default Map<Page,
Rectangle2D> getBBox()
If this Node is a Leaf it will calculate the boundingBox of its LeafTextBlock, otherwise it will calculate the Union of the BoundingBoxes of all its Children.com.iqser.red.service.redaction.v1.server.model.document.DocumentTree
Returns the DocumentTree Object.Set<com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine>
Set<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
Any Node maintains its own Set of Entities.default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
getEntitiesOfType
(String type) Returns a List of Entities in this SemanticNode which are of the provided type such as "CBI_author".default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
getEntitiesOfType
(String... types) Returns a List of Entities in this SemanticNode which have any of the provided types.default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
getEntitiesOfType
(List<String> types) Returns a List of Entities in this SemanticNode which have any of the provided types such as "CBI_author".default Page
Finds the first page associated with this Node.default Headline
Traverses the Tree up, until it hits a Headline or hits a Section which will then return the first Headline from its children.default SemanticNode
default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock
Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden.default Optional<SemanticNode>
Returns the next sibling node of this SemanticNode in the document tree, if any.default Integer
Each AtomicTextBlock has an index on its page, this returns the number of the first AtomicTextBlock underneath this node.getPages()
Each AtomicTextBlock is assigned a page, so to get the pages this node appears on, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.Each AtomicTextBlock is assigned a page, so to get the pages for this TextRange, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.default SemanticNode
default Map<Page,
List<Rectangle2D>> getPositionsPerPage
(TextRange textRange) For a given TextRange this function returns a List of rectangle around the text in the range.default Optional<SemanticNode>
Returns the previous sibling node of this SemanticNode in the document tree, if any.default SectionIdentifier
Returns the SectionIdentifier as a child of the SectionIdentifier returned by the getHeadline() method.default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock
Searches all Nodes located underneath this Node in the DocumentTree and concatenates their AtomicTextBlocks into a single TextBlock.default List<com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock>
getTextBlocksByPageNumbers
(Set<Integer> pageNumbers) Searches all Nodes located underneath this Node in the DocumentTree that are found on the given pages.default TextRange
The TextRange is the start and end string offsets in the reading order of the document.The id is a List of Integers uniquely identifying this node in the DocumentTree.getType()
Returns the type of this node, such as Section, Paragraph, etc.default boolean
hasEntitiesOfAllTypes
(String... types) Checks whether this SemanticNode has at least one Entity of each of the provided types.default boolean
hasEntitiesOfAnyType
(String... types) Checks whether this SemanticNode has any Entity of the provided types.default boolean
hasEntitiesOfType
(String type) Checks whether this SemanticNode has any Entity of the provided type.default boolean
Checks if its TreeId has a length greater than zero.default boolean
hasText()
Checks if the SemanticNode contains any text.default boolean
intersectsRectangle
(int x, int y, int w, int h, int pageNumber) Checks whether this SemanticNode intersects the provided rectangle.default boolean
isLeaf()
Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden.default int
length()
Returns the length of the text content in this Node's TextBlock.default boolean
matchesRegex
(String regexPattern) Checks whether this SemanticNode matches the provided regex pattern.default boolean
matchesRegexIgnoreCase
(String regexPattern) Checks whether this SemanticNode matches the provided regex pattern case-insensitive.default boolean
onlyOnPage
(Page page) Checks wether this SemanticNode appears on a single page only, and if that page is the provided one.default boolean
onPage
(int pageNumber) Checks if the given page number exists in the list of pages.default void
setLeafTextBlock
(com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock textBlock) Should only be used during construction of the Graph.void
This should only be used during graph construction.default Stream<SemanticNode>
Recursively streams all SemanticNodes located underneath this node in the DocumentTree in order.default Stream<SemanticNode>
streamAllSubNodesOfType
(NodeType nodeType) Recursively streams all SemanticNodes of the provided type located underneath this node in the DocumentTree in order.default Stream<SemanticNode>
Streams all children located directly underneath this node in the DocumentTree.default Stream<SemanticNode>
streamChildrenOfType
(NodeType nodeType) Streams all children located directly underneath this node in the DocumentTree of the provided type.default Stream<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
A view of the Entity Set of this SemanticNode including only the active (APPLIED or SKIPPED) Entities which are of a valid type (ENTITY or HINT).
-
Method Details
-
getType
NodeType getType()Returns the type of this node, such as Section, Paragraph, etc.- Returns:
- NodeType of this node
-
getTextBlock
default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock getTextBlock()Searches all Nodes located underneath this Node in the DocumentTree and concatenates their AtomicTextBlocks into a single TextBlock. So, for a Section all TextBlocks of Subsections, Paragraphs, and Tables are concatenated into a single TextBlock If the Node is a Leaf, the LeafTextBlock will be returned instead.- Returns:
- TextBlock containing all AtomicTextBlocks that are located under this Node.
-
getTextBlocksByPageNumbers
default List<com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock> getTextBlocksByPageNumbers(Set<Integer> pageNumbers) Searches all Nodes located underneath this Node in the DocumentTree that are found on the given pages. Then consecutive AtomicTextBlocks are concatenated where possible and the list of the resulting TextBlocks is returned.- Returns:
- List of TextBlocks containing all AtomicTextBlocks that are located under this Node on the given pages.
-
getEntities
Set<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntities()Any Node maintains its own Set of Entities. This Set contains all Entities whose TextRange intersects the TextRange of this node.- Returns:
- Set of all Entities associated with this Node
-
streamValidEntities
default Stream<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> streamValidEntities()A view of the Entity Set of this SemanticNode including only the active (APPLIED or SKIPPED) Entities which are of a valid type (ENTITY or HINT). This is used for all functions, which check for the existence of an Entity, such as hasEntityOfType().- Returns:
- Set of valid TextEntities
-
getPages
Each AtomicTextBlock is assigned a page, so to get the pages this node appears on, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.- Returns:
- Set of PageNodes this node appears on.
-
getFirstPage
Finds the first page associated with this Node.- Returns:
- Set of PageNodes this node appears on.
-
getPages
Each AtomicTextBlock is assigned a page, so to get the pages for this TextRange, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.- Returns:
- Set of PageNodes this node appears on.
-
onPage
default boolean onPage(int pageNumber) Checks if the given page number exists in the list of pages.- Parameters:
pageNumber
- the page number to be checked- Returns:
- true if the page number exists, otherwise false
-
getDocumentTree
com.iqser.red.service.redaction.v1.server.model.document.DocumentTree getDocumentTree()Returns the DocumentTree Object.- Returns:
- the DocumentTree of the Document this node belongs to
-
getTreeId
The id is a List of Integers uniquely identifying this node in the DocumentTree.- Returns:
- the DocumentTree ID
-
setTreeId
This should only be used during graph construction.- Parameters:
tocId
- List of Integers
-
getHeadline
Traverses the Tree up, until it hits a Headline or hits a Section which will then return the first Headline from its children. If no Headline is found this way, it will recursively traverse the tree up and try again until it hits the root, where it will perform a BFS. If no Headline exists anywhere in the Document a dummy Headline is returned.- Returns:
- First Headline found.
-
getSectionIdentifier
Returns the SectionIdentifier as a child of the SectionIdentifier returned by the getHeadline() method.- Returns:
- The SectionIdentifier from the first Headline.
-
hasParent
default boolean hasParent()Checks if its TreeId has a length greater than zero.- Returns:
- boolean indicating whether this Node has a Parent in the DocumentTree
-
getParent
- Returns:
- The SemanticNode representing the Parent in the DocumentTree throws NotFoundException, when no parent is present
-
getHighestParent
- Returns:
- The SemanticNode which is directly underneath the document and also under which this node is. if this is the highest child node or the document itself, it returns itself.
-
getNextSibling
Returns the next sibling node of this SemanticNode in the document tree, if any. If there is no next sibling node, an empty Optional is returned.- Returns:
- Optional containing the next sibling node, or empty if there is none
-
getPreviousSibling
Returns the previous sibling node of this SemanticNode in the document tree, if any. If there is no previous sibling node, an empty Optional is returned.- Returns:
- Optional containing the previous sibling node, or empty if there is none
-
isLeaf
default boolean isLeaf()Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden. Currently only Sections, Images, and Tables are not leaves. A TableCell might be a leaf depending on its area compared to the page.- Returns:
- boolean, indicating if a Node has direct access to a TextBlock
-
getLeafTextBlock
default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock getLeafTextBlock()Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden. Currently only Sections and Tables are no leaves.- Returns:
- AtomicTextBlock
-
setLeafTextBlock
default void setLeafTextBlock(com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock textBlock) Should only be used during construction of the Graph. Sets the LeafTextBlock of this SemanticNode.- Parameters:
textBlock
- the TextBlock to set as the LeafTextBlock of this SemanticNode
-
hasEntitiesOfType
Checks whether this SemanticNode has any Entity of the provided type. Ignores Entity with ignored == true or removed == true.- Parameters:
type
- string representing the type of entity to check for- Returns:
- true, if this SemanticNode has at least one Entity of the provided type
-
hasEntitiesOfAnyType
Checks whether this SemanticNode has any Entity of the provided types. Ignores Entity with ignored == true or removed == true.- Parameters:
types
- an array of strings representing the types of entities to check for- Returns:
- true, if this SemanticNode has at least one Entity of the provided types
-
hasEntitiesOfAllTypes
Checks whether this SemanticNode has at least one Entity of each of the provided types. Ignores Entity with ignored == true or removed == true.- Parameters:
types
- an array of strings representing the types of entities to check for- Returns:
- true, if this SemanticNode has at least one Entity of each of the provided types
-
getEntitiesOfType
default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntitiesOfType(String type) Returns a List of Entities in this SemanticNode which are of the provided type such as "CBI_author". Ignores Entity which are not active or of a removal type ignored == true or removed == true.- Parameters:
type
- string representing the type of entities to return- Returns:
- List of RedactionEntities of any the type
-
getEntitiesOfType
default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntitiesOfType(List<String> types) Returns a List of Entities in this SemanticNode which have any of the provided types such as "CBI_author". Ignores Entity that are not valid.- Parameters:
types
- A list of strings representing the types of entities to return- Returns:
- List of RedactionEntities of any provided type
-
getEntitiesOfType
default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntitiesOfType(String... types) Returns a List of Entities in this SemanticNode which have any of the provided types. Ignores Entity that are not valid.- Parameters:
types
- A list of strings representing the types of entities to return- Returns:
- List of RedactionEntities that match any of the provided types
-
getNumberOnPage
Each AtomicTextBlock has an index on its page, this returns the number of the first AtomicTextBlock underneath this node. If this node does not have any AtomicTexBlocks underneath it, e.g. an empty TableCell. It returns -1.- Returns:
- Integer representing the number on the page
-
hasText
default boolean hasText()Checks if the SemanticNode contains any text.- Returns:
- true, if this node's TextBlock is not empty
-
containsString
Checks whether this SemanticNode contains the provided String.- Parameters:
string
- A String which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains the string
-
getEngines
Set<com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine> getEngines() -
addEngine
default void addEngine(com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine engine) -
containsAllStrings
Checks whether this SemanticNode contains all the provided Strings.- Parameters:
strings
- A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains all strings
-
containsAnyString
Checks whether this SemanticNode contains any of the provided Strings.- Parameters:
strings
- A List of Strings to check if they are contained in the TextBlock- Returns:
- true, if this node's TextBlock contains any of the provided strings
-
containsAnyString
Checks whether this SemanticNode contains any of the provided Strings.- Parameters:
strings
- A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains any of the strings
-
containsStringIgnoreCase
Checks whether this SemanticNode contains all the provided Strings case-insensitive.- Parameters:
string
- A String which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains the string case-insensitive
-
containsAnyStringIgnoreCase
Checks whether this SemanticNode contains any of the provided Strings case-insensitive.- Parameters:
strings
- A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains any of the strings
-
containsAllStringsIgnoreCase
Checks whether this SemanticNode contains any of the provided Strings case-insensitive.- Parameters:
strings
- A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains any of the strings
-
containsWord
Checks whether this SemanticNode contains exactly the provided String as a word.- Parameters:
word
- - String which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains string
-
containsWordIgnoreCase
Checks whether this SemanticNode contains exactly the provided String as a word case-insensitive.- Parameters:
word
- - String which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains string
-
containsAnyWord
Checks whether this SemanticNode contains any of the provided Strings as a word.- Parameters:
words
- - A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains any of the provided strings
-
containsAnyWordIgnoreCase
Checks whether this SemanticNode contains any of the provided Strings as a word case-insensitive.- Parameters:
words
- - A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains any of the provided strings
-
containsAllWords
Checks whether this SemanticNode contains all the provided Strings as word.- Parameters:
words
- - A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains all the provided strings
-
containsAllWordsIgnoreCase
Checks whether this SemanticNode contains all the provided Strings as word case-insensitive.- Parameters:
words
- - A List of Strings which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains all the provided strings
-
matchesRegex
Checks whether this SemanticNode matches the provided regex pattern.- Parameters:
regexPattern
- A String representing a regex pattern, which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains the regex pattern
-
matchesRegexIgnoreCase
Checks whether this SemanticNode matches the provided regex pattern case-insensitive.- Parameters:
regexPattern
- A String representing a regex pattern, which the TextBlock might contain- Returns:
- true, if this node's TextBlock contains the regex pattern case-insensitive
-
intersectsRectangle
default boolean intersectsRectangle(int x, int y, int w, int h, int pageNumber) Checks whether this SemanticNode intersects the provided rectangle.- Parameters:
x
- the lower left corner X valuey
- the lower left corner Y valuew
- widthh
- heightpageNumber
- the pageNumber of the rectangle- Returns:
- true if intersects, false otherwise
-
addThisToEntityIfIntersects
default void addThisToEntityIfIntersects(com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity textEntity) This function is used during insertion of EntityNodes into the graph, it checks if the TextRange of the RedactionEntity intersects or even contains the RedactionEntity. It sets the fields accordingly and recursively calls this function on all its children.- Parameters:
textEntity
- RedactionEntity, which is being inserted into the graph
-
streamChildren
Streams all children located directly underneath this node in the DocumentTree.- Returns:
- Stream of all children
-
streamChildrenOfType
Streams all children located directly underneath this node in the DocumentTree of the provided type.- Returns:
- Stream of all children
-
streamAllSubNodes
Recursively streams all SemanticNodes located underneath this node in the DocumentTree in order.- Returns:
- Stream of all SubNodes
-
streamAllSubNodesOfType
Recursively streams all SemanticNodes of the provided type located underneath this node in the DocumentTree in order.- Returns:
- Stream of all SubNodes
-
getTextRange
The TextRange is the start and end string offsets in the reading order of the document.- Returns:
- TextRange of this Node's TextBlock
-
length
default int length()Returns the length of the text content in this Node's TextBlock.- Returns:
- The length of the text content
-
getPositionsPerPage
For a given TextRange this function returns a List of rectangle around the text in the range. These Rectangles are split either by a new line or by a large gap in the current line. This is mainly used to find the positions of TextEntities- Parameters:
textRange
- A TextRange to calculate the positions for.- Returns:
- A Map, where the keys are the pages and the values are a list of rectangles describing the position of words
-
getBBox
If this Node is a Leaf it will calculate the boundingBox of its LeafTextBlock, otherwise it will calculate the Union of the BoundingBoxes of all its Children. If called on the Document, it will return the cropbox of each page- Returns:
- Rectangle2D fully encapsulating this Node for each page.
-
containsRectangle
Checks whether the Bounding Box of this SemanticNode contains the provided rectangle on the provided page.- Parameters:
rectangle2D
- The rectangle to check if it is containedpageNumber
- The Page number on which the rectangle should be checked- Returns:
- boolean
-
accept
default void accept(com.iqser.red.service.redaction.v1.server.service.document.NodeVisitor visitor) Accepts aNodeVisitor
and initiates a depth-first traversal of the semantic tree rooted at this node. The visitor'sNodeVisitor.visit(SemanticNode)
method is invoked for each node encountered during the traversal.- Parameters:
visitor
- TheNodeVisitor
to accept and apply during the traversal.- See Also:
-
NodeVisitor
-
onlyOnPage
Checks wether this SemanticNode appears on a single page only, and if that page is the provided one.- Parameters:
page
- the page to check- Returns:
- true, when SemanticNode is on a single page only and the page is the provided page. Otherwise, false.
-