All Known Implementing Classes:
com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode, Document, Header, Headline, Image, Paragraph, Section, Table, TableCell

public interface SemanticNode
  • Method Summary

    Modifier and Type
    Method
    Description
    default void
    accept(com.iqser.red.service.redaction.v1.server.service.document.NodeVisitor visitor)
    Accepts a NodeVisitor and initiates a depth-first traversal of the semantic tree rooted at this node.
    default void
    addEngine(com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine engine)
     
    default void
    addThisToEntityIfIntersects(com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity textEntity)
    This function is used during insertion of EntityNodes into the graph, it checks if the TextRange of the RedactionEntity intersects or even contains the RedactionEntity.
    default boolean
    Checks whether this SemanticNode contains all the provided Strings.
    default boolean
    Checks whether this SemanticNode contains any of the provided Strings case-insensitive.
    default boolean
    Checks whether this SemanticNode contains all the provided Strings as word.
    default boolean
    Checks whether this SemanticNode contains all the provided Strings as word case-insensitive.
    default boolean
    Checks whether this SemanticNode contains any of the provided Strings.
    default boolean
    Checks whether this SemanticNode contains any of the provided Strings.
    default boolean
    Checks whether this SemanticNode contains any of the provided Strings case-insensitive.
    default boolean
    Checks whether this SemanticNode contains any of the provided Strings as a word.
    default boolean
    Checks whether this SemanticNode contains any of the provided Strings as a word case-insensitive.
    default boolean
    containsRectangle(Rectangle2D rectangle2D, Integer pageNumber)
    Checks whether the Bounding Box of this SemanticNode contains the provided rectangle on the provided page.
    default boolean
    Checks whether this SemanticNode contains the provided String.
    default boolean
    Checks whether this SemanticNode contains all the provided Strings case-insensitive.
    default boolean
    Checks whether this SemanticNode contains exactly the provided String as a word.
    default boolean
    Checks whether this SemanticNode contains exactly the provided String as a word case-insensitive.
    If this Node is a Leaf it will calculate the boundingBox of its LeafTextBlock, otherwise it will calculate the Union of the BoundingBoxes of all its Children.
    com.iqser.red.service.redaction.v1.server.model.document.DocumentTree
    Returns the DocumentTree Object.
    Set<com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine>
     
    Set<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
    Any Node maintains its own Set of Entities.
    default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
    Returns a List of Entities in this SemanticNode which are of the provided type such as "CBI_author".
    default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
    Returns a List of Entities in this SemanticNode which have any of the provided types.
    default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
    Returns a List of Entities in this SemanticNode which have any of the provided types such as "CBI_author".
    default Page
    Finds the first page associated with this Node.
    default Headline
    Traverses the Tree up, until it hits a Headline or hits a Section which will then return the first Headline from its children.
    default SemanticNode
     
    default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock
    Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden.
    Returns the next sibling node of this SemanticNode in the document tree, if any.
    default Integer
    Each AtomicTextBlock has an index on its page, this returns the number of the first AtomicTextBlock underneath this node.
    default Set<Page>
    Each AtomicTextBlock is assigned a page, so to get the pages this node appears on, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.
    default Set<Page>
    getPages(TextRange textRange)
    Each AtomicTextBlock is assigned a page, so to get the pages for this TextRange, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.
    default SemanticNode
     
    For a given TextRange this function returns a List of rectangle around the text in the range.
    Returns the previous sibling node of this SemanticNode in the document tree, if any.
    Returns the SectionIdentifier as a child of the SectionIdentifier returned by the getHeadline() method.
    default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock
    Searches all Nodes located underneath this Node in the DocumentTree and concatenates their AtomicTextBlocks into a single TextBlock.
    default List<com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock>
    Searches all Nodes located underneath this Node in the DocumentTree that are found on the given pages.
    default TextRange
    The TextRange is the start and end string offsets in the reading order of the document.
    The id is a List of Integers uniquely identifying this node in the DocumentTree.
    Returns the type of this node, such as Section, Paragraph, etc.
    default boolean
    Checks whether this SemanticNode has at least one Entity of each of the provided types.
    default boolean
    Checks whether this SemanticNode has any Entity of the provided types.
    default boolean
    Checks whether this SemanticNode has any Entity of the provided type.
    default boolean
    Checks if its TreeId has a length greater than zero.
    default boolean
    Checks if the SemanticNode contains any text.
    default boolean
    intersectsRectangle(int x, int y, int w, int h, int pageNumber)
    Checks whether this SemanticNode intersects the provided rectangle.
    default boolean
    Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden.
    default int
    Returns the length of the text content in this Node's TextBlock.
    default boolean
    matchesRegex(String regexPattern)
    Checks whether this SemanticNode matches the provided regex pattern.
    default boolean
    Checks whether this SemanticNode matches the provided regex pattern case-insensitive.
    default boolean
    Checks wether this SemanticNode appears on a single page only, and if that page is the provided one.
    default boolean
    onPage(int pageNumber)
    Checks if the given page number exists in the list of pages.
    default void
    setLeafTextBlock(com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock textBlock)
    Should only be used during construction of the Graph.
    void
    This should only be used during graph construction.
    Recursively streams all SemanticNodes located underneath this node in the DocumentTree in order.
    Recursively streams all SemanticNodes of the provided type located underneath this node in the DocumentTree in order.
    Streams all children located directly underneath this node in the DocumentTree.
    Streams all children located directly underneath this node in the DocumentTree of the provided type.
    default Stream<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity>
    A view of the Entity Set of this SemanticNode including only the active (APPLIED or SKIPPED) Entities which are of a valid type (ENTITY or HINT).
  • Method Details

    • getType

      NodeType getType()
      Returns the type of this node, such as Section, Paragraph, etc.
      Returns:
      NodeType of this node
    • getTextBlock

      default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock getTextBlock()
      Searches all Nodes located underneath this Node in the DocumentTree and concatenates their AtomicTextBlocks into a single TextBlock. So, for a Section all TextBlocks of Subsections, Paragraphs, and Tables are concatenated into a single TextBlock If the Node is a Leaf, the LeafTextBlock will be returned instead.
      Returns:
      TextBlock containing all AtomicTextBlocks that are located under this Node.
    • getTextBlocksByPageNumbers

      default List<com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock> getTextBlocksByPageNumbers(Set<Integer> pageNumbers)
      Searches all Nodes located underneath this Node in the DocumentTree that are found on the given pages. Then consecutive AtomicTextBlocks are concatenated where possible and the list of the resulting TextBlocks is returned.
      Returns:
      List of TextBlocks containing all AtomicTextBlocks that are located under this Node on the given pages.
    • getEntities

      Set<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntities()
      Any Node maintains its own Set of Entities. This Set contains all Entities whose TextRange intersects the TextRange of this node.
      Returns:
      Set of all Entities associated with this Node
    • streamValidEntities

      default Stream<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> streamValidEntities()
      A view of the Entity Set of this SemanticNode including only the active (APPLIED or SKIPPED) Entities which are of a valid type (ENTITY or HINT). This is used for all functions, which check for the existence of an Entity, such as hasEntityOfType().
      Returns:
      Set of valid TextEntities
    • getPages

      default Set<Page> getPages()
      Each AtomicTextBlock is assigned a page, so to get the pages this node appears on, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.
      Returns:
      Set of PageNodes this node appears on.
    • getFirstPage

      default Page getFirstPage()
      Finds the first page associated with this Node.
      Returns:
      Set of PageNodes this node appears on.
    • getPages

      default Set<Page> getPages(TextRange textRange)
      Each AtomicTextBlock is assigned a page, so to get the pages for this TextRange, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.
      Returns:
      Set of PageNodes this node appears on.
    • onPage

      default boolean onPage(int pageNumber)
      Checks if the given page number exists in the list of pages.
      Parameters:
      pageNumber - the page number to be checked
      Returns:
      true if the page number exists, otherwise false
    • getDocumentTree

      com.iqser.red.service.redaction.v1.server.model.document.DocumentTree getDocumentTree()
      Returns the DocumentTree Object.
      Returns:
      the DocumentTree of the Document this node belongs to
    • getTreeId

      List<Integer> getTreeId()
      The id is a List of Integers uniquely identifying this node in the DocumentTree.
      Returns:
      the DocumentTree ID
    • setTreeId

      void setTreeId(List<Integer> tocId)
      This should only be used during graph construction.
      Parameters:
      tocId - List of Integers
    • getHeadline

      default Headline getHeadline()
      Traverses the Tree up, until it hits a Headline or hits a Section which will then return the first Headline from its children. If no Headline is found this way, it will recursively traverse the tree up and try again until it hits the root, where it will perform a BFS. If no Headline exists anywhere in the Document a dummy Headline is returned.
      Returns:
      First Headline found.
    • getSectionIdentifier

      default SectionIdentifier getSectionIdentifier()
      Returns the SectionIdentifier as a child of the SectionIdentifier returned by the getHeadline() method.
      Returns:
      The SectionIdentifier from the first Headline.
    • hasParent

      default boolean hasParent()
      Checks if its TreeId has a length greater than zero.
      Returns:
      boolean indicating whether this Node has a Parent in the DocumentTree
    • getParent

      default SemanticNode getParent()
      Returns:
      The SemanticNode representing the Parent in the DocumentTree throws NotFoundException, when no parent is present
    • getHighestParent

      default SemanticNode getHighestParent()
      Returns:
      The SemanticNode which is directly underneath the document and also under which this node is. if this is the highest child node or the document itself, it returns itself.
    • getNextSibling

      default Optional<SemanticNode> getNextSibling()
      Returns the next sibling node of this SemanticNode in the document tree, if any. If there is no next sibling node, an empty Optional is returned.
      Returns:
      Optional containing the next sibling node, or empty if there is none
    • getPreviousSibling

      default Optional<SemanticNode> getPreviousSibling()
      Returns the previous sibling node of this SemanticNode in the document tree, if any. If there is no previous sibling node, an empty Optional is returned.
      Returns:
      Optional containing the previous sibling node, or empty if there is none
    • isLeaf

      default boolean isLeaf()
      Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden. Currently only Sections, Images, and Tables are not leaves. A TableCell might be a leaf depending on its area compared to the page.
      Returns:
      boolean, indicating if a Node has direct access to a TextBlock
    • getLeafTextBlock

      default com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock getLeafTextBlock()
      Leaf means a SemanticNode has direct access to a TextBlock, by default this is false and must be overridden. Currently only Sections and Tables are no leaves.
      Returns:
      AtomicTextBlock
    • setLeafTextBlock

      default void setLeafTextBlock(com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock textBlock)
      Should only be used during construction of the Graph. Sets the LeafTextBlock of this SemanticNode.
      Parameters:
      textBlock - the TextBlock to set as the LeafTextBlock of this SemanticNode
    • hasEntitiesOfType

      default boolean hasEntitiesOfType(String type)
      Checks whether this SemanticNode has any Entity of the provided type. Ignores Entity with ignored == true or removed == true.
      Parameters:
      type - string representing the type of entity to check for
      Returns:
      true, if this SemanticNode has at least one Entity of the provided type
    • hasEntitiesOfAnyType

      default boolean hasEntitiesOfAnyType(String... types)
      Checks whether this SemanticNode has any Entity of the provided types. Ignores Entity with ignored == true or removed == true.
      Parameters:
      types - an array of strings representing the types of entities to check for
      Returns:
      true, if this SemanticNode has at least one Entity of the provided types
    • hasEntitiesOfAllTypes

      default boolean hasEntitiesOfAllTypes(String... types)
      Checks whether this SemanticNode has at least one Entity of each of the provided types. Ignores Entity with ignored == true or removed == true.
      Parameters:
      types - an array of strings representing the types of entities to check for
      Returns:
      true, if this SemanticNode has at least one Entity of each of the provided types
    • getEntitiesOfType

      default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntitiesOfType(String type)
      Returns a List of Entities in this SemanticNode which are of the provided type such as "CBI_author". Ignores Entity which are not active or of a removal type ignored == true or removed == true.
      Parameters:
      type - string representing the type of entities to return
      Returns:
      List of RedactionEntities of any the type
    • getEntitiesOfType

      default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntitiesOfType(List<String> types)
      Returns a List of Entities in this SemanticNode which have any of the provided types such as "CBI_author". Ignores Entity that are not valid.
      Parameters:
      types - A list of strings representing the types of entities to return
      Returns:
      List of RedactionEntities of any provided type
    • getEntitiesOfType

      default List<com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity> getEntitiesOfType(String... types)
      Returns a List of Entities in this SemanticNode which have any of the provided types. Ignores Entity that are not valid.
      Parameters:
      types - A list of strings representing the types of entities to return
      Returns:
      List of RedactionEntities that match any of the provided types
    • getNumberOnPage

      default Integer getNumberOnPage()
      Each AtomicTextBlock has an index on its page, this returns the number of the first AtomicTextBlock underneath this node. If this node does not have any AtomicTexBlocks underneath it, e.g. an empty TableCell. It returns -1.
      Returns:
      Integer representing the number on the page
    • hasText

      default boolean hasText()
      Checks if the SemanticNode contains any text.
      Returns:
      true, if this node's TextBlock is not empty
    • containsString

      default boolean containsString(String string)
      Checks whether this SemanticNode contains the provided String.
      Parameters:
      string - A String which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains the string
    • getEngines

      Set<com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine> getEngines()
    • addEngine

      default void addEngine(com.knecon.fforesight.service.layoutparser.internal.api.data.redaction.LayoutEngineProto.LayoutEngine engine)
    • containsAllStrings

      default boolean containsAllStrings(String... strings)
      Checks whether this SemanticNode contains all the provided Strings.
      Parameters:
      strings - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains all strings
    • containsAnyString

      default boolean containsAnyString(String... strings)
      Checks whether this SemanticNode contains any of the provided Strings.
      Parameters:
      strings - A List of Strings to check if they are contained in the TextBlock
      Returns:
      true, if this node's TextBlock contains any of the provided strings
    • containsAnyString

      default boolean containsAnyString(List<String> strings)
      Checks whether this SemanticNode contains any of the provided Strings.
      Parameters:
      strings - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains any of the strings
    • containsStringIgnoreCase

      default boolean containsStringIgnoreCase(String string)
      Checks whether this SemanticNode contains all the provided Strings case-insensitive.
      Parameters:
      string - A String which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains the string case-insensitive
    • containsAnyStringIgnoreCase

      default boolean containsAnyStringIgnoreCase(String... strings)
      Checks whether this SemanticNode contains any of the provided Strings case-insensitive.
      Parameters:
      strings - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains any of the strings
    • containsAllStringsIgnoreCase

      default boolean containsAllStringsIgnoreCase(String... strings)
      Checks whether this SemanticNode contains any of the provided Strings case-insensitive.
      Parameters:
      strings - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains any of the strings
    • containsWord

      default boolean containsWord(String word)
      Checks whether this SemanticNode contains exactly the provided String as a word.
      Parameters:
      word - - String which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains string
    • containsWordIgnoreCase

      default boolean containsWordIgnoreCase(String word)
      Checks whether this SemanticNode contains exactly the provided String as a word case-insensitive.
      Parameters:
      word - - String which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains string
    • containsAnyWord

      default boolean containsAnyWord(String... words)
      Checks whether this SemanticNode contains any of the provided Strings as a word.
      Parameters:
      words - - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains any of the provided strings
    • containsAnyWordIgnoreCase

      default boolean containsAnyWordIgnoreCase(String... words)
      Checks whether this SemanticNode contains any of the provided Strings as a word case-insensitive.
      Parameters:
      words - - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains any of the provided strings
    • containsAllWords

      default boolean containsAllWords(String... words)
      Checks whether this SemanticNode contains all the provided Strings as word.
      Parameters:
      words - - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains all the provided strings
    • containsAllWordsIgnoreCase

      default boolean containsAllWordsIgnoreCase(String... words)
      Checks whether this SemanticNode contains all the provided Strings as word case-insensitive.
      Parameters:
      words - - A List of Strings which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains all the provided strings
    • matchesRegex

      default boolean matchesRegex(String regexPattern)
      Checks whether this SemanticNode matches the provided regex pattern.
      Parameters:
      regexPattern - A String representing a regex pattern, which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains the regex pattern
    • matchesRegexIgnoreCase

      default boolean matchesRegexIgnoreCase(String regexPattern)
      Checks whether this SemanticNode matches the provided regex pattern case-insensitive.
      Parameters:
      regexPattern - A String representing a regex pattern, which the TextBlock might contain
      Returns:
      true, if this node's TextBlock contains the regex pattern case-insensitive
    • intersectsRectangle

      default boolean intersectsRectangle(int x, int y, int w, int h, int pageNumber)
      Checks whether this SemanticNode intersects the provided rectangle.
      Parameters:
      x - the lower left corner X value
      y - the lower left corner Y value
      w - width
      h - height
      pageNumber - the pageNumber of the rectangle
      Returns:
      true if intersects, false otherwise
    • addThisToEntityIfIntersects

      default void addThisToEntityIfIntersects(com.iqser.red.service.redaction.v1.server.model.document.entity.TextEntity textEntity)
      This function is used during insertion of EntityNodes into the graph, it checks if the TextRange of the RedactionEntity intersects or even contains the RedactionEntity. It sets the fields accordingly and recursively calls this function on all its children.
      Parameters:
      textEntity - RedactionEntity, which is being inserted into the graph
    • streamChildren

      default Stream<SemanticNode> streamChildren()
      Streams all children located directly underneath this node in the DocumentTree.
      Returns:
      Stream of all children
    • streamChildrenOfType

      default Stream<SemanticNode> streamChildrenOfType(NodeType nodeType)
      Streams all children located directly underneath this node in the DocumentTree of the provided type.
      Returns:
      Stream of all children
    • streamAllSubNodes

      default Stream<SemanticNode> streamAllSubNodes()
      Recursively streams all SemanticNodes located underneath this node in the DocumentTree in order.
      Returns:
      Stream of all SubNodes
    • streamAllSubNodesOfType

      default Stream<SemanticNode> streamAllSubNodesOfType(NodeType nodeType)
      Recursively streams all SemanticNodes of the provided type located underneath this node in the DocumentTree in order.
      Returns:
      Stream of all SubNodes
    • getTextRange

      default TextRange getTextRange()
      The TextRange is the start and end string offsets in the reading order of the document.
      Returns:
      TextRange of this Node's TextBlock
    • length

      default int length()
      Returns the length of the text content in this Node's TextBlock.
      Returns:
      The length of the text content
    • getPositionsPerPage

      default Map<Page,List<Rectangle2D>> getPositionsPerPage(TextRange textRange)
      For a given TextRange this function returns a List of rectangle around the text in the range. These Rectangles are split either by a new line or by a large gap in the current line. This is mainly used to find the positions of TextEntities
      Parameters:
      textRange - A TextRange to calculate the positions for.
      Returns:
      A Map, where the keys are the pages and the values are a list of rectangles describing the position of words
    • getBBox

      default Map<Page,Rectangle2D> getBBox()
      If this Node is a Leaf it will calculate the boundingBox of its LeafTextBlock, otherwise it will calculate the Union of the BoundingBoxes of all its Children. If called on the Document, it will return the cropbox of each page
      Returns:
      Rectangle2D fully encapsulating this Node for each page.
    • containsRectangle

      default boolean containsRectangle(Rectangle2D rectangle2D, Integer pageNumber)
      Checks whether the Bounding Box of this SemanticNode contains the provided rectangle on the provided page.
      Parameters:
      rectangle2D - The rectangle to check if it is contained
      pageNumber - The Page number on which the rectangle should be checked
      Returns:
      boolean
    • accept

      default void accept(com.iqser.red.service.redaction.v1.server.service.document.NodeVisitor visitor)
      Accepts a NodeVisitor and initiates a depth-first traversal of the semantic tree rooted at this node. The visitor's NodeVisitor.visit(SemanticNode) method is invoked for each node encountered during the traversal.
      Parameters:
      visitor - The NodeVisitor to accept and apply during the traversal.
      See Also:
      • NodeVisitor
    • onlyOnPage

      default boolean onlyOnPage(Page page)
      Checks wether this SemanticNode appears on a single page only, and if that page is the provided one.
      Parameters:
      page - the page to check
      Returns:
      true, when SemanticNode is on a single page only and the page is the provided page. Otherwise, false.