java.lang.Object
com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
com.iqser.red.service.redaction.v1.server.model.document.nodes.Document
All Implemented Interfaces:
com.iqser.red.service.redaction.v1.server.model.document.nodes.GenericSemanticNode, SemanticNode

public class Document extends com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
Represents the entire document as a node within the document's semantic structure.
  • Constructor Details

    • Document

      public Document(Set<Page> pages, Integer numberOfPages)
    • Document

      public Document()
  • Method Details

    • getType

      public NodeType getType()
      Description copied from interface: SemanticNode
      Returns the type of this node, such as Section, Paragraph, etc.
      Returns:
      NodeType of this node
    • getAllSections

      public List<Section> getAllSections()
      Gets the sections of the document as a list.
      Returns:
      A list of all sections within the document.
    • getMainSections

      @Deprecated(forRemoval=true) public List<Section> getMainSections()
      Deprecated, for removal: This API element is subject to removal in a future version.
      This method is marked for removal. Use SemanticNode.streamChildrenOfType(NodeType) instead, or getChildrenOfTypeSectionOrSuperSection() which returns children of type SECTION as well as SUPER_SECTION.
      Gets the main sections of the document as a list.
      Returns:
      A list of main sections within the document
    • getChildrenOfTypeSectionOrSuperSection

      public List<SemanticNode> getChildrenOfTypeSectionOrSuperSection()
      Gets the direct children of type SECTION or SUPER_SECTION of the document as a list of SemanticNode objects.
      Returns:
      A list of all children of type SECTION or SUPER_SECTION.
    • streamTerminalTextBlocksInOrder

      public Stream<com.iqser.red.service.redaction.v1.server.model.document.textblock.TextBlock> streamTerminalTextBlocksInOrder()
      Streams all terminal (leaf) text blocks within the document in their natural order.
      Returns:
      A stream of terminal TextBlock.
    • getTreeId

      public List<Integer> getTreeId()
      Description copied from interface: SemanticNode
      The id is a List of Integers uniquely identifying this node in the DocumentTree.
      Specified by:
      getTreeId in interface SemanticNode
      Overrides:
      getTreeId in class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
      Returns:
      the DocumentTree ID
    • setTreeId

      public void setTreeId(List<Integer> tocId)
      Description copied from interface: SemanticNode
      This should only be used during graph construction.
      Specified by:
      setTreeId in interface SemanticNode
      Overrides:
      setTreeId in class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
      Parameters:
      tocId - List of Integers
    • getSectionIdentifier

      public SectionIdentifier getSectionIdentifier()
      Description copied from interface: SemanticNode
      Returns the SectionIdentifier as a child of the SectionIdentifier returned by the getHeadline() method.
      Returns:
      The SectionIdentifier from the first Headline.
    • getHeadline

      public Headline getHeadline()
      Description copied from interface: SemanticNode
      Traverses the Tree up, until it hits a Headline or hits a Section which will then return the first Headline from its children. If no Headline is found this way, it will recursively traverse the tree up and try again until it hits the root, where it will perform a BFS. If no Headline exists anywhere in the Document a dummy Headline is returned.
      Returns:
      First Headline found.
    • streamAllImages

      public Stream<Image> streamAllImages()
      Streams all image nodes contained within the document.
      Returns:
      A stream of Image nodes.
    • toString

      public String toString()
      Overrides:
      toString in class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
    • getBBox

      public Map<Page,Rectangle2D> getBBox()
      Description copied from interface: SemanticNode
      If this Node is a Leaf it will calculate the boundingBox of its LeafTextBlock, otherwise it will calculate the Union of the BoundingBoxes of all its Children. If called on the Document, it will return the cropbox of each page
      Specified by:
      getBBox in interface SemanticNode
      Overrides:
      getBBox in class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
      Returns:
      Rectangle2D fully encapsulating this Node for each page.
    • builder

      public static Document.DocumentBuilder<?,?> builder()
    • getPages

      public Set<Page> getPages()
      Description copied from interface: SemanticNode
      Each AtomicTextBlock is assigned a page, so to get the pages this node appears on, it collects the PageNodes from each AtomicTextBlock belonging to this node's TextBlock.
      Returns:
      Set of PageNodes this node appears on.
    • getNumberOfPages

      public Integer getNumberOfPages()
    • setPages

      public void setPages(Set<Page> pages)
    • setNumberOfPages

      public void setNumberOfPages(Integer numberOfPages)
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class com.iqser.red.service.redaction.v1.server.model.document.nodes.AbstractSemanticNode