Unit simplehtmltreeparser

Description

This unit contains an HTML/XML -> tree converter

Overview

Classes, Interfaces, Objects and Records

Name Description
Object TTreeNodeEnumeratorConditions  
Object TTreeNodeEnumerator  
Object TTreeAttributeEnumerator  
Class TTreeNode This class representates an element of the html file
Class TTreeAttribute  
Record TBlockAllocator  
Object TTreeBuilder  
Class TTreeDocument  
Record TTreeDocumentOwnershipTracker  
Class ETreeParseException  
Class TTreeParser This parses an HTML/SGML/XML file to a tree like structure.

Functions and Procedures

function CSSHasHiddenStyle(const style: string): boolean;
function guessFormat(const data, uri, contenttype: string): TInternetToolsFormat;
function strEncodingFromContentType(const contenttype: string): TSystemCodePage;
function isInvalidUTF8Guess(const s: string; cutoff: integer): boolean;

Types

TTreeNodeType = (...);
TTreeNodeTypes = set of TTreeNodeType;
TTreeNodeFindOptions = set of (tefoIgnoreType, tefoIgnoreText, tefoCaseSensitive, tefoNoChildren, tefoNoGrandChildren);
TStringComparisonFunc = function (const a,b: string): boolean of object;
TTreeNodeEnumeratorNextCallback = function (current: TTreeNode): TTreeNode;
TTreeNodeEnumeratorAxis = (...);
TTreeNodeIntOffset = longint;
TNodeNameHash = cardinal;
TTreeNodeClass = class of TTreeNode;
TBasicParsingState = (...);
TParsingModel = (...);
TInternetToolsFormat = (...);

Constants

TreeNodesWithChildren = [tetOpen, tetDocument];

Description

Functions and Procedures

function CSSHasHiddenStyle(const style: string): boolean;
 
function guessFormat(const data, uri, contenttype: string): TInternetToolsFormat;
 
function strEncodingFromContentType(const contenttype: string): TSystemCodePage;
 
function isInvalidUTF8Guess(const s: string; cutoff: integer): boolean;
 

Types

TTreeNodeType = (...);

The type of a tree element. <Open>, text, or </close>

Values
  • tetOpen
  • tetClose
  • tetText
  • tetComment
  • tetProcessingInstruction
  • tetAttribute
  • tetDocument
  • tetNamespace
TTreeNodeTypes = set of TTreeNodeType;
 
TTreeNodeFindOptions = set of (tefoIgnoreType, tefoIgnoreText, tefoCaseSensitive, tefoNoChildren, tefoNoGrandChildren);

Controls the search for a tree element.
ignore type: do not check for a matching type, ignore text: do not check for a matching text, case sensitive: do not ignore the case, no descend: only check elements that direct children of the current node

TStringComparisonFunc = function (const a,b: string): boolean of object;
 
TTreeNodeEnumeratorNextCallback = function (current: TTreeNode): TTreeNode;
 
TTreeNodeEnumeratorAxis = (...);
 
Values
  • tneaSameNode
  • tneaDirectParent
  • tneaDirectChildImplicit
  • tneaDirectChild
  • tneaSameOrDescendant
  • tneaDescendant
  • tneaFollowing
  • tneaFollowingSibling
  • tneaAncestor
  • tneaPrecedingSibling
  • tneaPreceding
  • tneaSameOrAncestor
  • tneaDocumentRoot
  • tneaFunctionSpecialCase
  • tneaAttribute
TTreeNodeIntOffset = longint;
 
TNodeNameHash = cardinal;
 
TTreeNodeClass = class of TTreeNode;
 
TBasicParsingState = (...);
 
Values
  • bpmBeforeHtml
  • bpmBeforeHead
  • bpmInHead
  • bpmAfterHead
  • bpmInBody
  • bpmInFrameset
  • bpmAfterBody
  • bpmAfterAfterBody
TParsingModel = (...);

Parsing model used to interpret the document pmStrict: every tag must be closed explicitely (otherwise an exception is raised) pmHtml: accept everything, tries to create the best fitting tree using a heuristic to recover from faulty documents (no exceptions are raised), detect encoding

Values
  • pmStrict
  • pmHTML
  • pmUnstrictXML
TInternetToolsFormat = (...);
 
Values
  • itfUnknown
  • itfXML
  • itfHTML
  • itfJSON
  • itfXMLPreparsedEntity
  • itfPlainText

Constants

TreeNodesWithChildren = [tetOpen, tetDocument];
 

Author


Generated by PasDoc 0.16.0.