Class TTreeParser

Unit

Declaration

type TTreeParser = class(TObject)

Description

This parses an HTML/SGML/XML file to a tree like structure.

To use it, you have to call parseTree with a string containing the document. Afterwards you can call getLastTree to get the document root node.

The data structure is like a stream of annotated tokens with back links (so you can traverse it like a tree).
If TargetEncoding is not CP_NONE, the parsed data is automatically converted to that encoding. (the initial encoding is detected depending on the unicode BOM, the xml-declaration, the content-type header, the http-equiv meta tag and invalid characters.) You can change the class used for the elements in the tree with the field treeNodeClass.

Hierarchy

Overview

Fields

Public treeNodeClass: TTreeNodeClass;
Public globalNamespaces: TNamespaceList;
Public allowTextAtRootLevel: boolean;

Methods

Public constructor Create;
Public destructor destroy; override;
Public procedure clearTrees;
Public function parseTree(html: string; uri: string = ''; contentType: string = ''): TTreeDocument; virtual;
Public function parseTreeFromFile(filename: string): TTreeDocument; virtual;
Public function getLastTree: TTreeDocument;
Public procedure addTree(t: TTreeDocument);
Public procedure removeEmptyTextNodes(const whenTrimmed: boolean);

Properties

Published property parsingModel: TParsingModel read FParsingModel write FParsingModel;
Published property repairMissingStartTags: boolean read FrepairMissingStartTags write FrepairMissingStartTags ;
Published property repairMissingEndTags: boolean read FRepairMissingEndTags write FRepairMissingEndTags ;
Published property trimText: boolean read FTrimText write FTrimText;
Published property readComments: boolean read FReadComments write FReadComments;
Published property readProcessingInstructions: boolean read FReadProcessingInstructions write FReadProcessingInstructions;
Published property autoDetectHTMLEncoding: boolean read FAutoDetectHTMLEncoding write fautoDetectHTMLEncoding;
Published property TargetEncoding: TSystemCodePage read FEncodingTarget write FEncodingTarget;

Description

Fields

Public treeNodeClass: TTreeNodeClass;

Class of the tree nodes. You can subclass TTreeNode if you need to store additional data at every node

Public globalNamespaces: TNamespaceList;
 
Public allowTextAtRootLevel: boolean;
 

Methods

Public constructor Create;
 
Public destructor destroy; override;
 
Public procedure clearTrees;
 
Public function parseTree(html: string; uri: string = ''; contentType: string = ''): TTreeDocument; virtual;

Creates a new tree from an HTML document contained in html.
contentType is used to detect the encoding

Public function parseTreeFromFile(filename: string): TTreeDocument; virtual;
 
Public function getLastTree: TTreeDocument;

Returns the last created tree

Public procedure addTree(t: TTreeDocument);
 
Public procedure removeEmptyTextNodes(const whenTrimmed: boolean);
 

Properties

Published property parsingModel: TParsingModel read FParsingModel write FParsingModel;

Parsing model, see TParsingModel

Published property repairMissingStartTags: boolean read FrepairMissingStartTags write FrepairMissingStartTags ;
 
Published property repairMissingEndTags: boolean read FRepairMissingEndTags write FRepairMissingEndTags ;
 
Published property trimText: boolean read FTrimText write FTrimText;

If this is true (default is false), white space is removed from text nodes

Published property readComments: boolean read FReadComments write FReadComments;

If this is true (default is false) comments are included in the generated tree

Published property readProcessingInstructions: boolean read FReadProcessingInstructions write FReadProcessingInstructions;

If this is true (default is false) processing instructions are included in the generated tree

Published property autoDetectHTMLEncoding: boolean read FAutoDetectHTMLEncoding write fautoDetectHTMLEncoding;

Determines if the encoding should be automatically detected (default true)

Published property TargetEncoding: TSystemCodePage read FEncodingTarget write FEncodingTarget;
 

Generated by PasDoc 0.16.0.