rowbot/dom is an attempt to implement the Document Object Model (DOM) in PHP that is more inline with current standards. While PHP does already have its own implementation of the DOM, it is somewhat outdated and is more geared towards XML/XHTML/HTML4. This is very much a work in progress and as a result things may be broken.
- PHP >= 7.1
ext-mbstring
rowbot/url
- Does not rely on ext-dom
- Robust HTML5 tokenizer and parser
- Supports the
<template>
element - Supports
innerHTML
andouterHTML
- Supports live ranges
- Extensive test suite ported from Web platform tests
The primary entry point is the DocumentBuilder
class. It allows you create a document while specifing things such as
the Document's base URL and whether or not scripting should be emulated.
Returns a new instance of the DocumentBuilder
.
Required. Sets the content type of the document. If the given content type is invalid, a TypeError
will be thrown. This
will determine the type of document returned as well as what parser to use. The content type can be one of the following:
- 'text/html'
- 'text/xml'
- 'application/xml'
- 'application/xhtml+xml'
- 'image/svg+xml'
Sets the URL of the document. This is used for resolving links in tags such as <a href="/index.php"></a>
and
for resolving any links specified by <base>
elements in the document. If not set, the document will default to the
"about:blank" URL. This must be an absolute URL. If the given URL fails parsing, a TypeError
will be thrown. Not all
valid URIs are a valid document URL, for example, this will happily accept a URI of "mailto:[email protected]", so you
should take care when setting this value.
DocumentBuilder::create()->setDocumentUrl('http://example.com/');
DocumentBuilder::create()->setDocumentUrl('file:///C:/example.html');
DocumentBuilder::create()->setDocumentUrl('https://my.domain.net/index.php');
DocumentBuilder::create()->setDocumentUrl('https://searchengine.fr/search');
Enables scripting emulation. Enabling this does not cause any scripts to be executed. This affects how
the parser and serializer handle <noscript>
tags. If scripting emulation is enabled, then their content
will be seen as plain text to the DOM. If emulation is disabled, which is the default, their content
will be parsed as part of the DOM.
$document = DocumentBuilder::create()
->setContentType('text/html')
->emulateScripting(true)
->createEmptyDocument();
$el = $document->createElement('div');
$el->innerHTML = '<noscript><p id="foo">You must enable scripting!</p></noscript>';
$el->textContent; // <p id="foo">You must enable scripting!</p></noscript>
$foo = $el->getElementById('foo'); // null
$el->firstChild->firstChild->nodeName; // #text
$document = DocumentBuilder::create()
->setContentType('text/html')
->emulateScripting(false)
->createEmptyDocument();
$el = $document->createElement('div');
$el->innerHTML = '<noscript><p id="foo">You must enable scripting!</p></noscript>';
$el->textContent; // You must enable scripting!
$foo = $el->getElementById('foo'); // HTMLParagraphElement
$el->firstChild->firstChild->nodeName; // P
Parses the input string and returns the resulting Document
object. This will throw a TypeError
if the content type is not specified.
Returns an empty Document
object. The type of Document
object returned is dependent on the specified
content type. This will throw a TypeError
if the content type is not specified.
<?php
require_once 'vendor/autoload.php';
use Rowbot\DOM\DocumentBuilder;
// Creates a new DocumentBuilder, and saves the resulting document to $document
$document = DocumentBuilder::create()
// This is required. Tells the builder to what type of document and parser should be used.
->setContentType('text/html');
// Set's the document's URL, for more accurate link parsing. Not setting this will cause the
// document to default to the "about:blank" URL. This must be a valid URL.
->setDocumentUrl('https://example.com')
// Whether or not the environment should emulate scripting, which mostly affects how <noscript>
// tags are parsed and serialized. The default is false.
->emulateScripting(true)
// Returns a new document using the input string.
->createFromString(file_get_contents('path/to/my/index.html'));
// Do some things with the document
$document->getElementById('foo');
<?php
require_once "vendor/autoload.php";
use Rowbot\DOM\DOMParser;
$parser = new DOMParser();
// Currently "text/html" is the only supported option.
$document = $parser->parseFromString(file_get_contents('/path/to/file.html'), 'text/html');
// Do some things with the document
$document->getElementById('foo');
<?php
require_once "vendor/autoload.php";
use Rowbot\DOM\DocumentBuilder;
/**
* This creates a new empty HTML Document.
*/
$doc = DocumentBuilder::create()
->setContentType('text/html')
->createEmptyDocument();
/**
* Want a skeleton framework for an HTML Document?
*/
$doc = $doc->implementation->createHTMLDocument();
// Set the page title
$doc->title = "My HTML Document!";
// Create an HTML anchor tag
$a = $doc->createElement("a");
$a->href = "http://www.example.com/";
// Insert it into the document
$doc->body->appendChild($a);
// Convert the DOM tree into a HTML string
echo $doc->toString();
- Only UTF-8 encoded documents are supported.
- All string input is expected to be in UTF-8.
- All strings returned to the user, such as those returned from
Text.data
, are in UTF-8, rather than UTF-16. - All string offsets and lengths such as those in
Text.replaceData()
orText.length
are expressed in UTF-8 code points, rather than UTF-16 code units. - No XML parser exists at this time. However, XML documents can be built manually and serialized.
- For the entire Document:
- You may call the
toString()
method on the Document, e.g.$document->toString()
, or you may cast the Document to a string, e.g.(string) $document
,
- You may call the
- For Elements:
- Depending on your needs, you may use the
innerHTML
property to get all of the Element's descendants, e.g.$element->innerHTML
, or you may use theouterHTML
property to get the Element itself and all its descendants, e.g.$element->outerHTML
.
- Depending on your needs, you may use the
- For Text nodes:
- You may use the
data
property, e.g.$textNode->data
to get the text data from the node.
- You may use the
- For the entire Range:
- You may call the
toString()
method, e.g.$range->toString()
, or you may cast the Range to a string, e.g.(string) $range
.
- You may call the