PHP TutorialCompile PHP ExtensionsContributing to the PHP CoreContributing to the PHP ManualCreate PDF files in PHPInstalling a PHP environment on WindowsPHP Alternative Syntax for Control StructuresPHP APCuPHP Array iterationPHP ArraysPHP Asynchronous programmingPHP Autoloading PrimerPHP BC Math (Binary Calculator)PHP Built in serverPHP CachePHP Classes and ObjectsPHP ClosurePHP Coding ConventionsPHP Command Line Interface (CLI)PHP CommentsPHP Common ErrorsPHP Compilation of Errors and WarningsPHP Composer Dependency ManagerPHP ConstantsPHP Control StructuresPHP CookiesPHP CryptographyPHP DateTime ClassPHP DebuggingPHP Dependency InjectionPHP Design PatternsPHP Docker deploymentPHP Exception Handling and Error ReportingPHP Executing Upon an ArrayPHP File handlingPHP Filters & Filter FunctionsPHP Functional ProgrammingPHP FunctionsPHP GeneratorsPHP Headers ManipulationPHP How to break down an URLPHP How to Detect Client IP AddressPHP HTTP AuthenticationPHP Image Processing with GDPHP ImagickPHP IMAPPHP Installing on Linux/Unix EnvironmentsPHP JSONPHP LocalizationPHP LoopsPHP Machine learningPHP Magic ConstantsPHP Magic MethodsPHP Manipulating an ArrayPHP mongo-phpPHP Multi Threading ExtensionPHP MultiprocessingPHP MySQLiPHP MySQLi affected rows returns 0 when it should return a positive integerPHP NamespacesPHP Object SerializationPHP OperatorsPHP Output BufferingPHP Outputting the Value of a VariablePHP Parsing HTMLPHP Password Hashing FunctionsPHP PDOPHP PerformancePHP PHPDocPHP Processing Multiple Arrays TogetherPHP PSRPHP Reading Request DataPHP RecipesPHP ReferencesPHP ReflectionPHP Regular Expressions (regexp/PCRE)PHP Secure Remeber MePHP SecurityPHP Sending EmailPHP SerializationPHP SessionsPHP SimpleXMLPHP SOAP ClientPHP SOAP ServerPHP SocketsPHP SPL data structuresPHP SQLite3PHP StreamsPHP String formattingPHP String Parsing

PHP Parsing HTML

From WikiOD

Parsing HTML from a string[edit | edit source]

PHP implements a DOM Level 2 compliant parser, allowing you to work with HTML using familiar methods like getElementById() or appendChild().

$html = '<html><body><span id="text">Hello, World!</span></body></html>';

$doc = new DOMDocument();

echo $doc->getElementById("text")->textContent;


Hello, World!

Note that PHP will emit warnings about any problems with the HTML, especially if you are importing a document fragment. To avoid these warnings, tell the DOM library (libxml) to handle its own errors by calling libxml_use_internal_errors() before importing your HTML. You can then use libxml_get_errors() to handle errors if needed.

Using XPath[edit | edit source]

$html = '<html><body><span class="text">Hello, World!</span></body></html>';

$doc = new DOMDocument();

$xpath = new DOMXPath($doc);
$span = $xpath->query("//span[@class='text']")->item(0);

echo $span->textContent;


Hello, World!

SimpleXML[edit | edit source]

Presentation[edit | edit source]

  • SimpleXML is a PHP library which provides an easy way to work with XML documents (especially reading and iterating through XML data).
  • The only restraint is that the XML document must be well-formed.

Parsing XML using procedural approach[edit | edit source]

// Load an XML string
$xmlstr = file_get_contents('library.xml');
$library = simplexml_load_string($xmlstr);

// Load an XML file
$library = simplexml_load_file('library.xml');

// You can load a local file path or a valid URL (if allow_url_fopen is set to "On" in php.ini

Parsing XML using OOP approach[edit | edit source]

// $isPathToFile: it informs the constructor that the 1st argument represents the path to a file,
// rather than a string that contains 1the XML data itself.

// Load an XML string
$xmlstr = file_get_contents('library.xml');
$library = new SimpleXMLElement($xmlstr);

// Load an XML file
$library = new SimpleXMLElement('library.xml', NULL, true);

// $isPathToFile: it informs the constructor that the first argument represents the path to a file, rather than a string that contains 1the XML data itself.

Accessing Children and Attributes[edit | edit source]

  • When SimpleXML parses an XML document, it converts all its XML elements, or nodes, to properties of the resulting SimpleXMLElement object
  • In addition, it converts XML attributes to an associative array that may be accessed from the property to which they belong.

When you know their names:[edit | edit source]

$library = new SimpleXMLElement('library.xml', NULL, true);
foreach ($library->book as $book){
    echo $book['isbn'];
    echo $book->title;
    echo $book->author;
    echo $book->publisher;
  • The major drawback of this approach is that it is necessary to know the names of every element and attribute in the XML document.

When you don't know their names (or you don't want to know them):[edit | edit source]

foreach ($library->children() as $child){
    echo $child->getName();
    // Get attributes of this element
    foreach ($child->attributes() as $attr){
        echo ' ' . $attr->getName() . ': ' . $attr;
    // Get children
    foreach ($child->children() as $subchild){
        echo ' ' . $subchild->getName() . ': ' . $subchild;