Parse HTML using jQuery-like syntax in PHP

Quite often there is a need to parse HTML and extract some values from deep-deep nested tables or so. Most front solution is to use regular expressions but they sucks with nested tags. Other way is to use XPath, which performs much better here, but has not simple syntax to use.

Nowadays almost all PHP developers knows jQuery, which became like a standard in front-end development. Why not to use it for HTML parsing using familiar syntax.

For sure it is impossible to use jQuery javascript based for parsing, but there are PHP implementations(!!!). It allows to do all DOM manipulation original jQuery can do.

Lets get to a simple example. In the table below I need to extract values to an array.

<table class="oDescTable oJobClient">
<tr>
<th>Total Spent</th>
<td><strong>
Over $10,000
</strong></td>
</tr>
<tr>
<th>Hours Billed</th>
<td><strong>7,575</strong></td>
</tr>
<tr>
<th>Jobs Posted</th>
<td><strong>136</strong></td>
</tr>
<tr>
<th>Hires</th>
<td><strong>51</strong></td>
</tr>
<tr>
<th>Open Jobs</th>
<td><strong>1</strong></td>
</tr>
<tr>
<th>Current Team Size</th>
<td><strong>0</strong></td>
</tr>
</table>

# Using phpQuery

You can get Basics of this library here. All magic is done by one function pq() which acts like $() analog.

// Include library 
include_once 'phpQuery.php';
// Load HTML document
phpQuery::newDocumentHTML($html);
$p = array();
// Call pq() to extract needed values
foreach(pq('table.oJobClient tr') as $tr) {
   $tr = pq($tr);
   // Save values to the array
   $p[ trim($tr->find('th')->text()) ] = trim($tr->find('td')->text());
}

And the result will be an array with values

array (
  'Total Spent => 'Over $10,000'
  'Hours Billed' => '7,575'
  'Jobs Posted' => '136',
  etc..
}

Simple ? This still very small portion of what this library is capable of.

Using phpQuery