Owing to the fact that PHP generates its output dynamically, it is actually rather easy to generate messy output that is hard to read. While this is not a problem in itself, it does not look good on you and your website, and also makes the outputted HTML source code hard to read if you have debugging to do.
However, one of the new extensions available in PHP is called Tidy, and, amongst other things, it can clean up and repair poorly written HTML. More advanced users may want to use it to traverse their HTML documents in PHP, but, let's face it, it's called Tidy for a reason - that's what it does best.
Here's an example HTML document:
<TITLE>This is bad HTML</title>
<BODY>
This would get rejected as XHTML for a number of reasons.
First, the <FOO> tag doesn't exist.<br>Second, the tags aren't the same case.
Third, tags that don't end, like <HR>, aren't allowed.<br>
Tidy should fix all this for us!
As you can see, it's quite messy. Let's put it through Tidy with no particular options set:
$tidy = new tidy("lame.html");
$tidy->cleanRepair();
echo $tidy;
That will output the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>This is bad HTML</title>
</head>
<body>
This would get rejected as XHTML for a number of reasons. First,
the tag doesn't exist.<br>
Second, the tags aren't the same case. Third, tags that don't end,
like
<hr>
, aren't allowed.<br>
Tidy should fix all this for us!
</body>
</html>
As you can see, it's added all the right header and footer tags to make the overall content compliant, and normalised the case of the elements. Second, it's taken away the FOO tag because it's invalid. Third, it has wrapped the lines so they aren't too long. Finally, it has added a new line after each tag.
I don't know about you, but I find line-wrapping at a fixed width an alien concept. Even when I'm programming on a command-line Linux box (which I do quite regularly!) I still type very long lines of code - I make them as long as they need to be! Fortunately we can turn off Tidy's desire to wrap lines with the list of options. Tidy accepts quite a variety of different options, and we'll go over some of the popular ones in a moment. First things first, though: let's blast line wrapping and make the output actually look tidy!
<?php
$tidyoptions = array("indent" => true, "wrap" => 1000);
$tidy = new tidy("lame.html", $tidyoptions);
$tidy->cleanRepair();
echo $tidy;
?>
This time we use an array to store the options, enabling indent mode and setting the character-wrap limit to 1000 characters. Here's how that looks:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>
This is bad HTML
</title>
</head>
<body>
This would get rejected as XHTML for a number of reasons. First, the tag doesn't exist.<br>
Second, the tags aren't the same case. Third, tags that don't end, like
<hr>
, aren't allowed.<br>
Tidy should fix all this for us!
</body>
</html>
Much better, but not yet perfect: it's valid HTML 3.2 now, but I'd much rather we went the whole hog and made it valid XHTML. Does it involve rewriting the HTML? Of course not - thanks to Tidy!
<?php
$tidyoptions = array("indent" => true, "wrap" => 1000, "output-xhtml" => true);
$tidy = new tidy("lame.html", $tidyoptions);
$tidy->cleanRepair();
echo $tidy;
?>
That extra option makes the world of difference. Take a look at the output now:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
This is bad HTML
</title>
</head>
<body>
This would get rejected as XHTML for a number of reasons. First, the tag doesn't exist.<br />
Second, the tags aren't the same case. Third, tags that don't end, like
<hr />
, aren't allowed.<br />
Tidy should fix all this for us!
</body>
</html>
Now we get the works: a full XHTML doctype, all our tags are indented, and all our tags are closed. This is what we should be aiming for as standard.
Want to learn PHP 7?
Hacking with PHP has been fully updated for PHP 7, and is now available as a downloadable PDF. Get over 1200 pages of hands-on PHP learning today!
If this was helpful, please take a moment to tell others about Hacking with PHP by tweeting about it!
Next chapter: Options for Tidy >>
Previous chapter: Distinguishing code blocks
Jump to:
Home: Table of Contents
Copyright ©2015 Paul Hudson. Follow me: @twostraws.