HTML Tidy integrated into MidCOM

I’ve today integrated John Coggeshall’s Tidy PECL extension into Midgard CMS. This means that all WYSIWYG-edited content should now be stored in proper, cleaned and indented XHTML.

The way this works is that widget_html, the Datamanager widget for the HTMLAREA editor checks if the Tidy PHP functions are available, and if they can be found runs the HTML through the cleaner.

The commit didn’t quite make it to the MidCOM 2.4.3 release so get it from CVS or patch manually if you want to try it out.

There are still some things I would like to do using the Tidy extension, including displaying possible content accessibility warnings, and making the Tidy options configurable. Now the Tidy options used are:

  • show-body-only
  • output-xhtml
  • enclose-block-text
  • drop-empty-paras
  • indent
  • break-before-br
  • drop-font-tags
  • drop-proprietary-attributes
  • bare

I had planned to do this integration much earlier, but always had issues getting the tidy extension to compile. The reason for this was that the binaries distributed on the tidy site don’t include the shared libraries and compiling tidy was needed. Some Linux distributions apparently provide packages for the tidy libraries.


Read more Midgard posts.