Sunday Notes from MgdSchema Workshop

cover image for Sunday Notes from MgdSchema Workshop

Midgard Query Builder

Jukka developed a new query builder addition to the MgdSchema system that enables Midgard developers to easily optimize the SQL queries used in their applications.

Jukka's Query Builder presentation

Currently the Query Builder is available in the Midgard C API, and the PHP mapping should be relatively easy to do. Piotras or Jukka will implement it next week.

The PHP API will provide a MidgardQueryBuilder class which works like the following:

<?php
// Instantiate the Query Builder for seeking MidgardArticles
$query = new MidgardQueryBuilder("MidgardArticle");

// Next add the SQL constraints you need

// List articles only from specific topic
$query->addConstraint("topic", "=", $topic->id);

// List only articles that have been approved since some timestamp
$query->addConstraint("approved", ">", $starting_time);

// Order the articles based on their approval time
$query->addOrder("approved", "DESC");

// Get only 20 articles for this particular view
$query->setLimit(20);

// Start from the Nth page of this article list
$query->setOffset($_REQUEST["startfrom"]);

// Execute the query returning an array of matching MidgardArticle objects
// The MidgardArticles are the full article objects with all regular methods
$articles = $query->execute();

if (!$articles)
{
	// Handle error
}

// And then display your articles
print_r($articles);
?>

Once the Query Builder is available for PHP, we can start really developing Midgard2. One of the tasks I’m eager to begin is developing a compatibility layer of the Classic Midgard API in PHP. Midgard Lite already has a 70% complete implementation of the API in pure PHP that will be easy to modify to use the Query Builder instead of DB_DataObject. And when we have the API implemented in PHP, we can start removing huge chunks of legacy code from midgard-php.

Java in Midgard

While PHP is still the web development language of choice in Midgard CMS, Java programming language support is also rising. Jukka has already implemented support for the Java Content Repository standard. JCR has been originally developed by Day Software in Switzerland as a generic content management API, the “JDBC of Content Management Systems”.

Site building tutorial

JSR-170, the Java Content Repository

With JCR, the content repository is divided into workspaces that in Midgard are represented by Sitegroups. JCR spec also defines a method for copying content between workspaces that is not yet implemented in midgard-java. However, this could be interesting future way to implement staging-live.

Within the workspace the content is managed as a tree. With Midgard, there is a virtual root node, and under that are the content roots like topic and style trees. Non-hierarchical structures are stored as references.

JCR provides an API for traversing the content hierarchy, and making modifications to different properties. The modifications can be collected into a set of atom operations that can be saved together. JCR would also provide real transactions, but these are not yet supported by Midgard.

Midgard JCR support also supports XPath queries and XML import/export we already utilize in the Exorcist cross-CMS content migration tool. JCR also has an introspection system that can be used by clients for creating custom administrative interfaces that automatically support all new content types in the repository.

With the JCR Server system, the Midgard repository is also available through RMI and WebDAV.

Preparing for the sightseeing flight

The big question with JCR is whether the standard will be adopted by different CMS developers. If it catches up, the benefits will be big especially in creating cross-CMS tools. IBM has already announced support for the standard, and it has also been noted by OSCOM.

Midgard-java installation

The suggestion for making JCR installation easier would be to add it to the Midgard Core package. The Java Native Interface would then be compiled by default, and server administrator could simply enable JCR by installing a Java Virtual Machine.

Another consumer for a JVM in Midgard is the Lucene-based indexing system in MidCOM, and the installation locations and dependencies should be synchronized between the two.

Jukka will try to produce an installation HOWTO for setting up midgard-java together with the JCR Server next week.

MidCOM indexing with Lucene

MidCOM uses a Lucene-based indexer for providing a full-text search system that provides a “live” index into the site data. All MidCOM components notify the indexer every time they change the data, meaning that all searches made in the system will return current content.

With the search system, users can easily query either text from anywhere in the Midgard content structure, or using advanced syntax for searching based on specific content fields or value ranges.

Discussion after sauna

Documents are organized within the MidCOM index based on their resource identifier, which is typically the object GUID. The fields are indexed separately, but they are also combined into the content field for the regular full-text search of all data. For native MidCOM content, the topic field is also stored into the index. External indexed data like OpenPSA content should not utilize that field.

The index also contains metadata like creation, revision and indexing timestamps. These can be used for limiting searches.

Indexing is handled by PHP class midcom_services_indexer_document, and its more contextual children for handling datamanager documents and file attachments:

<?php
// Get the indexer service from MidCOM
$indexer = & $GLOBALS["midcom"]->get_service('indexer');

if ($_REQUEST["action"] == "update")
{
	// Pass your datamanager data array to be indexed
	$indexer->index($datamanager);
}
elseif ($_REQUEST["action"] == "delete")
{
	// Drop the document from the index
	$indexer->delete($article->guid());

	// Delete the actual content object
	midcom_helper_purge_object($article->guid());
}
?>

The datamanager schemas can contain some hints for the indexer on how to handle them.

MidCOM indexer is relatively easy to set up, but needs yet to be integrated into Midgard packages. The suggested directory for the index is $MIDGARD_PREFIX/share/midgard/indexer/$INDEX_NAME.

The index can be accessed in two different ways. The midcom.helper.search component provides a normal site search engine with both a simple interface and an advanced search with support for limiting the search based on content types, topic trees and modification dates. The simple search form can be easily included into the site layout using MidCOM’s dynamic_load method.

The other method is by using the midcom.services.indexer API. For example, to list all images in a photo gallery taken with ISO rating 400, the code would be:

<?php
// Search for value "400" in schema field "ISO"
$query = "ISO:400";

// Search only in photo galleries
$query .= " AND __COMPONENT:net.siriux.photos";

// Search only photos taken since $date
$query .= " AND __CREATED > $date";

// Execute the query with Lucene
$result = $indexer->query($query);

// $result contains all matches as midcom_services_indexer_document objects
// sorted by relevance
print_r($result);
?>

At the moment the indexer is only available within MidCOM context, but Torben is working on adding support for external Midgard/PHP applications.


Read more Midgard posts.