Motorcycle Adventures and Free Software
Henri Bergius
Biker, free software consultant, neogeographer

See also my JavaScript blog, The Universal Runtime

There is a total of 861 posts.

Weblog: category "oscom"

Business analytics with CouchDB and NoFlo

Posted on 2011-09-21 17:52:53 UTC in 47° 0.000 N 13° 0.000 E 48km SE of Saalfelden am Steinernen Meer, AT to . 0 comments.

The purpose of business analytics is to find data from the company's information systems that can be used to support decision making. What customers buy most? What do they do before a buying decision? What are the signs that a customer may be leaving?

For the last month we've been working in Salzburg to build such a system, the Intelligent Project Controlling Tool needed for running large collaborative research projects like IKS. Since the design we went with can be reused for other business analytics needs, I wanted to write a bit about it.

But first, here is how our system looks like:

Proggis displaying IKS project plan

Where does the data come from?

There are many ways to gather business data. Often the information systems already contain the data needed. But it may also be hidden in a jungle of spreadsheets. Or maybe some data is simply not available, and has to be filled in manually.

Handling all these cases in one system is a tricky question. To solve it, we went with a two-layered strategy:

  • All data used for analytics is stored as Linked Data in a CouchDB system
  • NoFlo workflows are used for gathering data from the diverse sources and convert it to the format needed

In IKS's case, much of the data was available in a series of spreadsheets. With these, we built the necessary workflows for first converting the spreadsheets into XML with Apache Tika, and then extracting the information from them in a sensible subset of JSON-LD.

Because IKS is a collaborative project, information needs to be gathered from a diverse group of partner organizations. Some of them have systems that provide the needed APIs (like Basecamp, which we use), and we can just periodically import the data. But with many we decided on a simple data interchange approach: spreadsheets handled over email.

In this approach, user files a data request into the system. This gets picked up by NoFlo, which sends an email with the appropriate spreadsheet template to the partner. Then it starts waiting for a reply. When a reply arrives, it extracts the data from the attached spreadsheet and imports it to the system.

Our NoFlo processes are mostly initiated by the CouchDB change notification API. We keep them running persistently using forever Node, so whenever some operation needs to be run it happens nearly immediately.

Ensuring data consistency

With any automation, and especially with the email-based data interchange, things can go wrong. Because of this we tag all data that we receive with its origin, whether it was some automated operation or an imported spreadsheet. These origins are called execution documents. Users can browse all completed workflow executions and see what data came in from them. These can then be either accepted or rejected.

This way if some partner accidentally sends faulty data, or something else breaks, the incorrect information received can be easily removed. CouchDB's versioning capabilities help here.

Analyzing the data

CouchDB is built on top of the concept of map/reduce. Here you can modify and combine the data in lots of different ways using simple JavaScript functions. In our case we elected to write all our CouchDB code in CoffeeScript for simplicity. For example, here is the reduce function in CoffeeScript that counts totals of time planned, time used, and time left per task or partner in a project:

(keys, values, rereduce) ->
    roundNumber = (rnum, rlength) ->
        Math.round(parseFloat(rnum) * Math.pow(10, rlength)) / Math.pow(10, rlength)
    data =
        planned: 0.0
        spent: 0.0
        left: 0.0

    if rereduce
        for reducedData in values
            data.planned += reducedData.planned
            data.spent += reducedData.spent
        data.left = data.planned - data.spent
        return data

    for doc in values
        if doc['@type'] is 'effortallocation'
            data.planned += roundNumber doc.value, 1
        if doc['@type'] is 'effort'
            data.spent += roundNumber doc.value, 1
    data.left = roundNumber data.planned - data.spent, 1
    return data

If you figure out a new way to look at the data you have, simply write the needed map and reduce functions and save them into the database. CouchDB will then run them against existing data and produce numbers.

Data visualizations

Numbers are good, but to really see the information buried in them you need some visualizations. For this we decided to follow the CouchApp idea where the user interface code is stored in the database together with the data itself. This way no application servers are needed, and you can take the whole system with you just by replicating the database. Think of the possibility of doing some analysis on your company while flying to a meeting!

The visuals are in our case provided by JavaScript InfoVis Toolkit, a nice, MIT-licensed interactive graph library.

CouchDB views handle the number crunching, then CouchDB list functions process the numbers into the format needed for visualization. This leaves only a minimal amount of work for the client side.

For consistency our application has been built with CoffeeApp, so all the database and user interface code is in CoffeeScript.

In a nutshell

Any business analytics system dealing with moderate amounts of data can be built following this approach.

Simple architecture for a business analytics system

This way you have a business analytics environment that is easy to extend with more data when it becomes available. New analysis can be done by writing reasonably simple map/reduce functions, and CouchDB's replication capabilities allow you to take the system and data with you.

Using JSON-LD for the data storage makes a lot of sense, as this way the relations between different pieces of information are easy to handle. And using URIs for data identifiers means you can easily mash up information coming from different sources together.

The two-layered approach of using NoFlo for data imports, and CouchDB for analysis also allows for clean separation of concerns. In our case, I did the workflow part of things, and Szaby built the visualizations.

Sponsored links

save money using, phone card

VIE 2.0 is starting to emerge

Posted on 2011-09-21 15:01:28 UTC in 47° 0.000 N 13° 0.000 E 48km SE of Saalfelden am Steinernen Meer, AT to . 0 comments.

VIE is a JavaScript library that makes RDFa-annotated entities on web pages editable. We started the work towards the next major version of it, codenamed Zart (for Mozart) in a Salzburg IKS hackathon couple of weeks ago.

VIE

Yesterday I merged the Zart codebase into the VIE repository. This blog post describes some of the improvements it brings.

VIE now has an instance

For VIE 1.x users the first visible change (and probably the only necessary API change) is that now VIE needs to be instantiated before being used. Singletons are evil, and so we are not a singleton any longer.

So, for existing VIE code, you need to:

var vie = new VIE();
// and then any traditional VIE calls, like:
var entities = vie.RDFaEntities.getInstances('div.article');
console.log("There are " + entities.length + " RDFa entities in your articles");

The VIE 1.0 API can be disabled by passing a setting when instantiating VIE:

var vie = new VIE({classic: false});

Services and VIE

The other big change in VIE is that now the API has been built in a service-oriented manner. This means that for example reading and writing RDFa is just a service you can enable and disable at will.

The benefit here is that we can easily add support for other formats and capabilities without having to touch VIE internals. Thanks to the schema.org situation, Microdata is getting more use, and so at some point we'll probably add a service for it.

Registering and accessing services is easy:

// Instantiate VIE
var vie = new VIE();

// Pass the service instance and a name you want to use for it
vie.use(new vie.RdfaService, 'rdfa');

// Call a method from the service using the name
// this one would give us the RDF subject of the
// element matched by the jQuery selector
vie.service('rdfa').getElementSubject('div.article');

An immediate benefit here is that we can have two RDFa parsing implementations. If you have problems with our own custom jQuery-based RDFa parser, then you can use the more strict rdfQuery powered implementation instead:

vie.use(new RdfaRdfQueryService, 'rdfa');

Using deferreds

For the new main VIE API we created a sort of a Domain-Specific Language for handling semantic entities. A core part of it is that now all operations utilize jQuery's Deferred objects. With them you can attach different callbacks to the results of your operation, and they will fire either when the operation completes, or immediately if the operation has already been run.

This gives a lot of flexibility in using the API, and allows us to provide same API for services that deal with the DOM, and services that talk to external APIs like Stanbol.

For example, parsing RDFa from a given DOM element (provided with a jQuery selector) happens like this:

vie.load({
        element: 'div.article'
    }).
    from('rdfa').
    execute().
    done(function(entities) {
        console.log(entities);
    });

The chain here is: operation (in this case, load), from service (rdfa), execute operation, then when done, do callback.

With the RDFa service we register Backbone Views for the elements our entities came from, so just like with VIE 1.x, they will update automatically when you change the contents of your entities. But manual writing is also available in case you need it. Here is how it works:

vie.save({
        element: 'div.article',
        entity: someBackboneModel
    }).
    to('rdfa').
    execute().
    done(function() {
        console.log("Saved!");
    });

In addition to done, which fires if the operation succeeds, you have fail for failed operations, and then which fires regardless of success or failure.

Accessing external services

The new VIE is not just about RDFa. In addition to working with the entities you have on a page, you can also access external repositories of semantic information, like DBpedia.

For example, to find out everything that Wikipedia knows about Salzburg, you could run:

vie.use(new vie.DBPediaService, 'dbpedia');
vie.load({
        entity: '<http://dbpedia.org/resource/Salzburg>'
    }).
    using('dbpedia').
    execute().
    done(function(entity) {
        console.log("This is what we know of Salzburg");
        console.log(entity);
    });

In browser usage these calls to external services are subject to cross-domain AJAX limitations. A way to work around those is to set up a proxy, and tell the DBpedia service to use that. To do this, pass the proxy URL to the service when instantiating:

vie.use(new vie.DBPediaService({proxyUrl: 'http://localhost:8080'});

With this, all the factual information from Wikipedia will be at your disposal. The size of every city, the height of every mountain. Birthdates and places of birth for famous people. Your web app can do quite a bit with this information.

Finding entities from text

Apache Stanbol is a semantic engine that can extract all kinds of entities from text documents. It can be used for auto-tagging and other things.

Here is how you can use it with VIE:

vie.use(new vie.StanbolService, 'stanbol');
vie.analyze({
        element: 'div.article'
    }).
    using('stanbol').
    execute().
    done(function(entities) {
        console.log("We got the following enhancements for article content");
        console.log(entities);
    });

Stanbol can tell you what a piece of content talks about. People mentioned, places, concepts. It will also give you the language of the text.

Moving forward

The new version of VIE is still under heavy development. Most of the thngs work, but some details may still change. It is a good idea to start taking a look at it now, but before a beta release at least, VIE 1.0 is the recommended tool to use.

If you already use VIE 1.0 for making your content editable, VIE 2.x will give you a lot of additional power. Enhancements, data queries, namespace handling, and much more.

Thanks to Szaby and Sebastian for helping to make this happen!

Embrace and extend

Posted on 2011-09-11 23:14:02 UTC in 60° 9.834 N 24° 55.734 E Helsinki, FI to . 6 comments.

I'm getting worried about Google. Long one of the champions of the open web alongside Mozilla, the rise of social networking silos and the app economy seem to have scared them. And like any scared organism, they lash out.

Many of their plans to make web competitive against native development environments are good, there is indeed much to improve in the stack. But what I'm uneasy with is the unilateral way they go about it, preferring "big reveals" and post-facto standardization instead of the open conversation that built most of the Internet we have today. This is not the way to collaborate.

Consider some of their recent efforts:

  • SPDY, a protocol to replace HTTP which Web is built on. Currently only supported by Chrome, which uses it to talk to several Google services
  • Dart, their JavaScript-killer which recently surfaced through a leaked email
  • Microdata and Schema.org that seek to replace last ten years of semantic web development with a spec cooked up by couple of big vendors in secret

These - together with WebSQL, NaCl, WebM and WebP - mean that Google has active efforts to replace practically every layer of the web (except HTML itself) with something of their own design.

The way all of these were introduced bears strong reminders of how Microsoft tried to embrace, extend, and extinguish the web in late 90s. That period brought horrors like ActiveX and the awful, unkillable IE6. Though, for the sake of fairness, it also brought us XmlHttpRequest which was the enabler of the AJAX revolution.

Google's new technologies may end up being beneficial for web developers, but they also threaten to fragment the platform. After all, as the competition in the "post-PC" space heats up, the competitors are unlikely to embrace Google's extensions of the web stack. That would be a loss to all.

Brendan Eich, the original author of JavaScript comments on Hacker News:

So "Works best in Chrome" and even "Works only in Chrome" are new norms promulgated intentionally by Google. We see more of this fragmentation every day. As a user of Chrome and Firefox (and Safari), I find it painful to experience, never mind the political bad taste.

Ok, counter-arguments. What's wrong with playing hardball to advance the web, you say? As my blog tries to explain, the standards process requires good social relations and philosophical balance among the participating competitors.

Google's approach with Dart is thus pretty much all wrong and doomed to leave Dart in excellent yet non-standardized and non-interoperable implementation status. Dart is GBScript to NaCl/Pepper's ActiveG.

Disclaimer: I've been a long-time fan of many of Google's services, and have visited some of their offices a few times. I like the company. Which is exactly why I'm so concerned about this unilateral approach at standards. I am also involved in some standards processes through the IKS Project.

My secret agenda for PHP Content Management Systems

Posted on 2011-07-08 16:25:27 UTC in 48° 0.000 N 2° 0.000 E 10km NE of Saran, FR to . 5 comments.

As I've written before, I'm concerned about the state of the PHP ecosystem. There are lots of good applications written in the language, but there is very little code sharing between different projects, mainly because of framework incompatibilities, but also because of quite a strong NIH culture.

But there are also bright points. I've recently seen lots of exchange of ideas, and even potential code sharing between some communities including Symfony2, Midgard, TYPO3 and eZ Publish. Much of the vision in these systems is similar, as are many of the engineering principles. When everybody uses reasonable object-oriented design, namespaces, and test-driven development, it is much easier to share.

If I had to list three areas where there is most potential for collaboration, these would be:

Content model on the browser: VIE and RDFa

The age of communicating with your web audience via forms is almost over, and it is time to evolve. HTML5 includes support for the contentEditable attribute which allows rich editing interaction straight on the pages, and there are cool editors supporting that, including Aloha Editor and Mercury.

To do proper front-end editing, your CMS and the JavaScript environment have to agree on the content model. Fortunately there is a great solution for this: just annotate your content with some RDFa.

Having RDFa on a page allows the browser to understand the content. What is a collection of blog posts for instance, and what is the title of a blog post. With this, my VIE library will provide you with a nice in-browser content management API based on Backbone.js. Getting there is easy:

  1. Annotate your pages with RDFa
  2. Include vie.js to the pages
  3. Implement Backbone.sync

This allows a great deal of decoupling in the CMS stack. Suddenly the server side just has to worry about content management and page generation, and newer in-browser technologies can be used for actual content authoring.

Using RDFa annotations in your content comes also with another benefit: suddenly your pages themselves are an API into your content model. And search engines can understand and present your content better.

If you want to learn more about this, watch my talk from the Aloha Editor Dev Con.

Content persistence and retrieval: PHPCR

Historically, all CMSs have implemented persistence in their own way. There have been systems using relational databases like MySQL, systems providing their own content repository APIs like Midgard, and also some systems just using XML and the file system. This has reduced integration and code re-use possibilities between systems. In the Java world, a solution exists for this: the Java Content Repository standard (JCR).

Now JCR has been ported to PHP. PHPCR provides a standard interface for all content management needs, and has multiple back-ends available. Depending on your deployment needs, you could store your content into a relational database, into Apache Jackrabbit, or into for example MongoDB.

PHPCR is great in that you can start small: just model your content with a simple, filesystem-like tree of nodes and properties. Then when you need it, a wealth of functionality is available. Versioning? Query builders? Access control? It is all there for you to use. And, depending on the PHPCR back-end, you'll have the ability to scale up to insane amounts of content.

While I've advocated using content repositories for years now, this is the first time PHP has a true standardized, vendor-neutral API for it. And PHPCR is even being integrated into the JCR specification, eventually making it an official standard.

PHPCR discussion in Sursee, Switzerland

Adoption is also picking up. Yesterday I was in a meeting where we had developers from TYPO3, Symfony2, Doctrine and Midgard discussing issues and solutions in the content repository space. I just hope the other projects also pick this specification up.

Improving performance: AppServer-in-PHP

Of the three, this is probably the most controversial idea. Traditionally PHP is run as a scripting environment on a regular web server, like Apache or Nginx. In such setup, when the server receives a request, it passes it on to the PHP environment. The PHP environment loads all the code needed to fulfill the request, runs it, sends the response back, and unloads everything loaded.

This is fine when PHP is being used in the way Rasmus originally intended, as a simple display layer. But nowadays most of PHP runs on a big framework, whether it is MVC or something custom like Drupal. And loading and then discarding a whole framework for each request is simply insane.

With AppServer-in-PHP (AiP), you have an environment where even a big framework can perform. AiP provides you with a full server environment for PHP, written in PHP. In this setup, your framework is loaded when the server boots up, and then each request just runs the request processing part of it.

During the San Francisco Aloha Dev Con we ported TYPO3 to run on AiP, and the performance results where staggering. A simpler request with not much I/O would run 3-4 times faster than the same code on regular PHP setup, and an I/O -intensive request would still be twice as fast. AiP can't do much about I/O performance, but at least the cost of having a framework is greatly reduced.

In short, AppServer-in-PHP is something any developer running web services with a PHP framework should consider. It is also a great way for framework developers to see if they have request isolation problems in their design.

This post has been written in the TYPO3 Developer Days 2011 event where I was invited to discuss these ideas, and also help run the RDFa part of the TYPO3 Goes Semantic workshop.

Want to do something similar to PostRank?

Posted on 2011-06-04 08:29:33 UTC in 45° 0.000 N 122° 0.000 W 65km SE of Gresham, US to . 0 comments.

So, Google acquired PostRank, the service calculating impact of blog posts and other items in social media.

If you want something similar but without the Google tie-in, then a good option is my social impact calculator which is fully free software written in PHP. It was originally written in 2007, but the newer version has been cleaned of Midgard dependencies and updated to reflect the current popular social networking services. Usage example from my earlier post:

require('calculate.php');

$url = 'http://bergie.iki.fi/blog/introducing_the_midgard_create_user_interface/';

// Get the raw count for only one source
echo com_meego_planet_calculate::hackernews($url); // 145
echo com_meego_planet_calculate::facebook($url); // 1

// Get weighted total score for all sources
echo com_meego_planet_calculate::all($url); // 130.8

Openwashing

Posted on 2011-05-05 16:31:33 UTC in 47° 0.000 N 13° 0.000 E 48km SE of Saalfelden am Steinernen Meer, AT to . 0 comments.

Somehow I had missed this term being coined:

The old "open vs. proprietary" debate is over and open won. As IT infrastructure moves to the cloud, openness is not just a priority for source code but for standards and APIs as well. Almost every vendor in the IT market now wants to position its products as "open." Vendors that don't have an open source product instead emphasize having a product that uses "open standards" or has an "open API."

"Openwashing" is a term derived from "greenwashing" to refer to dubious vendor claims about openness. Openwashing brings the old "open vs. proprietary" debate back into play - not as "which one is better" but as "which one is which?"

Especially Google seems to be doing this quite a bit. If you want to be open, work in the open. This is the only way to ensure acceptance and sustainability for your code.

The beginning of a JavaScript journey

Posted on 2011-04-10 19:51:27 UTC in 60° 0.000 N 24° 0.000 E 28km S of Lojo, FI to . 0 comments.

While PHP remains my primary programming language for various reasons, my recent projects have involved quite a bit of JavaScript development. And I have to say I like it: the event-driven paradigm is quite elegant, closures are a joy to work with, and tools like Node.js and jQuery really open up the possibilities of the language.

But there is one weakness in the JS ecosystem: as things are just now picking up, the amount of information on especially making larger applications is quite sparse. To help solving this issue, I decided to start a new blog dedicated to what I learn in that space.

Like my earlier piece on the importance of the language, the blog is called The Universal Runtime.

If you run into interesting tutorials on JavaScript, or are doing something cool in the space, let me know!

VIE - Decoupled content management moves forward

Posted on 2011-03-09 17:54:15 UTC in 60° 0.000 N 24° 0.000 E 28km S of Lojo, FI to . 0 comments.

My posts on Decoupling Content Management, and especially the introduction to the "build a CMS, no forms allowed" approach we took with Midgard Create have generated a lot of interest.

When I first presented the approach in the recent Aloha Editor Developer Conference, many CMSs wanted to do something similar. And so we decided to strip the Midgard-specific parts out and make the tool a generic JavaScript library. As part of this work, the library was adopted by the IKS project and named VIE, or "Vienna IKS Editables". There first CMS implementations of VIE included WordPress, TYPO3 and KaraCos, with more on their way.

To get started with VIE, check these pages out:

In addition to Midgard Create, one of the first projects I'm implementing with VIE is Palsu, an interactive meeting tool powered by Node.js and Socket.io. It should explain the power of VIE outside of the traditional CMS space.

Update: VIE is now also available on npm:

npm install vie

Trying out Cloud9IDE: Developing software in your browser

Posted on 2011-03-02 14:29:23 UTC in 69° 0.000 N 20° 0.000 E 85km SE of Tromsø, NO to . 0 comments.

As I wrote in Better one file in the cloud than ten on the hard drive, when you mostly work on free software projects, then main frustration with a change of computer or a crashed harddrive is not lost files, but having to rebuild your development environment. The browser-based software development tool Cloud9IDE aims to solve that by moving the whole development environment to the "cloud".

cloud9ide-small.png

With Cloud9IDE you get an excellent code editor thanks to the ACE project (formerly known as Mozilla Bespin), git integration and possibility to run your Node.js code right on the server. The editor supports multiple programming languages including PHP, Python and Ruby, but obviously JavaScript is the main target with very nice debugging tools. Cloud9IDE can either be used on the cloud, or you can run your own installation of the GPLd system.

What is good:

  • Tight GitHub integration. All your GitHub projects are there, and you can register and log in using your GitHub credentials
  • Fast and nice UI, at least on modern browsers like Chromium. It is probably also very fast on Firefox 4
  • Integrated console and debugging tools
  • No need to install anything, but still the possibility to run your own instance if you want
  • Cloud9IDE is free software

What still needs love:

  • In the current hosted version there is no way to commit stuff back to Git, or to work with branches
  • No way to work with other Git repositories than GitHub
  • Running code in Node.js doesn't seem to support any installable modules
  • Some parts of the UI had glitches, like menus becoming transparent on hover

Decoupling Content Management

Posted on 2011-02-23 16:32:47 UTC in 60° 0.000 N 24° 0.000 E 28km S of Lojo, FI to . 5 comments.

Traditional content management systems are monolithic beasts. Just to make your website editable you need to accept the web framework imposed by the system, the templating engine used by the system, and the editing tools used by the system. Want to have a better user interface? Be prepared to rewrite your whole website, and to the pain of having to migrate content between different storage systems.

But none of this should be necessary. When web editing tools were more immature, it made sense for the same people to build the whole stack from database content models to web page generation and editing tools. But that was ten years ago, now we could do better.

Here is how a traditional CMS looks like:

cms-monolithic-approach.png

As you can see, the whole system is a monolithic block. The CMS provides content storage, routing, templating, editing tools, the kitchen sink. Probably you're even tied to a particular relational database for content storage. Want to use a cool new editor like Aloha, or a different templating engine, or maybe a trendy NoSQL storage back-end? You'll have to convince the whole CMS project or vendor to switch over.

A much better picture would be something like the following:

cms-decoupled-approach.png

In this scenario, the concept of Content Management is decoupled. There is a content repository that manages content models and how to store them. This could be something like JCR, PHPCR, CouchDB or Midgard2. Then there is a web framework, responsible of matching URL requests to particular content and generating corresponding web pages. This could be Drupal, Flow3, Django, CodeIgniter, Midgard MVC, or something similar. And finally there is the web editing tool. The web editing tool provides an interface for managing contents of the web pages. This includes functionalities like rich text editing, workflows and image handling.

The web editing tools have traditionally been part of the web framework, the framework serving forms and toolbars to the user as part of the generated web pages. But with modern browsers you could throw forms out of the window and just make pages editable as they are.

Common representation of content on HTML level

How would the communication between the web editing tool and the backend work, then?

cms-decoupled-communications.png

First of all, the web editing tool has to understand the contents of the page. It has to understand what parts of the page should be editable, and how they connect together. If there is a list of news for instance, the tool needs to understand it enough to enable users to add new news items. The easy way of accomplishing this is to add some semantic annotations to the HTML pages. These annotations could be handled via Microformats, HTML5 microdata, but the most power lies with RDFa.

RDFa is a way to describe the meaning of particular HTML elements using simple attributes. For example:

<div typeof="http://rdfs.org/sioc/ns#Post" about="http://example.net/blog/news_item">
    <h1 property="dcterms:title">News item title</h1>
    <div property="sioc:content">News item contents</div>
</div>

Here we get all the necessary information for making a blog entry editable:

  • typeof tells us the type of the editable object. On typical CMSs this would map to a content model or a database table
  • about gives us the identifier of a particular object. On typical CMSs this would be the object identifier or database row primary key
  • property ties a particular HTML element to a property of the content object. On a CMS this could be a database column

As a side effect, we also manage to make our page more understandable to search engines and other semantic tools. So the annotations are not just needed for UI, but also for SEO.

Common representation of content on JavaScript level

Having contents of a page described via RDFa makes it very easy to extract the content model into JavaScript. We can have a common utility library for doing this, but we also should have a common way of keeping track of these content objects. Enter Backbone.js:

Backbone supplies structure to JavaScript-heavy applications by providing models with key-value binding and custom events, collections with a rich API of enumerable functions, views with declarative event handling, and connects it all to your existing application over a RESTful JSON interface.

With Backbone, the content extracted from the RDFa-annotated HTML page is easily manageable via JavaScript. Consider for example:

objectInstance.set({title: 'Hello, world'});
objectInstance.save(null, {
success: function(savedModel, response) {
alert("Your article '" + savedModel.get('title') + "' was saved to server");
}
});

This JS would work across all the different CMS implementations. Backbone.js provides a quite nice RESTful implementation of communicating with the server with JSON, but it can be easily overridden with CMS-specific implementation by just implementing a new Backbone.Sync method. Look for example at the localStorage Backbone.js Sync implementation.

New possibilities for collaboration

Once the different Content Management Systems describe their content with RDFa, and provide an unified JavaScript API to it, lots of things become possible. While most systems probably want to have their own look-and-feel, still many functionalities can be shared. Consider for example:

  • Using browser's localStorage for storing drafts of content edited by user. Never lose content!
  • Collaborative editing via XMPP or WebSockets
  • Versioning and undo
  • Semantic enrichment of content using tools like Apache Stanbol

All of these would be quite hard to implement by an individual CMS project. But if we have a common JS layer available, the effort can be shared by all CMS projects implementing these ideas.

There have been prior efforts at doing something similar. In the early 2000s, OSCOM made the Twingle tool that was able to edit and save content with multiple CMSs. Then there was the Atom Publishing Protocol and the Neutrol Protocol efforts, and also CMIS. But all of these mandated that the systems would have to implement some particular server-side protocol. The advantage of the approach promoted here is that the only server-side change needed is adding RDFa annotations to HTML templates, and then the rest happens on JavaScript level.

The new CMS interface we've built for Midgard2 already uses these concepts. Now here in the Aloha Editor Developer conference we're talking with Drupal and TYPO3 developers about rolling out the same ideas in their systems. Other systems and projects are also more than welcome to participate.

Update: The work is underway to generalize the RDFa-Backbone.js bridge I originally wrote for Midgard Create. You can find it on GitHub. We're currently experimenting with it on both Midgard2 and TYPO3.