Motorcycle Adventures and Free Software
Henri Bergius
Biker, free software consultant, neogeographer

There is a total of 751 posts.

Weblog: category "midgard"

What is a content repository

Posted on 2009-11-19 10:02:03 UTC in 60° 10.272 N 24° 55.956 E Helsinki, FI to .

Joint post of Henri Bergius and Michael Marth cross-posted here and here.

Web Content Repositories are more than just plain old relational databases. In fact, the requirements that arise when managing web content have led to a class of content repository implementations that are comparable on a conceptual level. During the IKS community workshop in Rome we got together to compare JCR (the Jackrabbit implementation) and Midgard's content repository. While in some cases the terminology might be different, many of the underlying ideas are identical. So we came up with a list of common traits and features of our content repositories. For comparison, there is also Apache CouchDB.

So, why use a Content Repository for your application instead of the old familiar RDBMS? Repositories provide several advantages:

  • Common rules for data access mean that multiple applications can work with same content without breaking consistency of the data

  • Signals about changes let applications know when another application using the repository modifies something, enabling collaborative data management between apps

  • Objects instead of SQL mean that developers can deal with data using APIs more compatible with the rest of their desktop programming environment, and without having to fear issues like SQL injection

  • Data model is scriptable when you use a content repository, meaning that users can easily write Python or PHP scripts to perform batch operations on their data without having to learn your storage format

  • Synchronization and sharing features can be implemented on the content repository level meaning that you gain these features without having to worry about them

feature JCR / Jackrabbit Midgard CouchDB
content type system In JCR structured or unstructured nodes are supported and can be mixed at will in a content tree. Content types are defined in MgdSchema types. All content must be stored to an MgdSchema type, but types can be extended on content instance level using the "parameter" triplets Type-free
type hierarchy Structured node types support inheritence of types, additional cross-cutting aspects can be added with "mixins". Node types can define allowed node types for child nodes in the content hierarchy. MgdSchemas allow inheritance, and an extended type can be instantiated either using the extended type or the base type Type-free
IDs Nodes with mixin "referenceable" have GUID. In practice the node path is often used to reference nodes. Every object has a GUID used for referencing. Objects located in trees that have a "name" property can also be referred to using the path All objects can be accessed via a UUID
References Nodes can reference each other with hard link (special property type) or soft link (by referring to the node path) MgdSchema types can have properties linking to other objects of same or different type. A link of "parentfield" type places an MgdSchema type in a tree. No reference support built-in
content hierarchy All content is hierarchical / in a tree Content can exist in tree, or independently of it depending on the MgdSchema type definition flat structure
interesting property types Multi-valued (like an array), binary properties (e.g. for files), nodes have an implicit sort-order Binary properties stored using the Midgard Attachment system Support for binary properties
transactions Multiple content modifications are written in transactions. Transactions can be used optionally.
events JCR Observers can register for content changes on different paths and/or for different node types and/or CRUD, receive notification of changes as serialized node All transactions cause both process-internal GObject signals, and interprocess DBus signals Support for one external event notification shell script
workspaces Workspaces provide separate root trees. No workspaces support in Midgard 9.03, coming in next version Multiple databases within one CouchDB instance
import and export nodes or parts of the repository (or the whole repo) can be imported or exported in XML. 2 formats: docview for human-frindly representation, sysview including all technical aspects Objects can be exported and imported in XML format. There are tools supporting replication via HTTP, tarballs, XMPP, and the CouchDB replication protocol JSON serialization is the standard way of accessing the repository. CouchDB replication protocol supports full synchronization between instances
versioning Checkin/checkout model to create new versions of nodes, optionally versions complete sub-trees, supports branching of versions. No versioning All versions of content are stored and accessible separately, no branching
locking Nodes can be locked and unlocked Objects can be locked and unlocked
object mapping Not in standard, but implemented in Jackrabbit. Rarely used in practice. Object mapping is the standard way of accessing the repository All content is accessed via JSON objects
queries In JCR1 Sql or XPath, in JCR2 also QueryBuilder. Query Builder Javascript map/reduce
access control Done on repository level, i.e. all access control is independent of application. In Jackrabbit: pluggable authentication/authorization handlers. No access control in Midgard repository, usually implemented on application level. Midgard proves a user authentication API No access control
persistence In Jackrabbit different Persistence Managers can be plugged in (RDBMS, tar file, ...) libgda allows storage to different RDBMS like MySQL, SQLite and Postgres CouchDB has its own storage
architecture Jackrabbit: library (jar), JEE resource, OSGi bundle or standalone server Library Erlang-based daemon
APIs Standard: Java-based, PHP coming up. In Jackrabbit: also WebDAV and HTTP-based API C, Objective-C, PHP, Python HTTP+JSON
full-text search Included in repository. In Jackrabbit: Lucene bundled No (SOLR used on application level) Plugin for using Lucene, not installed by default
standard metadata All nodes have access rights, jcr:primaryType and jcr:mixinTypes properties. JCR 2.0 standardizes a set of optional metadata properties. All objects have a set of standard metadata including creator, revisor, timestamps etc No standard properties

Sponsored links

Microsoft Certification Exams โนเกีย Nokia มือถือ Online Project Management Association Website Software
collaboration software save money using, phone card Reviews มือถือ Mobile All Apps

Raise the hammer! Midgard2 Mjolnir goes live

Posted on 2009-11-18 13:21:07 UTC in 60° 10.272 N 24° 55.956 E Helsinki, FI to .

Mjolnir, the new major release of Midgard2 Content Repository is now out. Named after the hammer of Thor, this release finally provides a real content repository that can be used by both desktop and web application developers.

mjolnir-narrow.png

In addition to being a GObject-powered content repository for PHP, Python and Objective-C, the Mjolnir release provides several significant goodies on top of the older Midgard2 series:

We've been testing running the Qaiku microblogging service with Mjolnir. The exactly same PHP code that we used with Midgard 8.09 LTS performs 20-60% better when running on Mjolnir.

Get Midgard2 9.09 Mjolnir while it is hot! Builds for various Linux distributions are already starting to hit OBS repositories...

In defence of URLs and the Open Web

Posted on 2009-11-17 19:19:36 UTC in 60° 10.272 N 24° 55.956 E Helsinki, FI to .

An increasing number of web services and applications are emphasising search terms or pre-selected websites instead of allowing users to enter any address they choose. This is worrying, as while searches are more user-friendly, URLs are the heart of an open web where anybody can publish without obscure business dealings or oppressive app store policies.

There are many examples of this happening, from Facebook's framing of web to netbooks systems like the JoliCloud not having an address bar. Certainly many companies are looking at Mozilla's search engine revenue and Apple's app store model and want to emulate that, moving the web into silos of their own control. But at the same time, we're thinking of Linked Data and open, interoperable web standards.

Web indeed is at new crossroads.

Chris Messina predicts the death of URLs:

a future without URLs and without the infinite organicity of the web frightens me. It’s not that I know what we’ll lose by removing this artifact of one of the most generative periods in history — and that’s exactly the point! The URL and the ability for anyone to mint a new one and then propagate it is what makes the web so resilient, so empowering, and so interesting! That I don’t need to ask anyone permission to create a new website or webpage is a kind of ideological freedom that few generations in history have known!

Tim O'Reilly presents a call to arms:

It could be that everyone will figure out how to play nicely with each other, and we'll see a continuation of the interoperable web model we've enjoyed for the past two decades. But I'm betting that things are going to get ugly. We're heading into a war for control of the web. And in the end, it's more than that, it's a war against the web as an interoperable platform. Instead, we're facing the prospect of Facebook as the platform, Apple as the platform, Google as the platform, Amazon as the platform, where big companies slug it out until one is king of the hill.

And it's time for developers to take a stand. If you don't want a repeat of the PC era, place your bets now on open systems. Don't wait till it's too late.

 

Midgard Weekly Summaries are back

Posted on 2009-10-02 12:39:48 UTC in 60° 10.272 N 24° 55.956 E Helsinki, FI to .

Midgard is a very active free software project, and it is quite difficult to keep up with all the changes, decisions and discussions happening around it. Therefore I decided to bring the Midgard Weekly Summaries back.

MWS has been running before, with 66 issues released between 1999 and 2002, and 8 issues in 2007. This time we follow the idea of a Collaborative MWS.

Notices about new published summaries will be sent to the Midgard user mailing list, Qaiku #midgard channel, Twitter @MidgardProject and are available via RSS. Enjoy!

Technorati Tags:

Fall conference schedule

Posted on 2009-09-27 15:58:47 UTC in 60° 10.272 N 24° 55.956 E Helsinki, FI to .

After a brief summer motorcycling break the fall is shaping up to be quite full with conferences. Here is the current list:

Explaining signals at Gran Canaria Desktop Summit

Looking forward to all the interesting discussions and ideas that will surely come up from these events. If you will be around in one of those, make sure to look me up and we can chat. The events will also be covered in my Qaiku stream.

Content management starts with the repository

Posted on 2009-09-07 14:16:47 UTC in 60° 10.272 N 24° 55.956 E Helsinki, FI to .

Gadgetopia makes an argument for building your own CMS:

"See — the problem with a full scale Content Management System is that it has too many opinions. Those opinions were though of by somebody other than you and the needs of your organization. The more developed a content management system (or any piece of software, really) the more “opinions” it has. And the more “opinions” it has, the more likely one of them is going to really tick you off."

I can relate to this. We work with one system in particular that makes an astonishing array of presumptions about how you’re going to use it, and if you try to step outside those presumptions, demons fly out of the abyss and try to suck your eyeballs out.

This goes back to a previous discussion we had about Content Management as an API. In that post, we had a great experience with hand-rolling a CMS...

The term they are looking for is Content Repository, a service that provides common APIs for content storage, retrieval, signaling and so forth. With Midgard we're following this approach, providing content retrieval and web functionality APIs first, and then building some reusable user interfaces on top of that.

In addition to Midgard some content repositories to look at include Apache CouchDB and Jackrabbit. All of them allow you to stop worrying about storage and retrieval methods and focus on the actual end user functionalities, while keeping the whole system accessible and scriptable for integration purposes.

Technorati Tags: , ,

How Midgard and Midgard2 differ

Posted on 2009-09-04 07:57:34 UTC in 60° 10.272 N 24° 55.956 E Helsinki, FI to .

I had to make some updates to the architecture diagrams, and I thought to publish them here to showcase the difference. Midgard was a CMS framework for PHP:

Midgard 8.09 architecture

Midgard2 is a more universal content repository where CMS is just one application:

Midgard2 9.09 architecture

Please note that more choice in databases and web servers is not the only goodie provided by Midgard2. You also get things like a completely rewritten MVC framework, database views, transactions and native datetime objects. And all of this for multiple programming languages, not just PHP.

Technorati Tags:

Content repository talk in FrOSCon

Posted on 2009-08-07 19:57:56 UTC in 60° 9.792 N 24° 55.662 E Helsinki, FI to .

Content repositories can be useful for your application. In the PHP track of FrOSCon on Aug 22nd there will be a talk about this: Midgard2: Content repository for your PHP application

Content repositories allow you to separate the actual front-end of your application from background processing tools. More than just their underlying databases, they impose common rules for data access, and keep multiple applications up-to-date on data changes through signaling. Midgard2 provides a flexible content repository that avoids the restrictions of the traditional ORM approach. And not only your PHP web application, but also to possible Python, Objective-C and C# tools you use.

This enables you to split applications into smaller, easily maintainable and scalable pieces that can be run on different systems and platforms as needed. In addition to web, the Midgard2 library can be used for desktop and mobile application development, building software that synchronizes with web services. It is based and engineered fully on the top of the desktop (GNOME) software stack. Being highly modular and having very little dependencies it scales from a note taking application to a full-blown CMS system. Combined with advanced replication capabilities it allows you to synchronize data between offline and online instances of your service.

Unlike shown in the program, the talk will be given by Arttu Manninen this time, as I will be off motorcycling somewhere around Asia Minor. Arttu? Yep, this guy:

Tank surfing

Technorati Tags: , ,

Will content repositories kill the file?

Posted on 2009-07-30 17:10:24 UTC in 60° 10.524 N 24° 55.146 E Helsinki, FI to .

MDK laments the demise of the simple file in the onslaught of storage services:

Sure, the applications still give you a way to share things and take them out of the storage. You can export a contact out of your address book as a vcard file. But the role of The File here is slowly being reduced to a role of an intermediate storage medium. The business card is temporarily put in the .vcf file before it gets injected into somebody else’s database (another address book?).

As more and more applications operate on databases, the computer is becoming a monolithic black-box that “has things”. How exactly (and where) the data is stored is becoming less clear. The application and the interface becomes united with the user data. It becomes one.

This echos the sentiments of Alex Payne when he warned against what he calls Everything Buckets:

Computers work best with structured data. Everything Buckets discourage the use of structured data by providing a convenient place to commingle “structureless” data like RTF and PDF documents. Rather than forcing the user to figure out the rhyme and reason of their data (for example, by putting receipts in a financial management application and addresses in an address book), Everything Buckets cry: “throw it all in here! Search it! Maybe I’ll corrupt my proprietary database, but maybe I won’t and you’ll have the joy of sifting through a mire of RTF documents. Doesn’t that sound great?”

And yes, I agree that obscure application-specific databases are not really better than obscure proprietary file formats.

This is exactly why I've been talking about content repositories, services like Midgard2 and CouchDb that not only can provide superior content storage and organization, but do it in a way that multiple applications can share. You can easily write your own scripts to perform batch operations on the data, and receive D-Bus notifications when something changes.

And good repositories also provide easy synchronization tools so you can have your data available on all of your computers, and even on the web. If they can also do peer-to-peer sharing, we're close to achieving the fully free cloud.

Technorati Tags: , ,

Why you should use a content repository for your application

Posted on 2009-07-08 11:37:50 UTC in 28° 7.752 N 15° 27.078 W 5km NW of Las Palmas de Gran Canaria, ES to .

Midgard2

I gave my Midgard2: Content repository for desktop and the web talk yesterday in GCDS. The slides are available on SlideShare. The main idea was that any application that deals with structured data could benefit from using a content repository like Midgard2 or CouchDB.

So, what is a content repository? It is a service that sits between an application and a data store. It provides several advantages:

  • Common rules for data access mean that multiple applications can work with same content without breaking consistency of the data
  • Signals about changes let applications know when another application using the repository modifies something, enabling collaborative data management between apps
  • Objects instead of SQL mean that developers can deal with data using APIs more compatible with the rest of their desktop programming environment, and without having to fear issues like SQL injection
  • Data model is scriptable when you use a content repository, meaning that users can easily write Python or PHP scripts to perform batch operations on their data without having to learn your storage format
  • Synchronization and sharing features can be implemented on the content repository level meaning that you gain these features without having to worry about them

Midgard2 is a content repository library that is built on top of glib, libgda and dbus, making it fit the general free desktop infrastructure very well. You can use it in any application that is written in C, Objective-C, Python, PHP, or soon Mono. Learn more from the slides!

Technorati Tags: , , ,