Archive for the ‘best-practices’ Category

Three Ring Circus: Making Content, Code, and Configuration Work Together

August 24, 2011 3 comments

Since I started working at Magnolia, I’ve been asked one question over and over, more than any other:

“Can I please have some money?” This is because I’m a father.

The next most common question I’ve been asked, however, is this:

“I have content authors writing content on my production system. I have developers creating Java and Freemarker code on their own machines. I have system administrators setting up and tweaking configurations on our test server. How can I possibly get all the changes coming from all different directions where they need to be?”

This is a challenging problem, but — unlike convincing a teenager to wash the dishes without grumbling — not an insurmountable one. There are several approaches one can take; here’s one that we eventually settled on at Texas State University during my tenure there, and which worked quite well for us.

To start with, we need to decide what the canonical location for each of these sorts of information should be:

  1. Content: Since the content that site editors enter is the most up-to-date at any time, the production Author or Public instances of Magnolia are the places we’ll look for this.
  2. Code: Custom functionality in Magnolia is wrapped up into Modules, which are then compiled by Maven into JARs. Because these are built out of standard Java code, they’re usually managed with standard source control tools like Subversion, Git, or Mercurial, and the repository is the place to go to find the “real” version of the code.
  3. Configuration: Each system running Magnolia, be it a production instance, a testing instance, or a development instance, will likely need some different configuration tweaks. (They might use different SMTP configurations, for example, or a different persistence manager.) In this case, where you store these data is less clear-cut, but since it’s not necessarily shared among instances, it’s reasonable to store it on each box in some place outside of the web app (so that it’s not erased when Magnolia is redeployed).

Now that we’ve identified the various bits that we need to pull together and where they live, let’s look at how we might assemble things on the various instances of Magnolia. Here’s a pretty typical deployment:

First off, let’s discuss the development instances. These are going to be the most changeable, as developers configure and reconfigure things to handle the need of the moment. (You know how developers are!) Code in this case comes from Subversion (or whatever SCM system your team uses), combined with whatever changes the developer is working on. Content can either be the sample content that’s distributed with Magnolia, or can be bootstrapped or imported from content exported from a production instance. Configuration may be left with default values, or can be overridden by the developer in a variety of ways.

Helpful hint: When developing a module, sometimes you may want to have the module’s configuration rebootstrapped without having to delete all of the content and start the instance from scratch. You can do this by deleting the module’s configuration node at Configuration -> modules -> [module name] and then bouncing Magnolia. The system will see the module’s JAR, but will think it has never been installed, and will dutifully go through the bootstrap process again.

Next, the test instance. Here we want to have the latest code the developers have committed combined with up-to-date content from content authors. To achieve this, our Continuous Integration system watches the source code repository, and whenever new versions of our module code are checked in, it rebuilds the WAR, deletes the content repositories on the test system, and deploys the new WAR there. (You do have a Continuous Integration system, don’t you? If not, set one up! It’s one of the best things you can do for your team’s development workflow. We like Hudson, but there are lots of great ones out there.)

But where does the content come from? We’ve set up an automatic process to fetch an XML backup of the content in our production instance each night and to store it in a known location. (This can be as simple as having cron call wget each night with the path to Magnolia’s export servlet.) By setting the magnolia.bootstrap.dir property in to the directory where that content export is located, our instance will bootstrap this exported content along with all of its other initial setup tasks. We recommend a path outside of the web application directory in this case, so that any content you place here will survive redeployments.

Finally, let’s consider configuration. For things like SMTP that are set up in the JCR, it’s pretty simple to export those configurations to XML and put those in the bootstrap directory along with the content. Note, however, that this works best if you get down to the individual items that you want to have imported. For example, if you have a particular set of users you want bootstrapped, it’s best to export each of those users individually. And if you want particular settings to be used everywhere, export those settings — not the whole Config tree, since there are other settings in that tree that won’t be the same across all your instances.

But what about items — like which persistence manager to use — that have to be specified in For that, you can actually specify a different properties file for each of your environments, and have it applied automatically. (It’s like magic!)

The trick is to use a properties file that corresponds to one of these naming conventions:


When it starts up, Magnolia marches down this list looking for properties files that match. ${servername} is the host name of the machine that Magnolia is running on, while ${webapp} reflects the application context in which it is deployed. Thus, if you’re running an instance called “magnoliaAuthor” on a box named “production”, Magnolia will check the following paths:


Even better, properties defined in the more-specific paths will overwrite properties defined in the less-specific paths. Thus, you could define your persistence manager as Derby in WEB-INF/config/default/ and as Oracle in WEB-INF/config/production/ The result would be that Derby would be used everywhere except in production, where Oracle would be used instead.

(For more information on this mechanism, see the article on Using a Single WAR File with Multiple Configurations on our Documentation site.)

Finally, let’s take a look at our production system. This is the canonical location of our content, so we’re always up to date here. And since this is a production system, and you’re an extremely clever person, you have your content in a RDBMS, which ensures that the content is safely retained when you deploy new versions of Magnolia or your module.

The Code for your module should also make its way to the production system through your continuous integration environment. (This ensures that the version you’re testing is the same version that gets deployed — always a good thing.) You can entirely automate the process if you’re a trusting sort, or simply have the CI environment dump your module’s JAR file somewhere for you to copy over yourself if you’re of a more skeptical nature.

“But I’ve added a new paragraph in my module! How does it get set up?” Well, I’m glad you asked, Hypothetical Reader! Back around Magnolia 3.5, Magnolia introduced the Version Handler, a mechanism for automating version upgrade tasks. This has been a tremendous boon for deployment, as you might imagine.

Before, we maintained a manual list of things that needed to be changed when a module was updated, and had to go in and perform those tasks manually, usually at three in the morning, under time pressure, and while possibly hallucinating due to lack of sleep. (We saw a porcupine in the parking garage after one particularly grueling deployment. Surprisingly, it turned out to be real.) With this reproducible mechanism, we can do dry runs of our upgrades first thing in the morning (around 11:00am — we’re programmers, after all) while we’re still fresh and have a high degree of confidence that we’ll have flushed out any problems before we try to upgrade our production system.

The best practice here is to tag your code in your SCM each time you deploy to production. Then, when you want to try out your new code, you can build the previous deployed version, bootstrap it, and then try running your upgrade against that instance on a machine where it won’t get you fired when things go horribly wrong.

Lastly, configuration which is stored in the JCR will persist across deployments, just as content does. Anything that needs to be set in can be set up using the same per-host, per-webapp configuration mechanism that we discussed for the test environment.

And that’s it! A soup-to-nuts setup that lets everyone do their jobs, and gets your information where it needs to go. Of course, you may want to tweak this at times for various reasons — for example, if you’re developing a new feature and want to add content, you might make your test server the canonical location for content temporarily — but this should give you a good idea of where to start getting all of these pieces working together.

If you have any questions about this configuration, don’t hesitate to leave me a note here, email me at, or to send a carrier pigeon. (Warn me first, though, so I can lock up the cats.)