Version control in Drupal 6 using the Features module

Warning: in this article I assume that you understand the concepts of modules and hooks in Drupal, and that you also know the Views module, which is used as an example. It also assume that you are at least somewhat familiar with the general principles of software version control and are looking to apply them to developing Drupal 6 sites. It is also advisable that you have a brief look at the monkey-see-monkey-do screencast about the features module, which currently substitutes comprehensible documentation.

In 1972 programmers realized that efficient collabaration within software projects is difficult without tools that efficiently support exchanging and comparing changes made to source code. In that year the first source code control system was constructed. With the later release of CVS, which remains popular until today, the problem has pretty much been solved in a satisfactory way (though Mr. Linus "Git" Torvalds and others like to disagree).

All traditional version control tools exploit a simple observation about the nature of source code evolution: most of the time, in most well-structured programs, meaningful changes are represented as alterations of consecutive lines of source code. Changes, and their effects, tend to be local. Most programmers also naturally work and think in terms of changing lines of source code within files. Thus, all you need to satisfy most programmers is a system which quickly and conveniently presents which lines have been changed by whom, given two or more points in time. With that ability, you can also package sets of changes into compact files called patches and then apply them to an original version of a system to reliably reproduce someone else's work or store away to track individual contributions (and blame for bugs).

The idea easily generalizes to tools for managing configuration items rather than source code. The term configuration item is rather related to ideas such as "configuration of electrons in an atom" or "configuration of stars", not to be confused with the idea of configuration files. It refers to any artifact which might change over time and changes to which must be propagated between individual developers or systems. If changes to configuration items are also local, it is most straightforward to map them to simple text files and use a traditional version control tool to manage them. The opposite, bad approach, would be to convert simple text files into something that the version control tool cannot effectively deal with (something neither text- nor line-oriented) and thus to produce a classic configuration management problem.

Fast forward to 2009, Drupal 6 (and other Content Mangement Systems, to be fair). It appears that in this new context young people are now making some noise about the SCM trap. Putting configuration items, such as Drupal CCK content types, views, permissions settings - just about anything - into a relational database reliably defeats traditional version control tools. It becomes as cumbersome to distribute and coordinate changes to these items as it used to be for source code changes pre-1972. Yet, luckily for everyone, approaches and tools for solving this age-old problem are also being (re-)invented. The unfortunately named features module for Drupal 6 (SEO, anyone?) provides a workable solution, which I elaborate in the rest of this article.

However. before we go any further, a word of caution regarding terminology is in order. The established term "configuration item" (by Professor W. Tichy, during the 80's) is apparently unfamiliar to or not good enough for inventive young Drupal developers. They like to call configuration items "exportables" or (much worse) "components". This is the terminology found in the original code of the features module, the documentation, and the API. You better get used to it if you want to understand these sources, but I will try to shield you from confusion as much as possible by academically sticking to configuration items in this article. It is intriguing why so many otherwise intelligent and dedicated people choose to pollute the world with lazy rubbish terminology. In case of key projects it is likely to then become perpetrated by hundreds (thousands?) of unsuspecting follower-developers. A worthy topic of research for a psychologist, no doubt. But I digress.

The Drupal features module allows you to

Export configuration items from the database to a set of text files. (Ideally, one file per configuration item should be used to reflect the locality of changes and to reduce the risk of file-level conflicts. Grouping of similar configuration items in one file is also acceptable, and is indeed employed by Drupal's features module.)
Exchange and manage contents of the exported text files using traditional version control tools.

At this point in our design reconstruction, different approaches seem viable. An intuitive one, which I and my colleagues have been successfully using in another (non-Drupal) project for a few years, is to provide a symmetric import operation which loads the (now merged/resolved) configuration items from the file system back into the database. However, the solution used by Drupal's features is even neater, as it eliminates the need to perform manual imports (and thus also the danger of forgetting an import):

Use the exported files as the authoritative (live) version of the represented configuration items...
...unless another, more up-to-date version of those same items exists in the database. In this case, treat the database version as authoritative. That is, database settings override, or take precedence over, file system settings.
Make it possible to re-export the modified configuration items from the database. Consequently, the exported files' content becomes identical to the database, and so again, these files may be treated as authoritative.

In summary, people are supposed to edit configuration items in the database and always re-export them before synchronizing with the version control repository. In this way their changes become available to others and others' changes can be merged in through the file system, while the GUI used for editing may still be implemented in the browser (making it more or less comfortable, but certainly more fool-proof than source code editing).

A specification of which changes are actually exported (i.e. "belong to a feature"), is maintained within a (custom) feature module; not to be confused with Drupal's features module, which is used to generate the former one. A custom feature modules furthermore includes a plain-text version of the actual configuration items, not just references to them. More precisely, a source code representation of those items is kept within the feature module. By the way, note that a feature module is in itself a configuration item represented in the file system and thus readily accessible to version control. However, it is not a configuration item in the sense of possibly having a shadow copy within the database. It's more akin to a normal Drupal module. However, as mentioned, its content is generated by the features module and it also does contain some additional metadata that allows the features module to treat it as its own creation when it comes to re-exporting.

Asking inconvenient questions early is a virtue. Given the astonishing number of Drupal modules, each of which may contribute their own configuration items to a web site by introducing its own database tables, or even worse, extending existing ones, how does the features module find out how to export all those alien configuration items? And how do those alien modules know how to use the disk versions stored within the exported feature module instead of looking for versions in the database? Under which conditions is an exported configuration item reusable across different versions of an originating module? Is it even sufficient to check that the originating module is installed in the right version, or must we ensure that other supporting modules are also available on an item-per-item basis? More generally, is the beta version of the features module ready for use in real projects. Does it introduce any significant risks (such as data loss, lack-of support time bombs), glitches or limitations? To address these issues, in lack of case studies, a closer look under the hood of the features module is necessary.

To see what happens (and where and why things might go wrong), let's walk through a scenario of creating and using a feature which contains just a couple of view definitions:

To begin creating our feature, we fill out the export form located at /admin/build/features/create. Here we see an "Add components" combo box filled with entries such as "Content types", "Permissions", "Views", "Dependencies". In short, what is referred to as components here is really the kinds of configuration items that can be exported as part of a feature, Elsewhere, the developers carelessly use the same bland term component to refer to an individual configuration item, thus committing the ~~modelling shortcut~~ embarassing error of confusing types and instances, and wreaking havoc in minds of their readers. To populate the combo box, the features module invokes hook_features_api.
Interestingly, implementations of this hook are not contained within the contributing modules, but rather within the features module itself. This might be interpreted as a commitment of the features module's maintainers to keep up-to-date with changes in important (core) third-party modules. It might be also interpreted as a non-commitment of the third-party modules' authors to play nice with features. As the features module is rather new, the choice by its original inventors to put the burden of first-time integration on their own shoulders is understandable and smart from the division of labor viewpoint. In the long term, it might represent a weak point of the overall scheme and a stability risk, especially given the currently inadequate amount of examples/documentation for third-party integration.
Anyway, we already see an answer to one of the questions posed above about how the features module knows what to export. It either includes direct support for a particular kind of configuration items, or (preferably) relies on third-party module authors to describe their own kinds of configuration items through an API.
When we select a kind of configuration item in the "Add components" combo box, a list of individual exportable configuration items of that kind below is displayed. The content of this list is generated by hook_features_export_options. In case of views, the features module implements this hook directly to return names of all enabled views. This hook also provides an opportunity for the implementor to influence in which file the exported configuration items will be stored withing the feature module (e.g. in the same or separate file from configuration items of the other kinds). The views-specific implementation opts to store all exported views definitions in a single, separate file.
Suppose we click on a particular view in the list. Through an ahah callback the server is now consulted about further details of the selected configuration item, and the right-side pane of the form is updated with those details. More precisely, hook_features_export is invoked with a list of identifiers of all currently selected configuration items (e.g. view names). An empty output container is also passed in as a parameter. The hook's implementation is supposed to transfer the input list to the output container. At this stage additional items, not present in the original list, might be added. Furthermore, module dependencies of the selected configuration items are determined and recorded in the output container.
The provision of dependency recording gives a (partial) answer to the question of reusability of the exported configuration items in new contexts. The capability to record module dependencies allows the features module to disallow enabling a feature which contains configuration items not supported by available modules (or alternatively, to prompt the user to download the required modules).
The hook implementation may also designate follow-up functions to be called after return to continue the process of filling in the output container. This mechanism is used if the exported configuration items in fact consist of sub-items, which must be handled by other modules. In case of views, there is no need for such a cascade. The implementation simply transfers the requested view names to the output container and records names of modules on which the associated view definitions depend.
When we finally submit the export form, hook_features_export_render is called to query implementing modules for the actual exported source code to be stored within the feature module. The views-specific implementation calls the export method on each view to obtain this source code. Generally, it's entirely up to the implementor how each configuration item is represented in the source code. The views-specific implementation generates source code that populates and returns an array of view definitions.
Besides of the source file which holds all configuration items of a kind, the features module also generates three other files: an .info file, which includes a description of module dependencies, a boilerplate .module file, and a .features.inc file. This last file is of particular interest, as it defines a single hook for each kind of configuration item that was declared using hook_features_api (the same hook used to populate the combo box in first step, remember?). In case of views, the generated hook's name is declared to be views_default_views. This hook is what the views module will later call back to look for any exported views definitions. The hook's code is boilerplate - it delegates to the real function, which returns the array with configuration items (view definitions, in this case).
Having generated all those files, the features module packs them together and offers the resulting feature module for download and installation. The installation (enabling of features) occurs through the /admin/build/features page, rather than the usual modules page.

Thus far, we have roughly examined what happens during (re-)creating or (re-)defining a feature. Now it's time to look what happens when a feature is enabled. As it turns out, not so much. The function features_install_modules is called with a list of all feature modules that you have chosen to enable. This function silently ignores all already enabled feature modules and only processes the additional ones. For each new feature module, it determines a transitive closure of dependencies and enables all those required modules if need be. The new feature module itself is also enabled.

The next step is to examine how modules that own and export configuration items utilize their exported representations. More particularly, how does the views module determine whether to use a view definition stored in the database or the one contributed by a feature module through the file system? The solution is rather simple and implemented with little support from the features module. As already mentioned, a hook (views_default_views) hook is generated during the export. The views module calls this hook to find out which views are represented as exported configuration items (aka "default views"). Depending on whether or not a databased version of a view is also available, it is returned from views_get_view instead of the default one from the file system. When a view is updated, a version of it is created in the database, possibly using the default view as a prototype. No surprises.

Finally, let's tackle the remaining few operations that can be performed on an enabled feature: reverting, rebuilding (aka recreating or re-exporting), and disabling:

Reverting a feature means restoring all configuration items to their original exported state. Any changes made to the database versions of the configuration items are undone. The implementation within the features module simply calls the hook feature_revert and lets the modules that own the configuration items do the job. In case of views, the database version of each view included in the feature is deleted, thus giving precedence back to the default view. However, according to my tests, the views module seems to remember whether a view has a database version equal to the file system version originally, and in such case does not delete the database version during feature_revert, though you can delete it using the Revert action in the views GUI.
Recreating a feature means re-exporting the current versions of all included configuration items. The only difference from the original export is that now you don't have to choose which item to include in the feature (although you are given this opportunity). The process gives you a new .tgz feature module with which you should overwrite the installed module (unless it contains external changes, in which case you should merge both using your version control tool).
Disabling a feature is quite simple - it just disables the feature module, making the database versions of the configuration items all that remains active. A word of warning: if after enabling the feature you remove the database versions of the configuration items and subsequently disable the feature, you will effectively remove these configuration items from your site. For example, if you revert a view to only rely on the file system version, and disable the enclosing feature, the view will be gone (until you re-enable the feature, that is). This is to be expected. Also, if you choose to recreate such a disabled feature, the set of exported configuration items will not match the original one - in particular, the views which don't have versions in the database, will not be re-exported, unless you explicitly add them again to the export set.

Having provided (admittedly roundabout) answers to the questions posed in the beginning of the article, let's turn in conclusion to the bigger issue: is the features module safe to use today?

Obviously, an answer would benefit from more real-world experience to better estimate the frequency and severity of bugs. Still, even after a short examination of the overall design (and some peeks at the source code as well), it appears to me that the core concept of features is solid and ready for prime time. The unfortunate terminology and confusing documentation are flaws, but they are not insurmountable and certainly not show-stoppers, judging from the quality of other popular Drupal modules. I shall give the features module a serious try in my project to alleviate the pressing need for conveniently exchanging views and content types definitions between developers and tracking changes. One apparent weakness is that at present some important kind of configuration items might not be supported (e.g. localized strings). However, the features module can be augmented with other export-to-file-system approaches where they are available (.po files), and with background information included in this study, integration with new modules should also be relatively easy.

Having written this article, I actually started using the Features module in one project to keep the development, test and production sites in sync. The following updates document some discovered glitches and "lessons learned".

Update 2010-04-18, uploading updated CCK content types to production

I discovered the following caveat when using features to synchronize CCK content types from development to production: when you add a new field to a CCK content type, recreate the feature, then upload the updated feature module to a production site, the new field is not added to the content type immediately (neither is it added to the content type's database table). Instead, the feature appears marked as "needs review" and you have to "revert" it manually in order to make changes active. I would expect the "needs review" state to only occur if there have been independent changes to the feature's configuration items at the production site since the last upload, that is, when there is room for conflict, so this behavior is surprising.

Update 2010-05-12, added CCK fields not included in feature automatically

Suppose you have an existing feature which contains a content type. You add a new field to the content type in your database. What state would you expect to be reported for the feature then? Well, I would like it to be treated the same way as the case of a view to which I added a new field - the feature should show up as "overridden", and recreating the feature should export the new CCK field. Yet it doesn't. Apparently you have to add the new field to feature.info manually for this configuration (sub)item to be included in the feature at all.

I think that the reason for the current state of affairs might be the developers' speculation that a single content type may be governed by more than one feature (as well as the user herself). However, the same could be said about fields of a view. In any case, the current behavior is inconsistent.

Update 2010-05-12, modules added as dependencies not enabled automatically

Suppose you enable a module in your development site and add it as dependency to a feature. You then upload your feature to a production site which has the module in disabled state. What would you expect to happen when you revert the feature to "default" state? I would expect the module to become enabled (or some kind of error message to be displayed, had the module been unavailable). However, what happens instead is that the module stays disabled and - worse - can no longer be enabled manually because it is "locked" through the feature. You have to remmber to enable the module manually before uploading the updated feature which depends on it. This is exactly the kind of thing which you should not have to remember when relying on a configuration management tool.

11 comments:

Andy Lowe said...: Well written. Thank you for an interesting read. I second the idea that "features" is a poor name choice from a SEO perspective. Maybe call them "configuration items", "solutions" "outlines", "structures". Something besides a term already commonly searched for in relation to Drupal. Sometimes I don't envy Google's task :); April 25, 2010 at 5:42 AM
Asaph said...: Excellent article Jan, thank you.; April 25, 2010 at 11:12 PM
Anonymous said...: Marvelous; June 8, 2010 at 5:10 AM
Matt said...: Thanks so much for taking the time to write this... I've been doing a lot of research on config management in Drupal, and there's a lack of in-depth info like this. Your article helped out a lot with understanding the technical underpinnings of Features.; June 22, 2010 at 10:42 PM
jpl said...: You are welcome!; June 22, 2010 at 11:33 PM
Anonymous said...: Loving it ! - well written - good insight, and some food for thought.; June 24, 2010 at 2:41 PM
Kolier said...: Very helpful.
Thanks, Jan.; August 20, 2010 at 12:55 PM
Anonymous said...: Anyone who seriously wants to be using the Features module should read this. It's the best analysis so far and probably will be for a while; September 9, 2010 at 8:50 PM
Jon said...: Great article!

One thing to note - features 6.x-1.0 is still a bit buggy. Really wrestling it at the moment.; October 18, 2010 at 6:41 PM
john said...: Thank you for a great writeup.

Any additional thoughts since your last update?; November 5, 2010 at 9:41 PM
jpl said...: @john: I haven't gathered new insight since the last update, as the project in which the Features module was used got suspended. My interest in this module is possibly going to be renewed some time in December or 2011 due after a D5->D6 migration of another Drupal project.; November 5, 2010 at 9:47 PM

plosquare.com blog