Monday, March 10, 2014

The Complexities of Configuration

The Complexities of Configuration

In this blog I start to write some blogs about configuration. I will start rather general and then dive deeper into several topics as useful. I do not reclaim scientific correctness, so be polite with me, if I miss a point. Nevertheless I think, and hope somebody else finds it interesting. Feedback of any kind is always welcome, so we can have also some kind of two way communications here...

Understanding Configuration

General Aspects

Basically one might ask, what configuration is at all. When looking at the a  computation model, where some input is converted to some output, configuration can be seen as some kind of control flow, which affects the transformation. Nevertheless configuration is not equal to the program converting input to output. Configuration is more like a constraint recipe that tells the program in place, what to do, but only within the boundaries of the program allows to be configured. Obviously, if the configuration is so powerful, that it is capable of performing any task, it is questionable, if this should be called configuration (it may be called more a script or recipe). 
So summarizing configuration
  • should be constrained and limited for purpose.
  • must be interpreted by some algorithmic logic

Configuration is an API

Configuration is not there just for fun. With configuration your program logic defines an API, which clients interacts with. If you change it, you will break the contract. If you replace configuration, you must deprecated it. But things get worse. With code you have a compiler that flags out deprecations and will fail if pieces do not fit together anymore. With configuration you do not have any such tools.
As a consequence, like with APIs, you must think on what should be configurable. In general, similar as when designing programmatic APIs, reduce your API footprint to an absolute minimum. Frankly speaking, if something is not really meant to be configured or very complex to configure, or even very rarely used, consider to make it non configurable at all. Instead of ensure the component is well encapsulated as a Java artifact, so customers still can replace it with their own version if needed.

Configuration Types

When thinking on configuration types there are a couple of things that are commonly used to "configure" a program:

  • command line arguments
  • environment properties
  • system properties
  • files and classpath resources, using different formats; including standardized deployment descriptors as well as vendor specific formats
  • databases
  • remote configuration services
  • ...
This is list is for sure far from being complete. Nevertheless there are some similarities in most cases you will find:
  • a configuration entry is identified by some literal key.
  • configuration values most of the times are literal values. As a consequence configuration can basically be mapped to  key/value pairs.
  • configuration most of the time is single valued, but sometimes also multi valued (e.g. collections).
  • often keys use a naming scheme similar to package and class names (though property names are typically in lower case), e.g. a.b.c.myvalue. Hereby myvalue can be defined as the parameter name and a.b.c can be named the parameter area.
  • theoretically configuration values (as well as keys) may be of any type. Nevertheless if we would not constrain anything, we are again struggling with complexity and overlapping functionalities with other standards, e.g. CDI, are the natural consequence.

Configuration Building Blocks

So given the list above configuration is not a monolithic thing. It is a composite of

  • different configuration providers
  • different configuration sources
  • Override and priority rules for resolution of ambigous entries
  • Filters and views for limiting access and ensure only the information required is visible
  • finally composition can be made in different ways:
    • unions, rendering redundant entries to according multi-value entries.
    • resolving unions, where overriding and prioritization mechanism resolve the entries are visible in the composite configuration
    • extending, where only additional entries not contained in the base configuration are added, but the (redundant ones) are ignored.
    • exclusive add, where only entries were taken up into the composite that are contained only in either of the base configurations, but never in both.
    • subtractive, where you will remove the entries from the base configuration, that are contained in the second configuration
    • ...
Additionally configuration
  • may be static
  • may be different depending on the current runtime environment
  • or even mutable to some extend (or at least updateable). I will cover this topic in a separate blog, since mutability of configuration implies a bunch of possible issues you may face, especially when running in a EE environment, where concurrency is a default case!
  • maybe public or may contain entries to be protected by security mechanism

Configuration Metadata

Configuration meta data allows to store and provide additional data that describes configuration. It can be scoped on:

  • to a compete configuration
  • a partial configuration
  • a single configuration entry
Possible meta data could be:
  • the data provider
  • any additional provider settings
  • the type of data source
  • the configuration data's sensitivity
  • the configuration data owner
  • the exact source of the data, e.g. the jar and file path, where a classpath resource was loaded from.

Configuration Model

So given the points above I will stick to the working assumption to constrain a Configuration to be nothing else than a Map<String,String> instance, with some additional metadata and an identifier, that identifies a Configuration:


public interface Configuration extends Map<String,String>{

    /**
     * Access the identifying key of a configuration.
     * @return the configuration's key
     */
    public ConfigId getConfigId();

    /**
     * Get the meta information for the given key.
     *
     * @param key the key, not {@code null}.
     * @return the according meta-info, or {@code null}.
     */
    public Map<String,String> getMetaInfo(String key);

    /**
     * This method allows to check, if an instance is mutable. If
     * an instance is not mutable most of the so called
     * <i>optional</i> method of {@link java.util.Map} will throw
     * an {@link java.lang.UnsupportedOperationException}:
     * <ul>
     *     <li>{@link #put(Object, Object)}</li>
     *     <li>{@link #putAll(java.util.Map)}</li>
     *     <li>{@link #clear()}</li>
     *     <li>{@link #putIfAbsent(Object, Object)}</li>
     *     <li>{@link #remove(Object)}</li>
     *     <li>{@link #remove(Object, Object)}</li>
     *     <li>{@link #replace(Object, Object)}</li>
     *     <li>{@link #replace(Object, Object, Object)}</li>
     *     <li>{@link #replaceAll(java.util.function.BiFunction)}
     *     </li>
     * </ul>
     * @return true, if this instance is mutable.
     */
    public boolean isMutable();
}

I am well aware that this looks quite simple, so I will refine this model in subsequent blogs for covering additional requirements with things like
  • extension points like queries and type adapters
  • Adding basic type support for JDK's standard types (boolean, characters, numbers)
  • enabling submodules
  • and more...

Configuration Locations

Separate Configuration from Code

An area of discussion is sometimes if configuration must be strictly separated from code. I will not join any of the sometimes religious discussion on that, but define some rules of thumb, when I think configuration should be separated and when it should be deployed along the code. 
  • Configuration that is internal only, meaning it is not meant being used by clients, should always deployed with the code, basically within the same jar, if possible. This makes sense since such configuration is highly coupled to the code.
  • Default configuration that may be overridden, should still be deployed along the code. This ensures the defaults are always visible, when the code is deployed (with an according configuration reading mechanism in place, e.g. that honors the same classloading boundaries). Also it is a precondition to let convention-over-configuration to work effectively.
  • In next step I would think of configuration that controls the overall basic deployment setup, but still targets rather general concerns. For example configuration defining which modules are loaded, depending on the current deployment stage is such a case. Such configuration, though it may be stage specific, will not be affected by changes within the current runtime environment. I would recommend to deploy such configuration also with the application, e.g. as part of the deployed ear or war-archives. Reason is, that I tend to see configuration also as a n (optionally stage specific) default configuration.
  • Finally there is configuration that targets direct deployment aspects and that may change for each single deployment, regardless if performed manually or in an automated cloud like environment. This configuration should be separated from the code, meaning independently deployed. Hereby there are several options how to achieve this:
    • Deploy the files required with ssh, sftp or similar to the target node, where it can be read.
    • Mount some specific area into the file system, where the files are locally visible, e.g. nfs etc.
    • Access configuration from a configuration server (Pull-scenario).
    • Open a connection and wait, for the configuration server to push the configuration required onto your node (push-scenario).

Add Configuration as Classpath Resources

Many people tend to see configuration as files that must be deployed to the target system. Nevertheless in case of internal and default configuration (refer to the previous section for more details), deploying this configuration type as files in a separate deployment channel also creates some possible issues:

  •  It is cumbersome if clients have to care about what additional configuration must be installed to get things running. They want to define the dependency on the library and start working with it. In practice this may be even worse, when different versions of the classes require different (default) configuration. Often then outdated configuration is then shipped with newer version of the component, which often end up in hard to find errors.
  • Also on the deployment side (DevOps) it makes the deployment bigger (more files to be deployed) and more complex, for configuration updates.
Whereas when configuration is deployed as classpath resources there are some real benefits:
  • The classloader hierarchy ensures the configuration is only visible, where it should be visible. There is less risk, that configuration from different deplyment levels (= class loaders) is mixed up.
  • Reading classpath resources is standard mechanism of the JDK, it is also possibly during very early points of server startup or logging initialization.
  • Reading classpath resources is relatively fast and also can be secured, if necessary.
But deploying configuration as classpath resources also has some disadvantages:
  • First of all, it is less transparent. Theoretically each jar in a 200 jar deployment can contain relevant configuration. To find all the relevant entries maybe very difficult, especially if no common configuration lookup mechanism is defined and each code, is looking up configuration at arbitrary locations.
  • Overriding may also be more complex. You can override a file deployed to some file system easily, whereas changing a file contained in a jar, basically requires exchanging the whole jar (we ignore other possibilities here).

Fortunately the disadvantages can be handled relatively easily by externalizing the concern of configuration reading and management into a dedicated configuration service. 

Using a Configuration Service

If you would let each code individually lookup the configuration you may end up in systems that hard to control because
  • you will have to know which code is reading and using which configuration, and have to look into the source code to see what is happening
  • configuration locations are scattered across your system
  • you will probably have to deal with several different formats

Core Functionality

Using a dedicated configuration service for reading and managing configuration has several advantages:
  • It allows to define a (or several) configuration meta model, defining
  • where configuration is located (CLI arguments, system properties, environment properties, classpath, filesystem, remote resources etc).
    • how configuration can be overridden (ordering of declarations, explicit priorities and overrides etc).
    • in what format configuration must be provided (properties, XML, JSON, ...)
  • manage the configuration read, depending on the current runtime environment and
  • optimize configuration access, e.g. by caching or preloading.
  • provide hooks for listening to configuration changes (new configuration added, configuration altered or deleted)
  • also such s service can provide additional meta data about configuration and configuration entries.

Extended Functionality

As a benefit, since a configuration service controls everything happening in the area of configuration, it can provide additional services:
  • It can intercept configuration access to ensure security constraints
  • It can configuration access to log which code is using what kind of configuration. This can also easily be used of configuration evolution, e.g. by writing warning messages when deprecated parameters are read.
  • It can include additional configuration sources and locations to a configuration transparently, without having to change any client code.
  • a configuration service can be made remotely accessible, so it acts as a configuration server (pull scenario), or
  • it can be triggered, so it pushes configuration changes, to the according remote instances (push scenario)
  • ...

Configuration Injection

We have seen that a configuration service can create huge benefits. Nevertheless we have to be careful. to avoid a hard dependency on the configuration service component. This would happen, if we access all our configuration using a service location pattern, e.g.

Configuration config =
       ConfigService.getConfiguration(MyConfigs.MainConfig);


Fortunately since Java EE 6 we have CDI in place, which allows us to transparently inject things, so we might think of doing thinks as follows:

public class MyClass{
  @Configured
  private String userName;
  
  @Configured
  private int userName;

  ...
}

The code snippet above does only depend on the @Configured annotation. All configuration management logic is completely hidden. Nevertheless the code above looks very compelling it has some severe drawbacks, which we will also discover in later blogs. Do you see them already?


2 comments:

  1. Please finish the JSR-354 first in a version although Oracle agrees with. Java Money is still an important topic for the JDK but still to complex IMHO. For the EE topic JavaEE Configuration - that was originally announced by Mike Keith (Oracle) on JFocus 2013 - I propose to have 'active' EE members within this JSR. Cheers

    ReplyDelete
  2. Hi Heiko. Please make comments that are related to this topic here. I don't think people want to read how you think things are going. Summarizing both of your assumptions are completely invalid. Please follow the GitHub repo on JavaMoney and you will see quite a lot of activity there. BTW you are very welcome (and also multiple times invited) to join one of our Hackergarten in Zurich and contribute to the JSR 354 TCK ;-)

    ReplyDelete