Thursday, March 20, 2014

Adding Additional Type Support to Configuration

Adding Additional Type Support to Configuration

In my previous blog I discussed roughly what configuration is. I also stopped on a working assumption that configuration mainly is a Map<String,String> type with additional metadata. In this blog I would like to elaborate, how non literal types can be supported. Summarizing I would define the following requirements:
  • it should be possible to access configuration as non literal type
  • all types contained in java.lang should be supported.
  • nevertheless arbitrary other types should also be enabled
  • it should be possible to register "converters"
  • it should also be possible to pass a matching "converter" programmatically
First of all we have to think about, what kind of functionality we want to add here to the basic Configuration interface (this is also the reason why converter is written in italic face above).
Basically adding type support requires a configuration entry's value, that is a String to be compatible with some arbitrary type:
 "...allows the interface of an existing class to be used from another interface."
This is exactly the GoF's adapter pattern. So let as define an adapter:

@FunctionalInterface
public interface PropertyAdapter<T>{
  <T> T adapt(String value);
}

Now we can extend
Configuration to allow access on configuration using such an adapter:

public interface Configuration{
   ...
   <T> T getAdapted(String key, PropertyAdapter<T> adapter);

}

Obviously this is a very minimalistic approach. Now thinking on all basic types defined in java.lang, it is a good idea to add corresponding convenience methods:


public interface Configuration{
   ...
   Character getCharacter(String key);
   Byte getByte(String key);
   Short getShort(String key);
   Integer getInteger(String key);
   Long getLong(String key);
   Float getFloat(String key);
   Double getDouble(String key);     

}

By default, I would suggest throwing a RuntimeException, if a value is missing, is a good idea, so these methods never will return null values. Additinoally it might b e a good idea to let also default values to be returned, so we add also the following methods:


   Character getCharacterOrDefault(String key,
                                  
Character defaultValue);
   Byte getByteOrDefault(String key, Byte defaultValue);
   Short getShortOrDefault(String key, Short defaultValue);
   Integer getIntegerOrDefault(String key, Integer defaultValue);
   Long getLongOrDefault(String key, Long defaultValue);
   Float getFloatOrDefault(String key, Float defaultValue);
   Double getDoubleOrDefault(String key, Double defaultValue);
   <T> T getAdaptedOrDefault(String key, Adapter<T> adapter,
                             T defaultValue
);
 

With the above signatures passing null as a default value is completely valid. So one might write:

Byte myNumber = config.getByte("minNumber", null);
if(myNumber==null){
   // do whatever needed
}

Though such code above is quite common, it might be worthwil to think on additional utility functionality, e.g. using JDK 8 features:

Configurator configurator = Configurator.of(config);
configurator.forByte("minNumber", MyNumbers::configure};

Though it would be interesting to investigate this area more, I would like to go one step back and ask, if a single String configuration value is always enough to create/implement every type of adapted interface. We have seen that a configuration entry is basically
  • a configuration key
  • a configuration value
  • configuration entry metadata
So to cover that we would extend our interface slightly as follows:

@FunctionalInterface
public interface PropertyAdapter<T>{
  <T> T adapt(String key, String value,
              Map<String,String> metadata);
}
 
But this is still not optimal, we would be much more flexible by passing the Configuration instance, on which the adapter is used, to the adapter implementation:

@FunctionalInterface
public interface PropertyAdapter<T>{
  <T> T adapt(String key, Configuration config);
}

More generally the above is simply a specialization of a query against a configuration, but just for one specific key:

@FunctionalInterface
public interface ConfigurationQuery<T> {
   T queryFrom(Configuration config);
}

Now these concepts can be useful, when thinking on collection support. Image the following configuration:

foo.a=aValue
foo.b=bValue
foo.c=cValue

Given this configuration foo can be adapted in different ways:
  • It can be adapted to a Map with a=aValue, b=bValue, c=cValue
  • But it can also be mapped to a List (interpreting a,b,c as ordering predicate) with aValue, bValue, cValue.
  • Similarly it could also be mapped to a tree:
              (root)
             /  |   \
           a    b    c
           |    |    |   
      aValue  bValue cValue

With the extended adapter definition all that is possible.

With that our Configuration is already very flexible, for example think on the following entry: 

javax.persistence.unit.MyUnit=https://myconfigserver.net/myApp/persistence/MyUnit.xml?myinstance=${instance.host} 

This entry could reference a persistence.xml, that is provided remotely. With ${instance.host} EL could be used to enable also dynamic aspects included into the configuration.
Now an adapter might return a String, containing the descriptor file, but this would load the configuration completely into memory. As an alternate we might allow to return an InputStream:

InputStream myUnitConfigStream = config.getAdapted(
                           "javax.persistence.unit.MyUnit",

                           URLResolver.of());


URLResolver would implements hereby an adapter that creates an URL and tries to load it, returning the InputStream. It is accessed using a static factory method:

public final class URLResolver implements PropertyAdapter<InputStream>{

  private static final URLResolver INSTANCE = new URLResolver();








  private URLResolver(){}

  
  public InputStream adapt(String key, Configuration config){
    try{
      URL url = new URL(config.getValueOrDefault(key, null));
      if(url!=null){
        return url.openStream();
      }

    }
    catch (Exception e){
      // log error

    }
    return null;
  }

  public static URLResolver of(){
    return INSTANCE;
  }

}
  
So we have seen that with rather small additional to the Configuration interface, we already have gained much flexibility, what we can do with it. Thinking on the new features of Java 8 configuration will be for sure get much more fun than it was in the past.
I would be interested, on what you think would be useful scenarios, using the mechanism presented in this blog. So you are invited to leave your comments and ideas!



Monday, March 10, 2014

The Complexities of Configuration

The Complexities of Configuration

In this blog I start to write some blogs about configuration. I will start rather general and then dive deeper into several topics as useful. I do not reclaim scientific correctness, so be polite with me, if I miss a point. Nevertheless I think, and hope somebody else finds it interesting. Feedback of any kind is always welcome, so we can have also some kind of two way communications here...

Understanding Configuration

General Aspects

Basically one might ask, what configuration is at all. When looking at the a  computation model, where some input is converted to some output, configuration can be seen as some kind of control flow, which affects the transformation. Nevertheless configuration is not equal to the program converting input to output. Configuration is more like a constraint recipe that tells the program in place, what to do, but only within the boundaries of the program allows to be configured. Obviously, if the configuration is so powerful, that it is capable of performing any task, it is questionable, if this should be called configuration (it may be called more a script or recipe). 
So summarizing configuration
  • should be constrained and limited for purpose.
  • must be interpreted by some algorithmic logic

Configuration is an API

Configuration is not there just for fun. With configuration your program logic defines an API, which clients interacts with. If you change it, you will break the contract. If you replace configuration, you must deprecated it. But things get worse. With code you have a compiler that flags out deprecations and will fail if pieces do not fit together anymore. With configuration you do not have any such tools.
As a consequence, like with APIs, you must think on what should be configurable. In general, similar as when designing programmatic APIs, reduce your API footprint to an absolute minimum. Frankly speaking, if something is not really meant to be configured or very complex to configure, or even very rarely used, consider to make it non configurable at all. Instead of ensure the component is well encapsulated as a Java artifact, so customers still can replace it with their own version if needed.

Configuration Types

When thinking on configuration types there are a couple of things that are commonly used to "configure" a program:

  • command line arguments
  • environment properties
  • system properties
  • files and classpath resources, using different formats; including standardized deployment descriptors as well as vendor specific formats
  • databases
  • remote configuration services
  • ...
This is list is for sure far from being complete. Nevertheless there are some similarities in most cases you will find:
  • a configuration entry is identified by some literal key.
  • configuration values most of the times are literal values. As a consequence configuration can basically be mapped to  key/value pairs.
  • configuration most of the time is single valued, but sometimes also multi valued (e.g. collections).
  • often keys use a naming scheme similar to package and class names (though property names are typically in lower case), e.g. a.b.c.myvalue. Hereby myvalue can be defined as the parameter name and a.b.c can be named the parameter area.
  • theoretically configuration values (as well as keys) may be of any type. Nevertheless if we would not constrain anything, we are again struggling with complexity and overlapping functionalities with other standards, e.g. CDI, are the natural consequence.

Configuration Building Blocks

So given the list above configuration is not a monolithic thing. It is a composite of

  • different configuration providers
  • different configuration sources
  • Override and priority rules for resolution of ambigous entries
  • Filters and views for limiting access and ensure only the information required is visible
  • finally composition can be made in different ways:
    • unions, rendering redundant entries to according multi-value entries.
    • resolving unions, where overriding and prioritization mechanism resolve the entries are visible in the composite configuration
    • extending, where only additional entries not contained in the base configuration are added, but the (redundant ones) are ignored.
    • exclusive add, where only entries were taken up into the composite that are contained only in either of the base configurations, but never in both.
    • subtractive, where you will remove the entries from the base configuration, that are contained in the second configuration
    • ...
Additionally configuration
  • may be static
  • may be different depending on the current runtime environment
  • or even mutable to some extend (or at least updateable). I will cover this topic in a separate blog, since mutability of configuration implies a bunch of possible issues you may face, especially when running in a EE environment, where concurrency is a default case!
  • maybe public or may contain entries to be protected by security mechanism

Configuration Metadata

Configuration meta data allows to store and provide additional data that describes configuration. It can be scoped on:

  • to a compete configuration
  • a partial configuration
  • a single configuration entry
Possible meta data could be:
  • the data provider
  • any additional provider settings
  • the type of data source
  • the configuration data's sensitivity
  • the configuration data owner
  • the exact source of the data, e.g. the jar and file path, where a classpath resource was loaded from.

Configuration Model

So given the points above I will stick to the working assumption to constrain a Configuration to be nothing else than a Map<String,String> instance, with some additional metadata and an identifier, that identifies a Configuration:


public interface Configuration extends Map<String,String>{

    /**
     * Access the identifying key of a configuration.
     * @return the configuration's key
     */
    public ConfigId getConfigId();

    /**
     * Get the meta information for the given key.
     *
     * @param key the key, not {@code null}.
     * @return the according meta-info, or {@code null}.
     */
    public Map<String,String> getMetaInfo(String key);

    /**
     * This method allows to check, if an instance is mutable. If
     * an instance is not mutable most of the so called
     * <i>optional</i> method of {@link java.util.Map} will throw
     * an {@link java.lang.UnsupportedOperationException}:
     * <ul>
     *     <li>{@link #put(Object, Object)}</li>
     *     <li>{@link #putAll(java.util.Map)}</li>
     *     <li>{@link #clear()}</li>
     *     <li>{@link #putIfAbsent(Object, Object)}</li>
     *     <li>{@link #remove(Object)}</li>
     *     <li>{@link #remove(Object, Object)}</li>
     *     <li>{@link #replace(Object, Object)}</li>
     *     <li>{@link #replace(Object, Object, Object)}</li>
     *     <li>{@link #replaceAll(java.util.function.BiFunction)}
     *     </li>
     * </ul>
     * @return true, if this instance is mutable.
     */
    public boolean isMutable();
}

I am well aware that this looks quite simple, so I will refine this model in subsequent blogs for covering additional requirements with things like
  • extension points like queries and type adapters
  • Adding basic type support for JDK's standard types (boolean, characters, numbers)
  • enabling submodules
  • and more...

Configuration Locations

Separate Configuration from Code

An area of discussion is sometimes if configuration must be strictly separated from code. I will not join any of the sometimes religious discussion on that, but define some rules of thumb, when I think configuration should be separated and when it should be deployed along the code. 
  • Configuration that is internal only, meaning it is not meant being used by clients, should always deployed with the code, basically within the same jar, if possible. This makes sense since such configuration is highly coupled to the code.
  • Default configuration that may be overridden, should still be deployed along the code. This ensures the defaults are always visible, when the code is deployed (with an according configuration reading mechanism in place, e.g. that honors the same classloading boundaries). Also it is a precondition to let convention-over-configuration to work effectively.
  • In next step I would think of configuration that controls the overall basic deployment setup, but still targets rather general concerns. For example configuration defining which modules are loaded, depending on the current deployment stage is such a case. Such configuration, though it may be stage specific, will not be affected by changes within the current runtime environment. I would recommend to deploy such configuration also with the application, e.g. as part of the deployed ear or war-archives. Reason is, that I tend to see configuration also as a n (optionally stage specific) default configuration.
  • Finally there is configuration that targets direct deployment aspects and that may change for each single deployment, regardless if performed manually or in an automated cloud like environment. This configuration should be separated from the code, meaning independently deployed. Hereby there are several options how to achieve this:
    • Deploy the files required with ssh, sftp or similar to the target node, where it can be read.
    • Mount some specific area into the file system, where the files are locally visible, e.g. nfs etc.
    • Access configuration from a configuration server (Pull-scenario).
    • Open a connection and wait, for the configuration server to push the configuration required onto your node (push-scenario).

Add Configuration as Classpath Resources

Many people tend to see configuration as files that must be deployed to the target system. Nevertheless in case of internal and default configuration (refer to the previous section for more details), deploying this configuration type as files in a separate deployment channel also creates some possible issues:

  •  It is cumbersome if clients have to care about what additional configuration must be installed to get things running. They want to define the dependency on the library and start working with it. In practice this may be even worse, when different versions of the classes require different (default) configuration. Often then outdated configuration is then shipped with newer version of the component, which often end up in hard to find errors.
  • Also on the deployment side (DevOps) it makes the deployment bigger (more files to be deployed) and more complex, for configuration updates.
Whereas when configuration is deployed as classpath resources there are some real benefits:
  • The classloader hierarchy ensures the configuration is only visible, where it should be visible. There is less risk, that configuration from different deplyment levels (= class loaders) is mixed up.
  • Reading classpath resources is standard mechanism of the JDK, it is also possibly during very early points of server startup or logging initialization.
  • Reading classpath resources is relatively fast and also can be secured, if necessary.
But deploying configuration as classpath resources also has some disadvantages:
  • First of all, it is less transparent. Theoretically each jar in a 200 jar deployment can contain relevant configuration. To find all the relevant entries maybe very difficult, especially if no common configuration lookup mechanism is defined and each code, is looking up configuration at arbitrary locations.
  • Overriding may also be more complex. You can override a file deployed to some file system easily, whereas changing a file contained in a jar, basically requires exchanging the whole jar (we ignore other possibilities here).

Fortunately the disadvantages can be handled relatively easily by externalizing the concern of configuration reading and management into a dedicated configuration service. 

Using a Configuration Service

If you would let each code individually lookup the configuration you may end up in systems that hard to control because
  • you will have to know which code is reading and using which configuration, and have to look into the source code to see what is happening
  • configuration locations are scattered across your system
  • you will probably have to deal with several different formats

Core Functionality

Using a dedicated configuration service for reading and managing configuration has several advantages:
  • It allows to define a (or several) configuration meta model, defining
  • where configuration is located (CLI arguments, system properties, environment properties, classpath, filesystem, remote resources etc).
    • how configuration can be overridden (ordering of declarations, explicit priorities and overrides etc).
    • in what format configuration must be provided (properties, XML, JSON, ...)
  • manage the configuration read, depending on the current runtime environment and
  • optimize configuration access, e.g. by caching or preloading.
  • provide hooks for listening to configuration changes (new configuration added, configuration altered or deleted)
  • also such s service can provide additional meta data about configuration and configuration entries.

Extended Functionality

As a benefit, since a configuration service controls everything happening in the area of configuration, it can provide additional services:
  • It can intercept configuration access to ensure security constraints
  • It can configuration access to log which code is using what kind of configuration. This can also easily be used of configuration evolution, e.g. by writing warning messages when deprecated parameters are read.
  • It can include additional configuration sources and locations to a configuration transparently, without having to change any client code.
  • a configuration service can be made remotely accessible, so it acts as a configuration server (pull scenario), or
  • it can be triggered, so it pushes configuration changes, to the according remote instances (push scenario)
  • ...

Configuration Injection

We have seen that a configuration service can create huge benefits. Nevertheless we have to be careful. to avoid a hard dependency on the configuration service component. This would happen, if we access all our configuration using a service location pattern, e.g.

Configuration config =
       ConfigService.getConfiguration(MyConfigs.MainConfig);


Fortunately since Java EE 6 we have CDI in place, which allows us to transparently inject things, so we might think of doing thinks as follows:

public class MyClass{
  @Configured
  private String userName;
  
  @Configured
  private int userName;

  ...
}

The code snippet above does only depend on the @Configured annotation. All configuration management logic is completely hidden. Nevertheless the code above looks very compelling it has some severe drawbacks, which we will also discover in later blogs. Do you see them already?