By Steve Parsons
Active Data Services (ADS) – What is it good for? Quite a lot actually, as the song does not go. This article discusses what ADS is, what it brings to the table over and above Oracle Coherence and the positive and negative points of the technology.
ADS is typically first adopted for use with “Active Position Manager” (APM) but we are now seeing several Openlink clients starting to fulfil the promise of the technology by building custom interfaces using the ADS API’s. Upcoming versions of Openlink Endur/Findur will see the usage of ADS expand into new product areas.
What is ADS?
It is an in memory data grid designed for use with APM, but with supported Java and CSharp APIs that allow other applications to utilise the data that the APM services have inserted into the grid. Applications can also inject their own data into the grid. It is a layer that sits on top of Oracle Coherence. APM uses it as the data store and to aggregate positions required by APM page designs.
What is Oracle Coherence?
Oracle Coherence is an in memory data grid technology. It supports caching and is designed with high availability in mind. If used properly it should allow a linear ability to scale performance, i.e. it’s a scale out technology (add more servers into the grid) rather than a scale up technology (add more CPU’s and memory to a box). It also has very powerful publish and subscribe functionality for data updates.
This blog gives a good overview of the stated benefits of the technology.
It is used by many institutions in the finance domain. Some use it primarily as a data caching technology, we have also heard of others building trading systems around it.
But ADS is just Oracle Coherence !
I have heard this comment several times. No it’s not. The Openlink ADS development team have built a layer on top of Oracle Coherence. Coherence provides a concept called “Continuous Callback”. This is where a listener registers an interest in a cache of data. Every time a row is changed or added or deleted the listener receives a callback with the before and after state of the rows that were affected. ADS builds upon this concept by providing “Continuous Aggregation”. This is where a listener registers an interest in a cache by specifying an aggregation operation it is interested in. e.g.
SUM(mtm) WHERE portfolio = ‘X’, GROUP BY toolset, trader name.
(Note that ADS does not use a SQL like syntax, but it is analogous to SQL for the purpose of this example)
The listener will receive an initial aggregated total and then every time the total changes the listener receives a callback with the new total. This is a very powerful concept when thinking about real time position management problems. It allows client code to become very lightweight, whilst delegating the computationally heavy aggregation code to a horizontally scalable data grid. There are good reasons why the Openlink ADS Development team won an Oracle innovation award.
Why Use ADS ?
There are 4 common reasons for using ADS with APM.
- Ability to scale APM to deal with datasets comprising many millions of datarows. Without ADS, APM can cope with a couple of million datapoints in the local client session data cache (especially with 64 bit OS where process limits are 4GB in size). But if you have 2GB process limits or many millions of datapoints that need to be aggregated then APM will break without ADS. This is especially relevant if you have a large portfolio of power deals and are starting to trade at 15 minute granularity.
- Delegate the data cache & associated aggregations off the client sessions
The older APM architecture (the “SQLITE” architecture) loads datapoints into a local data cache and aggregates locally. This is a poor architecture for Citrix environments as this has a high CPU and memory usage impact. The load is also unpredictable as it depends on what APM pages the user has loaded.
- Writing custom interfaces to pass the APM data to downstream applications
Prior to ADS, the way the datapoints were stored in APM was proprietary and inaccessible to other applications (a closed architecture). ADS is an open architecture which allows other applications to take advantage of the business knowledge encapsulated in the APM service and associated simulation results.
- Writing custom applications to replace or complement the APM client
This is starting to become more prevalent. The APM client is a generic solution that deliberately avoids being market specific. In a front office environment traders ideally want custom solutions that fit their needs. Some Openlink clients have written additional applications that use the APM caches but present them in a different way, whilst still benefiting from the scalability of ADS. One client has even replaced the APM client with their own. BUT – do not under estimate the complexity of replacing the APM client in full. APM comes with several sophisticated pieces of functionality (e.g. bucketing methods, MW conversion) which rely on “inside” knowledge of how Endur/Findur works. ADS custom applications work best when the data is suitable for pivoting rather than complex bucketing.
ADS can also be used for applications that have nothing to do with APM. For instance, Openlink are now using it with their OpenRisk framework.
What are the downsides ?
ADS is difficult to size with precision as there are many variables that determine the amount of nodes that are required. E.g. JVM heap size, the number of rows and columns in the cache, how many listeners will be subscribing to the caches for aggregated updates. Initial number of node estimates can be over or under optimistic.
ADS is not cheap (although Openlink have changed the pricing model to make it more accessible). Depending on the number of nodes required, it can represent a substantial investment. Coherence is embedded within ADS so you don’t have to pay independently for Coherence, but if you already have a coherence license you still have to pay for ADS. The licensing model also means that you can’t use the Coherence API’s natively, but only through the ADS API’s (you can do similar calls to native coherence). So it is an independent product unrelated to coherence in licensing terms. In theory OL could rip coherence out and replace with a different underlying data cache (e.g. gemfire or an opensource product such as Infinispan). We doubt that will happen any time soon though.
- Heavily Dependent on network performance
Having a highly performant, no packet loss network environment is critical to Oracle Coherence working properly. Oracle provide tools to test network performance (e.g. the datagram test) which often disproves statements by clients that they have a clean network. We have seen several clients spend months swapping NIcs, switches, and entire H/W infrastructures in the quest for a network that works as advertised. If you’re considering ADS (or just Oracle Coherence) check your network before you even purchase. Otherwise you could be in for some expensive, unexpected remediation. In our experience network guys will always say the network is fine, even when the datagram test shows substantial packet loss. Don’t believe them.
- Heavily Dependent on Support from Openlink
Openlink have stated that ADS is a fundamental plank of their architecture going forward. In V14 onwards ADS has been integrated into Services Mgr to simplify the configuration (in earlier versions it can be complicated). But recent major new versions of ADS have proved problematic for the first customers taking that version. Try to find other ADS clients using similar versions to the one you are targeting to discuss their experiences. They will have a lot of good advice.
- Push Architecture
A push (i.e. sending out updates to subscribers) architecture is inherently far more complex than a pull architecture (where data is retrieved and aggregated once). Modern Cube technologies offer huge scalability without the complexity of the push architecture. If you don’t truly have a real time requirement (or very near real time) then you should consider a cube as a possible alternative. Use the right tool for the right job. ADS can be considered as a real time cube, but its not a cube. If you fill it with lots of historical data for deep analytical mining then unless you want to dim half the lights of London with the size of your server farm….. use a cube.
Why not just use a cube ?
Glad you asked that. Cubes are a pull architecture and therefore not real time in the way ADS can be. Also, designing a good cube is an art in itself. Cubes tend to start off small and performant, then over time gradually warp into a different shape as requirements are added or change. Performance degrades and maintainability suffers. The technology is also pretty obscure (although not as obscure as Oracle Coherence). Adding a new dimension to a cube can be a very painful experience as the impact on existing data and recalculation times can be high. The APIs around ADS are relatively simple, and it is easy to destroy caches and instantiate them with different schemas. It is a good option if you have real time (or very near real time) requirements.
Java Versus CSharp APIs
ADS is natively coded in java. The Csharp API is created via a conversion process. There have been issues with the Csharp APIs in the past. If possible, use the Java API in preference to the Csharp API. You will probably have fewer issues.
ADS is a little understood technology that gives the ability for APM to scale to much higher volumes of data points, whilst opening up the APM architecture to other applications. It also has applications outside of APM. Several Openlink clients are starting to utilise this potential successfully. It is a powerful tool that has many plus points. However, if you choose to invest in ADS then prepare carefully. Try to go for ADS versions that have been proved at other clients first, check your network and clearly understand what benefits you can expect to get.