Notes to Keep you, and Your Implementation Sane
by Glen Byrne
1. Introduction
Openlink Endur provides means to validate trade entry as it occurs and after the fact, by triggering AVS processes running in script engines and in the trading manager presentation layer. This can be done in avs, scripts triggered using definitions that allow the process to be triggered for specific instrument types, parties portfolios, statuses etc.
This can help play a roll in effecting a straight through processing, “hands-off” architecture, so it can be a critical element in cost savings on a trade by trade basis for large organizations with significant commodities flow businesses and therefore can be a critical element in your Endur set up. Most people regard it as a trivial exercise to set up, and to be used on an ad hoc basis, but we will show that a strategy and rigorous set of implementation steps is required if you are to build a stable, scalable and supportable Operations Services implementation.
There is good Openlink documentation explaining the set up of ops services scripts, how to write scripts and even some “dos and donts” on script writing and setup. However, despite the apparent simplicity of this functionality we have yet to see a really good, scalable, supportable, clearly defined and well understood implementation of ops services processing in any organisation.
Aside from trade entry, transaction field changes, party data amendments, nomination events, power scheduling events, clearing the instrument number, credit/risk checks, portfolio creation, logging on/logging off events can also trigger script processes according to definition parameters. However, the majority of Ops Services definitions are generally centred around Service Type = Trading, but even so the following discussion still applies to all of those other types of definitions as well.
2. Strategy
Depending on your organisation and your requirements, you should choose a strategic goal for which you are going to use ops services, even if that decision is to choose not to use it! You may want Operations Services to fulfil multiple purposes, and that is also perfectly acceptable, as long as you have a forced ranking of what those priorities are, so that when decisions arise, there is a clear idea of what functionality should take priority.
For example, in banks, hedge funds or organisations with a financial focus, it’s clear Ops Services should be used to assist in increasing STP rates. For organisations that are asset heavy or focussed, it may be that ops services is strategically employed to ensure data quality and integrity for scheduling and nominations; for organisations with a critical client facing function, this focus may be on ensuring data integrity and quality to confirmations and invoicing systems. The reason for using Ops Services could easily be a mix of all these elements, but even so, once these are stated and acknowledged, it will be easier to prioritise these.
By agreeing priorities, and being flexible about which is most important at any one point in time, Ops services becomes a strategic tool within which context sensible decisions can be taken about its use, and even when it should not be used. Having this framework agreed upfront will help make for a successful implementation and easy maintenance.
In a “green field” implementation it’s possible to outline a strategy for use of Operations Services pre and post processing.
Examples of potential strategic goals for the use of ops services in Endur
- to improve STP rates
- to guarantee data quality to systems and feeds downstream of Openlink in an organisation where particular data items and quality have been identified as critical to success/failure
- clear definition of when pre and post processing should be used (generally only as a last resort)
- definition of why it should be used – to what ultimate end is pre and post processing helping i.e. greater data quality, ensuring pre credit/market risk checks and so on
- what kind of error trapping and support and maintenance will be set up and maintained – who will respond to trades generating errors
- change control – if issues occur in ops services processing they are generally very high visibility issues and can impact negatively on the reputation and user perception of the IT team and the application itself, even for occurrences of errors that are highly trivial, and/or highly trivial to fix. Therefore, there should be some careful consideration given to how difficult you choose to make altering e.g. definitions, ops services scripts, as immediate flexibility is required in this area.
- build rigorous business processes around ensuring that ops services defns are maintained to the highest degree possible i.e. when the OL data model is extended with portfolios, user defined instrument types, and so on, these are pro-actively REMOVED from definitions where it’s clear they are not relevant, sice these data model extensions are added to be included in all definitions by default.
- aim to use ops services ONLY where it’s not possible in any other way to guarantee data quality. Have a clear strategy that Ops services are NOT to correct for poor data entry habits. Analyse to root cause and look to fix that issue.
- aim to use ops services to increase STP rates – this may have to be done in conjunction with wider programs to improve STP in your technology function or in the wider organisation. This may involve liaising with Middle Office on issues like template creation, agreement of economic and non economic deal parameters, locking down, in as much as possible, who creates templates, a rigorous testing process to approve the user of templates and ops services to ensure only UNMODIFIED templates can be used to book transactions.
- avoid over specification of ops services error messages and processing in the first instance – try and keep it to a minimum until users understand more about the system and how they interact with it. Encourage users to seek root causes to data quality issues before agreeing to create ops services to simply check and fail.
- design ops services so that USERS bear the brunt of the investigation into modification and sources of issues with data quality – this can help them to remove/rationalise ops services and ensures that maintenance falls with them.
- ops services can also be reserved for use in cases where Openlink bugs cause particular issues
- never use ops services to export data. If for some reason you have to, do that export offline i.e. post messages and export using a separate process.
- if you have to create an ops services defn to perform a data entry check e.g. check/remove carriage returns from a field because a downstream system will lose formatting with that field is sent – try to build in a sunset clause to the request e.g. “we will remove this in 6m time, because we expect you to fix your system to include consideration of this type of data……”
- bear in mind that the first “culprit” to be blamed for any performance issues in trade entry, will always be Ops services, and generally, unless there are some gross coding inefficiencies in pre-processing scripts, the source of the issue will nearly always be found elsewhere. A lean pre-processing list of definitions is both easier to quickly investigate and prove to not be at fault, and also convincing even for skeptical users, this allowing you to focus on finding the real cause.
The above examples of strategic aims and ideals can be applied to a current Ops Svs implementation in order to test it’s fitness and target definitions for removal/adjustment.
In the case where you are coming to a pre-existing Ops services set up, even if it’s only existed for a brief period of time, Section 3 outlines “Rules” that can be applied to all Ops Services definitions to decide if they are to be kept, re-worked, removed pending fuller solutions in the upstream or downstream world of the Endur implementation.
3. Rules for Ops Services Processing
Rules
Generally, the fashion these days when writing advice (or legislation!) is to provide “guidelines” that one is to intelligently interpret to suit ones needs. While there is a place for “guidelines”, when it comes to Operations Services, that place is completely secondary to the following “Rules”, which we have assembled, on the basis of what we would modestly describe as vast experience in the field, having seen how a lack of application of clear rules has so adversely impacted the implementation of a critically useful piece of Endur functionality in so many institutions.
Rule 1 – Intelligent Design
Use a central design authority to determine if an ops services is appropriate for what you are trying to achieve. E.g. could a proper template usage avoid data entry errors instead of enforcing it via ops services? This relies on system knowledge to ensure at there is really no other way to achieve the data check or data integrity for deals being booked. The Design Authority should have an overview of all OpenLink functionality but also upstream and downstream functionality, or at least access to expertise across the whole range of technologies that impact the particular case in question.
Rule 2 – Define the Definition
As a code design and code review standard, ins types should not be referenced in op services scripts. Ins types, portfolios, entities etc can all be configured and encoded at the definition level, so should not be referenced in the code. Perhaps the only exception to that might be where the data model is different for different ins types to retrieve data. However you should look at grouping e.g. All power toolset schedule processing together etc, which requires an overall design vision for how ops services work (Rule 1).
Rule 3 – Depart Gracefully
Never exit terminate for pre processing
Rule 4 – Get the Balance Right
Never encode too much processing in a single script. In theory all ops services you require can be done in one script running every time any deal is booked. Also in theory you could have a definition for every ins type/entity/portfolio/status change, but again that defeats the purpose of the functionality provided. The bottom line is you need a relatively flexible grouping of functionality into scripts/definitions, based on the type of transaction commonly traded and the relative volumes of the trades as well. By definition if you are using OpenLink then you probably trade a variety of commodities (if you are using OL to trade just electricity or just gas, or even worse just financial swaps, then you need to rethink that strategy). Think about how you need to group trades e.g. processing of futures, irrespective of exchange or underlying type, will typically require the same type of validation – so should be grouped together. Processing of financial instruments, depending on market, may be served by splitting up into different definitions; physical power scheduling and data validation may best be served separately from gas because of different data model footprints, etc.
How many operations services definitions is too many, or how many is too few?
- we would suggest that you should have MORE pre than post. If its the other way, you really need to seriously examine that
- often ops services is the first thing to be blamed for deal entry performance issues. You should consider building in performance measures to all ops scripts, start end times, durations, plus an easy way to measure total durations. If you have that you can easily rule it out or rule it in as a cause. To be fair, in the absence of very egregious coding, we have yet to experience a situation where ops services definitions either cumulatively or individually we’re a significant source of performance issues for the booking of INDIVIDUAL trades by individual trader(s). More commonly for large scale (>10,000) deal migrations, or amendments, or particularly cancellations for some reason, turning all pre and post processing off has helped for testing. In production, you can target specific ops definitions you may want to keep running during a bulk migration, amendment, tran regeneration and so on and hence make some performance gains while still maintaining required data integrity.
- you should act to minimise the number post-processing definitions that you have set up, and those you do have should be configured to run on dedicated run sites where possible. This removes the dependence on users sessions to complete, and allows complete error trapping where that’s required. Technically it is entirely possible through Openlink’s broadcast messaging service to ping error messages for post process services to users, but we’ve found that in practice all such messages are completely ignored by users.
- when coding up pre and post processing scripts always ensure to test for the case where you have multiple trade entry in one go as from blotters and desktops – the code to handle multiple deals is trivial, and while it’s not always required, should be the default standard for your ops code.
Rule 5 – Order! Order!
Ops services sequencing is not required, but you should always have a totally deliberate sequence in mind. This serves a number of purposes. First, it shows that there is a deliberate flow to the sequence of events when ops services kicks of. That means anyone extending or amending can clearly understand the impact if any changes are required. Second, it implies that there is a deliberate and meaningful flow to the data checks. Even if it is irrelevant where a new or amended definition sits in the services flow, having an understood sequence makes it easy to decide where to put a new definition, and it makes it easy to change the ordering of existing functions should that need arise.
Rule 6 – If it’s Offline, it’s Off the System
NEVER retain offline definitions in the ops services manager. We know of at least one instance where a n investment bank had very many offline definitions in the Ops services manager, simply through bad housekeeping. Over a weekend some changes were made which involved turning all ops services definitions off and then on again. Due to an error the ops services definitions were NOT turned back on, and on the Monday, it was impossible to tell which were valid ops services, and which were the redundant ones. Normally a copy environment would be available but since the change happened at the weekend, and the copy environment was made late in the weekend, the copy environment had the same status. Many dev test environments were available but all were so old, or so modified none could be trusted. Back up copies of production could be restored but because of the size of the db would take several hours to restore to examine. In the interim turning on all definitions resulted in multiple deal entry failures because of out of date code, but leaving off all definitions meant potentially serious data quality or even financial impact downstream. The situation had to be resolved by examining history tables which led to a significant delay, followed by verifying when a production copy was restored eventually. The impact was that trades booked in the interim were subject to many data quality issues that users had to spend some time to correct. In this high deal volume environment, the impact was quite large.
Rule 7 – Catch up – not Ketchup
Because post processing is not guaranteed many OL clients run a post processing “catch up” task that runs to ensure execution of any post processing task not completed for any given deal, as determined by logging of executions. You need to carefully choose the frequency of execution for this process. For example we know of one instance where, because of Openlink version database issues, a post processing task used to auto book transactions following the booking by a trader of an original trade, took several minutes to complete. In the meantime the frequency of the catch up task was so often that, while the original pp script was still running, the catch up would run, and, seeing a new trade having been booked, would examine which post processing scripts had completed. As the pp for original trade was still running it had not completed so the catch up task would kick off a second run of the pp script. The results was that 2 trades would get booked instead of one. This issue was made very difficult to resolve because in all of the test databases where it was looked at, the intra-day process of the catch up was NOT being run, so the issue was very hard to discover.
Independently this makes an interesting case to ensure that there is a prod copy environment in which all intraday processing is up, and constantly polling and running for the purposes of investigating production issues like this. This is possibly a very obvious point yet many OL clients do not have this type of configuration set up.
Rule 8 – Error Capture (and Defeat)
Standardize error trapping in scripts. For any give script capture all error for all checks at the end and present a list of checked errors through the script. There’s nothing worse than having to repeatedly book a deal to get all 4 errors you happen to have: all errors should appear in one comprehensive message.
In terms of scripting, this can be achieved by creating a single error string for pre processing, which, on any one or more ops services failures is displayed to the user.
Note there is a distinction between failures which stop deals being booked and warnings which tell the user the impact of a decision. As with failures, warnings need to be amalgamated into a single message at the end of deal booking.
All errors need to include specific instructions as to how to fix the issue. E.g. “Party has no agreement. Contact operations support on xtn 1234 to get one set up.”
Olisten console logging needs to show which script is running and which script is generating errors. When users report errors with a screenshot it’s immensely useful to see the script or defn name in the error or warning message.
As an aside, a useful note here is to reiterate the point that debugging print statements, and error trapping in AVS needs to take place inline. That is, NEVER nest debugging print statements in the kind of deeply nested, overloaded function calls that are typically found in Java or other .NET technologies. This makes debugging the AVS code tedious in the extreme, is not warranted by AVS or indeed JVS and serves only to frustrate any attempts to debug in resolving issues.
Rule 9 – KISS
In general, in the first instance, users will tend to over specify ops services checks and tend to view them as panaceas for data quality issues. Neither of these is a good thing. So, as far as possible when creating/setting up Operations Services processing – Keep It Simple Stupid!
Rule 10 – Don’t Even Think About it!
Never use post processing to directly book transactions in the system. There is a good argument to say that you should never ever use post processing to even trigger the asynchronous booking of trades, but we have found that is a difficult standard to always enforce, given typical user requirements.
That said, if it is a requirement then the best thing to do is to ensure that no deal booking takes place in the post processing script itself, and that it is used merely to write to post processing logs which are the polled later by tasks/workflows running on dedicated run sites which can a) guarantee execution with error trapping and b) ensure that deal entry and script engines is not encumbered locally on user machines or sessions. The main issue is to ensure that the required deal entry is not dependent on e.g. the user session remaining open long enough to complete, and that if bulk deal entry or amendment takes place there is a minimum of subsequent post processing activity on local systems.
Rule 11 – What’s in a Name?
Define a clear naming convention for your code in general but in particular for ops services. Use Openlink code categories to define all ops services, pre and post processing. Make good use of include and utility scripts when coding. You really need to have a proper design and architecture in mind so that new and existing code can be modified in a way that services existing and future possible needs.
Rule 12 – Not Just for Christmas
Ops services needs constant monitoring and maintenance. This is because as the Openlink data model is extended for portfolios and new instrument types, these are added as a default to be included In all existing ops services definitions. This is an understandable feature of endur and ops services in general. However, unless it happens to lead to a deal entry failure, it means that all definitions, no matter how carefully configured and restricted they were when first set up are subject to a kind of scope creep whereby new portfolios and ins types and entities are added and simply not removed because of bad housekeeping practice, and yet are kicking off ops definitions for which they are not at all relevant.
4. Guidelines
See Rules 🙂
5. Operations Services Maintenance
A regular check of definitions can often reveal desperate. lazy and poor house keeping. E.g. In one case post processing was used to generate invoicing XML data which was fed to a document management service outside Openlink. However, futures, not ever invoiced, were included in the definition. Hundreds of thousands of futures were booked to this system in any 2-3 month period and all were causing the invoice script to run. Not only that they were causing failures of the invoice script which was massively increasing the post processing error log tables. Other times, definitions that were completely redundant or superseded by other non ops services processes were left to run and burn system cycles completely uselessly. Regular scrutiny to weed out such definitions and to update definition criteria is completely necessary for a proper scalable Ops services set up.
We have yet to see anywhere where this has not occurred, in some cases to the point where it is impossible to tell by looking at the definition what trade set its really meant for. Often, some detailed examination of the code is required. This is an easily avoidable headache since typically adding new portfolios or instruments tends to be BAU or project work in itself subject to testing etc, so removing from definitions where the new data is not required should be an easy thing to add to the process. Of course that’s if you have a good understanding of all ops services in the first place. See also Testing section below.
As well as being wasteful of system resource this introduces the chance that deals are modified or checked in a way that is not appropriate. While one would expect such issues to be recognized, resource constraints in systems with very large data volumes in practice mean these issues often only get discovered at audit, at which point remediation costs can be high, and also highly prioritized, meaning impact on other scheduled priorities.
6. Testing of Operations Services
Philosophically speaking the way to test ops services for adding a new definition might be to have a test harness that books samples of all the trade types to which the def applies and ensure they are captured or modified or the user is requested to amend in the required way as expected. This only unit tests the definition. How can we ensure that adding the definition doesn’t somehow include or modify an existing deal type being booked in some unexpected way?
For example perhaps we use COMM-PHYS deals to model physical gas deals. A new implementation extends COMM-PHYS usage to emissions and physical oil. We introduce data checks for emissions as pre and post processing checks. Any COMM-PHYS deal will trigger our emissions checks, which perhaps may require the user to e.g. Specify a registry, which is irrelevant for gas. If that occurs there will be significant impact on the gas desk. How can we ensure that never happens? Should we ensure we have a test harness for all gas deals as well, and book a full representative trade set, and examine all before and after reporting for deals, or data from the data model? In this example the solution would be to introduce a check to look at index groups in the emission definition code, but many more complex examples might still beg the question of how one can be sure.
Between the extremes of only unit testing the new or amended definition (which in our experience is really the only type of testing that is ever applied, particularly in poorly understood ops services implementations) and using test harnesses to book a before and after full representative test trade set and regression testing e.g data model extracts of all the trades before and after the change, we argue there is a sensible middle ground, whether you have a solid grasp of some or all of your Ops services set up, or whether it’s a full on indecipherable mess that no one convincingly understands.
In the above instance you should target for testing any ops definition that includes the ins type for the new or amended definition, as well as unit testing. There may be cases where for example a definition applies to all trades. We have seen this where e.g. Every trade is given an organisation specific internal code, or every counterparty has an organisation specific code received from another system in the organisation. Apart from noting that such data should and could be piped directly to OL through a separate feed, or be applied to trades at point of entry of booking via that other system or auto booking, meaning we would suggest this should not be handled by one link, these types of definitions should also be included in testing. As we imply here if there are many of these types of definitions, you really need to look at re-organising your ops services fundamentally.
7. Production Implementation
Typically wait till EOD or green zone at weekends to turn off ops services to amend a definition or upload new code. Often, user sessions may have to be restarted to refresh for code changes. To change a definition it must be off line so you need to be sure no trades are booked that would normally kick off the definition, otherwise you run the risk of data quality issues for trades booked during the downtime.
In practice the amount of time it takes to change an ops services definition is so small, it’s very unlikely that during the down time a trade with errors that the definition happens to be checking happens to get booked.
However, increasingly, large organisations treat even minor or irrelevant configuration changes in exactly the same way as systemic or deep code or infrastructure changes because of misplaced regulatory concerns.
This can lead to a problem of agile response to trivial user requests, as in this case. We recommend a risk based test and implementation approach, backed up with evidence from the data base: modifying a definition that impacts a trade type that may be booked once per week should not be subject to the same testing or post trading hours constraints that a high volume flow type trade should be.
However that is all very well, but change control processes must be followed, although we would argue for flexibility on configuration changes.
8. Operations Services and STP
In many organizations STP (straight through processing) rates are taken very seriously as KPI’s, particularly in very high volume areas like equities and FX. In commodity trading in general this tends to be less of a consideration given the relative immaturity of some aspects of the markets and typical volumes of transactions even in so called flow businesses.
However, recently amongst some larger players in the commodities market this has become a significant consideration as cost cutting, regulatory pressures and internal rates of return at a business level become competitive yardsticks for the allocation of increasingly scarce and costly capital. These factors conspire to require a drive to greater efficiency.
In that context, ops services in Endur, as well as process workflows and other similar engines internal and external to the Endur product are linked together to reduce the amount of manual handling and intervention required for an end to end handling of a single trade.
The breadth and depth of an organization’s trading activity will determine the value of investing in STP, and ops services can play a critical role in that, but only in the specific context of enabling clear, well defined business rules and processes around a well understood trade scope, for which there are very clearly understood distinctions between economic information (that is details that affect the MTM and value of the trade such as price and volume), and non economic data and the relative importance of that data e.g. counterparty name and address and so on, and how that links can affect OL trade level economic data such as reset convention and payment periods.
It’s only with clear inputs from front, middle and back office can all of that data across the whole scope of a representative trade set can be processed and amalgamated into business rules that can be encoded in system templates and Ops services to be meaningful.
In this context we can look to OL templates to capture an lock down system, economic and non economic parameters, data and fields. Perhaps a feature of is would be, for example, ops services to ensure that a) no deal can be booked without a template, b) only middle office can approve and set up a template c) no deal can be booked against a template that has been modified away from the form and details of an approved template. A combination of system security permissions and ops services could be linked to do this, but note, you can never use system security or config purely on its own to prevent users for booking a particular type of transaction or creating new templates. However this type of combination of ops services, template creation and amendment against listings of approved fields, economic/non-economic parameters and so on, can be used to guarantee data quality for a well understood trade type of very high volume. In conjunction with other STP engines inside and outside OL, some fairly basic set up and configuration can be applied to increase and improve on, or even just start, STP processing.
9. The Good, the bad, and the Ugly
We present some examples where Ops Services processing was badly abused to correct data entry issues, rather than address underlying issues:
- auto booked trades have incorrect holiday calendar. Users request automation of manual changes to set correct holiday calendar. So ops service definition is set up for these trades to change holiday calendar. Reason it’s bad: this is more than likely a template issue or something to do with the auto booking process and is not the root cause issue that is being fixed.
- a reconciliation for multi-month exchange cleared products downstream of OL reconciles against a feed from the exchange, at a monthly level. Traders manually book the exchange cleared transactions in Openlink, but the exchange feed to downstream systems books the exchange representation, which is at a monthly level. However, traders are not willing to book e.g. 6×1 monthly trades in Openlink, because they are aware that they can simply book one trade covering 6 months. This causes an issue in the downstream reconciliation, because it cannot reconcile 6 individual monthly trades against a single trade from Openlink. An Operation services post processing script was written that recognises when a multi-month exchange cleared trade is booked, immediately cancels it and rebooks it as individual monthly trades. This “solution” persisted for several years until it was pointed out that the export from OL to the downstream reconciliation could be modified to handle the transactions, and the egregious ops svs definition removed.
Good example:
- multiple legal agreements may be applicable to a given deal but system default first in the list. Often this is wrong but is not noticed at deal entry. On investigation it’s found correct that there are multiple agreements available, so Ops services script is set up with definition to warn user and request ok for default agreement, and ok to proceed. This is added to an Existing definition/ script which perform similar validations for e.g. tran info fields, settlement instructions and so on.
10 Conclusion
Openlink Endur offers some powerful, configurable functionality to ensure data quality before and after trade entry occurs. The ease of use of the functionality often causes clients to abuse it, invariably resulting in a morass of code, overlapping definitions, lack of scalability, and a headache for users, developers and support teams. By following and applying a small number of fairly simple rules, users can maximize the undoubted benefits from a well designed Operations services configuration script base and implementation.
Glossary
EOD = End of Day, typically means all processes that run in batch mode after final closing marks are generated by FO.
STP = straight through processing – generally taken to mean the process whereby trades are automatically booked and processed through to settlement/invoicing and confirmations, including all necessary life cycle events such as revaluation, fixing, exercising and so on, as required.
Ops Svs = Operation Services – the module in Endur’s Operations Manager (Services tab) that allows the configuration of definitions and scripted processes that are triggered when trades are booked. This also includes other processes, such as presentation layer field changes, counterparty information changes and amendments, and so on.
Other people's views
Hi Glen, thanks for a great article. I’d like to respectfully disagree with you on a few points.
Removal of offline Ops Services – I agree that completely redundant Ops Services should be removed. However, those used for infrequent activities such a portfolio compression or deal migrations could be left in a ‘Suspended’ state, stopping them from being put ‘Online’ by mistake.
The issue of relying on the users to stay logged on while post processes are running has been largely negated by the Post Process service, whereby Ops Service processing is immediately delegated to an app server.