Working In Uncertainty
Better management of large scale financial and business processes using predictive statistics
by Matthew Leitch, first published 4 December 2006.
Science is a way to learn more from experience. Intuition from experience feels so convincing, but if you've ever logged facts and studied the patterns in your data you probably discovered things you never noticed before, even though they were right under your nose for weeks, months, or even years.
This article is about using data in scientific ways to get better, more economical control of large scale business and financial processes. We will look at ways to find out what drives errors and how much, to detect misleading performance figures, and to streamline monitoring of performance. It should be of interest to anyone involved with operational risk, management of processes and systems, or audit.
The techniques described below will work in any size of organisation, but they do require a reasonable quantity of data. Up to a point, the more the better. Also, since some of them require skill at getting data out of systems and at using analytical software, and these skills are hard to come by, the usual context is large scale business and financial processes. For example, billing in a utility company or trading in a large investment bank.
People managing processes of this scale typically have regular meetings to discuss performance, issues, and actions. Usually they will look at regular reports of performance indicators such as the number of transactions processed, the speed of processing, the number of staff working, and some error rates.
In banks these numbers are often called ‘Key Risk Indicators.’
At these meetings people will often discuss particular problems that have occurred and try to decide what the causes and fixes are.
Scope for improvement
All of these activities can be dramatically improved by using more scientific methods because there are hidden weaknesses in what is being done.
Easier than it seems
It would be exaggerating to describe the scientific methods involved as ‘easy’ but things are a lot easier now than they would have been 10 years ago, and there are efficient ways to use them that greatly reduce the work and risks involved. Most likely, it would be easier in your organisation than you imagine.
Data is easier to come by than ever. Operational systems tend to be based on a few well established database management systems with tools that make it easy to generate reports and define your own queries to view or download just about any data you like in a matter of minutes. Computer systems hold more and more details and every action is logged somewhere, usually timed to the millisecond.
These days computer hardware is so powerful that massive quantities of data can easily be loaded onto a memory stick and popped in your pocket, and intensive calculations and graphics can be handled on the average manager's laptop.
The analysis tools have never been more powerful or easier to use. The humble spreadsheet program is not humble. It can deal with tens of thousands of records and perform multivariate regressions as well as generating a wide range of graphs.
Beyond that there are more advanced tools such as MATLAB, S-PLUS and R, and the software from SAS Institute, to name but a few, that offer a vast range of tools for discovering patterns in experience and presenting them clearly.
The big idea
How can these tools be combined with that data to get some useful insights? The most cost effective way to start is to look for correlations in data you already have. Science prefers experiments but in working organisations it is hard to set up experiments so most of the evidence must come from studying what happens, without intervention.
Correlations do not prove causality, but they highlight the few factors that could be causes and eliminate many others than cannot be causes. Talking to people and occasionally even performing experiments can help establish causality if that is necessary.
For example, you might be wondering why people make mistakes when they enter data into a particular system. Some people make more mistakes than others. Are they careless? Do they need training? Is the system hard to use? Do people have too much work to do? Do they make more errors when they work on into the evening? Is there something about the particular data they enter that makes errors more likely? Do changes to the entry screen increase or decrease errors?
To examine these possibilities we need data on who has been trained, and when, what changes have been made to software, and when, what time each piece of data was entered (an easy one), and of course we need to know about what errors were made.
Knowing what errors were made is the most difficult because not all errors are discovered. There are three sources. Firstly, we can look at amendments that were made after initial data entry. This may include adjustments to accounting records, bills, and so on. Secondly, we can use computer queries and comparisons to find at least some of the errors that have so far escaped detection. Some types of error can be found completely using the right analysis. Thirdly, we can look for a log where people have noted down errors discovered.
What approach people think of first seems to depend in part on the industry. In banking, for some reason, it is common practice to keep a database of operational risk incidents that is populated by people making reports of unexpected things that have happened. Staff are strongly encouraged to report everything, consistently and promptly.
In contrast, in telecommunications companies it is common practice to rely heavily on computer tools to check and compare as much data as possible. For example, if processing a customer order for some new circuits involves five different computer applications (which it often does) then comparable data from all five will be extracted and compared using additional systems, with all discrepancies being checked and corrected. This kind of total, detailed reconciliation is practical and saves money through the errors it removes.
Automated error discovery is efficient, consistent, and perfect for statistical analysis. On the other hand, it is limited to only some kinds of error or delay. Manual reporting of ‘incidents’ is flexible and potentially more wide-ranging, but inconsistent and unreliable. People report incidents only when they have time. They do not want to record errors that make them look incompetent. They do not want to note errors that are of a common type or that have a low value.
What you should expect to gain
In an ideal world, valuable insights about how problems are caused would come flowing from your analysis straight away. In our real world, be prepared for some less pleasing but no less important discoveries.
Analysis often reveals problems with the data – perhaps data the company has been relying on for some time. Here are some real examples taken from a number of different organisations:
Some of these problems will be spotted by just using a critical eye and perhaps some well designed graphs. Others will come to light because an expected predictive relationship is not found. Looking for correlations between possible causes and their effects is a form of what auditors call ‘analytical review’ and is a very efficient way to find errors in data. If things aren't what you expect, question the data first.
Gradually, problems with data and analysis techniques get solved and, link by link, the causal relationships that result in errors, rework, and lost money will begin to emerge and be quantified. Each new discovery can and should be put to use immediately.
Four keys to an efficient approach
There are four keys to an efficient approach:
Value from the discoveries
The immediate value of gaining understanding this way is likely to be found in the misleading performance figures it exposes, and the myths about what drives errors that it dispels.
Beyond that, quantified understanding of relationships between drivers of risk and the mistakes and losses that result enables you to manage better. For example:
Changes to management policies. Understanding what really causes problems helps managers rethink their habitual decision making and invent new systems and policies. For example, suppose you discover that claim processing errors are more common at the end of the week as people scramble to complete their work by the weekend. The data show this gives rise to extra work the next week, perpetuating the cycle. How would you manage differently with that knowledge?
Interpreting performance. Understanding how results are driven by the conditions people face helps in understanding when people have made an improvement and when it is just conditions getting easier. For example, if invoicing performance seems better this week it could be because people read your email reminding them to be careful, or it could be because there were fewer difficult invoices to raise this week. Conversely, steady or even declining performance may hide a tremendous improvement in working practices if conditions have also got tougher.
More accurate Statistical Process Control (SPC). Many large companies are interested in using SPC methods but have processes whose conditions and outputs are affected by ongoing but uncertain trends. SPC usually assumes relatively stable conditions, so these trends are a big problem. However, if you have quantified how changing conditions predict results then this information can be used to screen out the variations you understand from the variations you do not yet understand. The control charts need to show the difference between predicted and actual results, rather than the actual results. Done this way, SPC is applicable to these processes.
Process measures as evidence of effective control. The steep costs of complying with section 404 of the Sarbanes-Oxley Act have made more people aware of the cost of demonstrating that controls are effective. One promising way to cut these costs is to use metrics of process health, such as detected error rates and backlogs, as evidence. However, the health metrics need to be reliable. If they were exposed as flawed by external audit work this would be a serious problem. The approach we have described is an efficient way to search for and correct flawed process health metrics.
French mathematician Henri Poincaré wrote that ‘Science is facts; just as houses are made of stone, so is science made of facts; but a pile of stones is not a house, and a collection of facts is not necessarily science.’
Managing business and financial processes illustrates this perfectly. Every day the team of people running that process gains a vast number of facts, but to really understand and manage those processes requires an organised and efficient approach. Predictive statistics is one of the key tools required to succeed.
2 December 2006: First version.
14 June 2019: Added a fourth key to efficiency: having a strong flow of ideas.
Words © 2006 Matthew Leitch. First published 4 December 2006.