Category Archives: LSCITS

Is it possible to validate LSCITS research?

For the past 5 years or so, I’ve been working on a UK research programme of research and education into large-scale complex IT systems (LSCITS). This has involved partners in other universities and industry. Overall, I think we’ve done a good job with lots of interesting research results. Thanks to the flexibility of EPSRC funding, we’ve been able to be responsive to new development that weren’t anticipated when we put the proposal together such as social networking and cloud computing.

You can see a list of what we’ve produced at the LSCITS web site.

So, academically all is well. Lots of publications, students have received PhDs and staff have been promoted. We’ve ran successful workshops and achieved our aim of creating an LSCITS community.

Yet, in spite of this, I am left with a feeling of unease. So far, very few of our results have had any impact on practice. This is not, in itself, a problem as it takes a while after a project finishes before the results can have an impact. But, if and when they are used, how will we know how good they are? I feel uneasy because, frankly, even with commitment and support from industrial users, I have no idea how we can assess the value of our work for improving real large-scale systems engineering practice.

Let us assume that some company or collaboration decides to take some of our ideas on board – let’s say those on socio-technical analysis.  They apply these on a project and eventually go on to create a system that the stakeholders are happy with. Does this mean our ideas have helped? Or, if the project is deemed to be a failure, does this mean that our ideas don’t work?

The problem with large-scale systems is just that – they are large-scale and their size means that there are lots of factors that can affect the success or otherwise of development projects. These factors are present in all projects but the influence of particular factors varies significantly – for example, real-time response is a key success factor in some systems but less important in others. Not only do we not know in advance which factors are likely to be significant, but we don’t really maintain enough information from previous projects even to hazard a guess.  We don’t understand how these factors relate to each other so we don’t know the consequences of changing one or more of them.

So, is it impossible to validate if LSCITS research makes a difference? If so, what is the purpose of doing that research? My answer to the first question is that I think it is practically if not theoretically impossible; the second, I’ll make the topic of another blog post.


Filed under LSCITS, research

The Fear Index – a novel about LSCITS

I read the Fear Index by Robert Harris on holiday last week.  Harris states in an afterword that ‘I would like to write a new version of Nineteen Eight-Four, based on the idea that it was the modern corporation, strengthened by computer technology, that had supplanted the state as the greatest threat to individual liberty’.

In a nutshell, the book is about algorithmic trading and a trading program created by a reclusive physicist that uses machine learning to predict the market and make trades on that basis. Its premise is that the market is affected by fear – as indicated by the use of certain words in the news, websites etc. as well as future trading indexes and that this information is a predictor of future stock prices.  So far, so good. Then it gets silly – in Harris’s scenario the machine learning creates a ubiquitous ‘super intelligent machine’ that builds its own data centers to ensure its survivability, tries to kill its creator (for reasons that are never clear and using a stupidly obscure approach) and manipulates not just the market but world events that will change the market.  The novel ends with the Flash Crash which is supposedly created by this machine to hide its actions.

I like Harris’s novels but like Woody Allen films, the earlier ones were the best. Fatherland and Enigma were, I thought, excellent and his novels of classical Rome were pretty good. I wasn’t impressed by the Ghost – reflecting Harris’s dislike of Tony Blair and this one was really pretty grim.

I think it’s great that popular novelists write about technology and no-one expects them to do anything but simplify and exaggerate for effect.  This could have been an excellent book about the dangers of algorithmic trading and complex systems – we are creating systems whose operation we don’t understand. But Harris’s ignorance of the technology means that he has written a book that is anti-technology and which grossly exaggerates the dangers.  He is absolutely right about the risks of algorithmic trading but exaggerating these means that his message will simply not get through.

Harris is an excellent writer but he should stick to history – this is a bad book.

Leave a comment

Filed under Book review, LSCITS

Abstraction and complexity

I gave a talk recently about complex systems engineering at Stirling University where I discussed my notions that software engineering is essentially reductionist and we need to rethink software engineering approaches to cope with the complex systems that we are now building. I was challenged by a questioner who claimed that abstraction was an effective way to deal with complexity and I’m afraid that I dismissed this rather glibly without any real rationale of why it was inappropriate.

I have now thought about this and I now think that I can present a better rationale of why abstraction is ineffective for complexity management. In a nutshell, complexity arises because of the interactions between the elements of a system (see my blog post on complexity). Systems are inherently complex when these interactions are dynamic and where they change their nature over time and in response to environmental stimuli. Complicated systems are ones where there are many elements, perhaps of different types and where elements may have many distinct characteristics but where the relationships between these elements are static.  For example, a topographic map is complicated but it is not complex.

Abstraction, however, is a mechanism for dealing with diversity in the system elements where abstractions represent the essential (for that system) characteristics of a collection of elements. Therefore, if we are building a transport model, we may have an abstraction ‘car’ which has characteristics of size and speed – we don’t care about marque, colour, etc. This is an absolutely essential mechanism for understanding and reasoning about systems and for helping us create software – but it helps us deal with complicated not complex systems.


Filed under LSCITS

What is complexity?

I’m part of the LSCITS project where LSCITS stands for ‘Large Scale Complex IT Systems’ and we have been having discussions about what is meant by ‘complexity’. Some argue that the term complexity should be reserved for complex adaptive systems, systems which are dominated by emergent behaviour. Others argue that ‘conventionally’ engineered systems can also be complex in that their non-functional characteristics and (sometimes) their functional behaviour cannot be predicted. This is particularly likely when we create systems by integrating different parts (often other systems) which are independently developed and managed. In such cases, it is practically impossible to predict how the characteristics of one system will interfere with the characteristics of others.

We suggested that ‘complex’ and ‘complicated’ were not the same. A complicated system is one that is understandable in principle although, in practice, the effort involved may be so great that it is not understandable in practice. This was my own view at one time, but I’ve now changed my mind and think that there is no practical difference between a complex and a very complicated system.  This position has emerged from musings on the roots of complexity.

Some systems are inherently complex – we cannot deduce their properties by a study of their components; we cannot predict the consequences to changes to these systems. System behaviour and properties are emergent and non-deterministic. I believe that such inherent complexity stems from the fact the there are dynamic, dependent relationships between the parts of the system. These relationships evolve in time and according to stimuli from the system’s environment. New relationships may be created and existing relationships may change. As a consequence, deterministic modelling techniques cannot be used to make predictions about such systems although statistical approaches may be used in some cases.

As soon as you consider the people who use a system to be part of the system, you have dynamic, dependent relationships between components in the system so I argue that all large, socio-technical systems can be considered to be complex systems.

There is also another aspect to complexity – what might be called epistemic complexity. This relates to the predictability of system properties when changes are proposed. If you don’t have enough knowledge about a system’s components and their relationships, you cannot make predictions about it, even if, in principle, that system does not have dynamic dependent relationships between its components. Therefore, I argue that large complicated systems are also complex systems when it is practically impossible to acquire the necessary knowledge to understand the system.

This means, of course, that complexity is not just a property of the system but also of the system observer. We have all encountered system experts who know about some system and can make changes to it in a reliable way. Their knowledge is hard to articulate and when they are no longer available, someone taking over the system may find it impossible to develop the same level of understanding. Therefore, what was a complicated system has become a complex system.

Where is all this leading – who cares?

Well, I think it is important to emphasise that complexity isn’t simple. Striving for a simple, universal definition of complexity isn’t really going to get us anywhere.

If we are to try and manage complexity we need a toolbox of theories and methods to do so. To give an example, if you think about dynamic dependencies between components, formal methods of computer science don’t really help. However, if you think about epistemic complexity, they may be very useful indeed as they allow us to state ‘truths’ about a system – filling in our knowledge about that system.

The notion of dynamic, dependent relationships may also be useful in helping us manage complexity. By developing a better understanding of such relationships (e.g. through socio-technical analysis of organisations), we may be able to change the type of these relationships from dynamic to static and hence reduce the complexity of the system.

As I said, complexity isn’t simple so there’s lots of scope for disagreement here.


Filed under LSCITS

What is failure?

The terms fault and failure are sometimes used loosely to mean the same thing but they are actually quite different. A fault is something inherent in the software – a failure is something that happens in the real world. Faults do not necessarily lead to failures and failures often occur in software that is not ‘faulty’.

The reason for this is that whether some behaviour is a failure or not, depends on the judgement of the observer and their expectations of the software. For example, I recently tried to buy 2 day passes on the Lisbon metro for myself and my wife. They use reusable cards so you buy 2 cards then credit them with the appropriate pass. The dialogue with the machine went as follows:

How many cards (0.5€ each): 2
How many passes (3.7€ each): 2
Total to pay: 15.8€

To put it mildly, I was surprised. I tried twice, the same thing happened. I then bought the passes one at a time and all was fine – I paid the correct fee of 8.4€.

From my perspective, this was a software failure. It meant that I had to spend longer than I should have buying these passes. On the train, I tried to think about what might have happened. What I guess is the situation is that it is possible to have buy more than 1 day pass at a time and have it credited to the card. So, the 2nd question should have been:

How many passes on each card?

From a testing perspective, the software was probably fine and free of defects and, if you understood the system, then you would have entered 1 pass per card.

So, failures are not some absolute thing that can be tested for. They will always happen because different people will have different expectations of systems. That’s the theme of my keynote talk at SEPGEurope 2010 conference in Porto. We need to design software to help people understand what its doing and help them recover from failures.

1 Comment

Filed under dependability, LSCITS

Every cut has a silver lining

The UK Government has announced cuts of up to 25% which will be imposed on universities over the next 4 years.

I’ve been reading a recent document from IBM called Capitalising on Complexity, which emphasises the importance of innovation and creativity and this has triggered a reflection on the contribution that the computer science research community in universities can make to this. Sadly, the conclusion I’ve come to is “if we can do anything it is in spite of rather than because of existing research structures and management”.

The problems that we suffer from are primarily imposed by the need for research quality to be ‘measured’ – either at the individual level (career progression, tenure, etc.) or at the institutional level. We are all encouraged to publish regularly in ‘high-quality journals or conferences’ and to write research proposals for external research support. More and more people are now competing for very limited funding.

The end-result of this is conservatism and incrementalism. It is dangerous to your career to go into a new area or to think differently as there are no ‘high quality’ journals and conferences to publish in. If you make proposals where you suggest interesting questions to explore with no clear idea of the results you will achieve (what I think of as real research), you have zero chance of funding because your proposal will inevitably have lots of holes in it that reviewers can challenge.

Research funding bodies, to their credit, are aware of this problem and sometimes support special initiatives (like the LSCITS project) to try and be more innovative. By and large, however, these rarely work as the pressures for incrementalism that are imposed by the current university system are just too great. Researchers have to think of their future – if they take 3 or 5 years out to ‘think differently’, then they will probably never get another research job.

All of this means that CS research in universities is not the lever for innovation that it should be, it does not encourage creativity, nor is it addressing the grand societal challenges that we face.

Paradoxically, perhaps, the inevitable cuts in university and research funding may offer us a way out of this situation. If there are no research jobs, then the notion of a research career is less important and smart people don’t have to be so concerned about publications. Cuts in travel budgets mean that less time is spent travelling to conferences to present papers to people who are mostly reading their email anyway. The hateful research assessment may disappear and we can start thinking long term rather than writing about another incremental advance.  Maybe some of us oldsters will be kicked into early retirement before senility sets in and we will have time to think  differently.

But we must try and maintain support for our PhD students. PhD’s themselves are mostly incremental – students have to write and defend a thesis and innovation is inherently risky. But PhD students have time to think, to be innovative and to come up with new and exciting ideas for the future. With fewer research jobs, they may focus on startups who are, it seems to me, to be the true source of innovation nowadays.

Leave a comment

Filed under LSCITS

Designing for failure

Software is now so complex that, for sure, it’s sometimes going to go wrong. Most software systems don’t accept this and make it harder than it should be to detect and fix problems. Take my experience today with Mac Mail. This is an email client that picks up mail from one or more servers. Mine is configured to access 4 separate accounts. This morning, it connected to 3 of them with no problem but when trying to connect to the 4th (the main account) it simply hung and the connecting icon kept spinning.

This being an Exchange server I immediately dismissed it as a server problem and got on with something else. I tried again and was surprised it didn’t work because our sys admins are usually quite quick at rebooting the Exchange server (lots of practice!). So, I connected with another machine and that was absolutely fine. I asked for suggestions – none were forthcoming – so I tried periodically throughout the morning with no success. Then I noticed that my incremental backup was trying to back up 40GB – and I certainly hadn’t created anything like that since last night.

Then problem number 2. Time Machine on the Mac doesn’t tell you what it has backed up or is trying to back up. Into Google, found a utility that does this, installed it and discovered this was a file in the Mail library called Recovered Mail, which was huge. Google again and discovered my problem was known and the Recovered Mail had to be deleted along with the offline cache.  Into Terminal, deleted these and all was well.

What I found annoying was that it would have been so easy to design the systems differently so that the problems could have been diagnosed and fixed in 2 minutes instead of several hours. If Mail had actually kept a log that could be examined, then I could have easily found out what t was trying to do. And, if it published the files it used and where they were installed, that would also have been helpful. Even better, if Mail had tried to create a very large file then asked me if I really wanted to do this, I would have picked up the problem immediately. If Time Machine had an elementary interface that showed what it was backing up, that would also have helped.

So, if you are designing software, think about what happens when it goes wrong. Don’t assume your users are stupid and provide ways to make the state of the system visible. When you create files – ask the user if these are exceptionally large and make sure that users can delete them. Don’t use ‘invisible’ files. And use timeouts – if something takes 50 times longer than normal, it really isn’t right.

1 Comment

Filed under LSCITS, Uncategorized