Cerca nel blog

lunedì 28 luglio 2008

Peak hour Scenario

In my day by day experience as consultant I meet a lot of IT managers which thinks about performance tests like stressing the system till it does not respond anymore to the load.

This is an interesting and useful approach of course but it is not realistic and most of the time is meaningless. We have to face the reality of the IT projects: they are always late and PT is always planned in the very last stage, most of the times few days before going live. This implies that you have a "challenging" timeframe and you have to make the best of your time.

Actually I suggest to approach the Performance Test starting with the Peak Hour Test. This particular stress scenario is designed to emulate the worst load situation that a web-site/ application will face in its exercise-life.  To test in this condition means that you are not looking for the "failure" point but that you are verifying that your infrastructure can handle the "forecasted" load.
  
In order to do this the application's "owner" should be able to tell which is the worst, real, situation for the application. I say "should be able" because surprisingly very often nobody has a clear answer to this question.  It's like if an airport designer does not know which is the wind in the airport's area or a designer of a dam does not know the level of the water in the reservoir.

Assuming that the peak hour load can be described in terms of concurrent users and frequency of operations my suggestion is to add a little bit more (10%) of the load and use this as load scenario!! 

This test is called the peak hour test and is in my opinion the best test to be done if you have limited budget and time constraints. It tells you if your application can "work" and how it responds in the worst working condition. By reaching the failure point you only know which is the total load. You do not have a clear picture of what would be the user experience. Furthermore the application's tuning has to be done in the "operation" range and not in the failure zone.
 


 

mercoledì 23 luglio 2008

Quality Disasters



In 1940 Tacoma Narrow Bridge felt down because of poor design. A that time bridge's design did not include aerodynamics studies. After this problem more accurated mathematical verifications have been included designig bridges.




In 1997 Mercedes Class A became world wide famous because it failed the moose test. Mercedes had to fix the "bug" by adding ESP. This made Class A one of the safest cars in its category and forced Mercedes to recall all the units sold.

These two are very famous examples of poor design. Both of them turned into money lost. It is easy to understand why.

Few years before the Mercedes Class A problem, in 1993, started one of the worst IT disaster ever: the Foxmeyer Delta III project. I am not analyzing here the project nor the causes of such a disaster. What I want to underline here is that a failure in IT, even in a manufacturing company like Foxmeyer, can lead to serious consequences: FoxMeyer went bankrupt. Today if you google for foxmeyer you find only "lawsuite" related articles and the "tale" of the disaster.

A failure in an IT project can turn out in a catastropher for the entire company. IT applications today are the business. If you want to put it other way round: the business today is strongly "supported" by IT Applications. This means that extra quality effort is justified even from the business point of view.

It is generally agreed that Analysis should account for 30% of the budget, Coding and Development for 40 and Quality and Deployment for the remaining 30%. I always challenge IT managers to split the budget: 70% to a System Integrator in charge of analyze, design and develop the application and 30% to a System Integrator in charge of doing quality, and by quality I mean all those activities with the aim to assure that the application complies with the business's requirements.

The problem here is that most of IT managers prefer to choose a single system integrator to build their application. This probably is a cost saving choice but it could seriously harm quality.

lunedì 21 luglio 2008

What Is Total Quality Control?

This weekend I was reading a nice book: "What Is Total Quality Control?" writen by Kaoru Ishikawa. What captured my attention is the idea that a Quality Control based on Inspection is not effective. He states clearly that "the fundamental concept of Quality Control is error's prevenction". As simple as that! Ishikawa wrote his book in 1992 and was focused mainly in manufacturing industry, nevertheless I have two things in mind:
  1. It applies to the IT industry as well
  2. It has not been applied to IT industry yet.
It is important to underline that the nature of IT's application lifecycle is quite short compared to normal manufactured goods but it is also important to understand that testing is not the most effective way to improve application's quality.
Functional Testing, Unit Testing, Performance Testing, Automated Testing etc are inspective controls that can be improved but they represents a reactive step. When you start testing you have already a product.

Quality Lifecycle has to start in the design phase with a strong analysis of requirements and coding methods.

venerdì 18 luglio 2008

Capacity Planning

Victorious warriors win first and then go to war, while defeated warriors go to war first and then seek to win.
Sun Tzu
Sun Tzu was not thinking about IT systems, nevertheless the ability to forecast behaviors is quite useful in the modern IT. One discipline very important where the "war" can be won before fighting is the Capacity Planning. Actually Capacity Planning is one of the most useful proactive phases of the Application's Quality Management. At the end of the day it answers to the question: when do I have to buy extra hardware to support my online business? The question itself is not stupid or pointless. It involves spending and rational costs management. If I buy today because I need next year I am probably wasting money because within six months the same hardware will be cheaper. The answer to this question is not trivial either, although I have seen CIOs taking the decision of deploy huge CRM systems with poor or no performance testing and without the clue of the system behaviour under load. In these cases the answer to the question has beed easy: get me extra RAM today, buy extra hardware, work 24h to fix the damned app server. Usually CIOs are more concerned about performance and availability of their applications. Let's find out how a good capacity planning can be carried out.

Three main tasks can be identified:
  • Analyze the Actual system in order to find out which is the current peak load and the current total capacity of the system.
  • Predict the growth of the load in the system. This task is always a prediction, therefore it has the precision of a forecast. The more data you have to support the prediction the more accurate it is, nevertheless you still have several degrees of uncertainty.
  • Calculate the time before failure.
The first task can be done conducting a performance test stressing the application to the failure point or looking at the system's stats if available.

The second task has to be worked out producing a math model. This task it's obviously very important and very complicated. You can assume that the growth of the load will be linear or parabolic or whatever you think it fits your business and you BI data. When I have to deal with this kind of issues I start with some hypothesis that I state very clearly at the beginning of the process then I work out a model, then I review the results: if they are not reasonable I go back and review my hypothesis. This is trivial but it is better to state clearly that this is the only valid method. The core of the Capacity Planning is hidden behind this task, it has to be done carefully. Cannot be a 5 minutes job.

Then working out the time left before performance decreases is easy.


In the graph above I have summarized the overall idea of the capacity planning. This approach is heuristic and has its strength in the starting point: you have the real situation of the application! A performance test discovered the limit of the actual architecture therefore you know point A and C in the graph. You do not know how quickly the applicatiion will get to C. This is really unknown and depends on several factors. To work out a good model is the trick and the most important task. To assume a linear behaviour is the easiest way and can be very accurated in the short term.

giovedì 17 luglio 2008

The Cost of Quality



Talking about Quality we have to consider costs. When it comes to investing in quality I always hear people talking about ROI. It is hard to calculate ROI on quality practices, it is harder to analyze the ROI when it comes to Software Quality. First off all it is hard to define what the Quality Costs are.

The term “quality costs” has different meanings to different people. Some equate “quality costs” with the costs of poor quality (mainly the costs of finding and correcting defective work); others equate the term with the costs to attain quality; still others use the term to mean the costs of running the Quality department.
Juran's Quality Handbook
Which is the total cost of a defect? The true is that nobody can tell exactly which is the total amount of money spent by the IT department in fixing a defect or if spending in testing and quality tools has been worth.
Costs of poor quality do not exist as a homogeneous mass. Instead, they occur in specific segments, each traceable to some specific cause. These segments are unequal in size, and a relative few of the segments account for the bulk of the costs.

How can we prove that it is really worth to spend money in quality? We have to go back to fundamentals and try to define a quality/cost model. A solution has been pointed out by Joseph Moses Juran in his quality handbook.

First of all we need to define the macro areas that concur to form he quality costs:
  1. Internal Failure Costs (defects discovered before delivery)
  2. External Failure Costs (defects in production)
  3. Appraisal Costs (costs of testing)
  4. Prevention Costs (costs of the quality organization)
We can group together the first the third and the fourth and call them Direct Costs of Quality whilst the second can be called indirect cost since it is not a proactive spending for quality but is a reactive expense.


If we plot Quality as % of conformance (to the requirements) in the x axis and costs in the y axis we have the cost of quality that raises from zero at 0% of quality to a certain amount that is the cost of 100% of quality. The Failure Costs are zero when there is a 100% of quality and raise when the quality decreases. It is trivial to work out the total cost of quality: the sum of the two curves is the total cost of quality. This graph is somehow misleading: actually the cost of 100% of quality is much higher and usually looks like the graph below:



In this graph you can see that the cost to reach the 100% of quality grows unlimited. Therefore the right cost of quality, at the minimun of the total costs curve, does not correspond to the 100% of quality. This is the right spending for quality if the only parameter to be considered is the cost.

With this in mind now we can move to analyze the other big question: how quality costs can be optimized? 

Indirect costs can be reduced, but they are unpredictable and it is always a reactive way to manage your quality spending. On the other hand direct costs of quality can be reduced in order to reduce the total cost of quality. This solution leads to a poorer quality into the organization. 

The real challenge is to spend the same amount of money improving the overall quality. How can you achieve this? The answer is easy: you can do it by spending better the same budget introducing quality's best practices in your IT division. 

Summarizing: it is hard to calculate the ROI of your quality spending. This does not means that you cannot prove the benefits of your spending in quality. By approaching the problem with a “quality-costs” model the optimum spending for quality can be identified and it does not correspond to the spending for 0% of defect. Therefore to improve the quality in the organization most of the times is not required an extra investment but a better use of the budget.

mercoledì 16 luglio 2008

CHAOS Report

The Economist in March reported that the Standish Group has recently released its CHAOS Report on the state of software development. This report basically analyses the outcome of the IT Projects. It is quite interesting that "35% of software projects started in 2006 were completed on time, on budget and did what they were supposed to, up from 16% in 1994; the proportion that failed outright fell from 31% to 19%". The Chaos report has a third category: those projects which were a challenge or, to use the report's words: "The project is completed and operational but over-budget, over the time estimate, and offers fewer features and functions than originally specified". Doing the math you can find something very interesting: 65% of the IT projects were either late, over budget, offered less features than expected or have been cancelled at some point. This percentage remained more or less the same over the past 12 years. What does it means?

Which are the causes of such gloomy scenario? The main reason can easily be found in the early stage of the lifecycle when the final user (the customer indeed) requires an application to the IT department. The requirements at this stage are fuzzy! IT's managers try to be vague taking in the requirements, business does not explain very well it's need (sometimes at this stage does not really understand the need). Afterwards the IT department starts its joba while business waits for the outcome.
We know it. It is an old story. The task is: how to stop this? There are several ways to stop this: use cases, behavioural models, requirements analysis and so on.

The idea overall is to have a constant review of what has to be done! The business has to exercise constant supervision on the development or, from the other side, the developers have to interact constantly with the business.

It is not uncommon that business decisions at the application level are taken by the team leader.
This is done to save money! The CHAOS Report shows that you are wasting money.

Big programs always have a Steering Commitee and Project Boards. These deal with high level issues. Under the board level there are several perfect organizations that just do not talk to each other. Again it is a well known problem: the communication and the exchange of ideas in the company is a success key.

It appears to be obvious but often the project plan comes first. Unfortunately there is no way out: you ought to understand that an IT project has two legs: Biz and Techs and during the IT initiative they have to collaborate.
Powered By Blogger