703-548-4285

You might be thinking, “No! Not a new acronym to learn!” You would be right… FRACAS is not a new idea by any stretch. It is a proven approach with a long history of assisting organizations in managing equipment failures and the actions to take as a result.

What is a Failure Reporting, Analysis, and Corrective Action System? Well, it can take many forms, all of which need to suit the needs of your organization. Is it software? Sure. Could it be based upon paper? Sure. It really does not matter what the tools are, rather it is the process you develop to report failures and then do something about them.

You might be surprised that most likely you are already “doing,” this process for your equipment, just in a ‘not-so formal’ way. Equipment fails, you figure out why, and then you do something about preventing the failure in the future. See? You are an expert already!

 

Failure Reporting

The first step is to define what is a failure. Not every “negative” event is a failure… but it could be. A good example is a pump that has degraded flow. Has the pump failed? No, unless the degraded flow goes below some threshold where the receiving process cannot continue. Are spikes in performance failures? Maybe, maybe not.

The point is to define what thresholds must be exceeded for a failure to be declared. When those situations exist, it will be reported by the field crews as a failure. You might consider starting by reporting everything that looks like a defined failure, so you have some real data to work with. In a very short period, you will find the sweet spot of failure situations of value to you. Adjust your data capture methods from there.

You don’t know what you don’t know… of course. You might be surprised at what constitutes a failure to your field crews. The simple act of collecting failure instances may open a completely different view of your equipment health. Consider the situation where your crews are repeatedly restarting a piece of equipment. While the downtime per instance is very short and perhaps unnoticed, the overall impact could be greater than you imagine. What if Fred is on vacation and no one knows that the device needs to be rebooted to solve whatever problem Production is reporting?

As you know, Maximo has a built-in Failure Reporting functionality from within Work Orders. Are you using it? If not, why not?

 

Analysis

If visions of equations and stacks of data come to mind, then I have not lost you as a reader yet! You bet, all this failure data can land on your desk in a very raw form that may take sorting and stacking to gain even a small degree of usefulness. Many equations and several data science approaches may need to be taken to filter the data down into a form that can expose what is happening.

I once had a statistics professor tell the class that the first thing you do with data is to just take a high-level look at it and see if there are any patterns. No formulas, no models… just lean your chair back and ponder what you see. It changed my perspective on the reams of operational data that I was expected to make sense of.

Are certain people on the crew more apt to report a failure than others? Do certain types of failures happen more than others? Are there groupings or collections of failures during certain times? Do certain kinds of equipment have reported failures, and yet you’d expect some equipment to have failures and yet there are none?

Once you have a qualitative sense of what the failure data is telling you, you can proceed to perform more detailed quantitative analysis if you have enough data to do so. One interesting result is the good old Mean Time Between Failure (MTBF). Given enough consistent data, MTBF gives you a sense at what frequency your repair teams must touch the equipment. Maybe its ok that each week there is a failure to address. Maybe that’s horrible! Keep in mind that a mean (average) is not a real number in your data set… it is a calculated value. That said, it is important to have quite a bit of data to base the calculation upon before it can be truly useful.

Another aspect of this analysis step is determining the root cause of the failure(s). Remember those people who are more apt to report a failure? Go talk to them and find out what is really going on with the equipment. Oh, the horror when you find out that the crews are always fixing something because the initial installation was crooked and causes the bearings to fail prematurely. Why did we not know this?

 

Corrective Action

Great, you have gotten to the root cause of the failures. Now you must do something about it. Or do you? Some corrective and preventive measures are so costly or invasive that they don’t make sense. Run to failure, or quick and repeated correctives may be valid actions (or inactions) if the context is right. Much care should be taken if this is the considered approach. You do not want to allow failure to continue that may have true consequences or safety exposure.

But let’s say you do have a reasonable corrective action to perform. You shouldn’t do it unless you have high confidence that it will solve the problem. There is nothing wrong with having to try a few things before settling on a solution. It may take a few turns around the track to get to the bottom of an issue with a permanent solution.

Either way, documenting what you plan on doing, what its impact is expected to be, and the end results are very important. It may take months or years for your corrective action to bear fruit and provide enough evidence (back to data collection) to say that it is the winner. You may find your efforts do nothing to improve the situation.

 

Finally…

To operate your FRACAS process does take some discipline. You must collect data and understand what it is telling you. Then, you must do something with that data… make decisions… take actions. Tools such as Maximo already have the foundations for FRACAS built-in, making it easy to get started. Performing root cause analysis with your field teams can be kind of fun! You will be amazed at their perspective and the value it has. Of course, acting does cost money and effort. In the end you have a much clearer picture of what is happening with your equipment. You also will have confidence in the actions you are taking are reducing or eliminating failures.

 

Article by John Q. Todd, Sr. Business Consultant at TRM. Reach out to us at AskTRM@trmnet.com if you have any questions or would like to discuss deploying MAS 8 or Maximo AAM for condition based maintenance / monitoring.

 

Share
X