In the world of risk management, maintenance of mission-critical equipment drives priorities and budgets. It is the ultimate test of proactive maintenance and smart decision making. Managing assets that “cannot be allowed to fail” is more than an emotionally charged mandate that forces managers into a continual state of alert. It is the harsh reality for technicians tasked with ensuring continuous performance or service. The stakes are high. Fortunately, technology can help mitigate the risks.
ADVERTISEMENT |
The scope and scale of critical assets and equipment vary greatly, from electrical grids and security systems to back-up generators at hospitals, refrigeration in the food and beverage industry, and traffic control for the airline industry. National defense systems and communications systems in the public sector, such as 911 call centers or alerts for fire departments, are other examples of high-tech equipment that cannot be allowed to fail. Whether it involves protecting health of consumers, safety of the workforce, or national security, mission-critical assets require special attention to detail and proactive monitoring of status. Prevention is the goal. Early detection of warning signs makes intervention possible.
Upgrade to meet the challenge
Modern software technology is essential for keeping today’s highly complex mission-critical assets performing up to standards. Enterprise asset-management solutions play a vital role in protecting performance when it matters most. Advanced solutions, purposed-built for the intense demands of manufacturing, can help track and monitor operational performance of assets throughout the organization, prioritizing focus on the most critical. Fact-based assessment of risk and impact helps set priorities and ensure the most critical assets receive prescriptive and preventive maintenance on a timely basis.
Innovative software providers have brought many game-changing features and functions to enterprise asset-management solutions in recent years, enabling users to turn asset reliability into a strategic differentiator. Some manufacturers are exploiting this ability to the fullest, promising customers on-time deliveries. Those who continue with old-school solutions can seldom make such claims as unexpected delays plague schedules and hinder reliability.
Without today’s new technologies, like analytics driven by artificial intelligence (AI), maintenance teams often find themselves fighting fires, jumping from one emergency to the next. Getting ahead of this type of reactive model requires predictive insights, using internet of things (IoT) sensors to monitor for key symptoms, like excessive heat or vibration, and tracking performance down to the component and part level. A change in mind-set is often required, too. It’s time to get strategic, preventive, and data-centric.
Assess risk and potential consequences
Reliability-centered maintenance tools allow users to define a risk assessment or a risk analysis for a location, a system, a position, or an asset. During a risk assessment, users define functions, failures, and the consequences of these failures. For example, the team may assess the risk associated with the finishing machinery as very high, because all work orders would be affected if the equipment goes down; this custom-built machinery has no off-the-shelf replacements. So, machine failure here could cause a six-week delay in orders shipped, and impact cash flow by millions of dollars. A risk assessment is typically performed on equipment at broad, higher levels in the equipment hierarchy, such as a storage location or a production line, rather than specific assets, like packaging equipment.
A risk analysis is like risk assessment but also includes definitions of failure modes. Understanding the risk and potential consequences is an important step in planning a strategic approach to keeping critical assets running—no matter what.
Assess conditions to set priorities
Condition-based monitoring is a highly effective approach to tracking real-time asset conditions and understanding the relative influence of an asset that may be starting to degrade or show signs of diminishing productivity. Condition-based monitoring uses a scoring system that considers several criteria, from age of the asset to time and cost to replace the asset. This scoring method is based on carefully defined definitions and objective criteria, eliminating the guesswork that can sometimes slow the process or make it unreliable.
The numerical system also becomes a language that can be easily understood by other departments. Telling the plant manager that a critical valve is scoring a 2 (out of 5) for condition assessment provides a clearer view than saying “it’s old and failing,” which can be open to interpretation.
Putting more solutions to work
In addition to the foundational enterprise asset-management solution, other technologies will help professionals manage mission-critical assets with greater confidence including:
• Cloud deployment. Moving systems to the cloud means turning over tasks like back-ups, data recovery, security, and compliance to a cloud provider who specializes in these issues and devotes full attention to data security. Cloud solutions are also updated continuously by the provider, ensuring that the solution is always modern, taking advantage of new functionality as it is available. This helps ensure that the solution is resilient to new assault threats, viruses, and malware.
• Internet of things (IoT). Sensors installed on or in assets can measure, collect, and store data concerning a wide variety of physical conditions, such as temperature, vibration, moisture, and density. This data is sent to the cloud to be aggregated and analyzed, looking for anomalies which may be early warning signs of the asset’s potential failure. For example, a temperature spike on a conveyer may be a warning sign that the bearings need to be replaced or coolant needs to be replenished.
By catching red-flag issues in early stages, intervention can happen so the least amount of interruption to performance is experienced. In some cases, the anomalies, if serious, can trigger automatic response, such as switching over to auxiliary power, stopping a line, or even automatically checking inventory and ordering a replacement part. In other cases, a technician can be notified to make further investigations. Early intervention is the best way to curtail negative impact of failing parts or components.
• Business-intelligence tools. Modern analytics are vital to delving into the cause-and-effect relationships between influencing factors and possible performance issues, particularly looking for issues that can be influenced or remedied. Managers can dive into data around costs, vendor reliability, lifecycle longevity, and user engagement. Analytics may uncover that components from one vendor tend to fail more often, or machinery fails more often on the second shift—such details point to issues that can be addressed. Some factors, like weather, can’t be changed, but managers who understand the impact can plan accordingly, such as adjusting inventory levels of engine coolant and antifreeze during certain months.
• Predictive analytics. Not only will today’s advanced BI solutions help analyze the past, but they can also help professionals project the future. Using data science algorithms and AI applications will help companies forecast trends, anticipate peaks in demand, and prepare for shifts in consumer buying habits. Patterns, previously unrecognized, can be spotted and analyzed with today’s BI solutions. For plant maintenance teams, predictive analytics help foresee the lifespan of machinery, stock items that need to be replenished, like ink or oil, or parts that need replacement based on wear, like machine filters and belts.
• Artificial intelligence (AI). AI has many more applications in mission-critical maintenance, including the ability to create models and explore possible outcomes. Pertinent information from other databases can be layered in, such as weather, seasonal trends, or typical buying cycles. This can help when making complex decisions such as choosing between the most cost-effective repair option and the one that offers the least amount of risk.
Solutions with advanced machine learning can reveal insight and advise the user on best courses of action, such as scheduling the calibration of heat-sealing machinery during October when the ambient temperature of the plant is within tolerance levels, or determining best times for re-builds or off-line maintenance. Bringing new equipment on line, too, needs careful timing. AI solutions can help managers explore “what if” scenarios to project impact on customer deliveries or cash flow.
Closing thoughts
Mission-critical machinery in manufacturing requires special attention. These assets may be critical to the manufacture of products or the operational activities of the facility, such as back-up generators, overhead doors, or forklifts. Often mission-critical equipment and assets are highly complex and can include security protection, data encryption, fail-sale back-ups, IoT sensors, and remote monitoring of warning triggers. Maintenance of these high-tech assets is seldom as simple as installing a replacement part. Trouble shooting performance issues can take time. Modern software can help simplify the process, finding correlational insights and predicting outcomes.
Organizations often operate on lean budgets with skeleton maintenance staffs. That is why companies must rely on technology to make smart decisions about priorities, use of resources, and scheduling for planned maintenance. By mitigating risk, modern software solutions help protect their mission-critical assets, keeping their facilities performing as needed.
Comments
Another Approach to Risk management
Hello Kevin:
Great article. When I was at Hewlett-Packard, we designed risk management into our Non-Stop unix product line (formerly the Tandem products), that offered "five-nines" of uptime, including planned and unplanned outages. One of our products was used in 911 systems, where an outage could be beyond mission-critical, which were termed "Life Critical." In designing the risk management for that system, we turned our usual approach inside out and said, essentially, "If failure can always happen, even in a hardened and resilient system, how do we mitigate THAT risk?" and we came up with a "fail fast, recover fast" model. Our systems could detect a non-conforming subsystem, "kill" the subsystem and recover, even rebooting the entire unix kernel, in 60 seconds or less while a second identical parallel node continued handling all transactions and maintaining uptime. It was beautiful to watch in operation.
So, keep in mind a comprehensive view of what it means to have continuous uptime. There are often other approaches that can yield equivalent or better results than what you might have expected at first.
Howard Hudson
Add new comment