Preventive maintenance intervals are usually set one of three ways: the OEM says so, somebody who retired 15 years ago decided, or somebody had a bad day and overcorrected. None of these approaches is optimal. The optimal PM interval is the one that minimizes total maintenance cost per unit of operating time, balancing the cost of planned replacements against the expected cost of unplanned failures.
Finding this optimal interval requires two things: a statistical model of your equipment's failure behavior (most commonly the Weibull distribution), and the cost of a planned PM versus the cost of an unplanned corrective repair. With these inputs, the age replacement model calculates the total expected cost per operating hour at every possible PM interval and identifies the minimum. This guide walks through the data requirements, the Weibull interpretation, the cost model, and the practical decisions that follow from the analysis.
Why PM Intervals Matter Economically
If you do PM too frequently, you spend money replacing components that still had useful life remaining. The parts cost is wasted, the labor is wasted, and you might introduce installation errors. If you do PM too infrequently, more components fail in service, causing unplanned downtime, collateral damage, safety incidents, and emergency repair costs that are 3-10 times higher than planned replacements.
The sweet spot is the interval where the sum of PM costs and expected failure costs is minimized. This is not a guess; it is a calculation. The age replacement model provides the answer, and it depends on exactly two things: the failure distribution (how quickly do components wear out?) and the cost ratio (how much more expensive is a failure than a planned replacement?).
In practice, most plants find that 30-50% of their time-based PMs are either too frequent (wasting money on unnecessary replacements) or too infrequent (allowing preventable failures). A systematic review using Weibull analysis typically reduces total maintenance cost by 10-25% without increasing failure rates. That is real money at scale: a plant spending $5 million per year on maintenance can save $500,000-1,250,000 by optimizing PM intervals alone.
PM Interval Optimizer
Optimize preventive maintenance frequency using Weibull reliability analysis and cost-risk modeling. Find the interval that minimizes total maintenance cost.
Weibull Distribution: What the Parameters Mean
The Weibull distribution is the standard reliability model for mechanical components. It has two parameters: shape (beta) and scale (eta). These two numbers completely describe the failure behavior of a component population.
Beta, the shape parameter, tells you the failure pattern. Beta less than 1 means the failure rate is decreasing over time. This is the infant mortality pattern, where new components fail early and survivors become more reliable with age. Beta equal to 1 means the failure rate is constant, which is the random failure pattern seen in electronics. Beta greater than 1 means the failure rate is increasing, which is the wear-out pattern seen in bearings, seals, belts, and most mechanical components.
Eta, the scale parameter, is the characteristic life. It is the age at which 63.2% of the population has failed. For a bearing population with eta = 20,000 hours, about 63% of bearings will fail before 20,000 operating hours and 37% will survive past it. Eta is not the same as MTBF (mean time between failures), though they are related. MTBF equals eta times the gamma function of (1 + 1/beta), which for beta around 2.0 gives MTBF approximately equal to 0.89 times eta.
The practical takeaway: if beta is less than 1, time-based PM is counterproductive because replacing the component resets the failure rate to its highest point. If beta equals 1, time-based PM has no effect on the failure rate, so you should run to failure or use condition monitoring. Only when beta is greater than 1 does time-based PM reduce the failure rate, making it a candidate for interval optimization.
Beta < 1.0: Infant mortality (decreasing failure rate) → PM is counterproductive
Beta = 1.0: Random failures (constant failure rate) → PM has no effect
Beta > 1.0: Wear-out (increasing failure rate) → PM is beneficial
Most mechanical components: beta = 1.5 to 4.0
Collecting the Right Failure Data
The quality of your Weibull analysis depends entirely on the quality of your failure data. You need time-to-failure records: the number of operating hours, calendar days, or cycles between installation and failure for each component that has failed. You also need suspension records: the operating time of components that were replaced preventively (before failure) or are still running. Ignoring suspensions biases your analysis toward shorter life estimates.
The data must be for a single failure mode. Mixing failure modes (e.g., bearing fatigue failures and seal leakage failures on the same pump) produces a meaningless Weibull fit. If your CMMS work orders describe failures generically ("pump failed," "replaced bearing"), you need to go back and classify them by root cause before running the analysis.
You need a minimum of 5-7 failure records for a usable fit. With 10-15 records, the confidence interval tightens considerably. Above 20 records, additional data provides diminishing returns. If you have fewer than 5 failures of the same mode, you do not have enough data for statistical analysis. Use engineering judgment, OEM data, or industry benchmarks as a starting point and refine as more data accumulates.
Data sources: CMMS work order history (primary), maintenance logs, equipment history files, operator logbooks, and vibration analysis trend data that shows the point of functional failure. The best plants have failure data organized by equipment, component, failure mode, and operating time. Most plants have to reconstruct this data from narrative work order descriptions.
1. Separate failure modes (do not mix bearing and seal failures).
2. Record operating time, not calendar time, if utilization varies.
3. Include suspensions (PM replacements and survivors).
4. Minimum 5-7 failure records for a usable Weibull fit.
5. Use consistent units (hours, days, or cycles) across all records.
The Age Replacement Model
The age replacement model is the mathematical framework that connects Weibull reliability to cost optimization. It calculates the total expected cost per unit of operating time as a function of the PM interval. The formula is:
C(t) = [Cp * R(t) + Cf * (1 - R(t))] / integral of R(x) from 0 to t
Where Cp is the cost of a planned PM replacement, Cf is the cost of an unplanned corrective repair, R(t) is the reliability function (probability of surviving to time t), and the integral in the denominator is the expected cycle length. The interval t that minimizes C(t) is the cost-optimal PM interval.
Intuitively: at short intervals, you spend a lot on PM but almost never have failures (R(t) is high). At long intervals, PM cost per cycle is low but failures become frequent (R(t) drops). The optimal interval is where the total rate of spending is minimized.
The cost ratio Cf/Cp drives the result. If a corrective repair costs 10 times more than a PM, the optimal interval is relatively short because avoiding failures is very valuable. If the cost ratio is close to 1, the optimal interval extends toward infinity (run to failure), because there is little economic penalty for failing.
A useful sanity check: the optimal interval typically falls near the B20-B40 life (the age at which 20-40% of components have failed). If your analysis suggests a PM interval where reliability is 95% or above, you are probably over-maintaining. If it suggests an interval where reliability is below 50%, you may have data quality issues.
C(t) = [C_pm × R(t) + C_fail × F(t)] / E[cycle length]
Where R(t) = survival probability at time t
F(t) = 1 - R(t) = failure probability
Optimal interval = t that minimizes C(t)
Practical Decisions After the Analysis
The calculator gives you a number. Now you have to make practical decisions. The optimal interval might be 8,247 hours, but nobody schedules PM at 8,247 hours. You round to 8,000 or 9,000, align with a planned outage window, and consider seasonal factors. The cost curve is usually fairly flat near the minimum, so plus or minus 10-15% does not significantly change total cost.
If the optimal interval is longer than your current PM, you can extend the interval and save money. Start by extending 20-30% and monitoring failure rates. If failures do not increase, extend further toward the calculated optimum. Gradual extension is safer than jumping to the calculated value because data quality issues can bias the calculation.
If the optimal interval is shorter than your current PM, you need to increase PM frequency. This is harder to sell to management because it increases PM cost immediately while reducing failure cost over time. Present it as a risk reduction investment with a calculated ROI.
If beta is less than 1.2 or the cost ratio is below 2:1, the analysis may recommend run-to-failure. This is uncomfortable for many maintenance managers but is sometimes the correct economic decision for non-critical components. Run-to-failure should only be applied where the failure mode has no safety, environmental, or secondary-damage consequences.