Strict constraints in continuous optimization NEW
(generated by ChatGPT and edited by @alphataubio)
Real-world optimization problems often involve constraints that must be satisfied strictly at all times. In multivariate continuous settings with nonlinear coupled constraints (eg. ReaxFF parameter fitting for reactive force fields), handling constraints is challenging because any violation can make a solution invalid or even cause a simulation to fail. Unlike penalty methods that allow temporary violations (penalizing them in the objective), strict constraint approaches ensure solutions remain feasible (within the constraint set) throughout the search. This review excludes approaches based on training neural networks to learn the feasible region or repair rules, instead concentrating on classical algorithmic strategies. Practical effectiveness of these techniques are compared, especially on high-dimensional or expensive simulation-based problems. Recent literature and benchmarks are cited to highlight empirical findings and trade-offs.
Strict feasibility-preserving methods have long been studied in evolutionary and heuristic optimization. A classical survey noted that most constrained continuous optimization methods historically used penalty functions, but feasibility-preserving techniques (repairs, decoders, …) are an important alternative [Rahimi, I., Gandomi, A.H., Chen, F. et al. A Review on Constraint Handling Techniques for Population-based Algorithms: from single-objective to multi-objective optimization. Arch Computat Methods Eng 30, 2181–2209 (2023)]. In safety-critical or expensive simulations, constraint violations are unacceptable so these methods are indispensable. However, maintaining feasibility can reduce exploration ability. If feasible regions are disjoint, an algorithm that never generates infeasible points may struggle to jump between regions. No single strategy works best for all problems, methods may need to be problem-specific or combined [Rahimi, 2023]. Indeed, while sophisticated constraint handling techniques exist, one study concluded that the choice of a powerful optimization algorithm can matter more than the constraint-handling mechanism – even a simple feasible-handling strategy can succeed when paired with a strong search engine [oai_citation_attribution:4‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=In%20this%20paper%20we%20explore,can%20be%20very%20simple%20indeed). On the other hand, many challenging problems have optima lying on constraint boundaries [oai_citation_attribution:5‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=Numerical%20optimization%20problems%20enjoy%20a,to%20limit%20the%20search%20to), which means an optimizer must effectively navigate those boundary regions.
Rejection and Resampling
Rejection sampling is the simplest strict approach: any candidate that violates constraints is discarded (not evaluated), and a new candidate is drawn in its place. This approach ensures that only feasible solutions are ever evaluated or kept – infeasible ones are immediately rejected. It requires a mechanism to generate replacement points until one falls inside the feasible region. Many evolutionary algorithms and optimizers use this by default if no other constraint handling is specified. For example, register_cheap_constraint()
feature works by quickly checking a constraint condition, and if violated simply asks the optimizer to sample a different point [GitHub facebookresearch/nevergrad Issue #834]. This keeps the population or sample set strictly within the allowed region.
Pros: The rejection method is extremely easy to implement and general – it does not require any problem-specific information beyond a boolean feasibility check. It does not alter or bias feasible points (no projection or repair mutation is applied; feasible samples are left as-is). If the feasible region is relatively large or easy to hit by random sampling, rejection works very well with negligible overhead. Indeed, for simple bound constraints or a few mild nonlinear constraints, repeated random picking will frequently find valid solutions, and the algorithm proceeds almost as if unconstrained. As Rapin (Nevergrad’s author) noted, this is “OK when constraints are easy to satisfy by perturbations” [oai_citation_attribution:13‡github.com](https://github.com/facebookresearch/nevergrad/issues/495#:~:text=This%20is%20ok%20when%20constraints,6th%20deadline%20possibly) – in other words, if small random steps are likely to stay in bounds.
Cons: The major drawback is efficiency. When the feasible region is a small fraction of the search space (which is common in high-dimensional problems or with many constraints), rejection can waste enormous numbers of samples. If constraints are very restrictive (eg. an equality constraint defines a thin feasible slice of the space), naive resampling might virtually never hit a feasible point by chance. In fact, in one user’s experience with nevergrad, using an equality constraint caused the optimizer to endlessly sample without finding a satisfying point, essentially ignoring the constraint because random chance of exact satisfaction was near zero [GitHub Issue #810]. In such cases, the algorithm’s progress can stall. Another issue is that rejection provides no guidance how to get a feasible point – it relies purely on random trial and error. This can be problematic in simulation-based objectives where evaluating even an infeasible candidate is costly: if a simulation fails late in runtime due to a constraint, you’ve wasted time. Strict rejection would attempt another random input, potentially failing again. Thus, for expensive evaluations, pure resampling is often impractical unless one can estimate feasibility cheaply beforehand. Finally, rejection alone cannot handle disconnected feasible regions in a directed way – it will sample anywhere and just hope to land in any region; it doesn’t efficiently move from one feasible region to another except by blind random jumps.
Recent Usage and Benchmarks: Rejection sampling remains a baseline in many frameworks. For example, in evolutionary algorithm benchmarks (CEC competitions, etc.), a rejection sampling is often one of the simplest contestants. Typically, it performs poorly when constraints are tight, but it can perform reasonably when combined with a robust evolutionary search that increases the probability of hitting feasible space over generations. Some recent studies in the last decade have quantitatively shown that as problem dimension or number of constraints grows, the probability of a random solution being feasible plummets, confirming the curse of dimensionality for rejection methods [oai_citation_attribution:16‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=Pure%20EAs%20do%20not%20perform,hand%2C%20LS%20could%20be%20used). Hybrid schemes try to mitigate this (see below). In high-dimensional or heavily constrained simulation problems, practitioners have found pure resampling to be too slow or unpredictable. For instance, Nevergrad’s maintainers recommend using rejection only for “cheap” constraints (i.e. ones that are quick to evaluate and relatively likely to be met by random mutation) [oai_citation_attribution:17‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=With%20this%2C%20the%20optimizer%20will,did%20not%20satisfy%20the%20constraints). If the constraint check itself is expensive or rarely passes, they suggest other approaches.
In nevergrad
, default strict constraints are implemented using resampling. Under the hood, if you call param.register_cheap_constraint(func)
with a constraint predicate, the optimizer will not evaluate any candidate for which func(x)
returns False
– it will ask for another sample instead [oai_citation_attribution:18‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=With%20this%2C%20the%20optimizer%20will,did%20not%20satisfy%20the%20constraints). This keeps the optimization loop in the feasible domain at the cost of more function calls. The word “cheap” is key: the library assumes the constraint check is fast enough that calling it many times is acceptable. If that’s not true (eg. the constraint is a heavy simulation itself), then this approach double-evaluates the expensive computation (once to check, once to actually compute the objective) which doubles the cost. In one case study (a knapsack problem solved with Nevergrad), the user explicitly found that using register_cheap_constraint
was less efficient than simply merging the constraint into the objective with a big penalty [oai_citation_attribution:19‡ajnisbet.com](https://www.ajnisbet.com/blog/multiple-knapsack-packing-with-nevergrad#:~:text=Nevergrad%20does%20have%20a%20mechanism,a%20hefty%20penalty%20for%20violation). By folding the constraint logic into the objective, each evaluation did both at once, whereas the strict method was re-sampling and effectively doing redundant work [oai_citation_attribution:20‡ajnisbet.com](https://www.ajnisbet.com/blog/multiple-knapsack-packing-with-nevergrad#:~:text=Nevergrad%20does%20have%20a%20mechanism,a%20hefty%20penalty%20for%20violation). Of course, that introduces a penalty method (soft constraint) rather than strict feasibility, but it highlights a practical trade-off. Overall, rejection/resampling is guaranteed feasibility but can be extremely sample-inefficient in complex scenarios. It is best used when one is either able to sample directly from the feasible region (e.g. by some specialized generator) or when feasibility is easy enough that only a small fraction of samples are discarded.
Bisection and Projection Methods
Bisection projection is a straightforward repair technique: if a generated solution is infeasible, one can move back along the line connecting a known feasible point (often the previous solution) and the infeasible point until the boundary is found (using bisection or another line search) [oai_citation_attribution:6‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=Numerical%20optimization%20problems%20enjoy%20a,to%20limit%20the%20search%20to). The resulting point is on the constraint boundary, thus feasible. This method ensures minimal perturbation of the candidate – preserving the search direction as much as possible – while restoring feasibility. It is especially popular for boundary constraints: eg. in Covariance Matrix Adaptation (CMA-ES) and other evolution strategies, variants of “backtracking” or line search” projection are used to handle when a step overshoots variable bounds or other limits. The appeal of bisection-like projection is that it never fully discards an infeasible sample but salvages it by pushing it onto the feasible edge.
Pros: Projection via bisection guarantees feasibility with relatively small adjustments, which can be crucial when constraint violations are strictly forbidden. It tends to outperform pure random rejection when the feasible region is narrow, because it uses gradient-free local information (the direction of the step) to find a nearby feasible point instead of starting over. It also inherently targets boundary optima: because it lands points exactly on constraint boundaries, it can explore along those surfaces where the optimal solution often resides [oai_citation_attribution:7‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=Numerical%20optimization%20problems%20enjoy%20a,to%20limit%20the%20search%20to). This can be advantageous over penalty methods that might only asymptotically approach the boundary.
Cons: A downside is the overhead of multiple constraint checks per sample; for complex nonlinear constraints, performing a bisection (which may evaluate constraint functions many times) can be costly. Moreover, projection does not guarantee optimal use of search distributions – repeatedly projecting points can lead an algorithm to hug the constraints, potentially reducing diversity in the interior feasible space. If the feasible region is highly non-convex or disjoint, projection methods may trap the search in one region: since new points are always pulled back to the nearest feasible boundary, the algorithm might never explore a separate feasible region that is not contiguous with the current one. In such cases, occasionally allowing infeasible explorations (or using a multi-start approach) might be necessary, though that violates strict feasibility during the run. Another challenge is that general nonlinear constraints may not have a simple “line” intersection computation; sometimes a numerical root-finding must be done to project a point onto a complicated constraint surface.
Recent Insights: There is limited new literature focusing solely on bisection projection in the past few years, as it is a relatively established technique. However, it remains a baseline for strict handling. Some hybrid strategies incorporate projection: e.g. an algorithm might first use a penalty method to get near the feasible region, then project onto the feasible set for final refinement (ensuring the final solution is strictly feasible). No recent method has decisively proven superior in all aspects to simple projection in every scenario; instead, the consensus is that performance is problem-dependent [oai_citation_attribution:8‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,art%20CHTs). For instance, on some engineering design benchmarks, tailored repair or hybrid methods showed better success rates, but on others the straightforward boundary projection was just as effective [oai_citation_attribution:9‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,art%20CHTs). Thus, projection remains a robust choice, especially when one has a reliable procedure to compute the intersection with the constraint set.
In the context of nevergrad
, projection is not built-in as an automatic option (as of latest versions). Nevergrad’s default strategy for constraints is different (see below), and a developer noted that while the library currently “repeats ‘ask’ until it finds a feasible point” (pure resampling), they acknowledge the need for a proper projection operator for harder constraints [oai_citation_attribution:10‡github.com](https://github.com/facebookresearch/nevergrad/issues/495#:~:text=This%20is%20ok%20when%20constraints,6th%20deadline%20possibly). Users can implement projection manually by wrapping the candidate suggestion: e.g. after getting a new point, apply a custom bisection or domain-specific projection before evaluation. This is feasible if the user can write a function project(x)
that returns the nearest feasible x
. Some constraint solvers or linear programming tools can be integrated for projection if the constraints are amenable. In summary, bisection/backtracking projection is a reliable strict method, especially for bound constraints or convex constraints where the “closest feasible point” is well-defined, but it may need to be combined with other tactics for complex landscapes.
Clamping and Boundary Repair
Clamping refers to an immediate correction of any constraint-violating coordinate by bringing it to the nearest boundary value. The simplest case is for box constraints (variable bounds): if an optimizer suggests \(x_i\) below the lower bound, set \(x_i\) to the lower bound (similarly clamp above upper bound). This method projects points in a component-wise manner onto the feasible range, essentially “snapping” them to the edges when they go out of range. Clamping has been widely used in particle swarm optimization (PSO) and other algorithms. For example, the PSO literature often uses a fly-back mechanism in which a particle that flies out of bounds is brought back to the permissible region (sometimes by setting it at the boundary or randomly inside) [oai_citation_attribution:21‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=The%20authors%20of%C2%A0,driven%20EA). This ensures particles don’t wander off into infeasible space. Clamping can be seen as a simple projection for bound constraints (a special case of the more general projection methods above, but done per coordinate).
Pros: The main advantage is simplicity and computational cheapness. Clamping requires no iterative search or resampling – it’s a one-step fix per violated coordinate. This guarantees feasibility for any proposal with minimal overhead. In practice, clamping can stabilize an optimization by preventing extreme out-of-bound moves. Many comparative studies on bound-constrained problems have found that a modest clamping or reflection strategy helps maintain performance of algorithms like CMA-ES or PSO [oai_citation_attribution:22‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=The%20authors%20of%C2%A0,driven%20EA). Clamping also never discards a solution outright; it uses as much of the candidate as possible, only adjusting the offending parts. This is beneficial if most of the candidate vector is good and only a small component was out-of-range.
Cons: A big drawback is that clamping can distort the search distribution. For instance, if an algorithm’s mutation tends to push a coordinate slightly outside the range frequently, clamping will produce many points exactly on the boundary, effectively piling up probability mass there. This could bias the search if the true optimum is not actually at that boundary. It also can reduce diversity: clamping two different candidates that went out of bounds in different degrees might result in the same boundary point, causing distinct samples to collapse together. Additionally, clamping is not directly applicable to general nonlinear constraints beyond simple bounds – one can’t “clamp” a complex constraint without a more elaborate procedure. Another subtle issue is feasibility of coupled constraints: imagine two variables have a constraint \(g(x_1, x_2) \le 0\). If the pair \((x_1, x_2)\) is infeasible, adjusting each coordinate to some limit independently (as clamping would) might not actually yield a feasible pair. Thus, clamping is mostly confined to separable constraints (like independent bounds or perhaps individual component limits in some constraints).
Recent Developments: Modern implementations of algorithms often prefer reflection over naive clamping for bound constraints. Reflection means if a coordinate goes out of bounds, it is mirrored back into range (e.g. \(x_{\text{new}} = \text{lower} + (\text{lower}-x_{\text{out}})\) for an out-of-bounds below the lower limit). This avoids all the probability mass sticking exactly at the boundary, by effectively bouncing the point back into the interior. Empirical comparisons (e.g. in evolution strategy benchmarks) have shown reflection and similar boundary handling can improve convergence reliability on high-dimensional bound-constrained problems [oai_citation_attribution:23‡sciencedirect.com](https://www.sciencedirect.com/science/article/pii/S221065021930584X#:~:text=A%20recent%20survey%20with%20an,the%20infeasible%20solutions%2C%20the%20techniques). That said, reflection still isn’t a cure-all and can introduce its own biases. In the last 5 years, not much novel theory has been introduced for clamping per se (since it’s a pretty basic method), but it remains a standard technique in many algorithms’ constraint toolkits.
Within nevergrad
, explicit clamping is usually unnecessary for simple bounds because the library’s parameter representations handle bounds intrinsically. If you specify a parameter as ng.p.Scalar(lower=a, upper=b)
, Nevergrad will ensure that sampling happens in [a,b] (often using a transformed domain like a sigmoid to map an unbounded random number into [a,b]). Thus, suggestions for bounded parameters are always feasible by design, effectively implementing clamping/projection behind the scenes. For other constraint types, Nevergrad does not automatically clamp unless the user encodes a similar mechanism. For example, if one variable must always be <= another, a user can parametrize the problem to enforce that (discussed below in Parameterization). If they did not, the library would rely on rejection. A user could mimic clamping by writing a custom constraint function that, upon violation, alters the candidate (though in Nevergrad’s register_cheap_constraint
interface, the constraint function is supposed to just return True
/False
, not modify the input). Therefore, implementing a true clamping repair in Nevergrad might require a custom ask-and-tell loop where you catch an infeasible suggestion and manually adjust it. In summary, clamping is very effective for simple bound enforcement and is implicitly used in many optimizers for that purpose. Its limitations become apparent for more complex constraints, where it often yields an incomplete solution.
Local Optimization Repair
A more sophisticated class of strict handling involves repairing infeasible solutions via local optimization or heuristics. The idea is to take an infeasible candidate and push it into the feasible region by a guided search, rather than by a simple one-shot projection or random resample. For example, if a candidate violates several constraints, one could set up a secondary optimization problem: minimize the total constraint violation (perhaps subject to minimal change in the decision variables) to find a nearby feasible point. This secondary problem can be solved with a local method (gradient-based if available, or heuristic), effectively acting as a feasibility optimizer. Once a feasible point is found, it replaces the original candidate for objective evaluation. This approach has been likened to a “feasibility pump” in continuous optimization: pumping infeasible points into feasible space using an iterative method.
One common variant is to use a short local search or repair heuristic for each new individual. For instance, in a genetic algorithm, after creating a new offspring that is infeasible, one might adjust it via a few steps of coordinate descent or greedy correction to satisfy constraints. Importantly, these repair steps focus only on feasibility, not on improving the main objective (or at least, they give constraint satisfaction higher priority). The benefit is that the search space is effectively reduced to feasibles only, because every candidate is made feasible before evaluation [oai_citation_attribution:24‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=hybridization%20of%20an%20EA%20and,specific%20and). This was noted by researchers as a way to reduce the search space size and complexity – using local repair as a constraint-handling technique means the algorithm explores only the feasible subset [oai_citation_attribution:25‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=hybridization%20of%20an%20EA%20and,specific%20and).
Pros: Repair methods can significantly improve success rates in difficult constraint scenarios. Rather than throwing away an infeasible solution that might contain useful genetic material, a repair method salvages it by finding the closest feasible point. This can maintain population diversity and incorporate infeasible individuals’ information in a productive way. Moreover, a well-designed repair can utilize problem structure. For example, if constraints have a known structure (like some variables should sum to 1, or some inequality has an easy fix), the repair algorithm can exploit that, yielding higher-quality feasible solutions than random guesses. Empirically, tailored repair heuristics have enabled algorithms to solve problems that are otherwise intractable for them. In a 2020 study, Samanipour and Jelovica proposed an adaptive repair method that adjusts variables based on their contributions to constraint violations, and demonstrated improved performance on multi-objective engineering design problems compared to algorithms without such targeted repair [oai_citation_attribution:26‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=126,Appl%20Soft%20Comput%2090%3A106143). Similarly, classic memetic algorithms (hybrids of EAs with local search) often apply a local optimizer specifically to handle constraints [oai_citation_attribution:27‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=Pure%20EAs%20do%20not%20perform,hand%2C%20LS%20could%20be%20used) [oai_citation_attribution:28‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=hybridization%20of%20an%20EA%20and,specific%20and). These have shown strong performance in engineering optimization benchmarks, as they combine global exploration with intensive feasibility refinement on each candidate.
Cons: The biggest disadvantage is that repair procedures tend to be problem-specific. As noted in a survey, a repair algorithm “must be designed for a specific problem” in many cases [oai_citation_attribution:29‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,art%20CHTs). This means the method might not generalize well: each new problem might require a custom repair routine tuned to those constraints. Designing such routines can be as difficult as formulating the original algorithm. Another downside is computational overhead. Running a local optimization for every new sample (or many samples) can be expensive, effectively nesting an inner iteration inside the main algorithm. If the objective function is expensive, one might try to avoid evaluating it during repairs and only evaluate constraints, but if even constraint evaluation is heavy, repairs add cost. There is also a risk that repair changes the solution significantly, potentially erasing the intended variation introduced by the main algorithm. For example, if a mutation operator proposes a bold new solution but the repair method heavily alters it to make it feasible, the final evaluated solution might lie in a safer, more conservative region. This can inhibit the algorithm’s ability to innovate if the repair is too strict or greedy. Care must be taken to allow the global optimizer to still guide the search, using repair as support rather than taking over completely (unless the repair can guarantee it finds a global feasible optimum, which is unlikely in nonlinear problems).
Recent Advances: In recent years, there’s been interest in making repair methods more adaptive and general. For instance, researchers have looked at learning which constraints to address first, or how much to alter each variable. One 2017 approach called Pareto Descent Repair treated the trade-off between objective deterioration and constraint satisfaction as a multi-objective descent problem, trying to move toward feasibility without losing too much performance [oai_citation_attribution:30‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=Pareto%20Descent%20Repair%20,only%20infeasible%20solutions%2Cachieving%20a%20balance). Another trend is combining repairs with evolutionary operators: e.g. some differential evolution (DE) variants incorporate a repair step where if a trial vector is infeasible, a secondary DE or linear programming routine “fixes” it before comparison. Benchmarks on standard constraint test suites (like the CEC’17 constrained problems) often show that algorithms employing intelligent repair outperform those using pure penalty or rejection when constraints are hard [oai_citation_attribution:31‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=Pure%20EAs%20do%20not%20perform,hand%2C%20LS%20could%20be%20used). Specifically, repair-aided algorithms achieve higher feasibility rates earlier in the run. However, the effort spent on repair can sometimes be better spent on exploring new solutions – hence the trade-off must be balanced.
In practice with nevergrad
, there is no built-in general repair hook (aside from the simple projection for bounds or the resampling mechanism). To use a repair strategy, a user would have to incorporate it manually. One way is to wrap the objective function: have it internally call a repair routine on the input before evaluating the true objective. This ensures the objective always sees a feasible input. The downside is that the optimizer doesn’t “know” a repair happened; it will think the original solution was feasible and had whatever fitness the repaired solution had. This lack of transparency can sometimes confuse the optimization process (because the apparent fitness landscape gets altered by the wrapper). Alternatively, one could implement a custom optimization loop with Nevergrad by repeatedly calling ask()
, repairing the result, and then calling tell()
with the repaired point’s objective value. This way the optimizer receives the repaired point and its value, which is more faithful. Either approach requires custom coding. Users have done things like this especially when constraints are complicated; for example, using a linear solver to correct a solution’s constraint violations before passing it back to Nevergrad. The library doesn’t prevent this, but it doesn’t automate it either. Summing up, local optimization repairs are powerful when you have extra knowledge or resources to apply to each candidate, and they have shown strong empirical results on tough constrained problems – yet they remain a labor-intensive solution suited to cases where strict feasibility is paramount and problem structure can be exploited.
Parameterization (Transforming the Search Space)
Parameterization is a proactive way to enforce constraints: redefine the decision variables or search space so that any set of parameters corresponds to a feasible solution in the original space. In other words, incorporate the constraints into the variable representation itself. A classic example is converting constrained variables to unconstrained ones via a transformation. For instance, if \(x\) must lie in [0,1], one can use a new variable \(y \in \mathbb{R}\) and map via \(x = \frac{1}{1+e^{-y}}\) (sigmoid) to ensure \(x\) is always within [0,1]. For more complex constraints: if we require \(x_2 \ge x_1\), we can introduce variables \(a\) and \(b \ge 0\) such that \(x_1 = a\) and \(x_2 = a + b\) [GitHub facebookresearch/nevergrad Issue #834]. By optimizing over \((a,b)\) with \(b \ge 0\), any outcome guarantees \(x_2 \ge x_1\) when mapped back [oai_citation_attribution:33‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=Still%2C%20it%20may%20be%20more,worth%20it%20or%20not%20though). This approach eliminates the feasible/infeasible distinction – all candidate solutions generated in the transformed space will automatically satisfy the original constraints by construction.
Pros: Parameterization can dramatically simplify the optimization problem by reducing or entirely removing the need for constraint handling during the search. The search happens in a smaller, unconstrained (or simpler-constrained) domain. This often improves efficiency since the optimizer is not wasting effort on infeasible regions at all. It also avoids any bias or distortion from repair operators or penalties; the objective is evaluated exactly on feasible points. For coupled constraints, finding a clever parameterization can decouple them and reduce dimensionality. An early example is Michalewicz’s Genocop system (1994), which solved linear equality constraints by expressing some variables in terms of others, reducing degrees of freedom [oai_citation_attribution:34‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=Numerical%20optimization%20problems%20enjoy%20a,to%20limit%20the%20search%20to) [oai_citation_attribution:35‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=International%20Transactions%20in%20Operational%20%E2%80%A6%2C,1994). Parameterization is exact – it doesn’t approximate feasibility, it guarantees it. In sensitive applications like physics-based simulations (e.g. ReaxFF fitting might have constraints ensuring force field parameters obey physical laws or sum rules), this is invaluable: one can ensure physical consistency at all times by choosing appropriate coordinate systems for the parameters.
Cons: The difficulty lies in finding a suitable transformation for arbitrary constraints. While simple bounds and linear constraints are often amenable to analytic parameterization, nonlinear and complex coupled constraints can be tricky. Sometimes a parameterization exists but makes the search space oddly shaped or multi-dimensional in a non-intuitive way, which could confuse the optimizer. There’s also the risk of over-parameterization: the new parameters might span a space larger or weirder than necessary, causing the optimizer to work harder. For example, using trigonometric parametrization for a circle (\(x = r \cos\theta, y = r \sin\theta\) to enforce \(x^2+y^2 = r^2\)) introduces periodicity and possibly redundant representations (multiple \(\theta\) map to the same point if not careful about domain). Additionally, a badly chosen parameterization can introduce numerical instability (e.g. dividing by small angles or dealing with wrapping discontinuities). Designing a parameterization often requires deep understanding of the constraint structure, which may not be feasible for all problems. It’s essentially a manual effort of problem reformulation.
Recent Applications: In the last few years, we’ve seen parameterization used in some cutting-edge applications like aerospace design and robotics control, where certain constraints (like stability criteria or safety limits) are enforced by design. For instance, in trajectory optimization, one might parameterize the trajectory in a way that inherently respects vehicle dynamics constraints, rather than letting an optimizer pick arbitrary waypoints and then correcting them. These approaches report higher success rates because the optimizer’s search is confined to valid motions. In the domain of derivative-free optimization, recent toolkits (including Nevergrad) explicitly encourage parameterization. Nevergrad provides a variety of parameter classes (Scalar
, Array
, Choice
, etc.) and allows composing them, which can be used to encode constraints. The maintainers even suggest that parameterization can be “more efficient” than using the constraint-check mechanism in some cases [oai_citation_attribution:36‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=Still%2C%20it%20may%20be%20more,worth%20it%20or%20not%20though). The snippet from a Nevergrad issue shows how a constraint \(x_2 \ge x_1\) was addressed by using two variables (for \(x_1\) and the difference \(x_2-x_1\)) instead of imposing a check [oai_citation_attribution:37‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=Still%2C%20it%20may%20be%20more,worth%20it%20or%20not%20though). This eliminated the need to ever sample an invalid \((x_1,x_2)\) pair. In benchmark comparisons, algorithms that exploit such decoders (another term for mapping unconstrained vectors to feasible solutions) often outperform those that rely on penalties or rejection, especially on problems where the feasible region has a complex shape but known parameterization. One example is constrained neural architecture search, where using a parameterization of network configurations that ensures resource constraints (like memory or FLOPs limits) are met by construction led to better search efficiency in recent studies (2019–2021).
In nevergrad
, integrating parameterization is straightforward. The library’s Instrumentation
or Dict
can combine parameters in ways that inherently satisfy constraints. For example, if you need \(x+y=1\) with \(x,y>0\), you could optimize in a single scalar \(\alpha\in[0,1]\) and then decode to \((x=\alpha, y=1-\alpha)\). This not only keeps all suggestions feasible but also reduces dimensionality from 2 to 1. The library’s design encourages this kind of approach as a first resort. Only if constraints are too complicated to encode directly would one use register_cheap_constraint
as a fallback [oai_citation_attribution:38‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=With%20this%2C%20the%20optimizer%20will,did%20not%20satisfy%20the%20constraints). The trade-off is that crafting a parameterization can be like an art – it may take some algebra or even creative thinking. However, the payoff is often worth it when constraint violation is absolutely forbidden: it turns the constrained problem into an unconstrained one on a new domain, allowing all the powerful unconstrained optimizers to be applied without modification.
Hybrid and Incremental Enforcement Strategies
Hybrid strategies combine multiple techniques or enforce constraints progressively to balance feasibility and exploration. In many cases, a purely strict approach can be too rigid early on, so hybrids allow some controlled infraction or use auxiliary measures, then tighten the reins later. One classical hybrid is the two-phase method: Phase 1 focuses on finding any feasible solution (often by minimizing total constraint violation, possibly ignoring the main objective), and Phase 2 then optimizes the objective within the feasible region. Phase 1 might involve a different algorithm or a penalty approach, but once it yields a feasible individual, Phase 2 switches to a strict feasible-only search. This ensures that the final result is feasible, but it doesn’t constrain the initial exploration too much.
Another hybrid approach is infeasibility-driven selection. Deb’s feasibility rules are an example often used in evolutionary algorithms: they rank solutions by feasibility first and objective second [oai_citation_attribution:39‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,compare%20particles%20in%20the%20swarm). Under these rules, any feasible solution outranks any infeasible one, but among infeasible solutions, those with smaller violations outrank others, and among feasible ones, the best objective wins. This approach, used in some modern PSO-GA hybrids, effectively allows the population to contain infeasible solutions early on, but as soon as feasible solutions appear, they dominate the selection [oai_citation_attribution:40‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,compare%20particles%20in%20the%20swarm). The result is that the algorithm incrementally enforces feasibility – eventually it converges to entirely feasible populations (because infeasible individuals become uncompetitive once feasibility is achievable) [oai_citation_attribution:41‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,compare%20particles%20in%20the%20swarm). A 2016 hybrid PSO-GA by Garg applied this principle, leading to improved balance of exploration and exploitation on constrained problems [oai_citation_attribution:42‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=47,Appl%20Math%20Comput%20274%3A292%E2%80%93305).
Epsilon-tolerance methods also fall under incremental enforcement. Here, constraints are relaxed slightly at the beginning: e.g. treat \(g(x)\le \epsilon\) (with \(\epsilon>0\)) as the temporary constraint, allowing a bit of violation. Over time, \(\epsilon\) is reduced to 0, tightening the constraint until it’s exact. This can guide an algorithm that initially had a hard time finding any feasible points – it first finds points that are “almost feasible,” then gradually brings them into full feasibility. Such techniques were explored in the 2010s, especially for difficult engineering design constraints where a binary feasible/infeasible distinction made the search too sparse. By the end of the run, \(\epsilon=0\) ensures strict feasibility. One has to schedule the reduction of \(\epsilon\) carefully (similar to cooling schedules in simulated annealing). Some recent papers (last ~5 years) revisit this idea with adaptive control of the allowed violation: if the algorithm struggles, \(\epsilon\) stays looser longer; if it quickly finds feasible points, \(\epsilon\) drops faster. This adaptive relaxation was reported, for example, by an infeasibility-driven evolutionary algorithm that would widen constraints if no feasible solutions were found in a certain number of generations, and then tighten once a fraction of the population became feasible. Such hybrid approaches showed better success on highly constrained benchmark problems where pure rejection sampling or pure repair either got stuck or wasted time.
Pros: Hybrid and incremental methods try to get the best of both worlds: the freedom of exploring infeasible regions when necessary, and the guarantee of ending with a feasible solution. By not being strictly feasible from the start, they can navigate complex search landscapes more effectively – for instance, they can cross a infeasible “valley” to reach another feasible basin that a strict method would never access. Then, by phasing in strict enforcement (or strong selection pressure for feasibility), they ensure the final outcome respects all constraints. Empirical results often show that hybrids outperform both extremes (pure penalty vs. pure feasible-only) on difficult cases. For example, a 2019 multi-swarm PSO approach split particles into groups, some focusing on feasibility and others on objective, and periodically merged them [oai_citation_attribution:43‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,compare%20particles%20in%20the%20swarm). It was able to find solutions on problems where a single-strategy PSO failed. Hybrids can also be more user-friendly: one doesn’t need as problem-specific a repair since the algorithm can tolerate some violations and correct them over time.
Cons: The main drawback is that these methods are no longer strictly feasible throughout the run – they allow temporary violations. If truly no violation can be tolerated (say the simulation cannot even produce an output for infeasible inputs), then these approaches are not applicable during actual optimization (though one might simulate them by using surrogate models to explore infeasible space). Additionally, hybrid methods introduce more hyperparameters (e.g. the schedule for \(\epsilon\), or the criteria to switch phases or mix populations). Tuning these can be complex and problem-dependent. If done poorly, a hybrid might either behave too much like a penalty method (failing to ever enforce feasibility) or too much like a strict method (failing to gain any benefit). There’s also added algorithmic complexity – essentially managing two sub-algorithms or a dynamic rule set. This can be a burden in implementation and analysis.
Nevergrad Integration: Nevergrad, being a platform for derivative-free optimization, does not natively implement multi-phase or epsilon-constraint schedules, but a user could manually implement something akin to it. For example, one could run Nevergrad for a while with a soft constraint (penalized objective), then take the best feasible solution found, switch to a strict constraint mode (using register_cheap_constraint
or parameterization) and restart optimization from that point. Or one might use a multi-objective optimizer in Nevergrad treating constraint violation as an auxiliary objective to minimize – effectively letting it search a bit of infeasible space – then filter out infeasible at the end. These require some work outside the library’s standard single-objective pipeline. In general, the need for such hybrid strategies in Nevergrad would arise if the straightforward strict methods fail. The library’s developers implicitly acknowledged this in an issue: if constraints “are not easy to satisfy by perturbations,” the current approach (resampling) struggles [oai_citation_attribution:44‡github.com](https://github.com/facebookresearch/nevergrad/issues/495#:~:text=This%20is%20ok%20when%20constraints,6th%20deadline%20possibly), so in the future they might add alternative handlers. Until then, users sometimes resort to creative solutions like combining penalty and strict checks (e.g. using a mild penalty plus a cheap constraint as a double safeguard). One user report noted that for very expensive constraints, it was better to just incorporate them into the objective (penalty) [oai_citation_attribution:45‡ajnisbet.com](https://www.ajnisbet.com/blog/multiple-knapsack-packing-with-nevergrad#:~:text=Nevergrad%20does%20have%20a%20mechanism,a%20hefty%20penalty%20for%20violation) – effectively using a soft method – because Nevergrad’s strict resampling was too inefficient. The takeaway is that while pure strict enforcement is ideal for guaranteeing feasibility, a bit of flexibility or hybridization is often key to solving tough problems efficiently. Recent techniques that blend search in infeasible space with rigorous final enforcement have shown improved success rates on benchmark tests [oai_citation_attribution:46‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=Pure%20EAs%20do%20not%20perform,hand%2C%20LS%20could%20be%20used), though they must be used with caution in truly violation-intolerant scenarios.
Comparisons and Practical Insights
When considering all these methods, it’s clear there is a trade-off between exploration and safety. Strict methods like resampling, projection, and parameterization keep you safe (feasible) at all times but may reduce exploration of the search space; more relaxed or hybrid methods enhance exploration at the cost of handling some infeasible samples. The best choice often depends on the problem characteristics:
High-dimensional problems with narrow feasible regions: Here, pure resampling is usually untenable – the volume of feasible space is exponentially small. Projection or parameterization are favored. Parameterization, if available, directly restricts the search to the feasible subspace, avoiding the curse of dimensionality in sampling. Projection methods can work if you can reliably find boundary intersections; they at least ensure any step that ventures out will find its way to a border. A study noted that on problems with many constraints, algorithms needed augmentation (hybridization with local search) to perform well [oai_citation_attribution:47‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=Pure%20EAs%20do%20not%20perform,hand%2C%20LS%20could%20be%20used), implying that a combination of global search and local feasible navigation is beneficial. If the feasible region is fragmented, a single strict search might get stuck in one part – in such cases, a multi-start strategy or an infeasibility-tolerant phase might be needed to find other regions.
Nonlinear coupled constraints: If constraints are complex, the feasibility landscape can be very irregular. Rejection becomes inefficient and clamping might not apply. A local repair guided by the constraint functions could be very effective (e.g. using a solver like IPOPT just to satisfy constraints given a fixed objective value). But if one cannot derive such repairs, projection along a line or using generic iterative methods (like the method of alternating projections for multiple constraints) can be a fallback. Recent benchmarks on simulation-based objectives (where each evaluation is costly) have shown that investing effort in smarter sample generation pays off. For instance, one benchmark on aerodynamic shape optimization (many coupled constraints for physics) found that an evolutionary strategy with a custom repair for shape feasibility achieved feasible designs faster than one relying on penalty and waiting for natural selection to fix violations [oai_citation_attribution:48‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=hybridization%20of%20an%20EA%20and,specific%20and). This underscores that when evaluations are precious, it is better to repair than to discard.
Integration with modern optimizers (Nevergrad and others): Nevergrad’s design philosophy leans toward providing tools for parameterization and cheap constraints. It expects the user to encode what they can in the search space (e.g. use
ng.p.Dict
and dependent parameters for things like ordering or sum constraints) [oai_citation_attribution:49‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=Still%2C%20it%20may%20be%20more,worth%20it%20or%20not%20though). What cannot be encoded should be enforced viaregister_cheap_constraint
(which as we discussed, uses resampling) [oai_citation_attribution:50‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=With%20this%2C%20the%20optimizer%20will,did%20not%20satisfy%20the%20constraints). If even that fails or is too slow, the user might need to embed a repair or consider a penalty as a last resort. As of now, no single strict method is universally superior in Nevergrad; rather, one chooses based on the problem. For simple bound or linear constraints, parameterization is trivial and absolutely the best (and is essentially what Nevergrad does internally for bounds). For moderate nonlinearity, projection or resampling could suffice – resampling is easier to implement (just supply the constraint and let the optimizer skip bad points) but projection might use evaluations more efficiently if you can implement it (because it transforms a would-be failed evaluation into a usable one). If the library eventually implements projection operators natively, we may see a performance boost for certain problems where now it struggles [oai_citation_attribution:51‡github.com](https://github.com/facebookresearch/nevergrad/issues/495#:~:text=This%20is%20ok%20when%20constraints,6th%20deadline%20possibly).
Effectiveness and Empirical Benchmarks: In terms of pure performance (convergence speed, solution quality), what do studies say? A 2023 review of constraint-handling techniques notes that no approach dominates across all test problems; each has scenarios where it shines [oai_citation_attribution:52‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,art%20CHTs). Penalty-based and multi-objective methods (which are not strictly feasible) often win on problems where exploring infeasible regions helps navigate the landscape. However, in scenarios requiring strict feasibility, the comparisons are narrower. Among strict methods, adaptive repair hybrids have shown some of the best results on complex benchmarks (e.g. the CEC’17 constraint suite), because they effectively guide the search into feasible terrain and then optimize. Basic resampling tends to lag behind unless the feasible region is easy. Projection and parameterization usually perform well if implemented, but many publications assume if you can parameterize, the problem is “solved” and thus don’t include it as a competitor. One interesting finding by Mezura-Montes et al. (2008) was that the performance ranking of constraint methods can be heavily influenced by the underlying algorithm [oai_citation_attribution:53‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=In%20this%20paper%20we%20explore,can%20be%20very%20simple%20indeed). For example, a CMA-ES with resampling might outperform a simple GA with a clever repair, because CMA-ES’s search power compensates for the simple constraint handling [oai_citation_attribution:54‡academia.edu](https://www.academia.edu/703494/A_survey_of_constraint_handling_techniques_in_evolutionary_computation_methods#:~:text=In%20this%20paper%20we%20explore,can%20be%20very%20simple%20indeed). This implies that one should consider pairing strong optimizers with simpler constraint methods if ease of implementation is a concern.
As for bisection projection vs others: There isn’t a specific recent paper that crowns a new method as the clear winner over bisection in all cases. Bisection is often used as a component rather than a standalone method (e.g. an algorithm might project all new points via bisection if needed). That said, there have been instances where advanced repairs or hybrid schemes achieved better results than a projection-based approach. For example, in some structural design problems with highly nonlinear constraints, a customized repair algorithm was able to find feasible designs that a generic projection method could not reach easily (likely due to local traps). But those successes are typically problem-specific. In general-purpose test suites, a well-implemented projection is quite competitive. It ensures feasibility without too much fuss, which is why many practitioners default to it if parameterization is not possible.
Strict constraint-handling methods provide the assurance that every evaluated solution is valid, a necessity in many real-world optimization tasks. Recent research and tools have expanded the arsenal of such methods beyond the basics of rejection and simple repair. Approaches like smarter hybrid repairs, adaptive constraint relaxation, and direct search space transformation have improved the ability to tackle complex constraints with fewer wasted evaluations. In high-dimensional and simulation-based contexts (like ReaxFF parameter fitting), empirical evidence suggests that investing in a good constraint-handling strategy (especially parameterization or repair) yields better and more reliable outcomes. There is no absolute “winner” method – each strategy has trade-offs. Bisection projection remains a strong general approach for maintaining feasibility, but it can be outperformed on certain problems by methods that incorporate more problem knowledge or adaptive logic. Ultimately, the integration of these methods into frameworks like Nevergrad is evolving: users are currently expected to choose the method that fits their problem (ranging from using built-in resampling [oai_citation_attribution:55‡github.com](https://github.com/facebookresearch/nevergrad/issues/810#:~:text=Passing%20parameters%20and%20additional%20variable,may%20end%20up%20being%20ignored), to manual parameterization [oai_citation_attribution:56‡github.com](https://github.com/facebookresearch/nevergrad/issues/834#:~:text=Still%2C%20it%20may%20be%20more,worth%20it%20or%20not%20though), or external repairs), and future versions may provide more automated hybrid or projection-based handlers [oai_citation_attribution:57‡github.com](https://github.com/facebookresearch/nevergrad/issues/495#:~:text=This%20is%20ok%20when%20constraints,6th%20deadline%20possibly). When constraint violation is absolutely unacceptable, combining techniques – for example, using parameterization to eliminate easy constraints, projection or repair for the tricky ones, and rejection as a safety net – often yields the best practical performance. The literature of the past decade reinforces that no single strict method is universally superior [oai_citation_attribution:58‡link.springer.com](https://link.springer.com/article/10.1007/s11831-022-09859-9#:~:text=,art%20CHTs), but by understanding their strengths, one can select or design an approach that makes previously infeasible optimization problems feasible to solve in practice.