Global Optimization

Summary

Global optimization has become a buzzword in optical design since Adaptive Simulated Annealing (ASA) was introduced in OSLO SIX in 1992. Several new optimization schemes have been introduced in other software to compete against ASA, but none seem to be designed to find a global optimum, vendor claims notwithstanding.

It's important to consider a few basic points about global optimization. First, a global optimum pertains to a specified error function in a specified region of search. Second, there can be only one global optimum in the region. Third, for continuous variables, the assertion of success is probabilistic. Finally, the search time should be specified for comparative evaluation. A typical result of a global optimization task might be something like "after a one hour search, point P has the highest probability of being the global optimum of error function F in region R". ASA has a solid theoretical basis for making such a claim.

Discussion

[Ed. Note] This article was originally published in 1993 as a Sinclair Optics Design Note titled "Global Optimization in Lens Design", by G.W. Forbes and Andrew E. W. Jones. It has been slightly edited, primarily to accommodate HTML typography and to bring references up to date.

OSLO SIX includes a global optimization algorithm called Adaptive Simulated Annealing (ASA). Although global optimization schemes have attracted substantial interest among lens designers in the past, only recently has affordable computer hardware become available with power sufficient to meet the high demands posed by global optimization problems in lens design.

The rationale for global optimization

The goal in lens design is the determination of the best possible lens that satisfies the given constraints. The performance of a lens system is measured by a merit function and the goal in the optimization stage of design is to determine the configuration of the lens system for which the merit function is smallest over the region of interest defined by the constraints—that is, to locate the global minimum of the merit function.

One simple global optimization scheme is the grid search: the merit function M is evaluated at points on a regular grid aligned to the coordinate axes, and the sample with the lowest value of M is taken as an estimate of the global minimum. Such a simple scheme, however, rapidly becomes unworkably inefficient with increasing dimensionality. Consider a problem in which five samples are required in each coordinate direction to locate the global minimum to the desired accuracy and in which the merit function requires 1 microsecond to evaluate. Thus, the time needed to perform a grid search in N dimensions is 5N microseconds: for N=10, the search is completed in less than ten seconds, but for N=15 more than eight hours is needed, and in just 20 dimensions, over three years is required. Clearly, even with such sparse sampling and with such unrealistically favorable estimates of evaluation speed, a grid search is impractical; the sampling of the region of interest in the configuration space must be more selective.

The most widely employed optimization scheme consists of the selection of a sample point by the designer, followed by the application of a local optimizer (in OSLO, this is the "Iterate full" command) that selects subsequent samples with the aim of proceeding "downhill"—that is, in a direction of decreasing merit-function value. If the local optimum thus found is not of sufficient quality, a new initial sample is chosen and the process is repeated until a satisfactory local minimum is found. While computationally much less demanding than a grid search, the success of such a scheme is heavily dependent upon the choice of the starting point, and the resulting designs are typically similar in form to the starting point.

OSLO has an alternative optimization command, "Iterate standard", that does not impose the downhill requirement. This command has been used for many years to overcome the so-called "stagnation effect" exhibited by damped-least-squares optimizers. Recently, David Shafer [1] has documented a scheme for using this command to generate multiple solutions to optical design problems. Such solution generators are fundamentally different from the sort of global optimizer that is discussed here, however, in that the results depend heavily on the starting point.

By using a global optimization scheme, we can transcend some of the limitations of local optimizers. Although global optimization is more computationally demanding than local optimization, the rapid and continuing increases in the computer power that is available at a given monetary cost make global optimization ever more practical. Since the performance of global optimizers is essentially independent of the particular starting point, they are especially effective in situations in which suitable starting points are difficult to determine. One attractive global optimization scheme is the simulated annealing algorithm, first applied to lens design in 1984 [2]. Simulated annealing belongs to a class of global optimizers that are called controlled random search methods because the merit-function space is sampled randomly according to a scheme in which the values of several parameters determine the distribution of the random samples. Although these so-called Monte Carlo methods might at first seem as promising as throwing darts whilst blindfolded, there are sound reasons for their implementation (see Why go to Monte Carlo?).

Simulated annealing

The simulated annealing algorithm derives its name from the fact that its behavior is controlled principally by a parameter T, called "temperature", that is analogous to the temperature in the thermal annealing process. We call simulated annealing a "true" global optimization algorithm because each run attempts to search the entire region of interest for the global minimum rather than performing multiple downhill optimization runs in which the selection of the various starting points is automated. On the surface, simulated annealing is quite simple, as depicted by the following flowchart.

Figure 2. Simulated annealing flowchart

In simulated annealing, the optimization process is not required to proceed uniformly downhill, but is allowed to make occasional uphill moves. The typical increase in M that is acceptable in an uphill move is determined by the value of T. At the start of the annealing process, T has a relatively large value compared to the standard deviation of the merit function over the region of interest, and the random walk effectively traverses all of this region. As the random walk progresses, T is lowered, the allowed increases in M are then typically smaller, and the walk is effectively constrained to ever lower valleys. If T is reduced sufficiently slowly, the random walk can escape the higher valleys during its earlier stages, and it can be expected to terminate at the global minimum. If T is lowered more quickly, however, the random walk is more likely to become trapped in one of the higher valleys. This follows the analogy with the thermal annealing process: if a hot solid is cooled slowly, its final state is likely to be one of lower potential energy (usually, a more ordered arrangement of atoms); if it is cooled more quickly, the final state is likely to be one of higher potential energy.

There exist numerous variants of simulated annealing, each of which represents an attempt to address the following concerns:

How is the initial value of T chosen, and how is T lowered?

How are the steps generated?

Most variants of simulated annealing require the user to answer these questions by providing the values of a set of parameters that control the course of optimization; for example, for the temperature, the user may be required to specify an initial value, a factor between 0 and 1 by which T is multiplied periodically, and a parameter used to control the length of this period. The parameters for tuning the step generation process are even more important. Determining appropriate values for all of these parameters is often accomplished through trial and error, which is prohibitively expensive for all but the simplest problems. Furthermore, many variants of annealing do not possess the desirable properties of invariance under linear transformations of the coordinates or of the merit function.

Adaptive Simulated Annealing

For these reasons, we have developed adaptive controls for all of the parameters in simulated annealing [3]. The performance of the resulting algorithm, which we call Adaptive Simulated Annealing (or just ASA), is invariant under linear transformations of both the merit function and, more importantly, of the coordinates in the configuration space. ASA requires the specification of only a single parameter: the annealing rate written as , that determines the average rate at which T is lowered. Given a value for this parameter, all else is automatically tuned to the particular problem at hand. Reducing the value of slows the cooling (hence increases the run time) and makes the search for the global minimum more thorough.

It follows from the earlier discussion of Monte Carlo algorithms that the effectiveness of any adaptive mechanism here will be crucially dependent upon the automatic control of the statistics of the random step generator. First, a fundamental requirement of simulated annealing is that the step distribution must be symmetric. That is, a step s is just as likely as the step –s. We have chosen to use a Gaussian distribution for the steps. In two dimensions, a general Gaussian distribution resembles an elliptical cloud that becomes less dense away from its center. The proportions, orientation, and size of the ellipse are the key parameters in this case. It is clear, that, as the value of T is lowered, the region being explored in the configuration space is reduced. It therefore seems reasonable to expect that the optimal Gaussian for the steps should also be modified as the annealing process advances. If the step size is too large for the current value of T, almost all of the attempted moves are rejected and the process stagnates. On the other hand, if the steps are too small, the walk may cover only a fraction of the configuration space that should be explored at the current temperature for optimal efficiency. It also appears reasonable to expect that the relative shape and orientation of the Gaussian cloud will need to be adjusted continually in order to maintain optimal efficiency. We have proposed a simple procedure that automatically adjusts the scale, shape, and orientation of the n-dimensional Gaussian at each stage during annealing.

The basic idea for step control in ASA is based on what is called the central limit theorem. This theorem states that, if you average a collection of independent random numbers, the result is a random number with a distribution that is roughly a (one-dimensional) Gaussian. This holds regardless of the distributions of each of the individual random numbers and it also follows that the variance (i.e., the mean-square spread) in the resulting random number is just the average of the variances of the individual random numbers. As a result, it turns out that a Gaussian multi-dimensional step generator can be realized by taking linear combinations of a collection of random vectors.

In ASA, the idea is to keep a record of the last m steps that have been accepted and to generate new steps by taking random linear combinations of these steps and scaling the result by an appropriate (dynamically adjusted) expansion factor. In this way, the statistics of the generated steps are directly coupled to both the behavior of the merit function and the current value of T. This scheme turns out to be not only effective but simple: without expensive matrix operations, the algorithm automatically adjusts to the multi-dimensional terrain in a way that is invariant under arbitrary linear transformations that scale and stretch the space in a manner that warps rectangular prisms to rotated, elongated parallelepipeds. Within the context of lens design, this invariance is of utmost importance since there is no natural measure of distance in the configuration space. (Recall that the coordinates are normally a mixture of refractive indices, thicknesses, curvatures, aspheric coefficients, etc., and there is no intuitive way to define the distance between two points in such a space.)

Another important aspect of the guided random walk is the means for dealing with steps that take the system outside the region of interest. That is, if one or more constraints are violated by the proposed system, the step cannot be accepted. Efficiency is enhanced if, instead of simply rejecting such a system and requesting a new random step, the point at which the current step first crosses a constraint boundary is found and the step is then reflected off this constraint back into the region of interest. Without this reflection process, the regions near constraints turn out to be undersampled and this can adversely affect the overall efficiency.

For optimal efficiency it is important that the details of this process of reflection take a form that ensures invariance under linear transformations of the coordinates. Since our intuition is based in a space where there is a natural measure of distance, it seems unambiguous to state that a step should be reflected in a given plane as if it were a mirror. This is not so, however. The difficulty can be appreciated by observing that the idea of a normal to a plane surface is not invariant under linear transformations. To see this, consider two perpendicular lines on a page and imagine a scale transformation that stretches the plane along a direction that is not parallel to either of the lines. As a result, the lines will no longer be perpendicular. It follows that, to complete the specification of the reflection process, it is necessary to introduce a particular measure of distance — i.e., a "metric" — to the coordinate space.

In ASA, there is a natural metric defined by the elliptical Gaussian cloud of the step distribution: the distance between two points can be measured in terms of the number of steps (of mean size in that direction) required to move from one point to the other. Notice that, due to the coupling to the step distribution, the form of this metric evolves during the annealing process. Now, if the space is redrawn by reference to this metric (which amounts to a linear transformation) the elliptical Gaussian cloud takes the form of a sphere and this gives the natural representation in which to bounce off constraints as if they were mirrors. With this, the essential pieces of the crucial step-generation component of ASA are completely determined.

The details of the temperature control aspects of ASA are more sophisticated, but it is intuitive that, for a given merit function, there will be particular ranges for the value of T during which a relative reduction in the rate of decrease of T will lead to better overall efficiency. This aspect of the adaptiveness in ASA is controlled by monitoring certain statistics of the recent stages of the search. The process is terminated when the typical relative change in the current merit function value falls below a certain level. On any given problem, ASA should be run as many times as is workable and the designs that emerge can then be reoptimized and fine-tuned by using the "Iterate full" or "Iterate standard" commands. There is a range of context-specific matters that must be addressed in applying an algorithm like ASA to lens design problems. One of these issues is the proper choice of non-linear coordinate transformations to enhance efficiency.

The general outline presented here should give some appreciation of the philosophy underlying the design of ASA. It is important not to forget that typical design problems are challenging tasks and the success of an automated algorithm is still dependent upon the designer's guidance. No longer is the designer asked for a starting point, but now the designer is asked for the specification of the region to be explored. We have enjoyed some impressive successes with ASA in lens design[4-6] and look forward to hearing of your experiences with this new algorithm in your own work.

References

1. David Shafer, Laser Focus World, 135, (Jan. 1993).

2. I.O. Bohachevsky et al., SPIE 485, 104, (1984).

3. A.E.W. Jones and G.W. Forbes, J. Global Optimization 6, 1 (1995)

4. G.W. Forbes and A.E.W. Jones, Proc. SPIE 1354, 144, (1990).

5. G.W. Forbes and A.E.W. Jones, Optics and Photonics News 3, 22, (March 1992).

6. G.W. Forbes and A.E.W. Jones, OSA Proceedings 22, 42, (1994).

신고

'research > distributed algorithm' 카테고리의 다른 글

Matlab-General simulated annealing algorithm open source  (0) 2008.02.10
fuzzy c-mean  (0) 2008.02.10
Global Optimization  (0) 2008.02.08
Important Referencess for Wireless Sensor Networks  (0) 2008.02.01
[펌] Simulated Annealing  (1) 2008.01.29
[펌] k-means clustering  (0) 2008.01.29
Posted by 나마스떼

댓글을 달아 주세요



티스토리 툴바