Suppose you have function g(x) and you want to find x so that g(x) = 0. However, you don’t have direct access to g(x). Instead you can evaluate f(x) which is g(x) plus random noise. Typically f(x) would be an approximation of g(x) based on Monte Carlo simulation. I have spent countless hours solving problems like this.
One difficulty is that the value of f(x) will change every time you compute it. You might hunt down x and be sure you have it braketed between two values, only to find out you were wrong when you compute f(x) more accurately, say with more Monte Carlo steps.
Assume g(x), i.e. f(x) without noise, is monotone increasing. Assume also that f(x) is expensive to compute. Say each evaluation takes 20 minutes. The amplitude of the noise may vary with x.
Here’s a simple approach to trying to find where f(x) is zero. Use a bisection method, but stop when it looks like you’ve come as close as you can due to noise. If the monotonicity assumption is violated, use that as a signal that you’re within the resolution of the noise.
# Assume a < b, fa = f(a) < 0, fb = f(b) > 0. def noisy_bisect(f, a, b, fa, fb, tolerance): if b-a < tolerance: return (a, b) mid = 0.5*(a+b) fmid = f(mid) if fmid < fa or fmid > fb: # Monotonicity violated. # Reached resolution of noise. return (a, b) if fmid < 0: a, fa = mid, fmid else: b, fb = mid, fmid return noisy_bisect(f, a, b, fa, fb, tolerance)
The values of f(a) and f(b) are saved to avoid recalculation since they’re expensive to obtain.
You could experiment with this function as follows.
from scipy.stats import norm from time import sleep # Gaussian distribution w/ mean 0 and standard deviation 0.01 noise = norm(0, 0.01) def f(x): sleep(10) # wait 10 seconds return x**2 - 1 + noise.rvs() a = 0 b = 3 fa = f(a) fb = f(b) print noisy_bisect(f, a, b, fa, fb, 0.0001)
Without noise, the code would iterate until the zero is bracketed in an interval of width 0.0001. However, since the noise in on the order of 0.01, the method will typically stop when the bracket is a on the order of 0.01 wide, maybe a little smaller.
We could make the root-finding software a little faster by using a secant method rather than a bisection method. Instead of taking the midpoint, the secant method draws a line between (a, f(a)) and (b, f(b)) and uses the point where the line crosses the x-axis as the next guess.
A more sophisticated approach would be to increase the accuracy of f(x) over time. If f(x) is evaluated using N steps of a Monte Carlo simulation, the standard deviation of the noise is roughly proportional to 1/√N. So, for example, cutting the noise in half requires multiplying N by 4. You could get a rough bracket on the zero using a small value of N, then increase N as the bracket gets smaller. You could play with this by using a function like the following.
def f(x, N): sleep(N) # wait N seconds noise = norm(0, 0.1/sqrt(N)) return x**2 - 1 + noise.rvs()
You can make the noise as small as you’d like, but at a price. And the price goes up faster than the noise goes down, so it pays to not reduce the noise more than necessary.
11 thoughts on “Root-finding with noisy functions”
Well, in most cases, you have no a priori idea of a noise model. That being said, this only works if the noise is zero mean. Why would the noise be zero mean?
MathDr: Monte Carlo estimates are often either unbiased, or the bias goes to zero asymptotically, so it’s realistic to assume the noise has zero mean.
Since calculation of ‘f’ is so expensive, why not to use faster algorithms like Brent or Ridder? Their rate of convergence is superlinear and they are already implemented in Scipy.
I understand, that maybe this is not the point that you want to make, but since you are giving advice for root-finding, why not to mention the best alternatives?
How about using a form of robust regression on the closest m function evaluations to fit the location of the zero? Pro: uses hard-earned Monte Carlo information more efficiently. Con: more complicated to set up, and possibly more fragile as a result.
Panos: If f(x) were expensive to evaluate but noiseless, I would use Brent’s method as implemented in
scipy.optimize.fminbound. But noise adds an extra complication. I wrote up a bisection method to show a simple way to deal with noise.
You could use Brent’s method if you had an estimate of the accuracy of f(x), terminating the method when you thought you were getting down to the noise level. (An advantage of the noisy bisection method is that it determines for itself when it has reached this level.) So you might set the noise level fairly high, do a few steps of Brent’s method, turn down the noise and run a few more steps, etc.
Iain: I like your idea.
Like Iain, I like the idea of a local fit of the data for precise root finding. I am not sure about the details, though (size of the fitting x interval, estimation of the noise, etc.).
Another idea would be to use Newton’s method (or some other method that uses derivatives) with a differentiation method that can handle noisy functions. An example of such method can be found at http://math.lanl.gov/Research/Highlights/tvdiff.shtml; I don’t know whether this method can be readily applied to the problem of root finding (through, say, Newton’s method), but it looks promising.
PS: The bisection code could also be more simply implemented with a simple loop (instead of recursion).
Eric: Differentiation and noise generally don’t mix, so I’m curious what the LANL paper is getting at. I’ll take a look.
The secant method is the name for a method that works like Newton’s method using the secant line as an estimate of the derivative, whereas there is a method more like the bracketing method of bisection which uses the secant line as the next guess point for the bracket. This is often called the “method of false position” or “regula falsi”, the difference being that secant method always keeps the last two locations, and regula falsi looks at the new position to find the new bracket. The bracketing method is likely to be better in the presence of noise. However all of this is moot unless your function is really one dimensional, and many of the problems with real applications have multidimensional inputs.
local fitting *is* a method for finding smooth derivatives in the presence of noise, so it makes sense to me to try a local fit together with a derivative based root finding approach. It would be interesting to express the local fit in terms of various coefficients considered as hyperparameters of your model which themselves have bayesian posterior distributions, the addition of another data point evaluation of the objective function could then be used in a bayesian updating scheme to improve the knowledge of the coefficients of the local fit.
The uncertainty in the location of the root could be computed directly using samples from the distribution of local fit functions via sampling from the parameter posterior distributions. sort of a hierarchical model for the zero. You could stop when the uncertainty in the location of the zero is small enough for your purposes (or doesn’t decrease when you add a few more points from your objective function)
Of course this adds an additional monte-carlo process, but presumably the monte carlo process of sampling from the coefficients of your fit is faster than the monte carlo process of calculating the objective function.
now that I think about it, I should really write up this proposed method in my own blog…
This problem (finding the root of the expected value of a univariate noisy function) is perhaps the early instance of a wide class of problems falling unter the moniker of Stochastic Approximation (http://en.wikipedia.org/wiki/Stochastic_approximation). The earliest algorithm is the Robbins-Monro algorithm (the original is available online: http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aoms/1177729586). Since 1951 many improvements have been made. Kushner and Yin’s “Stochastic Approximation” is a relatively up-to-date survey of the subject.
Incidentally, this type of problems finds applications in many fields: Asymptotic Statistics (e.g., work by Robbins and Lai’s), queueing theory (Sean Meyn), simulation (Peter Glynn), consensus formation in Economics (Tsitsiklis), and learning of massive data sets (Bottou, Tsybakov).
This is the classic “optimization of simulation output” problem in a slightly different guise. Very nasty, especially in multiple dimensions.
One approach I love is the one Lee Schruben came up with at Cornell in the ’80s. Within your Monte Carlo simulation, add periodic (e.g. sinusoidal) noise to the independent variables, with relatively prime driving frequencies. Then do a fourier transform of the output. The relationships between the driving frequencies and the periodic signal in the output provide information about the general shape of the response surface. You can then use an optimization technique (or root-finding technique) that is best adapted to functions with that shape.
This works best if your simulation is steady-state. If it really is one 20 minute independent run per evaluation, unrelated to any other evaluations, this method isn’t directly applicable.