Code katas taken more literally

Karate class

Code katas are programming exercises intended to develop programming skills, analogous to the way katas develop martial art skills.

But literal katas are choreographed. They are rituals rather than problem-solving exercises. There may be an element of problem solving, such as figuring how to better execute the prescribed movements, but katas are rehearsal rather than improvisation.

CodeKata.com brings up the analogy to musical practice in the opening paragraph of the home page. But musical practice is also more ritual than problem-solving, at least for classical music. A musician might go through major and minor scales in all 12 keys, then maybe a chromatic scale over the range of the instrument, then two different whole-tone scales, etc.

A code kata would be more like a jazz musician improvising a different melody to the same chord changes every day. (Richie Cole would show off by improvising over the chord changes to Cherokee in all twelve keys. I don’t know whether this was a ritual for him or something he would pull out for performances.)

This brings up a couple questions. What would a more literal analog of katas look like for programming? Would these be useful?

I could imagine someone going through a prescribed sequence of keystrokes that exercise a set of software features that they wanted to keep top of mind, sorta like practicing penmanship by writing out a pangram.

This is admittedly a kind of an odd idea. It makes sense that the kinds of exercises programmers are interested in require problem solving rather than recall. But maybe it would appeal to some people.

***

Image “karate training” by Genista is licensed under CC BY-SA 2.0 .

Seconds to hours

Suppose you have a number of seconds n and you want to convert it to hours and seconds. If you divide n by 3600, the quotient is the number of hours and the remainder is the number of seconds.

For example, suppose n = 8072022. Here’s the obvious way to do the calculation in Python:

    def div3600(n): return (n // 3600, n % 3600)

This returns (242, 822) if we pass in 872022, and so 872022 seconds equals 242 hours and 822 seconds.

However, as is often the case, the most obvious approach is not the fastest. In general, division is much slower than multiplication and addition, though division by powers of 2 is very fast. So we could speed up our calculation if we had a way to change the problem to one that requires dividing by powers of 2 rather than dividing by 3600.

Here’s another Python program that computes the same result a very different way.

    def div3600(n):
        assert(0 <= n < 2255761) 
        t = 1193047*n 
        q = t >> 32
        r = 3600*(t & 4294967295) >> 32
        assert(q == n // 3600)
        assert(r == n % 3600)
        return (q, r)

This algorithm does not work for all n, hence the assert statement at the top of the function verifying that n is in range. Notice that there are no explicit division statements. There is a bit shift, which amounts to a division by 232, and there is a bitwise and, which amounts to taking the remainder after dividing by 232.

The Python code is a prototype. If we cared so much about performance that we would consider using this algorithm, we shouldn’t be using Python. Here is a C implementation that would be more realistic.

    void div3600(long n, long* q, long* r) {
        long t = 1193047*n;
        *q = t >> 32;
        *r = 3600*(t & 4294967295) >> 32;
    }

The algorithm presented above is a example from [1]. In that paper the authors study integer functions of the form

r + β) // δ

and

r + β) % δ

and their application to speeding up software [2], especially calendar algorithms. The authors include benchmarks in several programming languages that verify that their algorithms are indeed significantly faster than the direct approach.

Of course most applications would do well to use direct calculations which are obviously correct. But a large scale application that does these kinds of calculations over and over might benefit from an obfuscated but more efficient algorithm.

***

[1] Cassio Neri and Lorenz Schneider. Euclidean Affine Functions and Applications to Calendar Algorithms. Arxiv 2102.06959v1.

[2] Here I’ve used Python’s integer division notation //. The author’s use C notation, where / means quotient because the context is integer arguments. I prefer Python’s notation because it is more explicit.

Memorizing Planck’s constant with DALL-E

Planck’s constant used to be a measured quantity and now it is exact by definition.

h = 6.62607015×10−34 J / Hz

Rather than the kilogram being implicit in the units used to measure Planck’s constant, the mass of a kilogram is now defined to be whatever it has to be to make Planck’s constant have the value above.

Now that it’s exact by definition, maybe you’d like to memorize it. Using the Major system described here we could encode the digits as “Judge enjoys quesadilla.” [1]

As with the previous post, I’m using a memorization exercise as an excuse to play around with DALL-E. I typed “A judge enjoying eating a quesadilla” into DALL-E 2 and got back four images, as always. The best of these was the following.

First attempt at judge enjoying a quesadilla

The food in the image looks like a quesadilla, but it’s not clear that the man eating it is a judge, or that he’s enjoying himself.

Next I changed “judge” to “a supreme court justice,” hoping DALL-E would create an image that more obviously features a judge.

Here’s one of the outputs:

Second attempt at judge enjoying a quesadilla

This fellow looks more like a judge, and he’s obviously enjoying himself. Maybe he’s eating a calzone, but we’ll call it a quesadilla.

Not all the images created by DALL-E are as accurate as the ones above. I suspect there’s a lot of selection bias in the examples of images posted online. I’m contributing to that selection bias by showing images that were good enough to include in a blog post. I tried other images for blogging on other topics, and the results were not worth sharing.

So in an attempt at mitigating selection bias, here’s another image generated from the prompt “A supreme court justice enjoying eating a quesadilla.”

Young lady who is not eating anything

This young lady is wearing black, as supreme court justices are wont to do. And she appears to be enjoying herself, but she’s definitely not eating a quesadilla.

Incidentally, another possible encoding of 662607015 is “Judge enjoys Costello” as in Abbot and Costello. When I typed “A supreme court justice enjoying watching Abbot and Costello on television” I got the following creepy image.

DALL-E attempt at a judge watching Abbot and Costello on television

Related posts

[1] This mnemonic is a little bit of a cheat, depending on how you pronounce quesadilla. The sound of ll is sorta like that of a y in English. Here I’m using it to represent 5 just as the l sound does. Here in southeast Texas, I believe most people use the Spanish pronunciation, at least approximately. If you completely anglicize the pronunciation so that ll is pronounced as in pillow, then you can use the mnemonic with no qualms.

DALL-E 2 and mnemonic images

I recently got an account for using OpenAI’s DALL-E 2 image generator. The example images I’ve seen are sorta surreal combinations of common words, and that made me think of the Major memory system.

I’ve written about the Major system before. For example, I give an overview here and I describe how to use it to memorize an ASCII table here. In a nutshell, there are consonant sounds associated with each digit. Choose constant sounds and add any vowel sounds you like to make words you can visualize.

There are a couple ways people use the Major system. One is simply to memorize numbers. Any encoding that leads to something you find easy to remember is OK. For example, suppose you want to encode 19. The consonant sounds for 1 are tth, and d, and the consonant sounds for 9 are p and b. So you could encode 19 as adobe, Debbie, Ethiopia, tuba, etc.

The other way people use the Major system is to memorize specific pegs for numbers. For example, you might choose tuba as your peg for 19. To memorize a list, you associate each item with its peg, such as associating the 19th item with a tuba. Pegs have to be unique so you can pull up a particular mental image to call a list item, such as remembering what you associated with a tuba.

For example, suppose you wanted to memorize a list of the US presidents. The 19th president was Rutherford B. Hayes, and so you might want to imagine him playing a tuba. I uploaded a photo of Hayes and asked DALL-E to make him play a tuba. The software rejected my request, saying that realistic photos of persons are not allowed at this time.

Rutherford B. Hayes playing a tuba

DALL-E knows about some people but not others. For example, it doesn’t know who Evariste Galois is, but apparently it has some idea who Rutherford B. Hayes is. When I asked for “Rutherford B. Hayes playing a tuba” it came back with the image above.

Franklin Delano Roosevelt was the 32nd president. The consonant sound for 3 is m and the consonant sound for 2 is n. Suppose your peg for 32 is moon, and you’d like to imagine FDR looking up at the moon. When I asked DALL-E to make an image of this, I got a very strange image image of FDR, but he was looking up at the moon.

FDR looking up at the moon

The only US president to serve two non-consecutive terms was Grover Cleveland, the 22nd and 24th president. I asked DALL-E for an image of Grover the Muppet holding an onion (22) in one hand and a wiener dog (24) in the other [1]. The result was not great.

Blue dog holding cucumber dog?

I thought Grover the Muppet would be more memorable than Grover Cleveland himself. But DALL-E did better with Mr. Cleveland. Maybe there’s some copyright issue with the muppets?

Grover Cleveland with a onion-banana and a dog

Well, he does have an onion, with something weird underneath. Bananas? Eggplant? Cow udders? And he has a dog, though not a wiener dog.

Creating your own mental images is far more efficient than having DALL-E come up with images for you, but the DALL-E images are useful examples of what you might imagine for yourself.

Related posts

[1] The Major system doesn’t use w and so you can throw it in as you would a vowel. So wiener decodes as n and r, 24.

Naming probability functions

Given a random variable X, you often want to compute the probability that X will take on a value less than x or greater than x. Define the functions

FX(x) = Prob(Xx)

and

GX(x) = Prob(X > x)

What do you call F and G? I tend to call them the CDF (cumulative distribution function) and CCDF (complementary cumulative distribution function) but conventions vary.

The names of software functions to compute these two functions can be confusing. For example, Python (SciPy) uses the names cdf and sf (the latter for “survival function”) while the R functions to compute the CDF take an optional argument to return the CCDF instead [1].

In the Emacs calculator, the function ltpn computes the CDF. At first glace I thought this was horribly cryptic. It’s actually a very nice naming convention; it just wasn’t what I was expecting.

The “ltp” stands for lower tail probability and “n” stands for normal. The complementary probability function is utpn where “utp” stands for upper tail probability. Unlike other software libraries, Emacs gives symmetric names to these two symmetrically related functions.

“Lower tail” probability is clearer than “cumulative” probability because it leaves no doubt whether you’re accumulating from the left or the right.

You can replace the “n” at the end of ltpn and utpn with the first letters of binomial, chi-square, t, F, and Poisson to get the corresponding functions for these distributions. For example, utpt gives the upper tail probability for the Student t distribution [2].

The Emacs calculator can be quirky, but props to the developers for choosing good names for the probability functions.

Related posts

[1] Incidentally, the CCDF cannot always be computed by simply computing the CDF first and subtracting the value from 1. In theory this is possible, but not in floating point practice. See the discussion of erf and erfc in this post for an explanation.

[2] These names are very short and only a handful of distribution families are supported. But that’s fine in context. The reason to use the Emacs calculator is to do a quick calculation without having to leave Emacs, not to develop production quality statistical software.

Floating point inverses and stability

Let f be a monotone, strictly convex function on a real interval I and let g be its inverse. For example, we could have f(x) = ex and g(x) = log x.

Now suppose we round our results to N digits. That is, instead of working with f and g we actually work with fN and gN where

fN(x) = round(f(x), N)

and

gN(x) = round(g(x), N)

and round(y, N) is the number y rounded to N significant figures [1].

This is what happens when we implement our functions f and g in floating point arithmetic. We don’t actually get the values of f and g but the values of fN and gN.

We assumed that f and g are inverses, but in general fN and gN will not be exact inverses. And yet in some sense the functions fN and gN are like inverses. Harold Diamond [2] proved that if go back and forth applying fN and gN two times, after two round trips the values quit changing.

To make this more precise, define

hN(x) = gN( fN(x)).

In general, hN(x) does not equal x, but we do have the following:

hN( hN( hN(x) ) )  = hNhN(x) ).

The value we get out of hN(x) might not equal x, but after we’ve applied hN twice, the value stays the same if we apply hN more times.

Connection to connections

Diamond’s stability theorem looks a lot like a theorem about Galois connections. My first reaction was that Diamond’s theorem was simply a special case of a more general theorem about Galois connections, but it cannot.

A pair of monotone functions F and G form a Galois connection if for all a in the domain of F and for all b in the domain of G,

F(a) ≤ baG(b).

Let F and G form a Galois connection and define

H(x) = G( F(x) ).

Then

H( H(x) ) = H(x).

This result is analogous to Diamond’s result, and stronger. It says we get stability after just one round trip rather than two.

The hitch is that although the functions f and g form a Galois connection, the functions fN and gN do not. Nevertheless, Diamond proved that fN and gN form some sort of weaker numerical analog of a Galois connection.

Example

The following example comes from [2]. Note that the example rounds to two significant figures, not two decimal places.

    from decimal import getcontext, Decimal
    
    # round to two significant figures
    getcontext().prec = 2
    def round(x): return float( Decimal(x) + 0 )
    
    def f(x):  return 115 - 35/(x - 97)
    def f2(x): return round(f(x))
    def g(x):  return 97 + 35/(115 - x)
    def g2(x): return round(g(x))
    def h2(x): return g2(f2(x))
    
    N = 110
    print(h2(N), h2(h2(N)), h2(h2(h2(N))))

This prints

   100.0 99.0 99.0

showing that It shows that the function h2 satisfies Diamond’s theorem, but it does not satisfy the identify above for Galois compositions. That is, we stabilize after two round trips but not after just one round trip.

Related posts

[1] Our “digits” here need not be base 10 digits. The stability theorem applies in any radix b provided bN ≥ 3.

[2] Harold G. Diamond. Stability of Rounded Off Inverses Under Iteration. Mathematics of Computation, Volume 32, Number 141. January 1978, pp. 227–232.

 

Inline computed content in org-mode

The previous post discussed how to use org-mode as a notebook. You can have blocks of code and blocks of results, analogous to cells in a Jupyter notebook. The code and the results export as obvious blocks when you export the org file to another format, such as LaTeX or HTML. And that’s fine for a notebook.

Now suppose you want to do something more subtle. You want to splice in the result of a computed value without being obvious about it. Maybe you want to compute a value rather than directly enter it so that the document will remain consistent. Maybe you have a template and you want to set the parameters of the template at the top of the file.

Web development languages like PHP do this well. You can write a PHP file that is essentially an HTML file with pieces of code spliced in. You do this my inserting

    <?php … ?>

into the HTML code, and when the page is rendered the code between the <?php and ?> tags is replaced with the result of executing the code. We’d like to do something analogous in org-mode with org-babel. (org-babel is the subsystem of org-mode that interacts with code.)

Here’s an org-mode example that sets length and width as variables at the top of a file and multiplies them later in the body of the file to get area.

We define our variables as follows. The block is marked :exports none because we do not want to display the code or the values. We just want the code to run when we export the file.

    #+begin_src python :session :exports none
    length, width = 7, 13
    #+end_src

The following almost does what we want [1].

    Area equals src_python[:session]{length*width}.

This renders as

Area equals 91.

if we export our org file to HTML The number 91 is typeset differently than the words before it. This would be more obvious if the computed value were a string rather than a number.

Org-mode is wrapping <code> tags around the computed result. If we were to export the org file to LaTeX it would wrap the result with \texttt{}. This is because, by default, the output of a computation is displayed as computer output, which is conventionally set in a monospace font like Courier. That’s fine in a technical document when we want to make it obvious that a calculation is a calculation, but typically not in a business context. You wouldn’t want, for example, to generate a letter that starts

Dear Michael,

with Michael’s name set in Courier, announcing that this is a form letter.

The fix is to add :results raw to the header session, the part in square brackets between src_python and the Python code.

    Area equals src_python[:session :results raw]{length*width}.

Now the calculation result is reported “raw”, i.e. without any special markup surrounding it.

***

[1] In this example I’m using Python, and so I used the function src_python. org-babel supports dozens of languages, and each has its src_<language> counterpart.

Org-mode as a lightweight notebook

You can think of org-mode as simply a kind of markdown, a plain text file that can be exported to fancier formats such as HTML or PDF. It’s a lot more than that, but that’s a reasonable place to start.

Org-mode also integrates with source code. You can embed code in your file and have the code and/or the result of running the code appear when you export the file to another format.

Org-mode as notebook

You can use org-mode as a notebook, something like a Jupyter notebook, but much simpler. An org file is a plain text file, and you can execute embedded code right there in your editor. You don’t need a browser, and there’s no hidden state.

Here’s an example of mixing markup and code:

    The volume of an n-sphere of radius r is 

    $$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n.$$

    #+begin_src python :session
    from scipy import pi
    from scipy.special import gamma

    def vol(r, n):
        return pi**(n/2)*r**n/gamma(n/2 + 1)

    vol(1, 5)
    #+end_src

If you were to export the file to PDF, the equation for the volume of a sphere would be compiled into a image using LaTeX.

To run the code [1], put your cursor somewhere in the code block and type C-c C-c. When you do, the following lines will appear below your code.

    #+RESULTS:
    : 5.263789013914324

If you think of your org-mode file as primary, and you’re just inserting some code as a kind of scratch area, an advantage of org-mode is that you never leave your editor.

Jupyter notebooks

Now let’s compare that to a Jupyter notebook. Jupyter organizes everything by cells, and a cell can contain markup or code. So you could create a markup cell and enter the exact same introductory text [2].

    The volume of an n-sphere of radius r is 

    $$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n$$.

When you “run” the cell, the LaTeX is processed and you see the typeset expression rather than its LaTeX source. You can click on the cell to see the LaTeX code again.

Then you would enter the Python code in another cell. When you run the cell you see the result, much as in org-mode. And you could export your notebook to PDF as with org-mode.

File diff

Now suppose we make a couple small changes. We want the n and r in the comment section set in math italic, and we’d like to find the volume of a 5-sphere of radius 2 rather than radius 1. We do this, in Jupyter and in org-mode [3], by putting dollar signs around the “n” and the “r”, and we change vol(1, 5) to vol(2, 5).

Let’s run diff on the two versions of the org-mode file and on the two versions of the Jupyter notebook.

The differences in the org files are easy to spot:

    1c1
    < The volume of an n-sphere of radius r is 
    ---
    > The volume of an \(n\)-sphere of radius \(r\) is 
    12c12
    < vol(1, 5)
    ---
    > vol(2, 5)
    16c16,17
    < : 5.263789013914324
    ---
    > : 168.44124844525837

However, the differences in the Jupyter files are more complicated:

    5c5
    <    "id": "2a1b0bc4",
    ---
    >    "id": "a0a89fcf",
    8c8
    <     "The volume of an n-sphere of radius r is \n",
    ---
    >     "The volume of an $n$-sphere of radius $r$ is \n",
    15,16c15,16
    <    "execution_count": 1,
    <    "id": "888660a2",
    ---
    >    "execution_count": 2,
    >    "id": "1adcd8b1",
    22c22
    <        "5.263789013914324"
    ---
    >        "168.44124844525837"
    25c25
    <      "execution_count": 1,
    ---
    >      "execution_count": 2,
    37c37
    <     "vol(1, 5)"
    ---
    >     "vol(2, 5)"
    43c43
    <    "id": "f8d4d1b0",

There’s a lot of extra stuff in a Jupyter notebook. This is a trivial notebook, and more complex notebooks have more extra stuff. An apparently small change to the notebook can cause a large change in the underlying notebook file. This makes it difficult to track changes in a Jupyter notebook in a version control system.

Related posts

[1] Before this will work, you have to tell Emacs that Python is one of the languages you want to run inside org-mode. I have the following line in my init file to tell Emacs that I want to be able to run Python, DITAA, R, and Perl.

    (org-babel-do-load-languages 'org-babel-load-languages '((python . t) (ditaa . t) (R . t) (perl . t)))

[2] Org-mode will let you use \[ and \] to bracket LaTeX code for a displayed equation, and it will also let you use $$. Jupyter only supports the latter.

[3] In org-mode, putting dollar signs around variables sometimes works and sometimes doesn’t. And in this example, it works for the “r” but not for the “n”. This is very annoying, but it can be fixed by using \( and \) to enter and leave math mode rather than use a dollar sign for both.

Not so fast

James Gregory’s series for π

\pi = 4\left(1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots \right)

is not so fast. It converges very slowly and so does not provide an efficient way to compute π. After summing half a million terms, we only get five correct decimal places. We can verify this with the following bc code.

    s = 0
    scale = 50
    for(k = 1; k <= 500000; k++) { 
        s += (-1)^(k-1)/(2*k-1) 
    }
    4*s

which returns

3.141590

which differs from π in the sixth decimal place. So does that mean there’s nothing interesting about Gregory’s formula? Not so fast!

When anyone speaks of a number of correct decimals, they nearly always mean number of consecutive correct digits following the decimal point. But for this post only I’ll take the term literally to mean the number of decimals that match the decimals in the correct answer.

The number of correct decimals (in this non-standard use of the term) in the series above is not so bad. Here’s the result, with the digits that differ from those of π underlined:

3.1415906535897932404626433832695028841972

So even though the sixth decimal value is wrong, the next 10 after that are correct, and then after a couple errors we get another string of correct digits.

In [1] the authors explain what makes this example tick, and show how to create similar sequences. For example, we see a similar pattern whenever the limit of the sum is half of a power of 10, but not so much for other limits. For example, let’s increase 500,000 to 600,000. We get

3.141590986923126572953384124016

which is completely wrong after the sixth digit. So even though the result is slightly more accurate, it has fewer correct decimals.

Related posts

[1] Jonathan Borwein, Peter Borwein, Karl Dilcher. Pi, Euler Numbers, and Asymptotic Expansions. American Mathematical Monthly. vol 96, p. 681–687.

Complex AGM

The arithmetic-geometric mean (AGM) of two non-negative real numbers a and b is defined as the limit of the iteration starting with a0 = a and b0 = b and

an+1 = ½ (an + bn)
bn+1 = √(an bn)

for n > 0. This sequence converges very quickly and is useful in numerical algorithms. The limit can be expressed in terms of an elliptic function, and that elliptic function can then be related to other functions. See, for example, this post for how the AGM can be used to compute logarithms to arbitrary precision.

Since the AGM is useful in computing special functions, and we’re often interested in evaluating special functions at complex values, it’s natural to want to evaluate the AGM for complex numbers.

But we immediately run into a difficulty: which square root do we pick to find the new b?

For a non-negative real number x, √x is defined to be the non-negative real number y such that y² = x. But for more general values of x we have to choose a branch of the square root function. Which branch should we use in the AGM?

Often when we need to take the square root of complex numbers we can use the “principal branch,” the branch that gives positive values for positive real inputs and extends to the rest of the complex plane with the negative axis removed. If you compute the square root of a complex number in software, as we’ll do below, this is probably the value you’ll get by default.

But it turns out we cannot simply pick the principal branch. Gauss discovered two centuries ago that the right branch to take could vary at each step [0]. What does that even mean? How do we know we’ve made the “right” choice?

For one thing, we want our iteration to converge. And for another, we’d like it to converge to something non-zero if we start with non-zero inputs [1]. The right choice will guarantee this [2].

So what is this right choice? We provisionally update a and b as above, using either square root for b, and keep the value of b if

|ab| ≤ |a + b|

or

|ab| = |a + b|

and the imaginary part of b/a is positive. In other words, chose the possible value of b that’s closer to a, and use the imaginary part of b/a as a tie breaker.

Here is the AGM implemented in Python, using the right square root at each step.

    def right(a, b):
        d = abs(a + b) - abs(a - b)
        return d > 0 or d == 0 and (b/a).imag > 0

    def agm(a, b):
        while abs(a-b) > 1e-14:
            a1 = 0.5*(a+b)
            b1 = np.sqrt(a*b)
            if not right(a1, b1):
                b1 = -b1
            a, b = a1, b1
        return a1

The test d == 0 should make you concerned. Testing floating point numbers for exact equality with zero is seldom the right thing to do, but we’ll gloss over numerical issues for this post.

Here’s an example, giving the a values of the iteration starting with 7 + 30i and 20 + 22i. The iteration converges in four steps.

    13.500000000000000 + 26.000000000000000j
    13.784944719026262 + 26.397404494892115j
    13.783557503026870 + 26.395953326186888j
    13.783557473769877 + 26.395953309190112j

In this example, the right root is the principal root every time. To find an example where the other root is chosen, I generated random starting points until I found one that took an alternate root.

Here are the values of b starting with -1.654 – 1.178i and 2.244 – 1.956i. An asterisk indicates that the principal root was not the right root.

     0.2328790598285062 - 0.728412421988127j
    -0.6829999999999998 - 0.589000000000000j
    -0.2254063569280081 - 0.799311791579126j *
    -0.2250604700857468 - 0.658706210994063j
    -0.2261796153095098 - 0.725905503054624j *
    -0.2252334135068774 - 0.729009001286595j
    -0.2257078598391289 - 0.727456168426402j *
    -0.2257065144081936 - 0.727457252170610j
    -0.2257071871240875 - 0.727456710298747j *
    -0.2257071871236612 - 0.727456710298506j
    -0.2257071871238743 - 0.727456710298626j *
    -0.2257071871238744 - 0.727456710298626j

More AGM posts

[0] David Cox. The Arithmetic-Geometric Mean of Gauss. L’Énseignement Mathématique, 30 (1984), p. 275–330.

[1] If a = –b we get zero immediately. But if a and b are both not zero, and a does not equal –b, then taking the right square root at each iteration gives us what we want.

[2] Interestingly, you could make a finite number of wrong choices and still end up with something that might converge, albeit to a different value. This gives you a different branch of the AGM.