[{"id":247093,"date":"2026-06-04T15:12:11","date_gmt":"2026-06-04T20:12:11","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247093"},"modified":"2026-06-04T15:12:11","modified_gmt":"2026-06-04T20:12:11","slug":"the-latin-of-linux","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/06\/04\/the-latin-of-linux\/","title":{"rendered":"The Latin of Linux"},"content":{"rendered":"<p>One reason people study Latin is that it is the ancestor of many modern languages. English derives from West Germanic languages, not from Latin, but much of English vocabulary, perhaps as much as 60%, derives from Latin, either directly or indirectly through French.<\/p>\n<p>Knowing a bit of Latin makes sense of many things that would otherwise seem completely arbitrary, such as why the symbols for gold, silver, and lead are Au, Ag, and Pb respectively.<\/p>\n<p>Similarly, ed(1) is the Latin of Linux [1]. Many conventions in command line utilities follow conventions that go back to the ed(1) line editor. They may go back even further. Just as Latin didn&#8217;t come out of nowhere, neither did ed(1), but you can&#8217;t go back indefinitely. It&#8217;s convenient to start history somewhere, and this post will start with ed(1) just as much discussion of Western linguistics starts with Latin.<\/p>\n<p>The following are features of ed(1) that live on in sed, awk, grep, vi, perl, bash, etc.<\/p>\n<ol>\n<li>Using slashes to delimit regular expressions<\/li>\n<li>Using $ to indicate the end of a line or the end of a file<\/li>\n<li>The pattern of specifying address + action or address range + action<\/li>\n<li>Using regular expressions as address ranges<\/li>\n<li>Using \\1, \\2, etc to refer to regex captures<\/li>\n<li>Using &amp; to refer to the entire matched text<\/li>\n<li>The g\/regexp\/command pattern<\/li>\n<li>Using p for printing lines, as in g\/re\/p<\/li>\n<li>The commands a, c, d, i, j, l, p, q, r, and w in vi<\/li>\n<li>! for shell escape<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>[1] Because the name &#8220;ed&#8221; is so short, and looks so much like the name Ed, it&#8217;s convenient to use its full Unix name ed(1). The parenthesized number is used to disambiguate different things that have the same name, such as the user command kill(1) and the system call kill(2). There is no ed(2) or any other higher-numbered ed. The number is there to make the name stand out, not to disambiguate anything.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One reason people study Latin is that it is the ancestor of many modern languages. English derives from West Germanic languages, not from Latin, but much of English vocabulary, perhaps as much as 60%, derives from Latin, either directly or indirectly through French. Knowing a bit of Latin makes sense of many things that would [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[5],"tags":[],"class_list":["post-247093","post","type-post","status-publish","format-standard","hentry","category-computing"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247093","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247093"}],"version-history":[{"count":2,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247093\/revisions"}],"predecessor-version":[{"id":247109,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247093\/revisions\/247109"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247093"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247093"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247093"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247102,"date":"2026-06-04T12:18:40","date_gmt":"2026-06-04T17:18:40","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247102"},"modified":"2026-06-04T12:18:40","modified_gmt":"2026-06-04T17:18:40","slug":"integrating-smooth-periodic-functions","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/06\/04\/integrating-smooth-periodic-functions\/","title":{"rendered":"Integrating smooth periodic functions"},"content":{"rendered":"<p>Several posts lately have looked at the function<\/p>\n<p style=\"padding-left: 40px;\"><em>f<\/em>(<em>x<\/em>) = cos(sin(<em>x<\/em>) +\u00a0<em>x<\/em>).<\/p>\n<p>This post will look at the function from a different angle. It&#8217;s a smooth function with period 2\u03c0, and it&#8217;s very flat at odd multiples of \u03c0, i.e. the first five derivatives are zero. For reasons I wrote about <a href=\"https:\/\/www.johndcook.com\/blog\/2017\/03\/01\/numerically-integrating-periodic-functions\/\">here<\/a>, this means that the trapezoid rule should be very efficient for integrating this function.<\/p>\n<p>In general, the error in the trapezoid rule is on the order of 1\/<em>N<\/em>\u00b2 where\u00a0<em>N<\/em> is the number of integration points. To be more specific, the error in integrating a function <em>f<\/em> over [<em>a<\/em>, <em>b<\/em>] with <em>N<\/em> points is bounded by<\/p>\n<p style=\"padding-left: 40px;\">(<em>b<\/em> \u2212 <em>a<\/em>)\u00b3 <em>M<\/em> \/ 12<em>N<\/em>\u00b2<\/p>\n<p>where <em>M<\/em> is the maximum absolute value of the second derivative of <em>f<\/em>. So in our case we should expect the error to be less than 82.67\/<em>N<\/em>\u00b2. In fact we do\u00a0<em>much<\/em> better than that. The error does not decrease quadratically, as it does in general, but exponentially.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium aligncenter\" src=\"https:\/\/www.johndcook.com\/periodic_gaussian.png\" width=\"640\" height=\"480\" \/><\/p>\n<p>With just 16 integration points, we&#8217;ve reached the limit of floating point representation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Several posts lately have looked at the function f(x) = cos(sin(x) +\u00a0x). This post will look at the function from a different angle. It&#8217;s a smooth function with period 2\u03c0, and it&#8217;s very flat at odd multiples of \u03c0, i.e. the first five derivatives are zero. For reasons I wrote about here, this means that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[71],"class_list":["post-247102","post","type-post","status-publish","format-standard","hentry","category-math","tag-integration"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247102","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247102"}],"version-history":[{"count":4,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247102\/revisions"}],"predecessor-version":[{"id":247106,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247102\/revisions\/247106"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247102"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247102"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247102"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247098,"date":"2026-06-04T08:48:56","date_gmt":"2026-06-04T13:48:56","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247098"},"modified":"2026-06-04T08:48:56","modified_gmt":"2026-06-04T13:48:56","slug":"partitions-over-permutations","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/06\/04\/partitions-over-permutations\/","title":{"rendered":"Partitions over permutations"},"content":{"rendered":"<p>I was thinking more about the cosine approximation to the Gaussian<\/p>\n<p style=\"padding-left: 40px;\">exp(\u2212<em>z<\/em>\u00b2) \u2248 (1 + cos(sin(<em>z<\/em>) +\u00a0<em>z<\/em>))\/2<\/p>\n<p>that I wrote about <a href=\"https:\/\/www.johndcook.com\/blog\/2026\/05\/31\/another-gaussian-approximation\/\">last<\/a> <a href=\"https:\/\/www.johndcook.com\/blog\/2026\/06\/01\/not-just-taylor-series\/\">week<\/a>. The two expressions above are close along the real axis but not along the imaginary axis.<\/p>\n<p>If\u00a0<em>z<\/em> =\u00a0<em>iy<\/em>, the right side grows much faster than the left, behaving like\u00a0exp(exp(<em>y<\/em>)).<\/p>\n<p>This led to me looking up the power series for the double exponential function exp(exp(<em>y<\/em>)). This is an interesting series because the coefficient of <em>x<\/em><sup><em>n<\/em><\/sup> is<\/p>\n<p style=\"padding-left: 40px;\"><em>e<\/em> <em>B<\/em><sub><em>n<\/em><\/sub> \/ <em>n<\/em>!<\/p>\n<p>where <em>B<\/em><sub><em>n<\/em><\/sub> is the <em>n<\/em>th <a href=\"https:\/\/www.johndcook.com\/blog\/2018\/06\/05\/bell-numbers\/\">Bell number<\/a>, which equals the number of ways to <strong>partition<\/strong> a set of\u00a0<em>n<\/em> labeled items [1]. And of course <em>n<\/em>! is the number of ways to <strong>permute<\/strong> a set of\u00a0<em>n<\/em> labeled items. So the <em>n<\/em>th coefficient in the power series for exp(exp(<em>y<\/em>)) is the ratio of the number of partitions to permutations for a set of\u00a0<em>n<\/em> labeled things, multiplied by <em>e<\/em>.<\/p>\n<p>The number of ways to partition a set of\u00a0<em>n<\/em> things grows quickly as\u00a0<em>n<\/em> increases, almost as quickly as the number of permutations, and so the series for the double exponential function converges very slowly.<\/p>\n<h2>Computing<\/h2>\n<p>SymPy has a function <code>bell<\/code> for computing Bell numbers, so you could compute the ratio of partitions to permutations as follows.<\/p>\n<pre>from sympy import bell, factorial\r\nf = lambda n: bell(n)\/factorial(n)\r\n<\/pre>\n<p>This returns a number of type <code>sympy.core.numbers.Rational<\/code> and so the result is exact. You can cast it to float for convenience.<\/p>\n<h2>Asymptotics<\/h2>\n<p>If we look at only the terms in the asymptotic series for log <em>B<\/em><sub><em>n<\/em><\/sub> and log\u00a0<em>n<\/em>! that grow with\u00a0<em>n<\/em> we have<\/p>\n<p style=\"padding-left: 40px;\">log <em>B<\/em><sub><em>n<\/em><\/sub> ~ <em>n<\/em> log\u00a0<em>n<\/em> \u2212\u00a0<em>n<\/em> log log\u00a0<em>n<\/em><br \/>\nlog\u00a0<em>n<\/em>! ~\u00a0<em>n<\/em> log\u00a0<em>n<\/em> \u2212 \u00bd log <em>n<\/em><\/p>\n<p>and so<\/p>\n<p style=\"padding-left: 40px;\">log( <em>B<\/em><sub><em>n<\/em><\/sub> \/\u00a0<em>n<\/em>! ) ~ \u00bd log <em>n \u2212\u00a0n<\/em> log log\u00a0<em>n<\/em><\/p>\n<p>There&#8217;s also an asymptotic series for log( <em>B<\/em><sub><em>n<\/em><\/sub> \/\u00a0<em>n<\/em>! ) involving the <a href=\"https:\/\/www.johndcook.com\/blog\/2021\/04\/23\/lambert-w\/\">Lambert\u00a0<em>W<\/em> function<\/a>:<\/p>\n<p style=\"padding-left: 40px;\">log( <em>B<\/em><sub><em>n<\/em><\/sub> \/\u00a0<em>n<\/em>! ) ~\u00a0<em>n<\/em>\/<em>r<\/em> \u2212 1 \u2212 <em>n<\/em> log <em>r<\/em><\/p>\n<p>where\u00a0<em>r <\/em>= <em>W<\/em>(<em>n<\/em>).<\/p>\n<h2>Related posts<\/h2>\n<ul>\n<li class=\"link\"><a href=\"https:\/\/www.johndcook.com\/blog\/2025\/04\/16\/bell-numbers-2\/\">Mr. Bell and Bell numbers<\/a><\/li>\n<li class=\"link\"><a href=\"https:\/\/www.johndcook.com\/blog\/2021\/11\/14\/estimating-the-number-of-integer-partitions\/\">Estimating partition numbers<\/a><\/li>\n<li class=\"link\"><a href=\"https:\/\/www.johndcook.com\/TwelvefoldWay.pdf\">Richard Stanley&#8217;s twelvefold way<\/a> (pdf)<\/li>\n<\/ul>\n<p>[1] It&#8217;s important that the items are labeled. Partition numbers are the number of partitions of an\u00a0<em>unlabeled<\/em> set. Partition numbers are much smaller than Bell numbers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was thinking more about the cosine approximation to the Gaussian exp(\u2212z\u00b2) \u2248 (1 + cos(sin(z) +\u00a0z))\/2 that I wrote about last week. The two expressions above are close along the real axis but not along the imaginary axis. If\u00a0z =\u00a0iy, the right side grows much faster than the left, behaving like\u00a0exp(exp(y)). This led to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[197,181],"class_list":["post-247098","post","type-post","status-publish","format-standard","hentry","category-math","tag-combinatorics","tag-complex-analysis"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247098","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247098"}],"version-history":[{"count":3,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247098\/revisions"}],"predecessor-version":[{"id":247101,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247098\/revisions\/247101"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247098"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247098"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247098"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247085,"date":"2026-06-03T10:13:24","date_gmt":"2026-06-03T15:13:24","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247085"},"modified":"2026-06-03T10:29:38","modified_gmt":"2026-06-03T15:29:38","slug":"naive-sum","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/06\/03\/naive-sum\/","title":{"rendered":"Naively summing an alternating series"},"content":{"rendered":"<p>Suppose you run across the power series for the exponential function and decide to code it up. Good idea: you&#8217;ll probably learn something, though maybe not what you expect.<\/p>\n<p>Maybe you decide a tolerance of 10<sup>\u221212<\/sup> is good enough, and so you sum the terms until the next term to add is below the tolerance.<\/p>\n<pre>from math import factorial, exp\r\n\r\ndef naive_exp(x):\r\n    tolerance = 1e-12\r\n    s = 0\r\n    n = 0\r\n    while True:\r\n        delta = x**n \/ factorial(n)\r\n        s += delta\r\n        if abs(delta) &lt; tolerance:\r\n            return s\r\n        n += 1\r\n<\/pre>\n<p>You want to try your program out, so you compute <em>e<\/em> by calling the function at 1. If you compare this to calling <code>exp(1)<\/code> you find that you got all the digits correct.<\/p>\n<p>Now you try computing exp(-20). Calling <code>naive_exp(-20)<\/code> gives<\/p>\n<pre>    5.47893091802112e-10<\/pre>\n<p>but calling <code>exp(-20)<\/code> gives<\/p>\n<pre>    2.061153622438558e-09<\/pre>\n<p>Don&#8217;t brush things like this as flukes or compiler bugs [1]. This is your golden opportunity to learn something.<\/p>\n<p>Maybe you add a print statement to see the intermediate values of the sum stored in the variable <em>s<\/em>. If you do, you&#8217;ll see that the partial sums oscillate wildly before settling down.<\/p>\n<p>Maybe that seems wrong, but then you look more carefully at the series. The <em>n<\/em>th term is <em>x<\/em><sup><em>n<\/em><\/sup>\/<em>n<\/em>!. Since <em>x<\/em> is negative, the terms alternate in sign. And the absolute values of the term get bigger before they get smaller. When <em>x<\/em> = \u221220, each numerator is 20 times larger than the previous, and each denominator is <em>n<\/em> times larger than the previous. So the terms will get bigger until <em>n<\/em> &gt; 20. So the wild oscillations are real, not a bug.<\/p>\n<p>The largest partial sum is 21822593.77927747 in absolute value. You know that exp(\u221220) is a very small number, so there&#8217;s going to have to be a lot of cancellation before the partial sums settle down to a small number. Maybe you&#8217;ve heard that cancellation is where numerical calculations lose precision. If not, now you know!<\/p>\n<p>Look again at the largest partial sum. There are eight figures to the right of the decimal point. The code is printing out results to as much precision as it has, so the error at this point is on the order of 10<sup>\u22128<\/sup>. We&#8217;re trying to compute a number on the order of 10<sup>\u22129<\/sup>, and if <em>any<\/em> digits in our result are correct, it would be a coincidence.<\/p>\n<p>If you go back and try your code on <em>x<\/em> = \u221222, the result is even worse, giving a negative result for a quantity that for theoretical reasons cannot be negative. But you can see why: you&#8217;re asking the code to compute a number that is closer to zero than the accuracy of the code.<\/p>\n<p>Computers don&#8217;t represent numbers in base 10 internally, but the argument above is sufficient in this case. If you want to dig deeper, look into the <a href=\"https:\/\/www.johndcook.com\/blog\/2009\/04\/06\/anatomy-of-a-floating-point-number\/\">anatomy of a floating point number<\/a>.<\/p>\n<p>There is a simple way around the problem above, but discovering it sooner would short-circuit the learning process. You could calculate exp(\u221220) as 1\/exp(20) and avoid all the cancellation because the series for exp(20) does not alternate.<\/p>\n<p>&nbsp;<\/p>\n<p>[1] Compilers do have bugs occasionally, but it&#8217;s orders of magnitude more likely that something is wrong with your code.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Suppose you run across the power series for the exponential function and decide to code it up. Good idea: you&#8217;ll probably learn something, though maybe not what you expect. Maybe you decide a tolerance of 10\u221212 is good enough, and so you sum the terms until the next term to add is below the tolerance. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[5],"tags":[],"class_list":["post-247085","post","type-post","status-publish","format-standard","hentry","category-computing"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247085","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247085"}],"version-history":[{"count":7,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247085\/revisions"}],"predecessor-version":[{"id":247092,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247085\/revisions\/247092"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247085"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247085"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247085"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247071,"date":"2026-06-01T07:42:12","date_gmt":"2026-06-01T12:42:12","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247071"},"modified":"2026-06-01T07:42:12","modified_gmt":"2026-06-01T12:42:12","slug":"not-just-taylor-series","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/06\/01\/not-just-taylor-series\/","title":{"rendered":"It&#8217;s not just Taylor series"},"content":{"rendered":"<p>There is still active discussion on X about the approximation<\/p>\n<p style=\"padding-left: 40px;\">exp(\u2212<em>x<\/em>\u00b2) \u2248 (1 + cos(sin(<em>x<\/em>) + <em>x<\/em>))\/2<\/p>\n<p>and some are saying this can just be explained by Taylor series: the series for the two sides differ for the first time at the <em>x<\/em><sup>6<\/sup> term, so that&#8217;s why you get a good approximation. As I wrote <a href=\"https:\/\/www.johndcook.com\/blog\/2026\/05\/31\/another-gaussian-approximation\/\">yesterday<\/a>, that&#8217;s only part of it.<\/p>\n<p>If it were just about Taylor series you could use<\/p>\n<p style=\"padding-left: 40px;\">exp(\u2212<em>x<\/em>\u00b2) \u2248 1 \u2212 <em>x<\/em>\u00b2 + <em>x<\/em><sup>4<\/sup>\/2<\/p>\n<p>which also has error <em>O<\/em>(<em>x<\/em><sup>6<\/sup>). But this approximation is only good for fairly small <em>x<\/em>, say in [\u22120.5, 0.5], whereas the approximation at the top of the post is good over [\u22124, 4]. When <em>x<\/em> = 4, the error in the cosine approximation is 0.002579 but the error in the Taylor approximation is 113, five orders of magnitude larger.<\/p>\n<p>If the accuracy of the cosine approximation were due to Taylor series, then we&#8217;d expect the function<\/p>\n<p style=\"padding-left: 40px;\">exp(\u2212<em>x<\/em>\u00b2) \u2212 (1 + cos(sin(<em>x<\/em>) + <em>x<\/em>))\/2<\/p>\n<p>to be small not just over the interval [\u22124, 4] but over a disk of radius 4 in the complex plane. But it&#8217;s not. When <em>x<\/em> = 4<em>i<\/em> the function is on the order of 10<sup>13<\/sup>.<\/p>\n<p>Both the cosine approximation and the Taylor approximation are good for small disks, say of radius 0.5. They&#8217;re both bad for much larger disks, and in fact the cosine approximation is worse.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There is still active discussion on X about the approximation exp(\u2212x\u00b2) \u2248 (1 + cos(sin(x) + x))\/2 and some are saying this can just be explained by Taylor series: the series for the two sides differ for the first time at the x6 term, so that&#8217;s why you get a good approximation. As I wrote [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[181],"class_list":["post-247071","post","type-post","status-publish","format-standard","hentry","category-math","tag-complex-analysis"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247071","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247071"}],"version-history":[{"count":2,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247071\/revisions"}],"predecessor-version":[{"id":247073,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247071\/revisions\/247073"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247071"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247071"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247071"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247067,"date":"2026-06-01T06:01:26","date_gmt":"2026-06-01T11:01:26","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247067"},"modified":"2026-06-01T06:01:26","modified_gmt":"2026-06-01T11:01:26","slug":"subscribe-by-email","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/06\/01\/subscribe-by-email\/","title":{"rendered":"Subscribe by email"},"content":{"rendered":"<p>Readers have subscribed to this blog via email almost from its beginning in 2008, but <em>how<\/em> they have subscribed has changed several times. I&#8217;ve used several services to provide email subscription that have come and gone.<\/p>\n<p>For the past two years I&#8217;ve been using Substack to send out emails announcing new blog posts. That has worked out well. Substack delivers email reliably, and that&#8217;s all I wanted. I&#8217;m not active on Substack other than using it to an email. I give a brief introduction to the latest two or three blog posts in each email, and sometimes I include additional ideas that occurred to me later.<\/p>\n<p>If you&#8217;d like to get blog post announcements and a little extra commentary via email, sign up for my free Substack newsletter <a href=\"https:\/\/johndcookconsulting.substack.com\/\">here<\/a>.<\/p>\n<p>If you&#8217;d like to learn about new posts sooner, you can subscribe via <a href=\"https:\/\/www.johndcook.com\/blog\/rss-feed\/\">RSS<\/a> or follow me on <a href=\"https:\/\/www.johndcook.com\/blog\/twitter_page\/\">X<\/a> or <a href=\"https:\/\/mathstodon.xyz\/@johndcook\">Mastodon<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Readers have subscribed to this blog via email almost from its beginning in 2008, but how they have subscribed has changed several times. I&#8217;ve used several services to provide email subscription that have come and gone. For the past two years I&#8217;ve been using Substack to send out emails announcing new blog posts. That has [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-247067","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247067","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247067"}],"version-history":[{"count":1,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247067\/revisions"}],"predecessor-version":[{"id":247070,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247067\/revisions\/247070"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247067"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247067"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247061,"date":"2026-05-31T11:15:49","date_gmt":"2026-05-31T16:15:49","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247061"},"modified":"2026-06-01T07:51:18","modified_gmt":"2026-06-01T12:51:18","slug":"another-gaussian-approximation","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/05\/31\/another-gaussian-approximation\/","title":{"rendered":"Another Gaussian approximation"},"content":{"rendered":"<p>The function<\/p>\n<p style=\"padding-left: 40px;\">(1 + cos(<em>x<\/em>))\/2<\/p>\n<p>gives a fair approximation to the Gaussian density<\/p>\n<p style=\"padding-left: 40px;\">exp(\u2212<em>x<\/em>\u00b2)<\/p>\n<p>You can make the approximation much better by raising it to a power. The function<\/p>\n<p style=\"padding-left: 40px;\">((1 + cos(<em>x<\/em>))\/2)<sup>4<\/sup><\/p>\n<p>gives a good lower bound and<\/p>\n<p style=\"padding-left: 40px;\">((1 + cos(<em>x<\/em>))\/2)<sup>3.5597<\/sup><\/p>\n<p>gives a good upper bound. More on that <a href=\"https:\/\/www.johndcook.com\/blog\/2023\/07\/05\/normal-approximation\/\">here<\/a>.<\/p>\n<p>There are other ways of improving the cosine approximation to the Gaussian. <a href=\"https:\/\/x.com\/1strawberry_jam\/status\/2060730093404856692\">Yesterday<\/a> I came across one I hadn&#8217;t seen before, adding a sin(<em>x<\/em>) term to\u00a0<em>x<\/em>.<\/p>\n<p style=\"padding-left: 40px;\">(1 + cos(sin(<em>x<\/em>) + <em>x<\/em>))\/2<\/p>\n<p>This function matches the first few terms of the power series for exp(\u2212<em>x<\/em>\u00b2) and has an error on the order of <em>x<\/em><sup>6<\/sup>\/240. You can&#8217;t see the difference between the two functions in a plot for \u22124 \u2264 <em>x<\/em> \u2264 4.<\/p>\n<p style=\"text-align: center;\">***<\/p>\n<p>There&#8217;s a tension between the previous two statements. If the error in on the order of <em>x<\/em><sup>6<\/sup>\/240 then we&#8217;d expect the error to be huge at\u00a0<em>x<\/em> = 4. We have<\/p>\n<p style=\"padding-left: 40px;\">4<sup>6<\/sup>\/240 = 17.07<\/p>\n<p>and yet<\/p>\n<p style=\"padding-left: 40px;\">exp(\u22124\u00b2) \u2212 ((1 + cos(4 + sin(4)))\/2) = \u22120.002579,<\/p>\n<p>i.e. the error is between 3 and 4 orders of magnitude smaller than we might expect.<\/p>\n<p>We have an alternating series, so the truncation error should be roughly equal to the first term after the truncation, right? No, the alternating series theorem doesn&#8217;t apply because the absolute values of the terms in the series are not decreasing yet for <em>x<\/em> = 4. The terms have to decrease eventually because the series has infinite radius of convergence, but they&#8217;re not decreasing at the 6th term; the terms will get much larger in absolute value before they get smaller.<\/p>\n<p>The basic alternating series theorem gives only an upper bound on truncation error, but there are extensions that also give a lower bound. I wrote about these <a href=\"https:\/\/www.johndcook.com\/blog\/2026\/03\/17\/alternating-series-remainder\/\">extensions<\/a> a few weeks ago. But they don&#8217;t apply here because the terms have not started decreasing in absolute value.<\/p>\n<p><strong>Update<\/strong>: See further discussion in the post <a href=\"https:\/\/www.johndcook.com\/blog\/2026\/06\/01\/not-just-taylor-series\/\">It&#8217;s not just Taylor series<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The function (1 + cos(x))\/2 gives a fair approximation to the Gaussian density exp(\u2212x\u00b2) You can make the approximation much better by raising it to a power. The function ((1 + cos(x))\/2)4 gives a good lower bound and ((1 + cos(x))\/2)3.5597 gives a good upper bound. More on that here. There are other ways of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-247061","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247061"}],"version-history":[{"count":6,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247061\/revisions"}],"predecessor-version":[{"id":247075,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247061\/revisions\/247075"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247056,"date":"2026-05-30T16:06:49","date_gmt":"2026-05-30T21:06:49","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247056"},"modified":"2026-05-31T07:01:35","modified_gmt":"2026-05-31T12:01:35","slug":"schwartz-zippel","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/05\/30\/schwartz-zippel\/","title":{"rendered":"Spot checking polynomial identities"},"content":{"rendered":"<p>If a polynomial identity holds at a few random points, it&#8217;s very like true. We&#8217;ll make this statement more precise, but first let&#8217;s look at some applications.<\/p>\n<p>You may want to test an identity that naturally presents itself as a statement that two polynomials are equal. Or you might use something like <a href=\"https:\/\/www.johndcook.com\/blog\/2012\/07\/21\/binomial-coefficient-trick\/\">the binomial coefficient trick<\/a> to reframe a problem that isn&#8217;t obviously an identity about polynomials. And with algebraic circuits, you can reformulate a wide range of computations as polynomial identities; this is widely used in zero-knowledge proofs.<\/p>\n<p>The theorem alluded to at the top of the post is the Schwartz-Zippel lemma. It is formulated in terms of the probability of a non-zero polynomial <em>P<\/em> evaluating to zero. To prove that two polynomials <em>Q<\/em><sub>1<\/sub> and <em>Q<\/em><sub>2<\/sub> are equal, you can show that<\/p>\n<p style=\"padding-left: 40px;\"><em>P<\/em> = Q<sub>1<\/sub>(<em>x<\/em>) \u2212 <em>Q<\/em><sub>2<\/sub>(<em>x<\/em>) = 0.<\/p>\n<h2>Schwartz-Zippel lemma<\/h2>\n<p>Let\u00a0<em>F<\/em> be a (typically large) finite field and let <em>P<\/em> be a non-zero polynomial in <em>n<\/em> variables<\/p>\n<p style=\"padding-left: 40px;\"><em>P<\/em>(<em>x<\/em><sub>1<\/sub>, <em>x<\/em><sub>2<\/sub>, <em>x<\/em><sub>3<\/sub>, \u2026, <em>x<\/em><sub><em>n<\/em><\/sub>)<\/p>\n<p>of total degree <em>d<\/em>. If we choose the\u00a0<em>x<\/em>&#8216;s randomly from\u00a0<em>F<\/em> then the probability that\u00a0<em>P<\/em> evaluates to zero is no more than\u00a0<em>d<\/em>\/|<em>F<\/em>|. [1]<\/p>\n<p>If the total degree\u00a0<em>d<\/em> is small relative to the size of the field, then the probability of\u00a0<em>P<\/em> evaluating to zero is small. As long as\u00a0<em>d<\/em> is less than |<em>F<\/em>|, you can evaluate the polynomial\u00a0<em>k<\/em> times to make<\/p>\n<p style=\"padding-left: 40px;\">(<em>d<\/em> \/ |<em>F<\/em>|)<sup><em>k<\/em><\/sup><\/p>\n<p>as small as you&#8217;d like. If <em>d<\/em> isn&#8217;t too large, and <em>F<\/em> is large, like the integers mod\u00a0<em>p<\/em> = 2<sup>255<\/sup> \u2212 19 used in cryptography, one polynomial evaluation might be enough to give convincing evidence that the polynomial is zero.<\/p>\n<p>&nbsp;<\/p>\n<p>[1] The Schwartz-Zippel lemma in its full generality applies to polynomials over an integral domain\u00a0<em>R<\/em> with variables drawn from <em>S<\/em>, a finite subset of\u00a0<em>R<\/em>. Here we&#8217;re setting\u00a0<em>R<\/em> =\u00a0<em>S<\/em> =\u00a0<em>F<\/em>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If a polynomial identity holds at a few random points, it&#8217;s very like true. We&#8217;ll make this statement more precise, but first let&#8217;s look at some applications. You may want to test an identity that naturally presents itself as a statement that two polynomials are equal. Or you might use something like the binomial coefficient [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[],"class_list":["post-247056","post","type-post","status-publish","format-standard","hentry","category-math"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247056"}],"version-history":[{"count":4,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247056\/revisions"}],"predecessor-version":[{"id":247062,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247056\/revisions\/247062"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247047,"date":"2026-05-29T07:24:30","date_gmt":"2026-05-29T12:24:30","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247047"},"modified":"2026-05-29T19:06:35","modified_gmt":"2026-05-30T00:06:35","slug":"online-one-pass-algorithms","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/05\/29\/online-one-pass-algorithms\/","title":{"rendered":"Online (one-pass) algorithms"},"content":{"rendered":"<h2>Canonical example<\/h2>\n<p>The sample variance of a set of numbers is defined in terms of the sum of the squared distances from each point to the mean.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" style=\"background-color: white;\" src=\"\/\/www.johndcook.com\/samplevariance.svg\" alt=\"s^2 = \\frac{1}{n-1}\\sum_{i=1}^n (x_i -\\bar{x})^2\" width=\"200\" \/><\/p>\n<p>So it would seem that you first need to calculate the mean, then go back and compute the squared differences from the mean. And yet sample variance can be computed in one pass through the data.<\/p>\n<p>You&#8217;ll find two equivalent equations in statistics books: the one described above and another based on the sum of the data points and the sum of the data points squared.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" style=\"background-color: white;\" src=\"\/\/www.johndcook.com\/variance2.svg\" alt=\"s^2 = \\frac{1}{n(n-1)}\\left(n\\sum_{i=1}^n x_i^2 -\\left(\\sum_{i=1}^n x_i\\right)^2\\right)\" width=\"307\" \/><\/p>\n<p>While this equation is theoretically correct, it is numerically unstable. Code that directly implements this equation can return a negative value for a quantity that is theoretically positive. I&#8217;ve seen this happen with real data, causing a program to crash when taking the square root of the variance to get the standard deviation.<\/p>\n<p>However, there is an algorithm that computes mean and variance in one pass that is accurate and numerically stable. This algorithm was developed by B. P. Welford in 1962. I discuss Welford&#8217;s algorithm and give code for implementing it <a href=\"https:\/\/www.johndcook.com\/blog\/standard_deviation\/\">here<\/a>.<\/p>\n<h2>Online algorithms<\/h2>\n<p>Welford&#8217;s algorithm is known in computer science as an &#8220;online&#8221; algorithm. This term was coined well before the Internet. For example, see the paper [1] from 1965.<\/p>\n<p>But of course now &#8220;online&#8221; means something else, and so the technical and colloquial uses of &#8220;online algorithm&#8221; have split. Technical literature uses the phrase to describe the kinds of algorithms in this post. Most people would take &#8220;online algorithm&#8221; to mean code that runs on a remote server. You may see &#8220;streaming algorithm&#8221; as a contemporary technical term, but I&#8217;d still search on &#8220;online algorithm&#8221; to find papers.<\/p>\n<h2>Computing higher moments online<\/h2>\n<p>Welford&#8217;s algorithm computes the first two moments, mean and variance, of a data set online. It is also possible to <a href=\"https:\/\/www.johndcook.com\/blog\/skewness_kurtosis\/\">compute skewness and kurtosis online<\/a>, as well as higher moments.<\/p>\n<h2>Online regression<\/h2>\n<p>Simple linear regression is closely related to calculating mean and variance, and it is possible to compute simple regression coefficients online. I have some old notes on this <a href=\"https:\/\/www.johndcook.com\/running_regression.html\">here<\/a>.<\/p>\n<p>This post was motivated by an email asking me about multiple regression. It is also possible to compute multiple regression coefficients online, but I haven&#8217;t done this. I found a couple references, [2] and [3], but I have not read them. There is a simple procedure for two predictor variables but I believe things get a little more complicated with three or more predictors, requiring a recursive least squares algorithm.<\/p>\n<h2>Related posts<\/h2>\n<p>The notion of online algorithms is closely related to the notion of a fold in functional programming. Here are several posts on computing things with folds.<\/p>\n<ul>\n<li class=\"link\"><a href=\"https:\/\/www.johndcook.com\/blog\/2016\/06\/09\/insertion-sort-as-a-fold\/\">Insertion sort as a fold<\/a><\/li>\n<li class=\"link\"><a href=\"https:\/\/www.johndcook.com\/blog\/2016\/07\/14\/kalman-filters-and-functional-programming\/\">Kalman filters as folds<\/a><\/li>\n<li class=\"link\"><a href=\"https:\/\/www.johndcook.com\/blog\/2016\/06\/08\/computing-higher-moments-with-a-fold\/\">Computing higher moments with a fold<\/a><\/li>\n<\/ul>\n<p>[1] One-Tape, Off-Line Turing Machine Computations by F. C. Hennie. Information and Control. 8, 553-578 (1965). Available <a href=\"https:\/\/cse.iitkgp.ac.in\/~goutam\/toc\/readingMaterial\/one-tape-off-line.pdf\">here<\/a>. In this paper Hennie writes &#8220;In an <em>on-line computation<\/em> the input data are supplied to the machine, one symbol at a time, at a special input terminal. \u2026 In an <em>off-line computation<\/em> all of the input symbols are written on one of the machine&#8217;s tapes prior to the start of the computation.<\/p>\n<p>[2] Arthur Albert and Robert W. Sittler, \u201cA Method for Computing Least Squares Estimators that Keep Up with the Data,\u201d Journal of the Society for Industrial and Applied Mathematics, Series A: Control, 3(3), 384\u2013417, 1965. DOI: 10.1137\/0303026.<\/p>\n<p>[3] Petre Stoica and Per Ashgren. Exact initialization of the recursive least-squares algorithm. Int. J. Adapt. Control Signal Process. 2002; 16:219&amp;ndashh;230.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Canonical example The sample variance of a set of numbers is defined in terms of the sum of the squared distances from each point to the mean. So it would seem that you first need to calculate the mean, then go back and compute the squared differences from the mean. And yet sample variance can [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[5,9],"tags":[106],"class_list":["post-247047","post","type-post","status-publish","format-standard","hentry","category-computing","category-math","tag-probability-and-statistics"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247047","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247047"}],"version-history":[{"count":4,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247047\/revisions"}],"predecessor-version":[{"id":247051,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247047\/revisions\/247051"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247047"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247047"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247047"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}},{"id":247046,"date":"2026-05-27T20:35:54","date_gmt":"2026-05-28T01:35:54","guid":{"rendered":"https:\/\/www.johndcook.com\/blog\/?p=247046"},"modified":"2026-05-28T06:04:34","modified_gmt":"2026-05-28T11:04:34","slug":"jensen-shannon","status":"publish","type":"post","link":"https:\/\/www.johndcook.com\/blog\/2026\/05\/27\/jensen-shannon\/","title":{"rendered":"Turning K-L divergence into a metric"},"content":{"rendered":"<h2>Kullback-Leibler divergence<\/h2>\n<p>Kullback-Leibler divergence is defined for two random variables\u00a0<em>X<\/em> and <em>Y<\/em> by<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" style=\"background-color: white;\" src=\"https:\/\/www.johndcook.com\/kl_def2.svg\" alt=\"D_{KL}(X || Y) = -\\int f_X(x) \\log \\frac{f_Y(x)}{f_X(x)} \\, dx\" width=\"304\" height=\"46\" \/><\/p>\n<p>K-L divergence is non-negative, and it&#8217;s zero if and only if <em>X<\/em> and\u00a0<em>Y<\/em> have the same distribution. But it is not a metric, for reasons explained <a href=\"https:\/\/www.johndcook.com\/blog\/2017\/11\/08\/why-is-kullback-leibler-divergence-not-a-distance\/\">here<\/a>. For one thing, it&#8217;s not symmetric.<\/p>\n<h2>Jeffreys divergence<\/h2>\n<p>We can fix the symmetry problem by defining<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" style=\"background-color: white;\" src=\"https:\/\/www.johndcook.com\/jeffreys_divergence.svg\" alt=\"J(X, Y) = D_{KL}(X || Y)\u00a0 + D_{KL}(Y || X)\" width=\"281\" height=\"18\" \/><\/p>\n<p>The\u00a0<em>J<\/em> above stands for Jeffreys, for Harold Jeffreys. <em>J<\/em> is called either the symmetrized K-L divergence or Jeffreys&#8217; divergence. It&#8217;s still a divergence, not a distance.<\/p>\n<p>A distance (metric) <em>d<\/em> has to have four properties:<\/p>\n<ol>\n<li><em>d<\/em>(<em>x<\/em>, <em>x<\/em>) = 0<\/li>\n<li><em>d<\/em>(<em>x<\/em>,\u00a0<em>y<\/em>) &gt; 0 if\u00a0<em>x<\/em> \u2260 <em>y<\/em><\/li>\n<li><em>d<\/em>(<em>x<\/em>,\u00a0<em>y<\/em>) =\u00a0<em>d<\/em>(<em>y<\/em>,\u00a0<em>x<\/em>)<\/li>\n<li><em>d<\/em>(<em>x<\/em>,\u00a0<em>z<\/em>) \u2264\u00a0<em>d<\/em>(<em>x<\/em>,\u00a0<em>y<\/em>) +\u00a0<em>d<\/em>(<em>y<\/em>,\u00a0<em>z<\/em>)<\/li>\n<\/ol>\n<p>K-L divergence satisfies the first two properties. Jeffreys&#8217; divergence satisfies the first three, but not the last one, the triangle inequality.<\/p>\n<p>To show that\u00a0<em>J<\/em> doesn&#8217;t satisfy the triangle inequality, let\u00a0<em>X<\/em>, <em>Y<\/em>, and\u00a0<em>Z<\/em> be Bernoulli random variables with\u00a0<em>p<\/em> equal to 0.1, 0.2, and 0.3 respectively. Then the following Python code shows that the divergence from\u00a0<em>X<\/em> to\u00a0<em>Y<\/em>, plus the divergence from\u00a0<em>Y<\/em> to\u00a0<em>Z<\/em>, is less than the divergence from\u00a0<em>X<\/em> to\u00a0<em>Z<\/em>. This would be like saying you could get from LA to NYC faster by having a layover in Denver rather than taking a direct flight.<\/p>\n<pre>from math import log\r\n\r\nkl = lambda p, q: p*log(p\/q) + (1-p)*log((1-p)\/(1-q))\r\nj  = lambda p, q: kl(p, q) + kl(q, p)\r\n\r\na = j(0.1, 0.2)\r\nb = j(0.2, 0.3)\r\nc = j(0.1, 0.3)\r\nprint(a + b, c)\r\n<\/pre>\n<p>This prints 0.135 and 0.270.<\/p>\n<h2>Jensen-Shannon distance<\/h2>\n<p>Jensen-Shannon distance turns K-L divergence into a metric as follows. First, define the random variable\u00a0<em>M<\/em> to be the average of\u00a0<em>X<\/em> and\u00a0<em>Y<\/em>. Then average the K-L divergence from\u00a0<em>M<\/em> to each of\u00a0<em>X<\/em> and\u00a0<em>Y<\/em>. This defines the Jensen-Shannon\u00a0<strong>divergence<\/strong>. It&#8217;s still not a metric, but its square root is, which defines the Jensen-Shannon <strong>distance<\/strong><em>.<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" style=\"background-color: white;\" src=\"https:\/\/www.johndcook.com\/jensen_shannon.svg\" alt=\"\\begin{align*} M &amp;= (X + Y)\/2 \\\\ \\text{JSD}(X, Y) &amp;= \\tfrac{1}{2}D_{KL}(X || M) + \\tfrac{1}{2}D_{KL}(Y || M) \\\\ d_{JS}(X, Y) &amp;= \\sqrt{\\text{JSD}(X, Y)} \\end{align*}\" width=\"331\" height=\"85\" \/><\/p>\n<p>The following code gives an example of Jensen-Shannon distance satisfying the triangle inequality.<\/p>\n<pre>def d(p, q):\r\n    m = 0.5*(p + q)\r\n    jsd = 0.5*kl(p, m) + 0.5*kl(q, m) \r\n    return jsd**0.5\r\n\r\na = d(0.1, 0.2)\r\nb = d(0.2, 0.3)\r\nc = d(0.1, 0.3)\r\nprint(a + b, c)\r\n<\/pre>\n<p>This prints 0.1817 and 0.1801. Now a layover makes the trip longer.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kullback-Leibler divergence Kullback-Leibler divergence is defined for two random variables\u00a0X and Y by K-L divergence is non-negative, and it&#8217;s zero if and only if X and\u00a0Y have the same distribution. But it is not a metric, for reasons explained here. For one thing, it&#8217;s not symmetric. Jeffreys divergence We can fix the symmetry problem by [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[68],"class_list":["post-247046","post","type-post","status-publish","format-standard","hentry","category-math","tag-information-theory"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247046","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/comments?post=247046"}],"version-history":[{"count":0,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/posts\/247046\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/media?parent=247046"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/categories?post=247046"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.johndcook.com\/blog\/wp-json\/wp\/v2\/tags?post=247046"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}]