Named entity recognition

Named entity recognition (NER) is a task of natural language processing: pull out named things text. It sounds like trivial at first. Just create a giant list of named things and compare against that.

But suppose, for example, University of Texas is on your list. If Texas is also on your list, do you report that you have a named entity inside a named entity? And how do you handle The University of Texas? Do you put it on your list as well? What about UT? Can you tell from context whether UT stands for University of Texas, University of Toronto, or the state of Utah?

Searching for Rice University would be even more fun. The original name of the school was The William Marsh Rice Institute for the Advancement of Letters, Science, and Art. I don’t know whether the name was ever officially changed. A friend who went to Rice told me they had a ridiculous cheer that spelled out every letter in the full name. And of course rice could refer to a grain.

Let’s see what happens when we run the following sentence through spaCy looking for named entities.

Researchers from the University of Texas at Austin organized a picleball game with their colleagues from Rice University on Tuesday.

I deliberately did not capitalize the definite article in front of University of Texas because I suspected spaCy might include the article if it were capitalized but not otherwise. It included the article in either case.

The results depend on the language model used. When I used en_core_web_trf it included at Austin as part of the university name.

When I used the smaller en_core_web_sm model it pulled out Austin as a separate entity.

The tag ORG stands for organization and DATE obviously stands for date. GPE is a little less obvious, standing for geopolitical entity.

When I changed Rice University to simply Rice, spaCy still recognized Rice as an organization. When I changed it to rice with no capitalization, it did not recognize it as an organization.

The other day I stress tested spaCy by giving it some text from Chaucer’s Canterbury Tales. Even though spaCy is trained on Modern English, it did better than I would have expected on Middle English.

Using the en_core_web_trf model it recognizes Engelond and Caunterbury as cities.

When I switched to  en_core_web_sm it still recognized Caunterbury as city, but tagged Engelond as a person.

 

Jaccard index and jazz albums

Miles Davis Kind of Blue album cover

Jaccard index is a way of measuring the similarity of sets. The Jaccard index, or Jaccard similarity coefficient, of two sets A and B is the number of elements in their intersection, AB, divided by the number of elements in their union, AB.

J(A, B) = \frac{|A \cap B|}{|A \cup B|}

Jaccard similarity is a robust way to compare things in machine learning, say in clustering algorithms, less sensitive to outliers than other similarity measures such as cosine similarity.

Miles Davis Albums

Here we’ll illustrate Jaccard similarity by looking at the personnel on albums by Miles Davis. Specifically, which pair of albums had more similar personnel: Kind of Blue and Round About Midnight, or Bitches Brew and In a Silent Way?

There were four musicians who played on both Kind of Blue and Round About Midnight: Miles Davis, Cannonball Adderly, John Coltrane, and Paul Chambers.

There were six musicians who played on both Bitches Brew and In a Silent Way: Miles Davis, Wayne Shorter, Chick Corea, Dave Holland, and John McLaughlin, Joe Zawinul.

The latter pair of albums had more personnel in common, but they also had more personnel in total.

There were 9 musicians who performed on either Kind of Blue or Round About Midnight. Since 4 played on both albums, the Jaccard index comparing the personnel on the two albums is 4/9.

In a Silent Way and especially Bitches Brew used more musicians. A total of 17 musicians performed on one of these albums, including 6 who were on both. So the Jaccard index is 6/17.

Jaccard distance

Jaccard distance is the complement of Jaccard similarity, i.e.

d_J(A, B) = 1 - J(A,B)

In our example, the Jaccard distance between Kind of Blue and Round About Midnight is 1 − 4/9 = 0.555. The Jaccard distance between Bitches Brew and In a Silent Way is 1 − 6/17 = 0.647.

Jaccard distance really is a distance. It is clearly a symmetric function of its arguments, unlike Kulback-Liebler divergence, which is not.

The difficulty in establishing that Jaccard distance is a distance function, i.e. a metric, is the triangle inequality. The triangle inequality does hold, though this is not simple to prove.

Trying NLP on Middle English

It’s not fair to evaluate NLP software on a language it wasn’t designed to process, but I wanted to try it anyway.

The models in the spaCy software library were trained on modern English text and not on Middle English. Nevertheless, spaCy does a pretty good job of parsing Chaucer’s Canterbury Tales, written over 600 years ago. I used the model en_core_web_lg in my little experiment below.

The text I used comes from the prologue:

From every shires ende
of Engelond to Caunterbury they wende
the hooly blisful martir for to seke
that hem hath holpen
whan that they were seeke.

The software correctly identifies, for example, wende (went) and seke (seak) as verbs, and seeke (sick) as an adjective. Overall it does a pretty good job. I imagine it would do worse on Middle English text that differed more from Modern English usage, but so would a contemporary human who doesn’t know Middle English.

Related posts

Natural language processing and unnatural text

I recently evaluated two software applications designed to find PII (personally identifiable information) in free text using natural language processing. Both failed badly, passing over obvious examples of PII. By contrast, I also tried natural language processing software on a nonsensical poem, it the software did quite well.

Doctor’s notes

It occurred to me later that the software packages to search for PII probably assume “natural language” has the form of fluent prose, not choppy notes by physicians. The notes that I tested did not consist of complete sentences marked up with grammatically correct punctuation. The text may have been transcribed from audio.

Some software packages deidentify medical notes better than others. I’ve seen some work well and some work poorly. I suspect the former were written specifically for their purpose and the latter were more generic.

Jabberwocky

I also tried NLP software on Lewis Carroll’s poem Jabberwocky. It too is unnatural language, but in a different sense.

Jabberwocky uses nonsense words that Carroll invented for the poem, but otherwise it is grammatically correct. The poem is standard English at the level of structure, though not at the level of words. It is the opposite of medical notes that are standard English at the word level (albeit with a high density of technical terms), but not at a structural level.

I used the spaCy natural language processing library on a couple stanzas from Lewis’ poem.

“Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!”

He took his vorpal sword in hand;
Long time the manxome foe he sought—
So rested he by the Tumtum tree
And stood awhile in thought.

I fed the lines into spaCy and asked it to diagram the lines, indicating parts of speech and dependencies. The software did a good job of inferring the use of even the nonsense words. I gave the software one line at a time rather than a stanza at a time because the latter results in diagrams that are awkwardly wide, too wide to display here. (The spaCy visualization software has a “compact” option, but this option does not make the visualizations much more compact.)

Here are the visualizations of the lines.








And here is the Python code I used to create the diagrams above.

    import spacy
    from spacy import displacy
    from pathlib import Path
    
    nlp = spacy.load("en_core_web_sm")
        
    lines = [
        "Beware the Jabberwock, my son!",
        "The jaws that bite, the claws that catch!",
        "Beware the Jubjub bird",
        "Shun the frumious Bandersnatch!",
        "He took his vorpal sword in hand.",
        "Long time the manxome foe he sought",
        "So rested he by the Tumtum tree",
        "And stood awhile in thought."
    ]
    
    for line in lines:
        doc = nlp(line)
        svg = displacy.render(doc, style="dep", jupyter=False)    
        file_name = '-'.join([w.text for w in doc if not w.is_punct]) + ".svg"
        output_path = Path(file_name)
        output_path.open("w", encoding="utf-8").write(svg)

Related posts

How an LLM might leak medical data

Machine learning models occasionally memorize training data. Under the right prompt, a model could return portions of the training data verbatim.

If a large language model is trained on deidentified medical data, along with data that overlaps with the medical data, it could potentially leak details of a person’s medical history. I’m not saying that any particular LLM does leak data, but it’s something to watch out for if you are training an LLM.

One way to try to create a data leak would be to prompt the LLM with the data you have and see whether it will fill in more. For example, I asked Bing AI

Tell me about a man from Arkansas with ICD-10 code C50.921.

It came back with the following.

… Once upon a time, there was a man named John who lived in Arkansas. John was a hardworking man who loved spending time with his family and friends. One day, John noticed a lump in his right breast. He went to the doctor and was diagnosed with malignant neoplasm of unspecified site of the right male breast, classified under the ICD-10 code C50.921. …

I chose this diagnosis code because male breast cancer is extremely rare, and so the risk of reidentification is higher than for a more common diagnosis.

The model filled in the detail that the man’s name is John. This is not at all suspicious since John is the one of the most common male names in history. If the man’s name had been Vivek, there would be more cause to suspect the model is telling us about a real man name Vivek, though of course that could be a randomly chosen male name.

If a neural network were training on deidentified medical data, it could link fields together. If the story above had said “John, aged 42, …” the age might have been pulled from an actual patient record.

If the data the network was trained on was deidentified well, even leaking data verbatim should not create more than a very small risk of identification. However, if the data contained tokens linking the records to publicly available information, such as real estate records—this happens—then our hypothetical LLM might reveal more personal details that could be used to narrow down whose data is being leaked.

Related posts

Experiments with Bing chat

My two previous posts looked at experiments with ChatGPT and Google Bard. This post will look at redoing the same experiments with Microsoft’s Bing Chat: looking for mnemonic encodings and simplifying Boolean expressions.

When you open up Bing chat you can select a conversational style:

  • More creative
  • More balanced
  • More precise

I chose “more precise” for my experiments.

Mnemonic encodings

Bing seems to “know” more about the Major mnemonic system. In response to my first question about the system it gave a link to the Wikipedia article on the Major system and a couple other relevant links. It gives a more complete table of encodings than ChatGPT did:

  • 0: s, z
  • 1: t, d, th
  • 2: n
  • 3: m
  • 4: r
  • 5: l
  • 6: j, ch, sh
  • 7: c, k, g, q, ck
  • 8: v, f, ph
  • 9: p, b

Directly asking for encodings

It seems to do pretty well when I ask for an encoding of a two-digit number, and when I ask for an encoding of a three-digit number it encodes the first two digits correctly but ignores the third digit. However, when I asked it to encode 345 it did very well:

Using the Major System, the number 345 can be encoded as “molar” or “mailer” or “miller” or “mural” or “marl”, among others.

ChatGPT and Google Bard said that no English word can encode 345, though Google Bard came up with Merlin anyway. (Strictly speaking “Merlin” encodes 3452, but it’s a common practice to just pay attention the first three consonants in words with more than three consonants.)

Asking for sounds

When I asked “Find a word with the sounds g, r, and f” Bing came back with

 One word that contains the sounds g, r, and f is “giraffe” 🦒

When I asked for the sounds l, f, and b, Bard came up with “belief.” When I added “in that order” to my prompt it came back with “bluff” and “flyby,” ignoring the order requirement.

In general Bing AI seems to do better that ChatGPT and Bard at word game questions.

Boolean minimization

Next I tried the Boolean minimization problem that I’d tried with ChatGPT and Bard using the prompt

Minimize the boolean expression ¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD

ChatGPT tried a Karnaugh map approach, but botched it. Bard tried an algebraic approach, but also botched it.

Bing tried something different:

The given boolean expression can be simplified using the consensus theorem. The consensus theorem states that for any three variables A, B, and C: AB + AC + BC = AB + AC. Applying this theorem to the given expression, we get

¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD
= ¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC
= ¬AB¬C¬D + A( ¬B¬C¬D + ¬B¬CD + ¬BC¬D + ¬BCD) + AB( ¬C¬D) + ABC
= ¬AB( ¬C) ( ¬D) + A( ¬B) ( ¬C( ¬D) + ¬CD) + AB( ¬C( ¬D)) + ABC
= ¬AB( ¬C) ( ¬D) + A( ¬B) ( ¬CD) + AB( ¬C( ¬D)) + ABC
= (A+B)(A+C)(A+D)(B+C)(B+D)(C+D)

This derivation starts out correctly but goes astray about halfway through.

 

Boolean function minimization with AI

Screenshot of Karnaugh map produced by ChatGPT

I was curious how well LLMs would do at minimizing a Boolean expression, that is, taking a Boolean expression and producing a smaller equivalent expression.

I didn’t expect good performance because this problem is more about logic than recall, but sometimes LLMs surprise you, so I wanted to give it a chance. I thought it might accomplish some reduction in size though not as much as possible. Let’s see what it did.

Prompt

My prompt for the three platforms I tried—GPT4All, ChatGPT, and Google Bard—was

Minimize the boolean expression ¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD

I knew the expression could be minimized to

B¬C¬D + A¬D + AC.

GPT4All

I first tried GPT4All using the Groovy data set. It crashed immediately. I tried a few variations on the same prompt and it crashed right away every time.

This was on a MacBook Pro. I didn’t try it on Windows or Linux.

ChatGPT

I’ll go through ChatGPT’s response a little at a time.

To minimize the given boolean expression ¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD, we can use Karnaugh maps or Boolean algebra simplification techniques. I’ll demonstrate both methods.

This is a sensible start. Karnaugh maps are a common classroom technique for solving small Boolean minimization problems, so there were lots of examples of Karnaugh maps in the data scraped to train ChatGPT.

Next ChatGPT draws a Karnaugh map. The screenshot at the top of the post was from a different experiment, a version of the experiment described here but containing an error on my part. This problem produced a black image simply containing “CD” but also contained a Karnaugh map in text. Cleaning up the formatting we get the following.

       | 00 | 01 | 11 | 10 |
    00 |  1 |  0 |  1 |  0 |
    01 |  1 |  1 |  1 |  1 |
    11 |  1 |  1 |  1 |  1 |
    10 |  0 |  1 |  0 |  0 | 

This is nonsense. Nine out of sixteen entries are wrong.

ChatGPT concludes

Grouping the adjacent 1’s in the Karnaugh map, we can simplify the expression to: ¬A¬C + ¬A¬D + A¬B + ACD

This is also nonsense. It does not correctly solve the original problem, nor does it solve the problem corresponding to the incorrect Karnaugh map above.

Next let’s see how it does with Boolean algebra. It starts out OK.

Boolean Algebra Method: We can simplify the expression using Boolean algebra identities and rules.

¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD

Rearranging the terms: A¬B¬C¬D + A¬B¬CD + AB¬C¬D + A¬BC¬D + ABC¬D + A¬BCD + ABCD + A¬B¬C¬D

Grouping the terms: A¬B¬C¬D + A¬B¬CD + AB¬C¬D + A¬BC¬D + ABC¬D + A¬BCD + ABCD

Combining like terms: (A + A)¬B¬C¬D + (A + A)¬B¬CD + AB¬C¬D + (A + A)¬BC¬D + ABC¬D + A¬BCD + ABCD

These steps haven’t accomplished anything, but they’re not wrong. Then it goes off the rails.

Using the complement law A + ¬A = 1: ¬B¬C¬D + ¬B¬CD + AB¬C¬D + ¬BC¬D + ABC¬D + A¬BCD + ABCD

The explanation is wrong and the solution is wrong.

The solution is completely wrong, but it looks plausible. At first glance I thought it might be right, or more right than it was. If you were making a movie and you needed to generate electrical engineering homework, this would be adequate for a realistic-looking homework solution.

Google Bard

Google Bard gives a more succinct solution with five steps:

  1. Identify the common terms in the expression. The common terms are ¬AB¬C¬D, A¬B¬CD, and ABC¬D.
  2. Combine the common terms. This gives us the expression (¬AB¬C¬D + A¬B¬CD + ABC¬D) + (A¬B¬C¬D + AB¬C¬D).
  3. Apply the distribution property. This gives us the expression ¬AB¬C¬D + A¬B¬CD + ABC¬D + A¬B¬C¬D + AB¬C¬D.
  4. Identify the redundant terms. The redundant terms are ¬AB¬C¬D and A¬B¬C¬D.
  5. Remove the redundant terms. This gives us the expression A¬B¬CD + ABC¬D.

This too is nonsense.

Large language models and mnemonics

The Major mnemonic system encodes numbers as words in order to make them easier to remember. Digits correspond to consonant sounds (not spellings) as explained here. You can use the system ad hoc, improvising an encoding of a word as needed, or you can memorize canonical encodings of numbers, also known as pegs.

Pegs have couple advantages. For one, they are eventually faster. Rather than search for an encoding, you recall a predetermined association. For another, pegs are better for memorizing numbered lists. To recall the nth item in a list, pull up the one-and-only image associated with that number and recall what you associated with it.

Coming up with encodings for most numbers is easy, but for a few holdouts it’s challenging. I created a personal list of pegs for numbers up to 1000, and a few of these are idiosyncratic, such as names of people who are not well known and some bend the rules of the Major system.

I decided to try the ChatGPT and Google Bard large language models to see whether they could help me find pegs that I could share with a general audience, free of obscure references.

Asking directly about the Major system

ChatGPT and Google Bard both seems to have some idea of the Major mnemonic system. As part of its reply to one of my prompts, Google Bard returned this:

    0: S, Z, soft C
    1: D, T, TH
    2: N, L, M
    3: M
    4: R
    5: L
    6: J, SH, soft CH, DG
    7: K
    8: P
    9: H

which is correct, except for 8 and 9, and some incorrect alternatives for 2.

When I asked ChatGPT for “Major mnemonic system encoding table” it returned a correct table. But when I asked it for encodings of numbers, it used different rules. For example, it seemed to systematically use M for 2.

I asked both systems

Find a word that encodes the number 792 in the Major memory system

ChatGPT came back with KNP and GNP. These aren’t words, and it seems ChatGPT might have dyslexia.

Bard said

There is no word in the English language that encodes the number 792 in the Major memory system. The closest word is “Sink”, but it only encodes the digits 7 and 9.

This was meant to be an easy question since there are common English words to encode 792, such as cabin. Neither system came up with good responses to the numbers I found challenging, and it may be that there simple are no English words with the desired sequence of sounds.

Before moving on, I wanted to mention one response from Google Bard that I found amusing. When I asked

Find a word that encodes 345 using the Major mnemonic system

Bard responded with

… The encoded number 345 is therefore MERLE.

However, there is no word in the English language that is spelled exactly as MERLE. However, there are a few words that are similar, such as Merry, Mettle, and Melt.

If you are looking for a word that is spelled exactly as MERLE, you may need to create a new word. For example, you could create the word Merlin, which is a name that is often associated with magic and wisdom.

Asking for sounds

The Major system is not well known, so I didn’t expect a direct approach to be that helpful. Next I tried asking for words with particular sequences of sounds rather than asking directly about the Major system.

When I asked both systems

Find a word containing the consonant sounds k, b, and n, in that order

ChatGPT came back with cabin and Bard said

There is no word in the English language that contains the consonant sounds k, b, and n in that order. The closest word is “knob”, but the n sound is pronounced as “ng” in this word.

So the point goes to GPT on that one.

Both systems did poorly on the numbers I found a hard time finding pegs for, possibly because there aren’t any (or many) English words with the specified sounds. With easier questions, the results were mixed.

When I asked

Find a word with the sounds g, r, and f

I was thinking giraffe. ChatGPT returned graf and Bard returned graft.

When I asked

Find a word with sounds l, f, and b

I was thinking, for example, lifeboat. But ChatGPT came back with

A word that contains the consonant sounds “l,” “f,” and “b” is “elfishly.”

and Bard replied

The word “blurb” contains the sounds l, f, and b in that order.

Phonetics

Both systems did well when I asked directly for the phonetic spelling of a word. So I tried to get both systems to come up with lifeboat by asking

Words whose IPA phonetic spellings contain l, f, and b

Bard found words that had some of these letters but not all. When I was more specific, asking

Words whose IPA phonetic spellings contain each of the letters l, f, and b

the results weren’t better.

ChatGPT, on the other hand, came up with affleb, triflingly, bluff, lifeboat, and fallopian. When I asked the more specific question, adding “each of”, ChatGPT returned alphabet, fulbright, bubluous, flyable, and belltry. When I added “in that order” to the prompt, ChatGPT said there are no such words, despite having some up with several such words a moment earlier.

Related posts

Buy one, get one free

phase portrait k = 1/2

The two latest blog post have elaborated on the fact that when a function satisfies a second order differential equation, once you’ve evaluated the function and its derivative you get the second derivative for free, or at least for cheap.

Sometimes functions are so closely related that software libraries bundle them together. For example, in Python’s scientific computing library SciPy there is no function for returning just one Jacobi elliptic function. Because it takes about as much work to compute one of the functions sn, cn, dn, and ph as it does to compute all of them, SciPy provides a function ellipj that returns all four at once. If you only want to compute sn, call ellipj and discard all but the first of the four values returned. No loss. But if you need two or more of the functions, the bundling saves compute time.

Jacobi functions

I haven’t looked at how SciPy computes Jacobi functions, but I imagine the code to compute the various functions overlaps so much that the addition cost of computing all the functions is minimal if you’re going to compute one of them. Jacobi functions are related to theta functions, and theta functions can be computed very efficiently.

I wrote about the interconnections between Jacobi functions in an earlier post. The three functions sn, cn, and dn all satisfy a simple system of differential equations

\begin{eqnarray*} x' &=& yz \\ y' &=& -zx \\ z' &=& -k^2 xy \end{eqnarray*}

with initial conditions x(0) = 0, y(0) = 1, and z(0) = 1. The image at the top of the post plots (sn(t, cn(t), dn(t)) and is taken from that post. The Jacobi functions are analogous to trig functions and like trig functions they satisfy myriad identities relating the functions to each other.

Normal random values

An example BOGO (buy one, get one free) in probability is generating samples from a normal random variable. The most common approach generates two uniform samples and transforms them into two normal samples. So if you ask the software for one normal random value, it could give you a second one at no extra computational expense. Using this approach, a function to compute a vector of 2n random values could do so in the time it would take to generate 1 random value n times.

Automatic differentiation

A final example that I’ll mention is automatic differentiation. With this technique it is often possible to evaluate a function and its gradient in little more time than it takes to evaluate the function alone. Having an explicit form of the derivative is not only unnecessary, it may be less efficient than letting automatic differentiation compute the derivative.

Cautionary example

On the flip side, it sometimes seems that you can buy one and get one free when you can’t or shouldn’t. For example, software libraries often have routines to evaluate the error function erf(x) and its complement erfc(x) = 1 – erf(x). In theory you can calculate one from the other, and often in practice too. But if erf(x) is close to 1, 1 – erf(x) might evaluate to zero in floating point arithmetic even though erfc(x) is not zero. For more examples along these lines, see Math functions that seem unnecessary.

Halley’s variation on Newton’s method

Newton’s method for solving f(x) = 0 converges quickly once it starts to converge. Specifically, it converges quadratically: the error is roughly squared at each step. This means that the number of correct decimal places doubles at each iteration of Newton’s method.

Doubling the number of decimal places is nice, but what if we could triple the number of correct decimals at each step? Edmond Halley, best known for the comet that bears his name, came up with a way to do just that. Newton’s method converges quadratically, but Halley’s method converges cubically.

Newton’s method updates iterations using the rule

x_{n+1} = x_n + \frac{f(x_n)}{f'(x_n)}

Halley’s variation uses the rule

x_{n+1} = x_n - \frac{1}{\dfrac{f'(x_n)}{f(x_n)} - \dfrac{f''(x)}{2f'(x)}}

Halley’s method makes more progress per iteration, but it also takes more work per iteration, and so in practice it’s not usually an improvement over Newton’s method. But in some cases it could be.

Evaluating efficiency

In root-finding problems, the function f is usually relatively time-consuming to evaluate. We can assume nearly all the work in root-finding goes into evaluating f and its derivatives, and we can ignore the rest of the arithmetic in the method.

Newton’s method requires evaluating the function f and its derivative f ′. Halley’s method requires these function evaluates as well and also requires evaluating the second derivative f ′′.

Three iterations of Newton’s method make as much progress toward finding a root as two iterations of Halley’s method. But since Newton’s method requires two function evaluations and Halley’s method requires three, both require six function evaluations to make the same amount of progress. So in general Halley’s method is not an improvement over Newton’s method.

When Halley might be better

Suppose we had a way to quickly compute the second derivative of f at a point from the values of f and its first derivative at that point, rather than by having to evaluate the second derivative explicitly. In that case two iterations of Halley’s method might make as much progress as three iterations of Newton’s method but with less effort, four function evaluations rather than six.

Many of the functions that are used most in applied mathematics satisfy 2nd order differential equations, and so it is common to be able to compute the second derivative from the function and its derivative.

For example, suppose we want to find the zeros of Bessel functions Jν. Bessel functions satisfy Bessel’s differential equation; that’s where Bessel functions get their name.

x^2 y'' + x y' + \left(x^2 - \nu^2 \right) y = 0

This means that once we have evaluated Jν and its first derivative, we can compute the second derivative from these values as follows.

J_{\nu}''(x_n) = \frac{(\nu^2 - x_n^2)J_\nu(x_n) - x_nJ_\nu'(x_n)}{x_n^2}

Maybe we’re not working with a well-studied function like a Bessel function but rather we are numerically solving a differential equation of our own creation and we want to know where the solution is zero. Halley’s method would be a natural fit because our numerical method will give us the values of y and y ′ at each step.

Related posts