RSA encryption exponents are mostly all the same

The big idea of public key cryptography is that it lets you publish an encryption key e without compromising your decryption key d. A somewhat surprising detail of RSA public key cryptography is that in practice e is nearly always the same number, specifically e = 65537. We will review RSA, explain how this default e was chosen, and discuss why this may or may not be a problem.

RSA setup

Here’s the setup for RSA encryption in a nutshell.

  1. Select two very large prime numbers p and q.
  2. Compute npq and φ(n) = (p – 1)(q – 1).
  3. Choose an encryption key e relatively prime to φ(n).
  4. Calculate the decryption key d such that ed = 1 (mod φ(n)).
  5. Publish e and n, and keep dp, and q secret.

The asymmetry in the encryption scheme comes from the fact that the person who knows p and q can compute n, φ(n), and d easily, but no one else can find d without factoring n. (Or so it seems.)

The encryption exponent

Here we focus on the third step above. In theory e could be very large, but very often e = 65537. In that case, we don’t actually choose e to be relatively prime to φ(n). Instead, we pick p and q, and hence n, and usually φ(n) will be relatively prime to 65537, but if it’s not, we choose p and q again until gcd(65537, φ(n)) = 1. Kraft and Washington [1] have this to say about the choice of e:

… surprisingly, e does not have to be large. Sometimes e = 3 has been used, although it is usually recommended that larger values be used. In practice, the most common value is e = 65537, The advantages are that 65537 is a relatively large prime, so it’s easier to arrange that gcd(e, φ(n)) = 1, and it is one more than a power of 2, so raising a number to the 65537th power consists mostly of squarings …

In short, the customary value of e was chosen for efficiency.

Heninger and Shacham [2] go into more detail.

The choice e = 65537 = 216 + 1 is especially widespread. Of the certificates observed in the UCSD TLS 2 Corpus (which was obtained by surveying frequently-used TLS servers), 99.5% had e = 65537, and all had e at most 32 bits.

So the “most common value” mentioned by Kraft and Washington appears to be used 99.5% of the time.

Is this a problem?

Is it a problem that e is very often the same number? At first glace no. If the number e is going to be public anyway, there doesn’t seem to be any harm in selecting it even before you choose p and q.

There may be a problem, however, that the value nearly everyone has chosen is fairly small. In particular, [2] gives an algorithm for recovering private RSA keys when some of the bits are know and the exponent e is small. How would some of the bits be known? If someone has physical access to a computer, they can recover some bits from it’s memory.

Related posts

[1] James S. Kraft and Lawrence C. Washington. An Introduction to Number Theory with Cryptography. 2nd edition.

[2] Nadia Heninger and Hovav Shacham. Reconstructing RSA Private Keys from Random Key Bits.

Revealing information by trying to suppress it

FAS posted an article yesterday explaining how blurring military installations out of satellite photos points draws attention to them, showing exactly where they are and how big they are. The Russian mapping service Yandex Maps blurred out sensitive locations in Israel and Turkey. As the article says, this is an example of the Streisand effect, named after Barbara Streisand’s failed attempt to suppress photos of her Malibu home. Her efforts to protect her privacy caused the photos to go viral.

A similar issue arises in differential privacy. A practical implementation of differential privacy must rule out some queries a priori in some way that is not data-dependent. Otherwise it could be a violation of differential privacy either to answer or refuse to answer the same query. Short of refusing the query entirely, it could also informative to reply “Are you sure you want to submit that query? It’s going to use up a lot of your privacy budget.”

Differential privacy adds random noise to query results, but in a more sophisticated way than Yandex blurring out portions of photos. Adding noise selectively reveals information about what the noise is trying to conceal. As the FAS article indicated, by blurring out only the sensitive areas, the blurred maps point out their exact location.

Selective security measures can have a similar effect. By placing greater protection on some information, you can mark that information as being particularly important. Once you realize this, otherwise nonsensical security measures start to make sense: applying uniform security results in excessive protection for some information, but keeps attackers from being able to identify the most valuable targets.

Related posts

Numerical methods blog posts

I recently got a review copy of Scientific Computing: A Historical Perspective by Bertil Gustafsson. I thought that thumbing through the book might give me ideas for new topics to blog about. It still may, but mostly it made me think of numerical methods I’ve already blogged about.

In historical order, or at least in the order of Gustafsson’s book:

The links above include numerical methods I have written about. What about the methods I have not written about?

PDEs

I like to write fairly short, self-contained blog posts, and I’ve written about algorithms compatible with that. The methods in Gustafsson’s book that I haven’t written about are mostly methods in partial differential equations. I don’t see how to write short, self-contained posts about numerical methods for PDEs. In any case, my impression is that not many readers would be interested. If you are one of the few readers who would like to see more about PDEs, you may enjoy the following somewhat personal rambling about PDEs.

I studied PDEs in grad school—mostly abstract theory, but also numerical methods—and expected PDEs to be a big part of my career. I’ve done some professional work with differential equations, but the demand for other areas, particularly probability and statistics, has been far greater. In college I had the impression that applied math was practically synonymous with differential equations. I think that was closer to being true a generation or two ago than it is now.

My impression of the market demand for various kinds of math is no doubt colored by my experience. When I was at MD Anderson we had one person in biomathematics working in PDEs, and then none when she left. There may have been people at MDACC doing research into PDEs for modeling cancer, but they weren’t in the biostatistics or biomathematics departments. I know that people are working with differential equations in cancer research, but not at the world’s largest cancer center, and so I expect there aren’t many doing research in that area.

Since leaving MDACC to work as a consultant I’ve seen little demand for differential equations, more in Kalman filters than in differential equations per se. A lot of companies hire people to numerically solve PDEs, but there don’t seem to be many who want to use a consultant for such work. I imagine most companies with an interest in PDEs are large and have a staff of engineers and applied mathematicians working together. There’s more demand for statistical consulting because companies are likely to have an occasional need for statistics. The companies that need PDEs, say for making finite element models of oil reservoirs or airplanes, have an ongoing need and hire accordingly.

Simulating identification by zip code, sex, and birthdate

As mentioned in the previous post, Latanya Sweeney estimated that 87% of Americans can be identified by the combination of zip code, sex, and birth date. We’ll do a quick-and-dirty estimate and a simulation to show that this result is plausible. There’s no point being too realistic with a simulation because the actual data that Sweeney used is even more realistic. We’re just showing that her result is reasonable.

Quick estimate

Suppose average life expectancy is around 78 years. If everyone is between 0 and 78, then there are 78*365 possible birth dates and twice that many combinations of birth date and sex.

What’s the average population of a US zip code? We’ll use 9,330 for reasons explained in [1].

We have 56,940 possible birth date and sex combinations for 9,330 people. There have to be many unused birth date and sex combinations in a typical zip code, and it’s plausible that many combinations will only be used once. We’ll run a Python simulation to see just how many we’d expect to be used one time.

Python simulation

The array demo below will keep track of the possible demographic values, i.e. combinations of birth date and sex. We’ll loop over the population of the zip code, randomly assigning everyone to a demographic value, then see what proportion of demographic values is only used once.

    from random import randrange
    from numpy import zeros
    
    zctasize = 9330
    demosize = 365*78*2
    demo = zeros(demosize)
    
    for _ in range(zctasize):
        d = randrange(demosize)
        demo[d] += 1
    
    unique = len(demo[demo == 1])
    print(unique/zctasize)

I ran this simulation 10 times and got values ranging from 84.3% to 85.7%.

Analytic solution

As Road White points out in the comments, you can estimate the number of unique demographics by a probability calculation.

Suppose there are z inhabitants in our zip code and d demographic categories. We’re assuming here (and above) that all demographic categories are equally likely, even though that’s not true of birth dates.

We start by looking at a particular demographic category. The probability that exactly one person falls in that category is

\frac{z}{d}\left(1 - \frac{1}{d}\right)^{z-1}

To find the expected number of demographic slots with exactly one entry we multiply by d, and to get the proportion of p of people this represents we divide by z.

\log p = (z-1)\log\left(1 - \frac{1}{d}\right) \approx - \frac{z}{d}

and so

p \approx \exp(-z/d)

which in our case is 84.8%, consistent with our simulation above.

Here we used the Taylor series approximation

\log(1 + x) = x + {\cal O}(x^2)

If z is of the same magnitude as d or smaller, then the error in the approximation above is O(1/d).

By the way, a similar calculation shows that the expected number of demographic slots containing two people is r exp(-r)/2 where rz/d. So while 85% can be uniquely identified, another 7% can be narrowed down to two possibilities.

Related posts

[1] I don’t know, but I do know the average population of a “zip code tabulation area” or ZCTA, and that’s 9,330 according to the data linked to here. As I discuss in that post, the Census Bureau reports population by ZTCA, not by zip code per se, for several reasons. Zip codes can cross state boundaries, they are continually adjusted, and some zip codes contain only post office boxes.

No funding for uncomfortable results

In 1997 Latanya Sweeney dramatically demonstrated that supposedly anonymized data was not anonymous. The state of Massachusetts had released data on 135,000 state employees and their families with obvious identifiers removed. However, the data contained zip code, birth date, and sex for each individual. Sweeney was able to cross reference this data with publicly available voter registration data to find the medical records of then Massachusetts governor William Weld.

An estimated 87% of Americans can be identified by the combination of zip code, birth date, and sex. A back-of-the-envelope calculation shows that this should not be surprising, but Sweeney appears to be the first to do this calculation and pursue the results. (Update: See such a calculation in the next post.)

In her paper Only You, Your Doctor, and Many Others May Know Sweeney says that her research was unwelcome. Over 20 journals turned down her paper on the Weld study, and nobody wanted to fund privacy research that might reach uncomfortable conclusions.

A decade ago, funding sources refused to fund re-identification experiments unless there was a promise that results would likely show that no risk existed or that all problems could be solved by some promising new theoretical technology under development. Financial resources were unavailable to support rigorous scientific studies otherwise.

There’s a perennial debate over whether it is best to make security and privacy flaws public or to suppress them. The consensus, as much as there is a consensus, is that one should reveal flaws discreetly at first and then err on the side of openness. For example, a security researcher finding a vulnerability in Windows would notify Microsoft first and give the company a chance to fix the problem before announcing the vulnerability publicly. In Sweeney’s case, however, there was no single responsible party who could quietly fix the world’s privacy vulnerabilities. Calling attention to the problem was the only way to make things better.

Related posts

Photo of Latanya Sweeney via Parker Higgins [CC BY 4.0], from Wikimedia Commons

Sine of a googol

How do you evaluate the sine of a large number in floating point arithmetic? What does the result even mean?

Sine of a trillion

Let’s start by finding the sine of a trillion (1012) using floating point arithmetic. There are a couple ways to think about this. The floating point number t = 1.0e12 can only be represented to 15 or 16 significant decimal figures (to be precise, 53 bits [1]), and so you could think of it as a representative of the interval of numbers that all have the same floating point representation. Any result that is the sine of a number in that interval should be considered correct.

Another way to look at this is to say that t can be represented exactly—its binary representation requires 42 bits and we have 53 bits of significand precision available—and so we expect sin(t) to return the sine of exactly one trillion, correct to full precision.

It turns out that IEEE arithmetic does the latter, computing sin(1e12) correctly to 16 digits.

Here’s the result in Python

    >>> sin(1.0e12)
    -0.6112387023768895

and verified by Mathematica by computing the value to 20 decimal places

    In:= N[Sin[10^12], 20]
    Out= -0.61123870237688949819

Range reduction

Note that the result above is not what you’d get if you were first to take the remainder when dividing by 2π and then taking the sine.

    >>> sin(1.0e12 % (2*pi))
    -0.6112078499756778

This makes sense. The result of dividing a trillion by the floating point representation of 2π, 159154943091.89536, is correct to full floating point precision. But most of that precision is in the integer part. The fractional part is only has five digits of precision, and so we should expect the result above to be correct to at most five digits. In fact it’s accurate to four digits.

When your processor computes sin(1e12) it does not naively take the remainder by 2π and then compute the sine. If it did, you’d get four significant figures rather than 16.

We started out by saying that there were two ways of looking at the problem, and according to the first one, returning only four significant figures would be quite reasonable. If we think of the value t as a measured quantity, measured to the precision of floating point arithmetic (though hardly anything can be measured that accurately), then four significant figures would be all we should expect. But the people who designed the sine function implementation on your processor did more than they might be expected to do, finding the sine of a trillion to full precision. They do this using a range reduction algorithm that retains far more precision than naively doing a division. (I worked on this problem a long time ago.)

Sine of a googol?

What if we try to take the sine of a ridiculously large number like a googol (10100)? This won’t work because a googol cannot be represented exactly as a floating point number (i.e. as a IEEE 754 double). It’s not too big; floating point numbers can be as large as around 10308. The problem is that a number that big cannot be represented to full precision. But we can represent 2333 exactly, and a googol is between 2332 and 2333. And amazingly, Python’s sine function (or rather the sine function built into the AMD processor on my computer) returns the correct result to full precision.

    >>> sin(2**333)
    0.9731320373846353

How do we know this is right? I verified it in Mathematica:

    In:= Sin[2.0^333]
    Out= 0.9731320373846353

How do we know Mathematica is right? We can do naive range reduction using extended precision, say 200 decimal places.

    In:= p = N[Pi, 200]
    In:= theta = x - IntegerPart[x/ (2 p)] 2 p
    Out= 1.8031286178334699559384196689…

and verify that the sine of the range reduced value is correct.

    In:= Sin[theta]
    Out= 0.97313203738463526…

Interval arithmetic interpretation

Because floating point numbers have 53 bits of precision, all real numbers between 256 – 22 and 256 + 22 are represented as the floating point number 256. This is a range of width 8, wider than 2π, and so the sines of the numbers in this range cover the possible values of sine, i.e. [-1, 1]. So if you think of a floating point number as a sample from the set of all real numbers with the same binary representation, every possible value of sine is a valid return value for numbers bigger than 256. But if you think of a floating point number as representing itself exactly, then it makes sense to ask for the sine of numbers like 2333 and larger, up to the limits of floating point representation [2].

Related posts

[1] An IEEE 754 double has 52 significand bits, but these bits can represent 53 bits of precision because the first bit of the fractional part is always 1, and so it does not need to be represented. See more here.

[2] The largest exponent of an IEEE double is 1023, and the largest significand is 2 – 2-53 (i.e. all 1’s), so the largest possible value of a double is

(253 – 1)21024-53

and in fact the Python expression sin((2**53 - 1)*2**(1024-53)) returns the correct value to 16 significant figures.

Six degrees of Kevin Bacon, Paul Erdos, and Wikipedia

I just discovered the web site Six Degrees of Wikipedia. It lets you enter two topics and it will show you how few hops it can take to get from one to the other.

Since the mathematical equivalent of Six Degrees of Kevin Bacon is Six degrees of Paul Erdős, I tried looking for the distance between Kevin Bacon and Paul Erdős and found this:

Erdos to Bacop

Kevin Bacon and Paul Erdős are both known for their prolific collaborations. Many actors have acted with Kevin Bacon, or acted with actors who have acted with him, etc. Many mathematicians wrote papers with Erdős, or have written papers with people who have written papers with him, etc. I’m four degrees of separation from Paul Erdős last I checked.  [1]

Here’s a more complex graph showing the three degrees of separation between Thomas Aquinas and Thomas the Tank Engine.

Aquinas to Tank Engine

Note that the edges are directed, links from the starting page to pages that lead to the target page. If you reverse the starting and ending page, you get different results. For example, there are still three degrees of separation if we reverse our last example, going from Thomas the Tank Engine to Thomas Aquinas, but there are about twice as many possible paths.

Related posts

[1] Your Erdős number, the degrees of separation between your and Paul Erdős, can decrease over time because his collaborators are still writing papers. Even Erdős himself continued to publish posthumously because people who began papers with him finished them years later.

Mersenne prime trend

Mersenne primes have the form 2p -1 where p is a prime. The graph below plots the trend in the size of these numbers based on the 50 Mersenne primes currently known.

The vertical axis plots the exponents p, which are essentially the logs base 2 of the Mersenne primes. The scale is logarithmic, so this plot suggests that the Mersenne primes grow approximately linearly on a double logarithmic scale.

Trend in Mersenne primes

Related posts

Spherical trig, Research Triangle, and Mathematica

This post will look at the triangle behind North Carolina’s Research Triangle using Mathematica’s geographic functions.

Spherical triangles

A spherical triangle is a triangle drawn on the surface of a sphere. It has three vertices, given by points on the sphere, and three sides. The sides of the triangle are portions of great circles running between two vertices. A great circle is a circle of maximum radius, a circle with the same center as the sphere.

An interesting aspect of spherical geometry is that both the sides and angles of a spherical triangle are angles. Because the sides of a spherical triangle are arcs, they have angular measure, the angle formed by connecting each vertex to the center of the sphere. The arc length of a side is its angular measure times the radius of the sphere.

Denote the three vertices by AB, and C. Denote the side opposite A by a, etc. Denote the angles at AB, and C by α, β, and γ respectively.

Research Triangle

Research Triangle is a (spherical!) triangle with vertices formed by Duke University, North Carolina State University, and University of North Carolina at Chapel Hill.

(That was the origin of the name, though it’s now applied more loosely to the general area around these three universities.)

We’ll take as our vertices

  • A = UNC Chapel Hill (35.9046 N, 79.0468 W)
  • B = Duke University in Durham (36.0011 N, 78.9389 W),
  • C = NCSU in Raleigh (35.7872 N, 78.6705 W)

Research Triangle

Mathematica

We’ll illustrate several features of Mathematica using the spherical triangle corresponding to Research Triangle.

Map

The map above was produced with the following Mathematica code.

    ptA = GeoPosition[{35.9046, -79.0468}]
    ptB = GeoPosition[{36.0011, -78.9389}]
    ptC = GeoPosition[{35.7872, -78.6705}]

    GeoGraphics[{Red, PointSize[Large], 
        Point[ptA], Point[ptB], Point[ptC]}, 
        GeoScaleBar -> "Imperial", 
        GeoRange -> 29000]

Note that longitude is in degrees east, so the longitudes above are negative.

Distance

To find the distance between two locations, you can use the function GeoDistance. For example,

    GeoDistance[ptA, ptB]

tells me that the distance between UNC and Duke is 8.99185 miles. I assume it displays miles by default based on my location in the US, though the GeoRange option above defaults to meters. You can make the system of units explicit. For example

    GeoDistance[ptA, ptB, UnitSystem -> "Metric"]

returns 14.741 km.

If we want to find the length of the side c in degrees, we can use the Earth’s radius.

    r = PlanetData["Earth", "Radius"]
    c = GeoDistance[ptA, ptB] 360 / (2 Pi r)

This shows that c is 0.13014°. Calculating the other sides similarly shows a = 0.30504° and b = 0.32739°.

Angles

Calling GeoDirection[ptA, ptB] returns 42.2432°, which says we need to head at an angle of about 42° to walk from UNC to Duke.

The code

    GeoDirection[ptA, ptB] - GeoDirection[ptA, ptC]

shows that the angle α is 68.6128°. (The code returns the negative of this angle because the angle is clockwise.) Similarly we find β = 87.9808° and γ = 23.4068.

The angles add up to 180° only because our triangle is small compared to the earth’s total area. The actual sum should be slightly more than 180°, but we’ve not retained enough precision to detect the difference. In general the “spherical excess,” i.e. the amount by which the sum of the angles exceed 180°, is proportional to the area of the triangle.

Related posts

Topping out

There’s an ancient tradition of construction workers putting a Christmas tree on top of a building when it reaches its full height. I happened to drive by a recently topped out building this morning.

topping out

Complex exponentials

Here’s something that comes up occasionally, a case where I have to tell someone “It doesn’t work that way.” I’ll write it up here so next time I can just send them a link instead of retyping my explanation.

Rules for exponents

The rules for manipulating expressions with real numbers carry over to complex numbers so often that it can be surprising when a rule doesn’t carry over. For example, the rule

(bx)y = bxy

holds when b is a positive real number and x and y are real numbers, but doesn’t necessarily hold when x or y are complex. In particular, if x is complex,

(ex)y = exy

does not hold in general, though it does hold when y is an integer. If it did hold, and this is where people get into trouble, you could make computations like

ei/3 = (ei)1/3 = 11/3 = 1

even though ei/3 actually equals (-1 + i√3)/2.

Complex exponential function

I usually use the notation exp(x) rather than efor several reasons. For one thing, it’s easier to read. When the argument is complicated, say if it contains exponents of its own, the superscripts can get tiny and hard to head.  For example, here are two ways of writing the probability density function for the Gumbel distribution.

e^{-x + e^{-x}} = \exp(-x + \exp(-x))

Math publishers often require the exp() notation in their style guides.

Aside from legibility, there is a substantive reason to prefer the exp() notation. The function f(z) = exp(z) has a rigorous definition in terms of a power series. It’s a single-valued function, valid for any complex number you might stick in. By contrast, it’s subtle how bx can even be defined rigorously, even if be. If you care to delve into the details, see the footnote [1].  This may seem pedantic, but it’s a hair worth splitting because it can avoid difficulties like the incorrect calculation above.

Now it is true that

exp(xy) = exp(x) exp(y)

for all x and y, even complex x and y, and so it is true that

exp(nx) = exp(x)n

when n is an integer. For positive integers you can see this by repeatedly adding x to itself. You can extend this to negative integers using

exp(-x) = 1/exp(x).

Related posts

[1] For positive b, you can bootsrap your way by defining bx for integer exponents, then rational exponents, then real exponents by taking limits. But when b is negative this doesn’t work. The way you end up defining bx in general is as exp(x log b), taking advantage of the rigorous definition of exp() alluded to above.

But what is log b? This gets subtle. The logarithm of b is a solution to the equation exp(x) = b, and this has infinitely many solutions for complex x. So now you need to either specify a particular branch to make the log function single valued, or you need to change your idea of a function to include multiply-valued functions.

If you get really technical, there’s a difference between exp(x) and ex is because the latter is exp(x log e), and log e has multiple values! So in that sense exp(x) and ex are different functions, the former single valued and the latter multiply valued.

Sine sum

Sam Walters posted something interesting on Twitter yesterday I hadn’t seem before:

If for some reason your browser doesn’t render the embedded tweet, he points out that

-\frac{1}{7} < \sum_{n=1}^N \sin(n) < 2

for all positive integers N.

Here’s a little Python script to illustrate the sum in his tweet.

    from scipy import sin, arange
    import matplotlib.pyplot as plt
    
    x = arange(1,100)
    y = sin(x).cumsum()
    plt.plot(x, y)
    plt.plot((1, 100), (-1/7, -1/7), "g--")
    plt.plot((1, 100), (2, 2), "g--")
    plt.savefig("sinsum.svg")

Exponential sums

Exponential sums are interesting because crude application of the triangle inequality won’t get you anywhere. All it will let you conclude is that the sum is between –N and N.

(Why is this an exponential sum? Because it’s the imaginary part of the sum over ein.)

For more on exponential sums, you might like the book Uniform Distribution of Sequences.

Also, I have a page that displays the plot of a different exponential sum each day, the coefficients in the sum

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

being taking from the day’s date. Because consecutive numbers have very different number theoretic properties, the images vary quite a bit from day to day.

Here’s a sneak peak at what the exponential sum for Christmas will be this year.

My Twitter graveyard

I ran into The Google Cemetery the other day, a site that lists Google products that have come and gone. Google receives a lot of criticism when they discontinue a product, which is odd for a couple reasons. First, the products are free, so no one is entitled to them. Second, it’s great for a company to try things that might not succeed; the alternative is to ossify and die.

In that spirit, I thought I’d celebrate some of my Twitter accounts that have come and gone.

Dormant accounts

My first daily tip Twitter account was SansMouse, an account that posted one keyboard shortcut per day. I later renamed the account ShortcutKeyTip. I also had an account PerlRegex for Perl regular expressions, DailySymbol for symbols, and BasicStatistics for gardening. Just kidding about that last one; it was for basic statistics.

Donated accounts

I started RLangTip and then handed it over to people who were more enthusiastic about R. I also started MusicTheoryTip and gave it away.

Renamed accounts

The account GrokEM was for electricity and magnetism. I renamed it ScienceTip and broadened the content to be science more generally. I also had an account SedAwkTip for, you guessed it, sed and awk. It also broadened its content and became UnixToolTip.

I renamed StatFact to DataSciFact and got an immediate surge in followers. Statistics sells better when you change the label to data science, machine learning, or artificial intelligence.

Resurrected accounts

I stopped posting to my signal processing account DSP_fact for a while and then started it up again. I also let my FormalFact account lie fallow for a while, then renamed it LogicPractice and restarted it. It’s the only one of my accounts I don’t post to regularly.

Rejected accounts

I had several other ideas for accounts that I never started. They will probably never see the light of day. I have no intention of starting any new accounts at this time, but I might someday. I also might retire some of my existing accounts.

Graphic design

Here’s a collage of the icons for my accounts as eight years ago:

SansMouse iconRegexTip iconTeXtip iconProbFact iconStatFact iconAlgebraFact iconTopologyFact iconAnalysisFact iconCompSciFact icon

The icons became more consistent over time, and now all my Twitter accounts have similar icons: blue dots with a symbol inside. I’m happy with the current design for now.

Twitter icons

Poetic description of privacy-preserving analysis

Erlingsson et al give a poetic description of privacy-preserving analysis in their RAPPOR paper [1]. They say that the goal is to

… allow the forest of client data to be studied, without permitting the possibility of looking at individual trees.

Related posts

[1] Úlfar Erlingsson, Vasyl Pihur, and Alexsandra Korolova. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. Available here.