Kibi, mebi, gibi

Computers like powers of 2, people like powers of 10, and the fact that 210 is approximately 103 makes it easy to convert between the two powers.

A kilobyte is 1000 bytes like a kilogram is 1000 grams. But for years the former was only approximate and the latter is exact. A kilobyte was actually 210 = 1024 bytes. Similarly a megabyte was 220 or approximately 106 bytes and a gigabyte was 230 or approximately 109 bytes. Then in 1998, the International Electrotechnical Commission (IEC) introduced new units to eliminate this ambiguity. The prefix “kibi” means exactly 210, “mebi” means 220, and “gibi” means 230. (The IEC also introduced tebi, pebi, exbi, zebi, and yobi as binary analogs for tera, peta, exa, zetta, and yota.)

I’ll give arguments for and against using the new units and conclude with a note about their implementation in PowerShell.

Pro

The number of bytes in a kibibyte is about 2.5% larger than the number of grams in a kilogram. But the number of bytes in a mebibyte is about 5% larger than the number if grams in a megagram. Every time you go up by a factor of 210, the approximation degrades about another 2.5%. So a tebibyte is almost 1.1 trillion bytes. At larger sizes, the traditional prefixes become progressively more misleading and hence the need for more precise alternatives is greater.

Another advantage to the new prefixes is that “kibi, mebi, gibi” is reminiscent of “veni, vidi, vici.”

(Why does the quality of the approximation (210)k ≈ 103k by about 2.5% for each increment in k? Because by the binomial theorem, (1 + 0.024)k ≈ 1 + 0.024 k.)

Con

The only thing special about decimal powers of 2 — 210, 220, 230 etc. — is that they are approximately round numbers in base 10 that have traditional prefixes associated with them. From a computer hardware perspective, numbers like 216, 232, and 264 are more natural. If you’re not going to use the traditional prefixes, abandon them rather than create near copies. Just state the number of bytes, using scientific notation if necessary.

PowerShell

In Windows PowerShell, the units kb, mb, and gb to mean 210, 220, and 230. For example, if you type 2mb + 5gb at the PowerShell command prompt, the result is 2102272. If you really want to compute 2×106 + 5×103 and can’t do it in your head, you could type 2e6 + 5e3.

Bruce Payette commented on this feature of PowerShell in his book PowerShell in Action.

Yes, the PowerShell team is aware that these notations are not consistent with the IEC recommendations (kibabyte, and so on). Since the point of this notation is convenience and most people in the IT space are more comfortable with Kb than with Ki, we choose to err on the side of comfort over conformance in this one case. Sorry. This particular issue generated easily the second most heated debate on the PowerShell internal and external beta tester lists.

17 thoughts on “Kibi, mebi, gibi

  1. (1) I’ve repeatedly tried to publish research papers with the recommended IEC notations. But journals and conferences in Computer Science won’t allow it. There is intense resistance. Which is weird because I can prove that there is confusion and that it impacts reported numbers (in research papers).

    (2) It is wrong to say that IT people all agree on what the units mean. Bought a hard drive lately? Here’s what my Mac reports about my hard drive:

    2 TB (2,000,398,934,016 bytes)

    It is not at all uncommon for research papers to use GB to mean two different things.

    We ought to be ashamed of this mess. But instead, we refuse to put into question our messy habits.

    Sad.

  2. Because of the confusion, I believe it’s better to simply use bytes. People reading a scientific journal should be able to understand scientific notation.

    As hard drives, although there is some confusion over terabytes and tibibytes, there’s also confusion over how much of that disk space is usable after the file system is installed. Because of that, I don’t think it would be misleading to say the drive is simply 2 GB. That’s close enough for most consumers. But a more technical customer would want to know exactly how many bytes the disk will hold, before and after it is formatted.

  3. The 2^x prefixes are estabished practice. Maybe not in “IT”, but among programmers. Byte and bit refer to imaginary counting units. Treating kilo and mega as SI prefixes for them makes no sense, because bytes aren’t physical measurement units. Otherwise the usage would imply that besides Kilobytes you could have Millibytes, Microbytes, Nanobytes and Attobytes. Which would be just plain silly — in comparison to real physical units like kJ, cm, µs, mg where ALL scientific prefixes are applicable.
    Hence it’s clear that the kilo in KB doesn’t have to be a physical scaling prefix.

  4. @mario

    Precisely. The term “kilo” in software means something different from everywhere else. Thus, *we* should be using a different term to avoid confusion, and that’s “kibi”.

  5. In my old days (late 80s) in the BIOS department at Dell, we referred to 1,000,000 bytes as a “marketing megabyte”.

  6. Another factor not mentioned here is that the IEC units sound incredibly stupid. Who would not be embarassed to say their laptop has “four gibs” of memory?

  7. Giles: Since the »gi« is short for »giga binary« the phrase »four gigs« should still be applicable.

    Interestingly for me it has become normal to use KiB, MiB, GiB, etc. I have to actually force myself to write KB if I actually mean 1000 Bytes (interesting here to note also that the common abbreviation is »KB«, not »kB« which would be the correct usage of the prefix – so another point where it’s obvious we’re not in SI land anymore). Similarly for dates in non-ISO-8601 format.

  8. The only reason any problem ever arose here is because the hard drive manufacturers decided to warp the meaning of megabyte, etc., to make their drives look bigger to the average consumer. They knew exactly what kilo, mega, and giga meant (exact powers of two), but they chose to use deceptive marketing practices that they knew they could get away with if they ever got taken to court over it. By forcing new prefixes down our throats, the IEC is giving in to these kinds of practices and saying, “That’s OK. We’ll just change on our end to accommodate your dishonest ways.” Do not support these new prefixes because it actually supports shifty salesmanship! What we really need is a tiny law that defines kilobyte, megabyte, etc., as perfect powers of two. Then the jerks who distorted the meanings of those prefixes for personal profit will snap right back into line.

  9. Really, the only thing that still has good reason to be measured with binary prefixes is RAM size – because of the nature of how RAM works, you’re never going to get 1GB of RAM, and it’s easier to say “1GiB of RAM” than “1.07GB of RAM”. I think flash memory is similar, but not certain.

    Hard drive sizes aren’t restricted to powers of 2, and are currently sold in decimal. Bandwidth in Mb/s isn’t restricted to powers of 2, and has always been in decimal. Clock speed in GHz isn’t restricted to powers of 2, and has always been in decimal. Image size in megapixels isn’t restricted to powers of 2, and has always been in decimal.

    Really, the only point of contention is file sizes, and most OSes report file sizes in binary units, I think mostly of tradition. Many OSes will also report hard drive sizes in binary units, again out of tradition. It’s a tradition that causes much confusion (the number of times I’ve seen someone be all “I bought this 2TB hard drive, but when I plugged it in it said it only had 1.8TB of space! What’s using the rest? It’s empty!”… and then people start thinking it’s filesystem overhead or something, but if your filesystem takes up 200GB then it’s time to switch to a different filesystem. Of course all it is is that 2TB = 1.8TiB).

    Personally, in code I write, I’ll still report file sizes in binary units, but I’ll include the little ‘i’s. Because if you don’t know what they mean, then you can still guess what the unit means… and if you do know what they mean, then the potential ambiguity is cleared up.

  10. @Eric

    I cannot agree with this. The prefixes kilo, mega and giga had well defined meaning well before we started manufacturing disk drives.

    And why would hard drive makers be any more “shifty” than memory makers? Are we to think that hard drive makers are especially evil?

  11. What I’m saying is that hard drive manufacturers made an “evil” decision in the 80s to exploit the dual meaning of mega/giga to their advantage, knowing very well that it would cause confusion. If you consider that sector sizes on every hard drive in existence are powers of two (minimum 512 bytes back then), then you’ll realize that it’s natural to continue using power-of-two-based measurements for the capacities of the drives because it would always reflect an integer number of sectors. I can’t say with certainty, but I’d bet my life that the engineers at the hard drive companies would have preferred that. The marketing creeps and bean counters are to blame for intentionally making consumers think they’re getting a bigger drive than they are, and we’re letting them get away with it by creating new units.

  12. Eric: Hard drive sizes are in no way a power of 2.

    Current hard drives (indeed, any hard drives 128GiB or larger that use LBA) typically have 512-byte sectors, 63 sectors per (virtual) track, and 255 (virtual) heads, making 512*63*255 = 8225280-byte cylinders, which a disk has an integer number of. You will notice that this number is not a power of 2, what with those factors of 63 and 255 in there. It’s not even an integer in KiB – it’s 8032.5KiB. About 7.8MiB. Using binary prefixes to refer to this size is just as awkward as decimal ones (indeed more so, for MiB vs MB, as you need more digits after the decimal point to specify the number exactly).

    And hard drives can be any integer multiple of that, they’re not forced to a power of 2. For instance, the hard drive I’m looking at now has 121601 cylinders. Seems a random number until you realise it means it’s slightly over 1TB of space (note: not 1TiB).

  13. And given that this supposedly “evil” decision is the same decision that brought forth the “1.44MB” diskette, which is, in itself, evidence of heavy confusion… I’m extremely tempted to call this one down to a misunderstanding that stuck than any sort of evil purposes.

    (If you’re unfamiliar… a 1.44MB diskette stores 1.44*1000*1024 bytes of data.)

  14. Philip, I never said that hard drive sizes were powers of two, just that they were an integer multiple of at least 2^9 bytes.

  15. I’m glad we now have a standard way to eliminate the ambiguity. k, M, G, T for 1000^n and Ki, Mi, Gi, Ti for 1024^n. But I agree that pronouncing them kibi, mebi, gibi, tebi makes one sound like Johny English after he accidentally injects himself with muscle relaxant, “kibibyte flibble”. The solution is to pronounce them kili, megi, gigi, teri. But make sure that final “i” sounds like “ee” as in “bee”, otherwise the tendency to shorten unstressed vowels ends up making them sound too much like their decimal counterparts.

  16. my SSD drive is advertised as 512 GB, which is inaccurate no matter how you look at it

  17. Mario writes: « Treating kilo and mega as SI prefixes for them makes no sense, because bytes aren’t physical measurement units. Otherwise the usage would imply that besides Kilobytes you could have Millibytes, Microbytes, Nanobytes and Attobytes. »

    Units like Millibytes certainly make sense to me, and I have used them before. For example, a ray-triangle-mesh intersection algorithm I developed has a memory cost of half a bit per vertex. Some fantastic compression scheme could have a resulting stream of 322 Millibytes per byte. Sure, you could express this in multiple ways, but there’s certainly nothing wrong with expressing relationships in partials of units.

Comments are closed.