Cricket header

From Kilo to Bronto

A Brief History of Big Data prefixes


Once upon a time, Big Data was measured in Kilos, which are units of 1,000 (or 103). 

But Big Data has got a lot bigger since then, and the International Bureau of Weights and Measures has had to invent new (so called S.I.) prefixes to keep up.

My Macbook has more than a 'Terabyte' of storage. What, I wondered, does 'Tera' mean, and what about all the other prefix words? Most of them come from Ancient Greek, but with some interesting twists. Here's a quick run through their history.  

103  KILO - From chilioi meaning 'thousand'. It was introduced by the French in 1799, when the kilogram became the first official metric measure of mass (in some Victorian philosophy and science books it was referred to as the 'chiliogram').  A kilobyte is actually 1024 (or 210) bytes, but it's close enough to 1000 that the kilo prefix is still used.

106  MEGA - From megas meaning 'big' or 'tall', its first widespread use as a prefix may have been in 1821 to name the Megalosaurus, a giant lizard whose fossilised bones had been found near Oxford. Mega was introduced as a scientific prefix in 1868 (there are mentions of MegaOhms of resistance) , though it didn't become an official SI Prefix until 1960. These days it is most frequently encountered in Megabytes (big computer images) and MegaHertz (on FM radio).

109  GIGA - From gigas meaning 'giant', it is said that giga was first proposed as a prefix in Germany in the 1920s. It became an official SI prefix in 1960, but there were different opinions on how to pronounce it. In the 1985 film Back To The Future, Marty asks the eccentric scientist Doc Brown: 'What the hell is a jigga-watt?'.  The pronunciation became Giga with a hard G in the 1990s. 

1012 TERA - From tera meaning 'monster'.  There is no connection to ptero (like pterosaur) which means 'winged'.  Introduced as a prefix in 1960, along with Mega and Giga. Coincidentally, this is the fourth prefix (it is 10004) and tera is very similar to tetra, the Greek word for four.  This coincidence led to...

1015 PETA  - In the 1970s, as scientists and computer technologists started dealing with measurements at a higher order of magnitude the International Bureau needed prefixes that would cope with the bigger numbers.  They opted for Peta because it is like penta, but with a dropped letter (in the spirit of tera and tetra).  There are allegedly two Petabytes of data stored in all the US research libraries combined.  

1018 EXA - while they were at it, the authorities in 1975 also introduced Exa, which is hexa (six) without the H.

1021 ZETTA - in 1982, New Scientist published an article about the likely need for a higher, seventh prefix.  I now make a fleeting appearance in the story. I submitted a letter (see here) suggesting they continue the pattern of dropping a consonant, making Hepta into Hepa, pronounced Heepa, as in 'That's a Hepajoules!' (geddit?). Not surprisingly they ignored my suggestion, and in 1991 they decided to make the prefix for 10007  Zetta, derived from the letter Zeta, which the Greeks used for the number seven. (Confusingly, zeta is the sixth letter in the Greek alphabet). By the way, Zeta led to the Roman's word 'Septa', or seven. September used to be the seventh month, before the two Caesars, Julius and Augustus, decided to insert their own months in front of it.

1024 YOTTA - and at the same time as Zetta, the international authorities introduced Yotta which is loosely based on 'Octo' meaning eight (and is not named after a Star Wars character, as is sometimes claimed).  Presumably they didn't want a prefix starting with 'O', though slightly confusingly this means that the top three official SI prefixes in ascending order are (e)X, Z, Y.

And that's (officially) it for the time being. But as the amount of data being stored grows and as scientists contemplate ever larger quantities of mass and power, there will eventually be a need for another SI Prefix.  There are already two candidates for 1027.  One is BRONTO, in the spirit of the giant dinosaur Brontosaurus. In fact it's already been adopted in some IT circles, where you'll find reference to Brontobytes.  However, there is also a strong lobby for HELLA which looks like it might eventually get the nod.  What's the thinking behind Hella?  Apparently it's because it's a 'Hellava big quantity'.  Seriously! Suddenly my proposal back in the 1980s of HEPA for the seventh prefix doesn't sound so silly.