And yet, why Posit are a worthy alternative to the IEEE 754

Month Posit declared habré open, so I can not pass by and ignore the criticism that had befallen them. In the previous series:

A new approach can help us to get rid of the floating-point
Posit-arithmetic: floating point victory over his own field. Part 1
Posit-arithmetic: floating point victory over his own field. Part 2
Test Posit adult

I think many of you can immediately remember at least one case from history when revolutionary ideas at the time of its formation met with opposition by a community of experts. As a rule, the reason for this behavior serves an extensive baggage of knowledge does not allow to look at an old problem in a new light. Thus, the new idea loses the characteristics of well-established approaches, because it is estimated to only those metrics that were considered important at the previous stage of development.

It is with this rejection today is facing format Posit: critics often simply “not look back“ and even banal misuse Posit in their experiments. In this article I will try to explain why.

On the merits of the Posit has been said is not enough: mathematical elegance, high precision value with a low exponent, a wide range of values, only one binary representation of NaN and zero, no subnormal values, the fight against overflow/underflow. There was little expressed criticism: poor accuracy for very large or very small values, the complex format of the binary representation and of course the lack of hardware support.

I do not want to repeat already mentioned arguments, instead try to focus on that aspect which is usually overlooked.

The game has changed

The IEEE 754 standard describes the floating-point number, implemented in the Intel 8087 almost 40 years ago. By the standards of our industry this is an incredible time; since then almost everything has changed: the performance of processors, storage cost, data volumes and scale computing. The format of the Posit was developed not just as the best version of IEEE 754, but as an approach to working with the numbers that meet the new requirements of the time.

The high-level task remains the same – we all need efficient computing over the rationals with minimal loss of precision. But the conditions, which solves the task is radically changed.

First, the priorities have changed for optimization. 40 years ago the computers performance is almost entirely dependent on processor performance. Today, the performance of most computing rests in memory. To verify this, just look at the key directions of development of processors in recent decades: three-level caching, speculative execution, pipelining of calculations, predicting branches. All these approaches aimed at achieving high performance in terms of fast computation and slow memory access.


Secondly, on the foreground there was a new requirement – effective energy consumption. Over the past decade, technology scale-out computing has advanced so much that we started to worry not so much the speed of these calculations as the electricity bill. Here I should emphasize the important to understand the detail. From the point of view of energy efficiency calculations are cheap, because the CPU registers are very close to his computers. Much more will have to pay for the data transfer between processor and memory, and over long distances.


Here is just one example of the scientific project, which plans to use Posit:


This network of telescopes generates 200 petabytes of data per second, the processing of which takes the power of a small power plant 10 MW. It is obvious that for such projects, reducing data volumes and energy consumption is critical.

In the beginning

So what does a standard Posit? To understand this, you need to go back to the beginning of the discourse and understand what is meant by the precision of floating point numbers.

There are actually two different aspects relevant to the accuracy. The first aspect is the accuracy of the calculations – how much rejected the results of a computation during the execution of different operations. The second aspect is the accuracy of representation – of how badly distorted the original value at the time of conversion from the field of rational numbers field of the floating point specific format.

Now it will be important for understanding the time. Posit is in the first format representing rational numbers, and not a way to perform operations on them. In other words, Posit is the compression format of rational numbers with losses. You could hear the assertion that 32-bit Posit is a good alternative for 64-bit Float. So, we are talking about reducing the required amount of data two times for storage and transmission of the same set of numbers. Two times less memory is almost 2 times less power consumption and high CPU performance due to lower expectations of access to memory.

The second end of the stick

Here you have had to arise a natural question: what is the meaning of effective representation of rational numbers, if it is not possible to calculate with high accuracy.

In fact, a way of performing accurate calculations is, and it is called the Quire. This is another format of representing rational numbers, is inextricably linked to the Posit. Unlike Posit, the format of Quire intended for calculations and for storing intermediate values in registers instead of in main memory.


If in short, Quire represent not that other, as a wide battery of integer (fixed point arithmetic). Unit as the binary representation corresponds to a Quire a minimum positive value Posit. Quire the maximum value corresponds to the maximum value of the Posit. Each value of the Posit has a unique representation in the Quire without loss of accuracy, but not Quire each value can be represented in a Posit without loss of accuracy.

The benefits Quire obvious. They allow you to perform operations with much more high precision than Float, and for operations of addition and multiplication of any loss of precision will not do. The price you have to pay for it – wide CPU registers (32-bit Posit with es=2 correspond to a 512-bit Quire), but for modern processors it is not a serious problem. And if 40 years ago a calculation on a 512 bit integers seemed an unacceptable luxury, today it is more adequate alternative to the wide memory access.

Collect the puzzle

Thus, a Posit offered not just a new standard in the alternative Float/Double, but rather a new approach for working with numbers. The novelty of the approach that uses two different representation formats, one for storage and transmission of numbers – Posit, and the other for computations and intermediate values – Quire.

When you solved a practical problem with the use of floating point numbers, from the point of view of the CPU work they can be represented in the form of a set of the following:

  1. To read the value of numbers from memory.
  2. To perform some sequence of operations. Sometimes the number of operations is large enough. Thus, all intermediate values of the computation are stored in registers.
  3. Record the results of operations in memory.

In the case of Float/Double precision is lost on each transaction. In the case of Posit+Quire the loss of precision during computations is negligible. It is lost only at the last stage, at the moment of value conversion Quire in Posit. That’s why most of the problems of “accumulation of error” for Posit+Quire simply not relevant.

In contrast to Float/Double when you use Posit+Quire we typically can afford a more compact representation of numbers. As a result, more rapid data access from memory (better performance) and more efficient storage and transmission of information.

The Ratio Of Muller

To demonstrate, I will cite only one example – a classical recurrence relation Mueller, invented specifically in order to demonstrate how the accumulation of errors when floating-point is radically distorts the result of the calculation.


In the case of arithmetic with arbitrary precision, recursive sequence should be limited to the value 5. In the case of floating point arithmetic the only question is which iteration of the calculation results will have adequately large deviation.

I conducted an experiment for IEEE 754 single and double precision, and 32-bit Posit+Quire. The calculations were performed in the arithmetic Quire, but each value in the table is converted into a Posit.

The results of the experiment

 # float(32) double(64) posit(32)
------------------------------------------------ 0 4 4.000000 4.000000 1 4.250000 4.250000 4.25 2 4.470589 4.470588 4.470588237047195 3 4.644745 4.644737 4.644736856222153 4 4.770706 4.770538 4.770538240671158 5 4.859215 4.855701 4.855700701475143 6 4.983124 4.910847 4.91084748506546 7 6.395432 4.945537 4.94553741812706 8 27.632629 4.966962 4.966962575912476 9 86.993759 4.980042 4.980045706033707
10 99.255508 4.987909 4.98797944188118
11 99.962585 4.991363 4.992770284414291
12 99.998131 4.967455 4.99565589427948
13 99.999908 4.429690 4.997391253709793
14 100.000000 -7.817237 4.998433947563171
15 100.000000 168.939168 4.9990600645542145
16 100.000000 102.039963 4.999435931444168
17 100.000000 100.099948 4.999661535024643
18 100.000000 100.004992 4.999796897172928
19 100.000000 100.000250 4.999878138303757
20 100.000000 100.000012 4.999926865100861
21 100.000000 100.000001 4.999956130981445
22 100.000000 100.000000 4.999973684549332
23 100.000000 100.000000 4.9999842047691345
24 100.000000 100.000000 4.999990522861481
25 100.000000 100.000000 4.999994307756424
26 100.000000 100.000000 4.999996602535248
27 100.000000 100.000000 4.999997943639755
100.000000 100.000000 28 4.999998778104782
100.000000 100.000000 29 4.99999925494194
100.000000 100.000000 30 4.999999552965164
100.000000 100.000000 31 4.9999997317790985
32 100.000000 100.000000 4.999999850988388
33 100.000000 100.000000 4.999999910593033
34 100.000000 100.000000 4.999999940395355
35 100.000000 100.000000 4.999999970197678
36 100.000000 100.000000 4.999999970197678
37 100.000000 100.000000 5
38 100.000000 100.000000 5
39 100.000000 100.000000 5
40 100.000000 100.000000 5
41 100.000000 100.000000 5
42 5 100.000000 100.000000
43 100.000000 100.000000 5
44 100.000000 100.000000 5
45 100.000000 100.000000 5
46 100.000000 100.000000 5
47 5 100.000000 100.000000
48 100.000000 100.000000 5
49 100.000000 100.000000 5
50 100.000000 100.000000 5
51 100.000000 100.000000 5
52 100.000000 100.000000 5.000000059604645
53 100.000000 100.000000 5.000000983476639
54 100.000000 100.000000 5.000019758939743
55 100.000000 100.000000 5.000394910573959
56 100.000000 100.000000 5.007897764444351
57 100.000000 100.000000 5.157705932855606
58 100.000000 100.000000 8.057676136493683
59 100.000000 100.000000 42.94736957550049
60 100.000000 100.000000 93.35784339904785
61 100.000000 100.000000 99.64426326751709
62 100.000000 100.000000 99.98215007781982
63 100.000000 100.000000 99.99910736083984
64 100.000000 100.000000 99.99995517730713
65 100.000000 100.000000 99.99999809265137
66 100.000000 100.000000 100
67 100.000000 100.000000 100
68 100.000000 100.000000 100
69 100.000000 100.000000 100
70 100.000000 100.000000 100

As the table shows, 32-bit Float give up after the seventh value, and the 64-bit Float lasted up to 14 iterations. At the same time, the calculations for Posit with the use of the Quire return a stable result up to 58 iterations!


For many practical cases and, when applied correctly Posit the format really allows one hand to save on memory, compressing the number of better than it does Float, on the other hand to ensure the best calculation accuracy due to the application of Quire.

But this is only a theory! When it comes to accuracy and performance, always do tests before you blindly trust a particular approach. Indeed, in practice, your specific case will be exceptional more often than in theory.

Oh and don’t forget the first law of Clark (free interpretation): When a respected and experienced expert claims that a new idea will work, he is almost certainly right. When he says that the new idea will not work — he is very probably wrong. I don’t consider myself a seasoned expert to allow you to rely on my opinion, but I ask you to wary of criticism even the most experienced and respected people. After all, the devil is in the details, and even experienced people can miss.


Leave a Reply

Your email address will not be published. Required fields are marked *