I wrote a variation of the double-to-string algorithm Ryu and I'm struggling with coverage tests

4 days ago 3
ARTICLE AD BOX

I wrote a variation of Ryu (specifically this double-to-string algorithm) for my use case:

made it double-to-OutputStream (so I do not have to call String.getBytes(), saving me to use String like and intermediate buffer)

removed the Double.toString compliance but kept the Double.parseDouble compatibility. This to ensure that what I'm writing to my OutputStream can be actually read back easily and without getting a different double value (I actually also tested that the Jackson library can read it back and it works, I think).

removed trailing zeroes handling (I'm okay with printing some more digits instead of adding more unpredictable ifs)

assumed RoundingMode to always be ROUND_EVEN

removed debug logs

only scientific notation supported (except when the exponent would be 0 and of course for NaN and +/-Infinity)

added a thread local scratch buffer

put the final keyword here and there

changed the function entry point. In the original code was public static String doubleToString(double value) in mine it is public static void doubleToStream(final double value, final OutputStream out)

my function handles the base cases (0, Infinity and NaN) before calling private static int fillBuffer(final long bits, final byte[] result), which is like starting in the original Ryu algorithm from this line.

Here is the code:

import java.io.IOException; import java.io.OutputStream; import java.math.BigInteger; public final class RyuDoubleIsh2 { private RyuDoubleIsh2() { } private static final int DOUBLE_MANTISSA_BITS = 52; private static final long DOUBLE_MANTISSA_MASK = (1L << DOUBLE_MANTISSA_BITS) - 1; private static final int DOUBLE_EXPONENT_BITS = 11; private static final int DOUBLE_EXPONENT_MASK = (1 << DOUBLE_EXPONENT_BITS) - 1; private static final int DOUBLE_EXPONENT_BIAS = (1 << (DOUBLE_EXPONENT_BITS - 1)) - 1; private static final int POS_TABLE_SIZE = 326; private static final int NEG_TABLE_SIZE = 291; private static final int POW5_BITCOUNT = 121; // max 3*31 = 124 private static final int POW5_QUARTER_BITCOUNT = 31; private static final long[][] POW5_SPLIT = new long[POS_TABLE_SIZE][4]; private static final int POW5_INV_BITCOUNT = 122; // max 3*31 = 124 private static final int POW5_INV_QUARTER_BITCOUNT = 31; private static final long[][] POW5_INV_SPLIT = new long[NEG_TABLE_SIZE][4]; private static final byte[] NAN = {'N', 'a', 'N'}; private static final byte[] POSITIVE_INFINITY = {'I', 'n', 'f', 'i', 'n', 'i', 't', 'y'}; private static final byte[] NEGATIVE_INFINITY = {'-', 'I', 'n', 'f', 'i', 'n', 'i', 't', 'y'}; private static final byte[] POSITIVE_ZERO = {'0', '.', '0'}; private static final byte[] NEGATIVE_ZERO = {'-', '0', '.', '0'}; private static final ThreadLocal<byte[]> TL_BUFFER = ThreadLocal.withInitial(() -> new byte[32]); static { final BigInteger mask = BigInteger.valueOf(1).shiftLeft(POW5_QUARTER_BITCOUNT).subtract(BigInteger.ONE); final BigInteger invMask = BigInteger.valueOf(1).shiftLeft(POW5_INV_QUARTER_BITCOUNT).subtract(BigInteger.ONE); for (int index = 0; index < POS_TABLE_SIZE; index++) { final BigInteger pow = BigInteger.valueOf(5).pow(index); final int pow5len = pow.bitLength(); for (int j = 0; j < 4; j++) { POW5_SPLIT[index][j] = pow .shiftRight(pow5len - POW5_BITCOUNT + (3 - j) * POW5_QUARTER_BITCOUNT) .and(mask) .intValueExact(); } if (index < NEG_TABLE_SIZE) { // We want floor(log_2 5^q) here, which is "pow5len - 1" final int j = pow5len - 1 + POW5_INV_BITCOUNT; final BigInteger inv = BigInteger.ONE.shiftLeft(j).divide(pow).add(BigInteger.ONE); for (int k = 0; k < 4; k++) { if (k == 0) { POW5_INV_SPLIT[index][k] = inv.shiftRight((3 - k) * POW5_INV_QUARTER_BITCOUNT).intValueExact(); } else { POW5_INV_SPLIT[index][k] = inv.shiftRight((3 - k) * POW5_INV_QUARTER_BITCOUNT).and(invMask).intValueExact(); } } } } } public static void doubleToStream(final double value, final OutputStream out) throws IOException { final byte[] toWrite; final int length; if (Double.isNaN(value)) { toWrite = NAN; length = NAN.length; } else if (value == Double.POSITIVE_INFINITY) { toWrite = POSITIVE_INFINITY; length = POSITIVE_INFINITY.length; } else if (value == Double.NEGATIVE_INFINITY) { toWrite = NEGATIVE_INFINITY; length = NEGATIVE_INFINITY.length; } else { final long bits = Double.doubleToRawLongBits(value); if (bits == 0x0000000000000000L) { toWrite = POSITIVE_ZERO; length = POSITIVE_ZERO.length; } else if (bits == 0x8000000000000000L) { toWrite = NEGATIVE_ZERO; length = NEGATIVE_ZERO.length; } else { toWrite = TL_BUFFER.get(); length = fillBuffer(bits, toWrite); } } out.write(toWrite, 0, length); } public static int fillBuffer(final long bits, final byte[] result) { // Otherwise extract the mantissa and exponent bits and run the full algorithm. final int ieeeExponent = (int) ((bits >>> DOUBLE_MANTISSA_BITS) & DOUBLE_EXPONENT_MASK); final long ieeeMantissa = bits & DOUBLE_MANTISSA_MASK; final int e2; final long m2; if (ieeeExponent == 0) { // Denormal number - no implicit leading 1, and the exponent is 1, not 0. e2 = 1 - DOUBLE_EXPONENT_BIAS - DOUBLE_MANTISSA_BITS - 2; m2 = ieeeMantissa; } else { // Add implicit leading 1. e2 = ieeeExponent - DOUBLE_EXPONENT_BIAS - DOUBLE_MANTISSA_BITS - 2; m2 = ieeeMantissa | (1L << DOUBLE_MANTISSA_BITS); } final boolean sign = bits < 0; // Step 2: Determine the interval of legal decimal representations. final boolean even = (m2 & 1) == 0; final long mv = m2 << 2; final long mp = mv + 2; final int mmShift = ((m2 != (1L << DOUBLE_MANTISSA_BITS)) || (ieeeExponent == 1)) ? 1 : 0; final long mm = mv - 1 - mmShift; // Step 3: Convert to a decimal power base using 128-bit arithmetic. // -1077 = 1 - 1023 - 53 - 2 <= e_2 - 2 <= 2046 - 1023 - 53 - 2 = 968 long dv, dp, dm; final int e10; if (e2 >= 0) { final int q = Math.max(0, ((e2 * 78913) >>> 18) - 1); // k = constant + floor(log_2(5^q)) final int k = POW5_INV_BITCOUNT + pow5bits(q) - 1; final int i = -e2 + q + k; dv = mulPow5InvDivPow2(mv, q, i); dp = mulPow5InvDivPow2(mp, q, i); dm = mulPow5InvDivPow2(mm, q, i); e10 = q; if (q <= 21 && mv % 5 != 0 && !even && pow5Factor(mp) >= q) { dp--; } } else { final int q = Math.max(0, ((-e2 * 732923) >>> 20) - 1); final int i = -e2 - q; final int k = pow5bits(i) - POW5_BITCOUNT; final int j = q - k; dv = mulPow5divPow2(mv, i, j); dp = mulPow5divPow2(mp, i, j); dm = mulPow5divPow2(mm, i, j); e10 = q + e2; if (q <= 1 && !even) { dp--; } } // Step 4 final int vplength = decimalLength(dp); int exp = e10 + vplength - 1; int removed = 0; int lastRemovedDigit = 0; long output; while (dp / 10 > dm / 10) { lastRemovedDigit = (int) (dv % 10); dp /= 10; dv /= 10; dm /= 10; removed++; } output = dv + ((dv == dm || (lastRemovedDigit >= 5)) ? 1 : 0); final int olength = vplength - removed; // Step 5: Print the scientific notation int index = 0; if (sign) { result[index++] = '-'; } // Print in the format x.xxxxxE-yy. for (int i = 0; i < olength - 1; i++) { int c = (int) (output % 10); output /= 10; result[index + olength - i] = (byte) ('0' + c); } result[index] = (byte) ('0' + output % 10); result[index + 1] = '.'; index += olength + 1; if (olength == 1) { result[index++] = '0'; } // Print 'E', the exponent sign, and the exponent, which has at most three digits. if (exp != 0) { result[index++] = 'E'; if (exp < 0) { result[index++] = '-'; exp = -exp; } if (exp >= 100) { result[index++] = (byte) ('0' + exp / 100); exp %= 100; result[index++] = (byte) ('0' + exp / 10); } else if (exp >= 10) { result[index++] = (byte) ('0' + exp / 10); } result[index++] = (byte) ('0' + exp % 10); } return index; } private static int pow5bits(final int e) { return ((e * 1217359) >>> 19) + 1; } // JIT somehow optimize this, and it's faster than performing log magic private static int decimalLength(final long v) { if (v >= 1000000000000000000L) return 19; if (v >= 100000000000000000L) return 18; if (v >= 10000000000000000L) return 17; if (v >= 1000000000000000L) return 16; if (v >= 100000000000000L) return 15; if (v >= 10000000000000L) return 14; if (v >= 1000000000000L) return 13; if (v >= 100000000000L) return 12; if (v >= 10000000000L) return 11; if (v >= 1000000000L) return 10; if (v >= 100000000L) return 9; if (v >= 10000000L) return 8; if (v >= 1000000L) return 7; if (v >= 100000L) return 6; if (v >= 10000L) return 5; if (v >= 1000L) return 4; if (v >= 100L) return 3; if (v >= 10L) return 2; return 1; } private static int pow5Factor(long value) { // We want to find the largest power of 5 that divides value. if ((value % 5) != 0) return 0; if ((value % 25) != 0) return 1; if ((value % 125) != 0) return 2; if ((value % 625) != 0) return 3; int count = 4; value /= 625; while (value > 0) { if (value % 5 != 0) { return count; } value /= 5; count++; } throw new IllegalArgumentException("" + value); } /** * Compute the high digits of m * 5^p / 10^q = m * 5^(p - q) / 2^q = m * 5^i / 2^j, with q chosen * such that m * 5^i / 2^j has sufficiently many decimal digits to represent the original floating * point number. */ private static long mulPow5divPow2(final long m, final int i, final int j) { // m has at most 55 bits. final long mHigh = m >>> 31; final long mLow = m & 0x7fffffff; final long bits13 = mHigh * POW5_SPLIT[i][0]; // 124 final long bits03 = mLow * POW5_SPLIT[i][0]; // 93 final long bits12 = mHigh * POW5_SPLIT[i][1]; // 93 final long bits02 = mLow * POW5_SPLIT[i][1]; // 62 final long bits11 = mHigh * POW5_SPLIT[i][2]; // 62 final long bits01 = mLow * POW5_SPLIT[i][2]; // 31 final long bits10 = mHigh * POW5_SPLIT[i][3]; // 31 final long bits00 = mLow * POW5_SPLIT[i][3]; // 0 final int actualShift = j - 3 * 31 - 21; return (((((( ((bits00 >>> 31) + bits01 + bits10) >>> 31) + bits02 + bits11) >>> 31) + bits03 + bits12) >>> 21) + (bits13 << 10)) >>> actualShift; } /** * Compute the high digits of m / 5^i / 2^j such that the result is accurate to at least 9 * decimal digits. i and j are already chosen appropriately. */ private static long mulPow5InvDivPow2(final long m, final int i, final int j) { // m has at most 55 bits. final long mHigh = m >>> 31; final long mLow = m & 0x7fffffff; final long bits13 = mHigh * POW5_INV_SPLIT[i][0]; final long bits03 = mLow * POW5_INV_SPLIT[i][0]; final long bits12 = mHigh * POW5_INV_SPLIT[i][1]; final long bits02 = mLow * POW5_INV_SPLIT[i][1]; final long bits11 = mHigh * POW5_INV_SPLIT[i][2]; final long bits01 = mLow * POW5_INV_SPLIT[i][2]; final long bits10 = mHigh * POW5_INV_SPLIT[i][3]; final long bits00 = mLow * POW5_INV_SPLIT[i][3]; final int actualShift = j - 3 * 31 - 21; return (((((( ((bits00 >>> 31) + bits01 + bits10) >>> 31) + bits02 + bits11) >>> 31) + bits03 + bits12) >>> 21) + (bits13 << 10)) >>> actualShift; } }

I am having real troubles finding a double value parameter of doubleToStream

that will cover the last line of the int decimalLength(final long v) function (basically it seems that it's parameter, dp right at the beginning of // Step 4 , is always >= 10)

that will make the int pow5Factor(long value) function return a value > 4 (function only called here if (q <= 21 && mv % 5 != 0 && !even && pow5Factor(mp) >= q) { and only with mp as parameter ). It seems that the internal loop will at most do only 1 iteration.

I tried walking back with math but I'm doing something wrong because I always end up with a double value that do not cover those cases. If that is actually true and those line can't be covered the algorithm can actually be improved. Please help me find those weird cases.

Read Entire Article