Fix precision loss in large integral string conversions#3405
Fix precision loss in large integral string conversions#3405fallintoplace wants to merge 4 commits into
Conversation
|
@fallintoplace Thank you for your contribution. Can we also handle and add tests for large integral-looking strings that are not plain integer literals, e.g. "9007199254740993.0", "9007199254740993e0", and f"{LongType.max}.0"? These still go through the float path today and either lose precision or overflow incorrectly. A test for "1e3" would also catch the scientific-notation regression from the previous behavior. |
|
Thanks, pushed a follow-up in d443f37 to handle the remaining integral-looking string cases without going through float, and added regressions for "9007199254740993.0", "9007199254740993e0", f"{LongType.max}.0", and "1e3". |
|
|
||
|
|
||
| def _truncate_numeric_string_to_int(value: str) -> int: | ||
| return int(Decimal(value)) |
There was a problem hiding this comment.
int(Decimal(value)) fixes the precision issue, but it also means we materialize the full integer before doing the bounds check. For example, literal("1e1000000").to(IntegerType()) eventually returns IntAboveMax, but only after constructing a very large Python int first. In my local check this took ~17s.
Can we compare the parsed Decimal against IntegerType.max/min or LongType.max/min before converting to int?
number = Decimal(self.value)
if number > IntegerType.max:
return IntAboveMax()
elif number < IntegerType.min:
return IntBelowMin()
return LongLiteral(int(number))That should preserve the precision fix while avoiding excessive CPU/memory use for obviously out-of-range values.
Summary
Fixes precision loss when converting large integral strings in two runtime paths:
StringLiteral.to(IntegerType/LongType)partition_to_py(...)for integral and time-based partition values backed by integersRoot cause
Both paths were converting through
floatbefore converting toint, which loses precision for values outside the IEEE-754 exact integer range.That caused valid 64-bit integers like
LongType.maxand9007199254740993to be corrupted.What changed
int(float(...))with exact integer parsing inpartition_to_pyStringLiteral.to(IntegerType/LongType), exact integral strings now use exact integer parsing while fractional numeric strings retain the existing truncation behaviorLongType.maxand9007199254740993Validation
uv run pytest tests/expressions/test_literals.py tests/test_conversions.pyCloses #3404.