By Alexander J. Yee
(Last updated: August 20, 2023)
Back To:
- Numberworld Home
- y-cruncher Home
General:
- Formulas + Algorithms
- Binary Splitting
- Recursion Library
- Processor-Specific Optimizations
Large Number Arithmetic:
- Number Representation
- Integer Addition
- Large Multiplication
- Inverse Square Root
- Division and Reciprocal
- Radix Conversion
Implementation (as of v0.8.2):
General Information:
- y-cruncher started off as a C99 program. Now it is requires C++17.
- Intel SSE and AVX compiler intrinsics are heavily used.
- Some inline assembly is used.
- C++ template metaprogramming is used extensively to reduce code duplication.
Libraries and Dependencies:
- WinAPI (Windows Only)
- POSIX (Linux Only)
- Cilk Plus
- Thread Building Blocks (TBB)
y-cruncher has no other non-system dependencies. No Boost. No GMP. Pretty much everything that isn't provided by C++ is built from ground up.
Furthermore, the Cilk and TBB dependencies can be trivially removed without affecting the core functionality of the program.
Compilers:
- Windows: Microsoft Visual Studio + Intel Compiler
- Linux: GCC
Other Internal Requirements:
- Little-endian addressing
- Sign-fill (arithmetic) right-shift
- IEEE double-precision floating-point
Code Organization:
y-cruncher's root source tree is (roughly) broken up into the following subdirectories. They are listed in order of build dependency.
Module | Files | Lines of Code | Open Sourced? | Description |
Public Libs | 153 | 15,217 | Yes | The public portion of the support libraries. |
Private Libs | 456 | 60,875 | No | The private portion of the support libraries. |
Dynamic Linking | 3 | 213 | No | Nothing here yet. |
Launcher | 10 | 931 | Yes | The CPU dispatcher that picks the optimal binary to run. It's the module that builds the y-cruncher(.exe) binary. |
Digit Viewer 2 | 93 | 11,626 | Yes | The new Digit Viewer. |
BBPv2 | 60 | 9,770 | No | The bundled BBP digit extraction app for Pi. |
Modules | 2,841 | 484,039 | No | Low-level arbitrary-precision arithmetic: Addition, subtraction, multiplication, single-word division, and checksum hashing. |
Objects | 84 | 17,469 | Partial | Large number objects. (BigInt, BigFloat, etc...) |
Functions | 349 | 54,065 | No | High-level mathematical code. Implementations for all functions and all the constants. |
YMP Library | 14 | 2,275 | Headers Only | A public interface to the internal large number library. |
Number Factory | 31 | 3,419 | Yes (outdated) | Research infrastructure and test app for the YMP library. |
y-cruncher | 299 | 43,170 | No | y-cruncher itself. Top-level code that includes all the UI menus. |
Experimental | 69 | 13,876 | No | Sandboxes for experimental code. |
External | 2 | 169 | No | Supporting DLLs. |
Misc. | 9 | 1,933 | No | Settings, versioning, and development sandbox. |
Total: | 4,473 | 719,047 | Software bloat anyone? |
Notes:
- The code size has grown significantly in the year leading up to v0.8.1 and v0.8.2. This is due to the ongoing rewrite of nearly all the large multiplication algorithms. As of this writing, that rewrite has not yet reached the point where the code that is being rewritten can be removed yet. Thus, it is expected that the code size will shrink in the next few releases.
- Roughly 70,000 lines of code are either generated, or significantly aided by code generation. Most of it is auto-generated superoptimizer tables.
- There is an estimated 100,000 lines of near duplicate code that are either not similar enough to de-dupe using generic programming techniques, or will incur an unacceptable performance hit to do so.
- It takes several hours and all 64GB ram on my dedicated build machine (Ryzen 7 1800x) to compile all of y-cruncher, both Windows and Linux.
Like most other programs, there are theoretical limits to y-cruncher. Most of these limits are enforced by the program.
32-bit | 64-bit | Comments | |
Ram Usage | ~1.8 GiB | ~ 1 EiB (1018 bytes) | Limited by memory address space. |
Disk Usage | ~ 1 EiB | Limited by 64-bit address space. | |
Task Decomposition | 65,536 | Arbitrary limit. | |
RAID - Level 1 | 8 paths | ||
RAID - Level 2 | 64 x Level 1 RAID groups | Limited by the # of bits in largest integer. Will likely be increased in the future. | |
Largest Multiplication | (2.02 * 1018) x (2.02 * 1018) bits (6.7 * 1017) x (6.7 * 1017) decimal digits | Small Primes Number-Theoretic Transform:
| |
Convolution Length | 4.03 * 1018 bits 1.21 * 1018 decimal digits | ||
Computation Size (for all constants) | 1015 decimal digits | Limited by double-precision floating-point.* | |
BBP Hexadecimal Offset | 246 - 1 | Implementation-specific limitation. |
*y-cruncher uses double-precision floating-point for things such as:
- Computing the number of terms needed for a computation.
- Determining how much precision is needed for an operation.
- Determining how large the numbers will be so that enough memory will be allocated.
- Finding the optimal work-partitioning for various algorithms.
The result of these calculations are generally rounded to integers and must be accurate to +/- 1 for the program to operate correctly. The problem is that double-precision floating-point only has 53 bits of precision which will run out at around 9 * 1015. Since there is round-off error, the limit will certainly be lower. The exact limit is unknown and will vary with the different constants. Therefore y-cruncher arbitrarily caps it to 1015 decimal digits. Colloquially, I call this the "float-indexing limit".
There are currently no plans to raise this limit since it is already well beyond the capability of current hardware (as of 2015).
It is worth mentioning that the float-indexing limit is the only thing left that prevents y-cruncher from going all the way up to the 64-bit limit. Without it, it should be possible to reach 6.1 * 1017 decimal digits (the limit of the Small Primes NTT).
Getting rid of the float-indexing limit will require a floating-point type with at least a 64-bit mantissa. A viable option is to use 80-bit extended-precision via the x87 FPU although some compilers don't support it. But since "float indexing" isn't exactly a performance bottleneck, any sort of software emulation will probably work as well.