Commit Graph

33 Commits (73654f751cea700bbb71a1cca0421c59cbe99465)

Author SHA1 Message Date
magical 73654f751c reduce buffer size to 8 bytes
instead of buffering an entire block, buffer only when the input is not
aligned to 8 bytes, and otherwise xor uint64-sized chunks directly into
the state.

the code is a little more complicated but i think it's worth it.
we could eliminate the buffer entirely but that requires either
shenanigans with unsafe, or fiddly code to xor partial uint64s

a caveat is that the implementation now only supports sponge capacities
that are a multiple of 8. that's fine for the standard instantiations
but may restrict unusual applications.

not only does this let us reduce the buffer from 200 bytes to 8,
it also provides a nice speedup

name      old time/op    new time/op    delta
256_8-2     1.45µs ± 0%    1.28µs ± 1%  -11.58%  (p=0.000 n=10+10)
256_1k-2    10.1µs ± 0%     9.3µs ± 0%   -7.67%  (p=0.000 n=10+10)
256_8k-2    75.6µs ± 0%    70.2µs ± 1%   -7.09%  (p=0.000 n=10+10)
512_8-2     1.39µs ± 1%    1.29µs ± 1%   -6.85%  (p=0.000 n=10+10)
512_1k-2    18.7µs ± 0%    17.0µs ± 0%   -8.70%   (p=0.000 n=9+10)
512_8k-2     146µs ± 1%     129µs ± 0%  -11.70%   (p=0.000 n=10+9)

name      old speed      new speed      delta
256_8-2   5.53MB/s ± 0%  6.25MB/s ± 0%  +13.06%  (p=0.000 n=10+10)
256_1k-2   102MB/s ± 0%   110MB/s ± 0%   +8.30%  (p=0.000 n=10+10)
256_8k-2   108MB/s ± 0%   117MB/s ± 1%   +7.64%  (p=0.000 n=10+10)
512_8-2   5.78MB/s ± 1%  6.20MB/s ± 1%   +7.32%  (p=0.000 n=10+10)
512_1k-2  54.9MB/s ± 0%  60.1MB/s ± 0%   +9.53%   (p=0.000 n=9+10)
512_8k-2  56.1MB/s ± 1%  63.5MB/s ± 0%  +13.26%   (p=0.000 n=10+9)
2024-10-06 18:07:16 -07:00
magical 70a9bfa87d help the bounds checker in le64dec 2024-10-06 00:47:45 -07:00
magical 79b27a1530 add a few more test vectors for SHA-256
these test some boundary conditions for an optimization i'm about to do.

computed using https://emn178.github.io/online-tools/sha3_256.html
2024-10-06 00:40:46 -07:00
magical 883cbd827a sha3sum: remove dependency on fmt
probably silly, but it cuts the binary size down significantly
2024-10-05 21:18:39 -07:00
magical b5d2ed36ca gen: use short declarations for c and d vars 2024-10-05 21:02:12 -07:00
magical 055806bbad de-unroll Chi loop in roundGeneric
(reroll? no, that's something else)

makes it more similar to the templatized code in gen.go. this isn't the
optimized code, so performance doesn't matter.
2024-10-05 20:54:51 -07:00
magical 5e1178f8c2 remove a bunch of blank lines in keccak_gen.go 2024-10-05 20:47:40 -07:00
magical b64eff8ecd add digest.clone method and use it in Sum 2024-10-05 20:17:20 -07:00
magical d6e555a97c avoid an alloc in sha-512 benchmarks 2024-10-05 19:50:43 -07:00
magical 0de798ef8f avoid an indirect call 2024-10-05 19:22:41 -07:00
magical 517ccd27fd Remove unnecessary label 2024-10-04 23:46:16 -07:00
magical a0d95be4fb test: Rename some variables. 2015-01-08 15:09:38 -08:00
magical bdf20db1f3 gen: Split d into five separate veriables. 2015-01-03 02:06:18 -08:00
magical 902ec9e896 gen: Alter mod function. Purely cosmetic. 2015-01-03 01:40:42 -08:00
magical 7b92fe3532 Add another test vector. 2015-01-03 01:36:56 -08:00
magical 16d859b6d8 More tests and benchmarks. 2015-01-03 00:03:28 -08:00
magical c04abc1bf8 Fix benchmark - stop allocating on every iteration. 2015-01-01 20:50:56 -08:00
magical 33dc508782 Gofmt. 2015-01-01 03:10:11 -08:00
magical 0b361a2be7 Combine Rho with Pi and Chi.
Starting to switch to plane-wise processing.
2015-01-01 03:06:32 -08:00
magical f67abd3a9d Refactor: use [25]uint64 instead of [5][5]uint64. 2015-01-01 03:00:28 -08:00
magical c8f826bc6a Begin to refactor gen.go 2015-01-01 00:41:19 -08:00
magical 7b01515ff6 Gofmt. 2014-12-31 23:59:40 -08:00
magical e7f1f3541f Add a simple sha3sum utility. 2014-12-31 23:32:14 -08:00
magical dd21e91ec1 Add 512-bit hash. Use SHA-3 padding. 2014-12-31 23:19:05 -08:00
magical 58e2940852 Remove dead function. 2014-12-31 20:30:56 -08:00
magical 56a2055f6e Use fewer XORs in Theta and eliminate Pi.
Go's common subexpression elimination is apparently not up to snuff.

Pi is now done implicitly.
2014-12-31 17:52:09 -08:00
magical 64c5855490 Optimize loads and stores a bit. 2014-12-31 16:52:34 -08:00
magical df6edcd0bb Generate a faster round function. 2014-12-31 16:43:36 -08:00
magical c9dcfb85a1 Gofmt 2014-12-31 15:19:23 -08:00
magical e40b3562fb Perform keccak-f in-place. 2014-12-31 15:17:55 -08:00
magical 5ee886a4b3 Swap x and y. 2014-12-31 15:15:49 -08:00
magical 0ed98686b8 Combine keccakf steps. 2014-12-31 15:11:40 -08:00
magical ed04711f60 Initial commit. 2014-12-31 14:59:00 -08:00