free software resistance
the cost of computing freedom is eternal vigilance
### shuftail.fig
*originally posted:* may 2023
of the tools included (or not included) with openbsd, the one i miss the most is shuf.
recently i had a particularly important use for it-- and i wanted to work on a very large source, while limiting the size of the output.
basically, i wanted to work with 1.5 gb (millions of lines) of input and limit output to an arbitrary number of lines, but probably about 1000.
i didnt just want the equivalent of shuf | tail but rather shuf that selects from millions of lines, without actually shuffling millions of lines. shuf | tail wont do that.
shuftail will.
```
cat reallyhuge.txt | shuftail 1000 2000000
```
will let you select 1,000 lines (randomly) from 2,000,000 lines, which it will then shuffle.
because it doesnt hold the entire input in a buffer, and because it selects randomly based on the total size, and because it doesnt want to require you to feed the input twice (taking twice as long and being very tedious) it requires you to specify the number of lines you will be giving it. if you give it more lines, it will only select input from the top but if you give it fewer lines, it might not work reliably.
not knowing how the shuf routine works, this iterates through every line (of the smaller specified buffer, not the entire source) and swaps that line with a random line-- i wanted the shuffle to be thorough but efficient. this routine should take the roughly the same amount of time (proportional to the number of lines) on every run.
the code is not indented, it took maybe 30-45 minutes to write, future versions should really use indentation.
like most things on this website, the code is licensed 0-clause bsd.
```
# shuftail
# usage: stdout | shuftail taillength sourcelength
# 2023
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
try
l command mid 1 1 val
p command mid 2 1 val
ifless p l
now 1 divby 0 # sourcelength cannot be less than tail
next
except
now "usage: shuftail taillength sourcelength" print end
next
b 0 arr mid 1 0
c 0
while
n randint 1 p
f instr b n
ifequal f 0
b plus n
now c plus 1 swap now c
ifequal c l
break
next
next
wend
c 0
buf "" arr mid 1 0
try
while
source lineinput
now c plus 1 swap now c
f instr b c
iftrue f
buf plus source
next
wend
except
p
next
for each 1 l 1
r randint 1 l
s1 buf mid each 1
s2 buf mid r 1
buf arrset each s2
buf arrset r s1
next
forin each buf
now each print
next
```
license: 0-clause bsd
```
# 2018, 2019, 2020, 2021, 2022, 2023
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
```
=> https://freesoftwareresistance.neocities.org