johnkerl · johnkerl · Dec 15, 2021 · Dec 13, 2021 · Dec 13, 2021 · Dec 7, 2021
diff --git a/docs/src/manpage.md b/docs/src/manpage.md
@@ -478,6 +478,14 @@ MISCELLANEOUS FLAGS
                                 rather than after. May be used more than once.
                                 Example: `mlr --from a.dat --from b.dat cat` is the
                                 same as `mlr cat a.dat b.dat`.
+       --hash-records           This is an internal parameter which normally does not
+                                need to be modified. It controls the mechanism by
+                                which Miller accesses fields within records. In
+                                general --no-hash-records is faster, and is the
+                                default. For specific use-cases involving data having
+                                many fields, and many of them being processed during
+                                a given processing run, --hash-records might offer a
+                                slight performance benefit.
        --infer-int-as-float or -A
                                 Cast all integers in data files to floats.
        --infer-no-octal or -O   Treat numbers like 0123 in data files as string
@@ -508,12 +516,16 @@ MISCELLANEOUS FLAGS
                                 unlikely to be a noticeable performance improvement,
                                 since direct-to-screen output for large files has its
                                 own overhead.
+       --no-hash-records        See --hash-records.
        --nr-progress-mod {m}    With m a positive integer: print filename and record
                                 count to os.Stderr every m input records.
        --ofmt {format}          E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
                                 sprintf-style codes for floating-point nummbers. If
                                 not specified, default formatting is used. See also
                                 the `fmtnum` function and the `format-values` verb.
+       --records-per-batch {n}  This is an internal parameter for maximum number of
+                                records in a batch size. Normally this does not need
+                                to be modified.
        --seed {n}               with `n` of the form `12345678` or `0xcafefeed`. For
                                 `put`/`filter` `urand`, `urandint`, and `urand32`.
        --tz {timezone}          Specify timezone, overriding `$TZ` environment
@@ -2994,5 +3006,5 @@ SEE ALSO
 
 
 
-                                  2021-12-07                         MILLER(1)
+                                  2021-12-15                         MILLER(1)
 </pre>
diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt
@@ -457,6 +457,14 @@ MISCELLANEOUS FLAGS
                                 rather than after. May be used more than once.
                                 Example: `mlr --from a.dat --from b.dat cat` is the
                                 same as `mlr cat a.dat b.dat`.
+       --hash-records           This is an internal parameter which normally does not
+                                need to be modified. It controls the mechanism by
+                                which Miller accesses fields within records. In
+                                general --no-hash-records is faster, and is the
+                                default. For specific use-cases involving data having
+                                many fields, and many of them being processed during
+                                a given processing run, --hash-records might offer a
+                                slight performance benefit.
        --infer-int-as-float or -A
                                 Cast all integers in data files to floats.
        --infer-no-octal or -O   Treat numbers like 0123 in data files as string
@@ -487,12 +495,16 @@ MISCELLANEOUS FLAGS
                                 unlikely to be a noticeable performance improvement,
                                 since direct-to-screen output for large files has its
                                 own overhead.
+       --no-hash-records        See --hash-records.
        --nr-progress-mod {m}    With m a positive integer: print filename and record
                                 count to os.Stderr every m input records.
        --ofmt {format}          E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
                                 sprintf-style codes for floating-point nummbers. If
                                 not specified, default formatting is used. See also
                                 the `fmtnum` function and the `format-values` verb.
+       --records-per-batch {n}  This is an internal parameter for maximum number of
+                                records in a batch size. Normally this does not need
+                                to be modified.
        --seed {n}               with `n` of the form `12345678` or `0xcafefeed`. For
                                 `put`/`filter` `urand`, `urandint`, and `urand32`.
        --tz {timezone}          Specify timezone, overriding `$TZ` environment
@@ -2973,4 +2985,4 @@ SEE ALSO
 
 
 
-                                  2021-12-07                         MILLER(1)
+                                  2021-12-15                         MILLER(1)
diff --git a/docs/src/reference-main-flag-list.md b/docs/src/reference-main-flag-list.md
@@ -341,6 +341,8 @@ These are flags which don't fit into any other category.
 `: Force buffered output to be written after every output record. The default is flush output after every record if the output is to the terminal, or less often if the output is to a file or a pipe. The default is a significant performance optimization for large files.  Use this flag to force frequent updates even when output is to a pipe or file, at a performance cost.
 * `--from {filename}
 `: Use this to specify an input file before the verb(s), rather than after. May be used more than once. Example: `mlr --from a.dat --from b.dat cat` is the same as `mlr cat a.dat b.dat`.
+* `--hash-records
+`: This is an internal parameter which normally does not need to be modified. It controls the mechanism by which Miller accesses fields within records. In general --no-hash-records is faster, and is the default. For specific use-cases involving data having many fields, and many of them being processed during a given processing run, --hash-records might offer a slight performance benefit.
 * `--infer-int-as-float or -A
 `: Cast all integers in data files to floats.
 * `--infer-no-octal or -O
@@ -355,10 +357,14 @@ These are flags which don't fit into any other category.
 `: Like `--load` but works with more than one filename, e.g. `--mload *.mlr --`.
 * `--no-fflush
 `: Let buffered output not be written after every output record. The default is flush output after every record if the output is to the terminal, or less often if the output is to a file or a pipe. The default is a significant performance optimization for large files.  Use this flag to allow less-frequent updates when output is to the terminal. This is unlikely to be a noticeable performance improvement, since direct-to-screen output for large files has its own overhead.
+* `--no-hash-records
+`: See --hash-records.
 * `--nr-progress-mod {m}
 `: With m a positive integer: print filename and record count to os.Stderr every m input records.
 * `--ofmt {format}
 `: E.g. `%.18f`, `%.0f`, `%9.6e`. Please use sprintf-style codes for floating-point nummbers. If not specified, default formatting is used.  See also the `fmtnum` function and the `format-values` verb.
+* `--records-per-batch {n}
+`: This is an internal parameter for maximum number of records in a batch size. Normally this does not need to be modified.
 * `--seed {n}
 `: with `n` of the form `12345678` or `0xcafefeed`. For `put`/`filter` `urand`, `urandint`, and `urand32`.
 * `--tz {timezone}

diff --git a/internal/pkg/cli/option_parse.go b/internal/pkg/cli/option_parse.go
@@ -2578,7 +2578,11 @@ this does not need to be modified.`,
 
 		{
 			name: "--hash-records",
-			help: `This is an internal parameter which normally does not need to be modified.`,
+			help: `This is an internal parameter which normally does not need to be modified.
+It controls the mechanism by which Miller accesses fields within records.
+In general --no-hash-records is faster, and is the default. For specific use-cases involving
+data having many fields, and many of them being processed during a given processing run,
+--hash-records might offer a slight performance benefit.`,
 			parser: func(args []string, argc int, pargi *int, options *TOptions) {
 				types.HashRecords(true)
 				*pargi += 1
@@ -2587,7 +2591,7 @@ this does not need to be modified.`,
 
 		{
 			name: "--no-hash-records",
-			help: `This is an internal parameter which normally does not need to be modified.`,
+			help: `See --hash-records.`,
 			parser: func(args []string, argc int, pargi *int, options *TOptions) {
 				types.HashRecords(false)
 				*pargi += 1

diff --git a/internal/pkg/types/mlrmap.go b/internal/pkg/types/mlrmap.go
@@ -58,7 +58,7 @@ package types
 // Both these figures are for just doing mlr cat. At the moment I'm leaving this
 // default-on pending more profiling on more complex record-processing operations
 // such as mlr sort.
-var hashRecords = true
+var hashRecords = false
 
 func HashRecords(onOff bool) {
 	hashRecords = onOff

diff --git a/man/manpage.txt b/man/manpage.txt
@@ -457,6 +457,14 @@ MISCELLANEOUS FLAGS
                                 rather than after. May be used more than once.
                                 Example: `mlr --from a.dat --from b.dat cat` is the
                                 same as `mlr cat a.dat b.dat`.
+       --hash-records           This is an internal parameter which normally does not
+                                need to be modified. It controls the mechanism by
+                                which Miller accesses fields within records. In
+                                general --no-hash-records is faster, and is the
+                                default. For specific use-cases involving data having
+                                many fields, and many of them being processed during
+                                a given processing run, --hash-records might offer a
+                                slight performance benefit.
        --infer-int-as-float or -A
                                 Cast all integers in data files to floats.
        --infer-no-octal or -O   Treat numbers like 0123 in data files as string
@@ -487,12 +495,16 @@ MISCELLANEOUS FLAGS
                                 unlikely to be a noticeable performance improvement,
                                 since direct-to-screen output for large files has its
                                 own overhead.
+       --no-hash-records        See --hash-records.
        --nr-progress-mod {m}    With m a positive integer: print filename and record
                                 count to os.Stderr every m input records.
        --ofmt {format}          E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
                                 sprintf-style codes for floating-point nummbers. If
                                 not specified, default formatting is used. See also
                                 the `fmtnum` function and the `format-values` verb.
+       --records-per-batch {n}  This is an internal parameter for maximum number of
+                                records in a batch size. Normally this does not need
+                                to be modified.
        --seed {n}               with `n` of the form `12345678` or `0xcafefeed`. For
                                 `put`/`filter` `urand`, `urandint`, and `urand32`.
        --tz {timezone}          Specify timezone, overriding `$TZ` environment
@@ -2973,4 +2985,4 @@ SEE ALSO
 
 
 
-                                  2021-12-07                         MILLER(1)
+                                  2021-12-15                         MILLER(1)
diff --git a/man/mlr.1 b/man/mlr.1
@@ -2,12 +2,12 @@
 .\"     Title: mlr
 .\"    Author: [see the "AUTHOR" section]
 .\" Generator: ./mkman.rb
-.\"      Date: 2021-12-07
+.\"      Date: 2021-12-15
 .\"    Manual: \ \&
 .\"    Source: \ \&
 .\"  Language: English
 .\"
-.TH "MILLER" "1" "2021-12-07" "\ \&" "\ \&"
+.TH "MILLER" "1" "2021-12-15" "\ \&" "\ \&"
 .\" -----------------------------------------------------------------
 .\" * Portability definitions
 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -576,6 +576,14 @@ These are flags which don't fit into any other category.
                          rather than after. May be used more than once.
                          Example: `mlr --from a.dat --from b.dat cat` is the
                          same as `mlr cat a.dat b.dat`.
+--hash-records           This is an internal parameter which normally does not
+                         need to be modified. It controls the mechanism by
+                         which Miller accesses fields within records. In
+                         general --no-hash-records is faster, and is the
+                         default. For specific use-cases involving data having
+                         many fields, and many of them being processed during
+                         a given processing run, --hash-records might offer a
+                         slight performance benefit.
 --infer-int-as-float or -A
                          Cast all integers in data files to floats.
 --infer-no-octal or -O   Treat numbers like 0123 in data files as string
@@ -606,12 +614,16 @@ These are flags which don't fit into any other category.
                          unlikely to be a noticeable performance improvement,
                          since direct-to-screen output for large files has its
                          own overhead.
+--no-hash-records        See --hash-records.
 --nr-progress-mod {m}    With m a positive integer: print filename and record
                          count to os.Stderr every m input records.
 --ofmt {format}          E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
                          sprintf-style codes for floating-point nummbers. If
                          not specified, default formatting is used. See also
                          the `fmtnum` function and the `format-values` verb.
+--records-per-batch {n}  This is an internal parameter for maximum number of
+                         records in a batch size. Normally this does not need
+                         to be modified.
 --seed {n}               with `n` of the form `12345678` or `0xcafefeed`. For
                          `put`/`filter` `urand`, `urandint`, and `urand32`.
 --tz {timezone}          Specify timezone, overriding `$TZ` environment

diff --git a/todo.txt b/todo.txt
@@ -1,6 +1,7 @@
 ================================================================
 PUNCHDOWN LIST
 
+* --ifs-regex & --ips-regex -- guessing is not safe as evidence by '.' and '|'
 
 * big-picture item @ Rmd; also webdoc intro page