Module strscans

This module contains a scanf macro that can be used for extracting substrings from an input string. This is often easier than regular expressions. Some examples as an apetizer:

# check if input string matches a triple of integers:
const input = "(1,2,4)"
var x, y, z: int
if scanf(input, "($i,$i,$i)", x, y, z):
  echo "matches and x is ", x, " y is ", y, " z is ", z

# check if input string matches an ISO date followed by an identifier followed
# by whitespace and a floating point number:
var year, month, day: int
var identifier: string
var myfloat: float
if scanf(input, "$i-$i-$i $w$s$f", year, month, day, identifier, myfloat):
  echo "yes, we have a match!"

As can be seen from the examples, strings are matched verbatim except for substrings starting with $. These constructions are available:

$iMatches an integer. This uses parseutils.parseInt.
$fMatches a floating pointer number. Uses parseFloat.
$wMatches an ASCII identifier: [A-Z-a-z_][A-Za-z_0-9]*.
$sSkips optional whitespace.
$$Matches a single dollar sign.
$.Matches if the end of the input string has been reached.
$*Matches until the token following the $* was found. The match is allowed to be of 0 length.
$+Matches until the token following the $+ was found. The match must consist of at least one char.
${foo}User defined matcher. Uses the proc foo to perform the match. See below for more details.
$[foo]Call user defined proc foo to skip some optional parts in the input string. See below for more details.

Even though $* and $+ look similar to the regular expressions .* and .+ they work quite differently, there is no non-deterministic state machine involved and the matches are non-greedy. [$*] matches [xyz] via parseutils.parseUntil.

Furthermore no backtracking is performed, if parsing fails after a value has already been bound to a matched subexpression this value is not restored to its original value. This rarely causes problems in practice and if it does for you, it's easy enough to bind to a temporary variable first.

Startswith vs full match

scanf returns true if the input string starts with the specified pattern. If instead it should only return true if theres is also nothing left in the input, append $. to your pattern.

User definable matchers

One very nice advantage over regular expressions is that scanf is extensible with ordinary Nim procs. The proc is either enclosed in ${} or in $[]. ${} matches and binds the result to a variable (that was passed to the scanf macro) while $[] merely optional tokens.

In this example, we define a helper proc someSep that skips some separators which we then use in our scanf pattern to help us in the matching process:

proc someSep(input: string; start: int; seps: set[char] = {':','-','.'}): int =
  # Note: The parameters and return value must match to what ``scanf`` requires
  result = 0
  while input[start+result] in seps: inc result

if scanf(input, "$w$[someSep]$w", key, value):
  ...

It also possible to pass arguments to a user definable matcher:

proc ndigits(input: string; intVal: var int; start: int; n: int): int =
  # matches exactly ``n`` digits. Matchers need to return 0 if nothing
  # matched or otherwise the number of processed chars.
  var x = 0
  var i = 0
  while i < n and i+start < input.len and input[i+start] in {'0'..'9'}:
    x = x * 10 + input[i+start].ord - '0'.ord
    inc i
  # only overwrite if we had a match
  if i == n:
    result = n
    intVal = x

# match an ISO date extracting year, month, day at the same time.
# Also ensure the input ends after the ISO date:
var year, month, day: int
if scanf("2013-01-03", "${ndigits(4)}-${ndigits(2)}-${ndigits(2)}$.", year, month, day):
  ...

Macros

macro scanf[](input: string; pattern: static[string]; results: varargs[typed]): bool
See top level documentation of his module of how scanf works.   Source Edit
macro scanp(input, idx: typed; pattern: varargs[untyped]): bool
See top level documentation of his module of how scanp works.   Source Edit

Templates

template atom(input: string; idx: int; c: char): bool
Used in scanp for the matching of atoms (usually chars).   Source Edit
template atom(input: string; idx: int; s: set[char]): bool
  Source Edit
template success(x: int): bool
  Source Edit
template nxt(input: string; idx, step: int = 1)
  Source Edit