Discussion:
[9fans] lan9 rc-shell level regex vs utf values questions
(too old to reply)
andrew zerger
2012-06-07 18:49:49 UTC
Permalink
Access utf values and echo from indexes on rc?

Before I write any code over the issue, does anyone have a grep, sed, or
split application which matches by utf character values like \U2424 instead
of whatever built-in token like \n ?

Or another basic question I cannot get the manuals to answer yet, how to
'echo \U2424' value?

Just seems like this would have been done already, so asking,
tyvm
--
⎌⎺⎺├@┌␊├├≀-␍⎌␊▒␍:/␀⎺└␊/⎌␀⎺#
c***@gmx.de
2012-06-07 18:59:09 UTC
Permalink
in plan9, you hit [Alt] then type X2424

echo '␤'

alternatively, you can run:

unicode 2424

also, no \n needed:

echo '
This
Is
a

Test'

theres no need to escape anything other than the quotes.

--
cinap
erik quanstrom
2012-06-07 19:05:13 UTC
Permalink
Post by andrew zerger
Access utf values and echo from indexes on rc?
Before I write any code over the issue, does anyone have a grep, sed, or
split application which matches by utf character values like \U2424 instead
of whatever built-in token like \n ?
do you mean that instead of matching lines you want to match records delimited
by \u2424? if so, you can use sam or acme structured regular expressions, sres, or
you can just tr ␤ '\n'. if you want to save the original newlines, you can first
change those to an unused character in your text.

as cinap mentions, there's no need to escape codepoints >= 0x80. rc treats
'em all as the same class of symbol as a-z.

- erik
Ethan Grammatikidis
2012-06-08 09:03:09 UTC
Permalink
On Thu, 7 Jun 2012 15:05:13 -0400
Post by erik quanstrom
do you mean that instead of matching lines you want to match records delimited
by \u2424?
Also awk 'RS=␤', or awk -F ␤ where -F is field separator.
Ethan Grammatikidis
2012-06-08 11:12:58 UTC
Permalink
On Fri, 8 Jun 2012 10:03:09 +0100
Post by Ethan Grammatikidis
On Thu, 7 Jun 2012 15:05:13 -0400
Post by erik quanstrom
do you mean that instead of matching lines you want to match records delimited
by \u2424?
Also awk 'RS=␤', or awk -F ␤ where -F is field separator.
my bad, awk -v 'RS=␤' or put RS=␤ in the BEGIN block. You can also set FS instead of using -F if you like.
Loading...