Discussion:
[9fans] empty *
(too old to reply)
Gorka Guardiola
2012-06-14 08:28:09 UTC
Permalink
While playing with grep, I was suprised by grep '*\.c' not giving
an error (* is missing an operand). Arguably * applied to empty
can match empty, but surprisingly enough, Acme's edit behaves
differently. And even grep is not consistent (grep '*' is different than
grep '' whereas both should be an empty pattern or the first one
should be an error). Another funny one is that Edit gives back
an error complaining of missing operand to * when the regexp is
empty.

Greps from other systems accept an empty pattern
(and are thus consistent but they would not have
catched the error starting all this).


cpu% echo hola | grep '*a'
hola
cpu% echo hola | grep '*'
grep: *: syntax error
cpu% echo hola | grep ''
grep: empty pattern

Edit , s/*//
regexp: missing operand for *
Edit: bad regexp in s command

Edit , s/*c//
regexp: missing operand for *
Edit: bad regexp in s command

Edit , s///

regexp: missing operand for *
Edit: bad regexp in s command

G.
Peter A. Cejchan
2012-06-14 09:32:56 UTC
Permalink
This is from manpage, but I not sure what _exactly_ it means, and whether
it applies to your problem:
Care should be taken when using the shell metacharacters
$*[^|()=\ and newline in pattern; it is safest to enclose
the entire expression in single quotes '...'. An expression
starting with '*' will treat the rest of the expression as
literal characters.

more strange behavior:
% echo foo.c | 9 grep '*\.c'
%
% echo foo.c | 9 grep '*.c'
foo.c
% echo fooxc | 9 grep '*.c'
%
% echo fooxc | 9 grep '.*.c'
fooxc
% echo fooxc | 9 grep '.*\.c'
%
% echo foo.c | 9 grep '.*\.c'
foo.c
% echo foo.c | 9 grep '*foo.c'
foo.c
% echo foo.c | 9 grep '*.00.c'
%

Looks like
" An expression
starting with '*' will treat the rest of the expression as
literal characters."
(see above) really applies (for unknown reasons).


However, I am just a 'toy programmer', so you were warned ;-)
Regards,
++pac
Post by Gorka Guardiola
While playing with grep, I was suprised by grep '*\.c' not giving
an error (* is missing an operand). Arguably * applied to empty ... [snip]
Gorka Guardiola
2012-06-14 09:54:28 UTC
Permalink
This is from manpage, but I not sure what _exactly_ it means, and whether it
          Care should be taken when using the shell metacharacters
          $*[^|()=\ and newline in pattern; it is safest to enclose
          the entire expression in single quotes '...'.  An expression
          starting with '*' will treat the rest of the expression as
          literal characters.
Everything is enclosed in '' the shell is not seeing this.

G.
Anthony Martin
2012-06-14 09:42:19 UTC
Permalink
% sed -n 20,32p /sys/src/cmd/grep/grep.y
prog: /* empty */
{
yyerror("empty pattern");
}
| expr newlines
{
$$.beg = ral(Tend);
$$.end = $$.beg;
$$ = re2cat(re2star(re2or(re2char(0x00, '\n'-1), re2char('\n'+1, 0xff))), $$);
$$ = re2cat($1, $$);
$$ = re2cat(re2star(re2char(0x00, 0xff)), $$);
topre = $$;
}
%

The above code sets up the initial state
machine including the pattern passed on
the command line, $1.

This combined with the fact that multiple
"stars" are coalesced causes the weirdness
you're seeing.

Anthony
erik quanstrom
2012-06-14 13:29:55 UTC
Permalink
Post by Anthony Martin
This combined with the fact that multiple
"stars" are coalesced causes the weirdness
you're seeing.
there is no case of multiple '*'s in the patterns peter gave.
there is a case of patterns beginning with '*' which treats the
rest of the pattern as a literal, but that's different.

- erik
erik quanstrom
2012-06-14 13:33:45 UTC
Permalink
Post by Gorka Guardiola
While playing with grep, I was suprised by grep '*\.c' not giving
an error (* is missing an operand). Arguably * applied to empty
nope, that's not right. * starting a pattern escapes the whole string.
this is unique to grep.

from grep(1):

[....] An expression
starting with '*' will treat the rest of the expression as
literal characters.

- erik
erik quanstrom
2012-06-14 13:44:25 UTC
Permalink
nope, that's not right.  * starting a pattern escapes the whole string.
this is unique to grep.
Argh, yes, it has a special meaning. I have somehow managed to miss
this for all this time.
it's easy to miss, but critical especially since we have other implementations
that don't do this. i'd argue that they should for consistency.

- erik
Lucio De Re
2012-06-14 14:05:43 UTC
Permalink
Post by erik quanstrom
nope, that's not right.  * starting a pattern escapes the whole string.
this is unique to grep.
Argh, yes, it has a special meaning. I have somehow managed to miss
this for all this time.
it's easy to miss, but critical especially since we have other implementations
that don't do this. i'd argue that they should for consistency.
It seems to me that grep requires it, in some sense, because the Unix
grep has a -f option that resembles it. Other regexp users almost
invariably achieve this by other, simpler means.

++L
t***@polynum.com
2012-06-14 14:00:47 UTC
Permalink
nope, that's not right.  * starting a pattern escapes the whole string.
this is unique to grep.
I guess this is surprising because with a POSIX grep(1), if I read the
description correctly:

1) If the * is the very character of a BRE (since POSIX has BRE and ERE)
it shall be treated as is---but the remaining of the expression is
interpreted.

2) In a ERE, if the * is the very first character, or follows |,
^ or ( this is undefined.

But I must admit that I was unaware, till Erik's message, of both the
Plan9 behavior, and even the "details" of the POSIX behavior...
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Gorka Guardiola
2012-06-14 14:13:12 UTC
Permalink
Post by t***@polynum.com
nope, that's not right.  * starting a pattern escapes the whole string.
this is unique to grep.
I guess this is surprising because with a POSIX grep(1), if I read the
1) If the * is the very character of a BRE (since POSIX has BRE and ERE)
it shall be treated as is---but the remaining of the expression is
interpreted.
2) In a ERE, if the * is the very first character, or follows |,
^ or ( this is undefined.
Also this:

cpu% echo hola | grep '*'
grep: *: syntax error
cpu% echo hola | grep ''
grep: empty pattern


grep '*' and grep '' should still be the same, shouldn't they?

G.
t***@polynum.com
2012-06-14 14:36:02 UTC
Permalink
Post by Gorka Guardiola
cpu% echo hola | grep '*'
grep: *: syntax error
The plan9 regexp are mainly Extended Regular Expression. If the POSIX
description is taken, a leading '*' is a syntax error. I guess that the
leading '*' followed by some non empty pattern is a Plan9 way to get
"grep -F" ?
Post by Gorka Guardiola
cpu% echo hola | grep ''
grep: empty pattern
From the POSIX description, an empty pattern is not allowed. '*' is not
an empty pattern.
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
erik quanstrom
2012-06-14 14:52:02 UTC
Permalink
Post by t***@polynum.com
Post by Gorka Guardiola
cpu% echo hola | grep '*'
grep: *: syntax error
The plan9 regexp are mainly Extended Regular Expression. If the POSIX
description is taken, a leading '*' is a syntax error. I guess that the
leading '*' followed by some non empty pattern is a Plan9 way to get
"grep -F" ?
Post by Gorka Guardiola
cpu% echo hola | grep ''
grep: empty pattern
From the POSIX description, an empty pattern is not allowed. '*' is not
an empty pattern.
i'm sorry this just isn't correct. see the man page.

plan 9 grep has no intentions of being posix. grep '*' can be seen as a literal
escape plus the pattern of ''. this is an empty pattern, thus an error.

- erik
t***@polynum.com
2012-06-14 15:08:32 UTC
Permalink
Post by erik quanstrom
Post by t***@polynum.com
Post by Gorka Guardiola
cpu% echo hola | grep ''
grep: empty pattern
From the POSIX description, an empty pattern is not allowed. '*' is not
an empty pattern.
i'm sorry this just isn't correct. see the man page.
plan 9 grep has no intentions of being posix. grep '*' can be seen as a literal
escape plus the pattern of ''. this is an empty pattern, thus an error.
In this case, shouldn't the error message be exactly the same?
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
erik quanstrom
2012-06-14 14:46:58 UTC
Permalink
Post by Gorka Guardiola
cpu% echo hola | grep '*'
grep: *: syntax error
cpu% echo hola | grep ''
grep: empty pattern
grep '*' and grep '' should still be the same, shouldn't they?
yes, but does it matter? you correctly get an error either way.

- erik
Gorka Guardiola
2012-06-14 14:50:22 UTC
Permalink
On Thu, Jun 14, 2012 at 4:46 PM, erik quanstrom
Post by erik quanstrom
cpu%  echo hola | grep '*'
grep: *: syntax error
cpu%  echo hola | grep ''
grep: empty pattern
grep '*' and grep '' should still be the same, shouldn't they?
yes, but does it matter?
Probably not.

G.
Gorka Guardiola
2012-06-14 13:38:59 UTC
Permalink
On Thu, Jun 14, 2012 at 3:33 PM, erik quanstrom
Post by Gorka Guardiola
While playing with grep, I was suprised by grep '*\.c' not giving
an error (* is missing an operand). Arguably * applied to empty
nope, that's not right.  * starting a pattern escapes the whole string.
this is unique to grep.
Argh, yes, it has a special meaning. I have somehow managed to miss
this for all this time.

G.
erik quanstrom
2012-06-14 13:28:07 UTC
Permalink
Post by Peter A. Cejchan
% echo foo.c | 9 grep '*\.c'
correct. match \.c as a literal string. there is no match.
Post by Peter A. Cejchan
% echo foo.c | 9 grep '*.c'
foo.c
correct. match .c as a littal string. there is a match.
Post by Peter A. Cejchan
% echo fooxc | 9 grep '*.c'
%
% echo fooxc | 9 grep '.*.c'
fooxc
correct. match 0-n any character then 1 any character then a c. there is a match.
Post by Peter A. Cejchan
% echo fooxc | 9 grep '.*\.c'
correct. this time there's no match because '.' is treated as a literal not
a pattern.
Post by Peter A. Cejchan
% echo foo.c | 9 grep '.*\.c'
foo.c
correct. match 0-n any characters, then a literal '.' then literal 'c'. there is a match.
Post by Peter A. Cejchan
% echo foo.c | 9 grep '*foo.c'
foo.c
correct. match the literal string foo.c. there is a match.

remember that the match doesn't have to be anchored by default, so i sometimes
do this
grep $somesym `{find /sys/src|grep '\.[chys]$'}
this is also packaged up in the local version of 'g'; this would be equivalent
g $somesym /sys/src

- erik
Continue reading on narkive:
Loading...