Discussion:
[9fans] grep bug?
(too old to reply)
arisawa
2013-07-11 13:11:08 UTC
Permalink
Hello,

It seems f option of grep is buggy.
or any limitations in using the RE?

term% wc MD5dir
4584 9168 388756 MD5dir
term% wc x
4582 4582 151206 x
term% grep -f x MD5dir | wc
4580 9160 388463
term%
term% grep e54272690d513f8b2403568a7574b1ba MD5dir
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep e54272690d513f8b2403568a7574b1ba x
e54272690d513f8b2403568a7574b1ba
term% grep -v -f x MD5dir
7b6d7ae369226b6d0195ac3fe4487ce7 /usr/arisawa/src/elnfs/WWW/
d44d788ad1237311d8282bbabca65977 /usr/arisawa/src/hg/python-2.5.1-ape/Modules/_ctypes/libffi/src/darwin/
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
84a0f83f5020f16d0b277e8b19407791 /usr/arisawa/src/trans
term%

Kenji Arisawa
erik quanstrom
2013-07-11 14:05:18 UTC
Permalink
Post by arisawa
Hello,
It seems f option of grep is buggy.
or any limitations in using the RE?
term% wc MD5dir
4584 9168 388756 MD5dir
term% wc x
4582 4582 151206 x
term% grep -f x MD5dir | wc
4580 9160 388463
term%
term% grep e54272690d513f8b2403568a7574b1ba MD5dir
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep e54272690d513f8b2403568a7574b1ba x
e54272690d513f8b2403568a7574b1ba
term% grep -v -f x MD5dir
7b6d7ae369226b6d0195ac3fe4487ce7 /usr/arisawa/src/elnfs/WWW/
d44d788ad1237311d8282bbabca65977 /usr/arisawa/src/hg/python-2.5.1-ape/Modules/_ctypes/libffi/src/darwin/
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
84a0f83f5020f16d0b277e8b19407791 /usr/arisawa/src/trans
term%
a trick i often use for many fixed strings is sort + uniq.
(internally, grep/comp.c:/^increment does O(n^2)
qsorts on the patterns) perhaps it could be used to
double-check.

to find the md5 hashes that only appear in one file or the other
(only the first field is considered by uniq),

cat x MD5dir | sort | uniq -c | sed '/^ *2 /d'

to count the fields that appear in both

cat x MD5dir | sort | uniq -c | grep '^ *2 ' | wc -l
or
... | awk '$1==2{n++}END{print n}'

can you find a smaller test case that has the same issue. this
should be fixed

- erik
arisawa
2013-07-11 21:39:26 UTC
Permalink
Thanks erik,

It is not easy to get small sample.
I tried to make smaller sample discarding the first line of pattern file and target file.
the problem depends not only pattern file but also target file!
it is curious why grep does not give conclusion to each line that is read from target file.

my result is shown below.
t0 and t1 are target file
z0 and z1 are pattern file

term% echo $t
e54272690d513f8b2403568a7574b1ba
term% grep -n $t z0 z1 t0 t1
z0:4081: e54272690d513f8b2403568a7574b1ba
z1:4082: e54272690d513f8b2403568a7574b1ba
t0:4298: e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
t1:4299: e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% wc z0 z1 t0 t1
4575 4575 150975 z0
4576 4576 151008 z1
4583 9166 388702 t0
4584 9168 388756 t1
18318 27485 1079441 total
term% diff -c z0 z1
z0:1,3 - z1:1,4
+ 00775d6a004acb79a2cd3ec30f743a9e
008de90ca02f6c4f10e2f172e0511105
00cfd51ea3ea1152f98c1f90130be7d0
00cff9d5f2b4d08753f96564dac50a58
term% diff -c t0 t1
t0:1,3 - t1:1,4
+ f104706a82b7c20e0f2c3cf83033958c /usr/arisawa/bin/rc/
521db7c46291f0785d8d77f8e614350a /usr/arisawa/bin/rc/photo/
e76b64877aea9c8b4dcc24069436d52d /usr/arisawa/lib/
1d6ca6d83f5f51ebfdd94fea02a9fb8b /usr/arisawa/lib/cookies/
term% grep -f z0 t0 | grep $t
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep -f z1 t0 | grep $t
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep -f z0 t1 | grep $t
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep -f z1 t1 | grep $t
term%

Kenji Arisawa
arisawa
2013-07-21 03:29:01 UTC
Permalink
Hello,

grep -vf z z
is a good test.

term% p z
850f815f90e7498364c668b9e0774d96
851cef1c518242f977be3ce40714b7f8
852142a8b4f175d4fa9003ab30743106
8522f3bc66efa1edaa1e6c495e2e7b89
852684fef763a4e36d57b99a85ab366b
85377206b2292dd884a5f502f846c7d1
...
term% wc z
2209 2209 72897 z
term% grep -vf z z
feeaa668ee79fbbde6f7539dac41a2ed
term% grep feeaa668ee79fbbde6f7539dac41a2ed z
feeaa668ee79fbbde6f7539dac41a2ed
term%

The sample file is:
http://plan9.aichi-u.ac.jp/z.gz

Kenji Arisawa

Loading...