On Sat, Apr 09, 2005 at 05:14:26PM +0100, Prof A Olowofoyeku (The African Chief) wrote:
Hi
I have been trying to implement a "pos" function that only matches whole words. Below is my best effort. Is there a more efficient (or "better") way of implementing this? Thanks.
The efficiency of your solution (after correction) depends on the efficiency of pos.
Better efficiency might be obtained, depending on the characteristics of the actual parameters. Searching for short identifiers in relatively short strings versus searching for `tricky'/longer identifiers in trickier/longer texts (with many partial matches), but this requires that you rewrite the implementation of pos.
A typical implementation of pos traverses the text from left to right (low to high index values), and matches it against the substring, also from left to right. When the match fails, the next character in the text is taken as starting position of the next matching attempt. If you match the string from right to left against the text, you can often make a bigger jump in the text for the next candidate position.
If you are really interested in this art, you can consult e.g. one of (but there are other sources that may be more accessible):
A New Taxonomy of Sublinear Keyword Pattern Matching Algorithms, by L.G.W.A. Cleophas, B.W. Watson and G. Zwaan, Computing Science Report 2004/03, Faculty of Mathematics and Computing Science, Eindhoven University of Technology, Eindhoven, The Netherlands, March 2004.
http://alexandria.tue.nl/extra1/wskrap/publichtml/200407.pdf -- A Taxonomy of Keyword Pattern Matching Algorithms, by B.W. Watson and G. Zwaan, Computing Science Report 92/27, Faculty of Mathematics and Computing Science, Eindhoven University of Technology, Eindhoven, The Netherlands, 1992, ISSN 0926-4515.
http://alexandria.tue.nl/extra1/wskrap/publichtml/9310412.pdf --
{ test: correct results in comments } begin writeln (PosInString ('adam', 'he is adamant')); {0} writeln (PosInString ('adam', 'why is he so adamant about it?')); {0} writeln (PosInString ('adam', 'he knows about adam ant')); {16} writeln (PosInString ('adam', 'adam and eve')); {1} writeln (PosInString ('adam', 'eve and adam')); {9} writeln (PosInString ('adam', 'eve and adam ate the apple')); {9} end.
I am missing test cases like
writeln (PosInString ('adam', 'adam')); {1} writeln (PosInString ('adam', 'adamant')); {0} writeln (PosInString ('adam', 'madam')); {0} writeln (PosInString ('adam', 'madam i''m adam')); {11}
Tom