Re: 10206 lexical ambiguity

20 Jan 2017


      Paul Isaacs wrote:
...
On 19/01/17 06:00 PM, Peter wrote:
...
As the grammer for new_ordinal_type is new_ordinal_type: 
enumerated_type | subrange_type ; (a) would still be parsed as 
"new_ordinal_type". from that point, only one token, either ; or .. is 
needed to resolve which "new_ordinal_type" we have. Don't see the 
problem. Regards, Peter
Hello Peter,
Ahhh. Some light. If the parse is shift-reduce a-la bison, yes.
If the parse is recursive descent, it may be  different story. The 
enumerated/subrange choice can not be delayed.
I am  too green to know but I find it hard to believe that the type 
classification of a grammar ( LALR( 1 ), etc. ) depends on the parsing 
method.
I have chosen recursive descent. It is easier to understand and the 
error messages are more meaningful because the parser always knows where 
it is in the productions. The probability is higher that others ( who 
are not bison gurus and who want to stick with pascal source ) may take 
an interest in the compiler.
I have a circular buffers at both the lexical and token levels whose 
size can be set so the lookahead is not a problem.
Up until this point, I have not had a lookahead problem ( at least that 
I am aware of ) and it has been my intention to generate a phase 1 
parser for an eventual compiler that used strictly lexical information.
However, I am now leaning towards the idea that using the semantic hints 
as provided by the multiple identifier types in the productions may be 
wiser. Type mismatches would be detected and reported much earlier at 
the expense of maintaining a symbol table stack and the parser would 
more closely follow the 20106 standard. Is this: true? wise?
1) lexical level deals with tokens.  Correct question is if
   there is syntactic ambiguity in ISO grammar.  And the answer
   is that your example is _not_ an ambiguity.
2) Of course ability of parser to deal with tricky construct
   depends on parsing method.  Basicaly simple recursive descent
   parser (LL(1)) must recognize a construct seeing first token of
   that construct (or a following token if the construct is empty).
   LR(1) parser must recognize a construct seeing first token
   _after_ the construct.  In other words, LL(1) parser makes
   decisions quite early and may have trouble with constructs
   like "new_ordinal_type".  LR(1) has no trouble with this.
3) LL(1) parser can be modified to work more like LR(1), namely
   you handle common prefix separately and once you can
   decide which branch to take you follow that branch.
4) IME using semantic information during parse is very
   problematic.  Namely, semantic information for lookahead
   tokens is not available: you can put semantic information
   into symbol table only _after_ you recognized declaration.
   In many cases this does not cause problems, but you may
   get subtle errors.  In particular problem becomes much
   worse if you decide to use more than one token of
   lookahead.
5) My preffered way to handle ambiguites which may be resoved
   using semantic information is to postpone resoving them to
   semantic stage. For example one could allow enumaration
   as lower bound of a subrange during parsing and report
   error only during semantic checking: arguably putting
   enumeration there is a semantic error not different in
   spirit from using non-constant expression as lower bound
   or from type error in lower bound expression.
6) GPC uses GLR parser (which essentially automatically uses
   whatever lookahead is needed to resolve parse) and subranges
   are main reason for needing GLR.  However, ISO subranges
   are tame, the wild beasts are Borland Pascal extensions.
-- 
                              Waldek Hebisch

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: 10206 lexical ambiguity