Paul Isaacs wrote:
On 19/01/17 06:00 PM, Peter wrote:
As the grammer for new_ordinal_type is new_ordinal_type: enumerated_type | subrange_type ; (a) would still be parsed as "new_ordinal_type". from that point, only one token, either ; or .. is needed to resolve which "new_ordinal_type" we have. Don't see the problem. Regards, Peter
Hello Peter,
Ahhh. Some light. If the parse is shift-reduce a-la bison, yes.
If the parse is recursive descent, it may be different story. The enumerated/subrange choice can not be delayed.
I am too green to know but I find it hard to believe that the type classification of a grammar ( LALR( 1 ), etc. ) depends on the parsing method.
I have chosen recursive descent. It is easier to understand and the error messages are more meaningful because the parser always knows where it is in the productions. The probability is higher that others ( who are not bison gurus and who want to stick with pascal source ) may take an interest in the compiler.
I have a circular buffers at both the lexical and token levels whose size can be set so the lookahead is not a problem.
Up until this point, I have not had a lookahead problem ( at least that I am aware of ) and it has been my intention to generate a phase 1 parser for an eventual compiler that used strictly lexical information.
However, I am now leaning towards the idea that using the semantic hints as provided by the multiple identifier types in the productions may be wiser. Type mismatches would be detected and reported much earlier at the expense of maintaining a symbol table stack and the parser would more closely follow the 20106 standard. Is this: true? wise?
1) lexical level deals with tokens. Correct question is if there is syntactic ambiguity in ISO grammar. And the answer is that your example is _not_ an ambiguity. 2) Of course ability of parser to deal with tricky construct depends on parsing method. Basicaly simple recursive descent parser (LL(1)) must recognize a construct seeing first token of that construct (or a following token if the construct is empty). LR(1) parser must recognize a construct seeing first token _after_ the construct. In other words, LL(1) parser makes decisions quite early and may have trouble with constructs like "new_ordinal_type". LR(1) has no trouble with this. 3) LL(1) parser can be modified to work more like LR(1), namely you handle common prefix separately and once you can decide which branch to take you follow that branch. 4) IME using semantic information during parse is very problematic. Namely, semantic information for lookahead tokens is not available: you can put semantic information into symbol table only _after_ you recognized declaration. In many cases this does not cause problems, but you may get subtle errors. In particular problem becomes much worse if you decide to use more than one token of lookahead. 5) My preffered way to handle ambiguites which may be resoved using semantic information is to postpone resoving them to semantic stage. For example one could allow enumaration as lower bound of a subrange during parsing and report error only during semantic checking: arguably putting enumeration there is a semantic error not different in spirit from using non-constant expression as lower bound or from type error in lower bound expression. 6) GPC uses GLR parser (which essentially automatically uses whatever lookahead is needed to resolve parse) and subranges are main reason for needing GLR. However, ISO subranges are tame, the wild beasts are Borland Pascal extensions.