Re: Implicit length/upperbound declarations

2 Jun 2003

      CBFalconer wrote:
...
Why not start the thinking from what is as compatible as possible
with standard Pascal techniques.
Since you bring up the issue of ISO 7185 ("classic" Pascal) vs.
10206 (Extended Pascal) again, would you like to address the
following limitations of classic Pascal? (Excuse me if I sound a
little like Brian Kernighan, but I think these points are valid
criticisms of classic Pascal. Most of them have been addressed in
Extended Pascal -- and, I have to admit, most of them also in BP,
though often differently.)
- No variables of dynamic size. You can't, e.g., get a number of
  elements at runtime and do something with them.(*) Without using
  standard compliance level 1 (conformant array parameters), you
  can't even have routines that work on arrays of different size, so
  common routines have to be copied for each possible size.
(*) The usually suggested work-arounds include:
- Set a maximum size at runtime. This is suitable for demo and
    learning programs, not for the real world (unless perhaps the
    size is made unreasonably large, but this will waste too much
    memory).
- Use linked lists. This works in some cases, but often it adds an
    O(n) factor to the complexity and is therefore inacceptable.
- Use more complicated structures such as balanced trees. This
    will add a lot of unnecessary programming overhead.
- Strings padded with spaces. (EP shares this defect in some
  respects, e.g. string comparisons.) Treating `foo' and `foo   ' as
  equivalent might work sometimes, but will already fail when
  writing them, followed by other stuff in the same line.
- No modularization. Often used routines have to be copied (since
  there's also no include) into each program that uses them which
  becomes a maintenance nightmare when modifying them.
- No file names. Classic Pascal programs can only access Input,
  Output and files declared in the program header. It's not possible
  to choose a file name during runtime.
- No way to access routines written in other languages or system
  routines, except through the few predefined routines.
- No `otherwise' in `case'.
- No defined order of evaluation, in particular for Boolean
  operators. This makes especially loops quite a bit more
  complicated to write (examples can be found in Kernighan's paper).
- No way to escape type-checking or "untyped pointers". Something
  like your nmalloc cannot be written in classic Pascal.
- No bitwise operators (and, or, xor). It's possible, but very
  cumbersome and inefficient to implement them using arithmetics.
Unless you can explain how to overcome those restrictions or how
they are irrelevant, I'm afraid I can't take your cause for classic
Pascal for real-world programs too seriously.
Do you actually stick to these limitations of classic Pascal, or do
you use some extensions after all? (I'm reminded of a Usenet
discussion I had with someone about type escapes. First, he claimed
he can do anything by purely standard means, and when we finally got
to the critical points, he admitted to just abusing variant records
(which is in no way guaranteed to work by the standard) and using
assembler code to do the type conversions. This is not very
convincing to me.)
Actually I'm wondering, what's your point in bringing this up again?
Is it compiler-portability? That's a valid point(*) -- but then, of
course, you must not use any extensions or rely on any
implementation-dependant or implementation-defined features.
(*) For classic Pascal; for EP there are very few compilers at all,
    and BP is too unprecisely defined, so the several existing "BP
    compatible" compilers, including GPC, differ on details which
    are undocumented in BP, so much that you don't usually get
    portable programs without using many compiler conditionals.
Or do you want to prove that any program can be written in classic
Pascal? Sure, I'll give you that. Then again, any program can be
written on a Turing machine.
The difference is that one is a little more comfortable than the
other. And that's my reason for extensions such as this one. With
existing means, you have to modify two places to add one entry. It's
more comfortable to have to change only one place.
...
There one would create an array
of messages with a variation on:
CONST
     maxmsglgh    = 30;  (* kept short to minimize my typing *)
     maxmsgcnt    = 3;   (* and here also *)
TYPE
     msgid        = 1..maxmsgcnt;  (* or an enumeration *)
     amsg         = PACKED ARRAY[1..maxmsglgh] OF char;
     msgs         = ARRAY[1..maxmsgcnt] OF amsg;
VAR
     resultmsgs   : msgs;
(* 1--------------1 *)
PROCEDURE initresultmsgs(VAR msggrp : msgs);
(* 2--------------2 *)

PROCEDURE initonemsg(ix : msgid; themsg : amsg);

  BEGIN (* initonemsg *)
  msggrp[ix] := themsg;
  END; (* initonemsg *)

(* 2--------------2 *)

BEGIN (* initresultmsgs *)
             (* 123456789-123456789-123456789- *)
initonemsg( 1, 'did not terminate with status ');
initonemsg( 2, 'terminated with status        ');
initonemsg( 3, 'was teminated by signal       ');
END; (* initresultmsgs *)

(* 1--------------1 *)
.....
initresultmsgs(resultmsgs);
Which, once set up, is easily modified, allows the bulky
initialization code to be segregated and gotten out of the way on
suitable platforms after use, etc.  The nuisance is that each
message has a fixed length.  We can fix this by modifying the type
of amsg to include a length field (say lgh), and doing eventual
writes with this as a parameter, such as:
 write(fp, resultmsg.body[ix] : resultmsg.lgh);

which can in turn be encapsulated in a convenient:
PROCEDURE writeresult(VAR fp : text; ix : msgid; VAR msgset :
msgs);
At least that is the way I would attack the job :-)
It won't surprise you that I have a lot of objections to your
method:
- You add two procedures to initialize one variable. In a program
  with many initialized variables, this becomes quite a lot of
  procedures. (Besides, in classic Pascal they have to be separated
  in location from the respective variables due to the fixed order
  of declarations.)
I prefer to write compact code (which is IMHO much more readable).
  This may be influenced by the fact that I'm not paid by LOC. ;-)
- To add one entry, you still have to change 2 places. If it's not
  the last one, you have to change N indices additionally.
You wrote: "Notice that all the above is easily converted to use
  an enumerated set of msgid to avoid the use of magic constants."
  That's true, but then you might need some comments to easily match
  the enum IDs to the initialization lines (which is again extra
  effort). Also, it adds another set of (global!) identifiers, just
  for one variable -- not very convenient (unless you need the
  identifiers, anyway).
There are more identifiers added (types, constants, etc.). Your
  use of urdabrvids (unreadable, abbreviated identifiers, IMHO)
  suggests to me that this is not a good idea. Using fewer
  identifiers allows you to use more expressive names that don't get
  ridiculously long ...
- You have to count the characters (as indicated by your `123...'
  ruler). Sure, also with an initialized arrays of EP strings,
  there's a limit, but the compiler will complain when one value is
  too large, and when it's too small it will only waste a little
  memory, so it's no real problem to declare a reasonable size and
  be told by the compiler if it gets too small.
- The handling of the length with `:' is specific to `Write'. Other
  usages have to be specially coded.
Also for `Write', the suggested encapsulation doesn't help all
  that much. Instead of
WriteLn ('The process ', ResultMessage[WaitPIDResult], Status, '.');
you'd have to write:
Write ('The process ');
    WriteResult (Output, WaitPIDResult, ResultMessage);
    WriteLn (Status, '.');
Quite clumsy, isn't it? (Apart from that fact that having to add
  `Output' in `WriteResult' is less than intuitive ...)
(And, of course, WriteResult is bound to an array of fixed size,
  so you can't use it for different lists of messages, unless they
  happen to have the same number of messages, or you're willing to
  fill them up with dummy entries (strings, not chars this time).
  This is the same problem on the outer dimension.)
- You wrote: "Even the generation of the suggested lgh field can be
  automated in the initialization procedure, by having it measure
  the number of trailing blanks present.  That leaves the only real
  7185 nuisance the necessity of typing in those blanks in the first
  place, and possibly the added storage space used." -- And added
  run time, and most of all, the impossibility to define strings
  with trailing blanks.
There might be more problems. My experience with such code is that
it distracts so much attention on the formalities (both when writing
and later when reading/modifying the code) that it makes it harder
to focus on the "real" code.
...
The
point of any extended syntax is to ease that work, but NOT to
change the eventual calls that use it.  The tools you have in
10206 include strings with capacities and actual lengths.  There
is no need to make the end code look anything like C, but there is
a need to make it clear.
I don't quite understand these remarks. The discussed extensions
would not change any calls (or make it more like C in any way --
what gave you this idea???). It's just about simplifying the
declaration by allowing the programmer to omit a value which the
compiler can easily determine itself.
...
Extensions are like optimization.  The first rule is don't do it.
The second rule is don't do it yet.  The third rule is think it
over again.
I agree for pointless extensions. But extensions which make
programming more comfortable and don't entail serious problems are
not pointless IMHO.
(Likewise for optimizations, BTW. Much as a generally dislike
low-level tricks, I'm all for algorithmic optimizations when
reasonable. In a project I worked on recently, I easily achieved a
100* speedup, just by storing things in suitably linked data
structures, rather than looking them up again and again. Without any
dirty tricks.)
Adriaan van Os wrote:
...
...
What I don't like is that another identifier is introduced within
some other declarations. Enum types do the same, and this has caused
some extra work in the compiler. Since it's possible to get the
upper bound as `High (s)' or `High (a)' (BP compatible feature), I
think it's not necessary to introduce a new identifier in the
declaration.
The constant identifier could be optional:
var
    s: String ( const ) = 'Hi';
    a: array [1 .. const ] of Integer = (42, 17, 99);
Sorry, but this is exactly the opposite. This way, we'd not only
have the complexities of handling the identifier (in case it's
there), but also two syntactic variations. So this in example of an
unwarranted extension IMHO. One way seems useful, two ways are
redundant. And the extension should be kept as simple as possible
(i.e., *no* additional identiffier, since it's alread possible to
declare it with existing features).
Besides, using `const' as a placeholder looks quite strange. To me,
it would suggest something like the variable is constant, not the
upper bound is omitted.
Frank
-- 
Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E
GPC To-Do list, latest features, fixed bugs:
http://www.gnu-pascal.de/todo.html
GPC download signing key: 51FF C1F0 1A77 C6C2 4482  4DDC 117A 9773 7F88 1707

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Implicit length/upperbound declarations