Our simple language, Pl\0, had only one type of variable, integers.
We have briefly discussed the problems of adding real numbers to the list of types. How
could be add arrays, pointers and virtual functions? Recall that when we wanted to access
memory assigned to a variable, we had to know its address relative to a given base point.
We can use this idea to store the beginning location of data in a structure that resides in
a different structure. If a variable x represents an array of integers, then the location of
x in the stack is the beginning of a block of memory used to store the array. Data stored
at is accessed by computing the offset from the beginning of the array to the desired position.
If the first subscript in our array, x, is min, then x[n] is located at the address
of x add to it (n - min) * size(integer), where size(integer) is the amount of memory
needed to store one integer. In our p-code, if we want to load x[3] in an array of integers with first subscript 1,
where x is the second variable defined, we would generate
lod (level difference) 4 + 2.The 4 is the address of the second variable (assuming the first is an integer) and the 2 comes from the subscript, 3, minus the minimum subscript, 1. Clearly we can store any type of data item in an array constructed this way by simply computing the offset by multipling n-min by the size of one item of the new data type.
One other problem we have is to know how much memory to allocate for our array so that the block of
memory will only be used for that array. When we allocate a block in pascal we specify
array[beg..max] of ourtype. We know the size of ourtype, so we allocate a block of size max*size(ourtype - beg + 1). C and C++ also require that statically allocated arrays specify a size. Since these arrays always use first subscript 0, only the size is specified -
ourtype x[max];. Any variable that follows
and array will have its beginning address altered to take into account the fact that the array takes more
than one location in the stack. If we had a variable y defined directly following the x in our example above,
and we knew that x was an array of seven integers, then the address of y would be the address of x plus 7, not
plus one.
The address of an item in an array depends on the fact that each item in the array is the same
size. Structures such as pascal's record and c's struct have fields of different sizes. On the other hand
each field has a name that can be looked up in a table at compile time. The offset of each field can be stored
in the lookup table, so if we have the following record:
x : record first :integer; second: real; third : char; end; x.third := 'a';We know that x.third is located at address(x) + size(integer) + size(real). The sum of the sizes is stored in the table so it can be substituted at run time. As with arrays, the total size of a record must be known before hand so that amount of memory may be allocated when variable addresses are computed.
One problem with storing all of the variables in the stack is that sizes must be known at compile time. One way around this problem is to store only the address of a structure in the stack and allocate the memory for the structure at runtime. Since all addresses on a given machine have a predetermined format, there is no need to know the exact size of the block of memory this will point to. To make this work, a block of memory separate from the stack must be allocated. This block of memory may be allocated and released during the run either by a programmer, the structure the programmer allocates (counted pointer) or by a memory manager that "collects garbage" (unused memory). We will worry more about memory management when we discuss lisp.
Most languages have special notation used for accessing a field of a pointer. If p is a pointer to a record of the same type as x above, then to access the field "third", we write p^.third. If we had the following c struct definition
struct{
int first;
double second;
char third;
} *p;
the we to access third we would write p->third.
The special notation, tells the compiler to generate indirect access code. In our p-code lid and sti replace lod and sto. These commands load and store items at the address stored in the location referenced. In c and c++, pointers can also be accessed as arrays. An array of integers can be created at runtime, by the following code:
int *p = new int[n];where n is the size of the array. The third item in the array is accessed by the expression x[2] (0 is the first location in a c array). The memory p points to is released by the following statement:
delete [] p;the [] indicates that p points to a block of memory. The size of the block is frequently stored at the address p - size(integer) so the amount of memory released is known.
Objects are members of classes, that are similar to structures in that they have fields that represent data items. In addition objects have member functions. C++ compilers treat classes in much the way they treat structs (structs are considered public classes). Data members are stored at predetermined offsets from the beginning of an object. Where are the functions stored? The actual location of the functions is immaterial, but their addresses are stored in a block of memory shared by all of the members of the class. For standard functions this means their address can be inserted in call routines at compile time, but for virtual functions, the address is retrieved from the function address block at runtime. Calls to virtual functions are handle in two steps. First the address is loaded, then the function is called. The diagram below demonstrates this relationship.