'Huge model' Turbo Pascal

Among Windows' many endearing features, the fact that protected mode programs can dynamically allocate megabytes of RAM -- not a mere handful of kilobytes, as under DOS -- is surely one of the most significant. Unfortunately, it's also one of the very few occasions that TPW (Turbo Pascal for Windows) fails to rise brilliantly to: While you can allocate large quantities of memory, you can't allocate more than 64K at a time. While you can go straight to Windows to allocate large blocks of memory for, say, a bitmap or a large character vector, TPW itself is still basically a 16-bit compiler without any support for 'huge' model programming. You can refer to the thousandth entry in an array of characters but you can't refer to the hundred thousandth. You can use WINMEM32.DLL to allocate a 'flat' (unsegmented) memory block, but then you have to write all your 32-bit code in TASM or BASM.

Of course, this is largely true of TP6, too, but in real mode we can do segment arithmetic with impunity. In Windows' protected mode environment, 32-bit pointers no longer consist of a 16-bit segment and a 16-bit offset, but of a 16-bit selector and a 16-bit offset. The selector is essentially a pointer into an relatively small array of segment descriptors, so that while any 32-bit value represents a valid address in real mode, the same is not true in protected mode. (In fact, since a selector is a 16-bit pointer to an eight byte descriptor, the odds are 7 to 1 that any given 16-bit value will be an invalid selector. What's more, since all 8192 possible descriptors will probably not be filled in, the odds against an arbitrary 16-bit value mapping to a valid selector are significantly higher than 7 to 1.) While, of course, this hardware pickiness about selector values is what makes invalid pointer bugs so much easier to find under Windows than under DOS, it also complicates random access to huge structures. We can't just increment the segment part of a pointer to slide our 64K window 16 bytes forward; we can only use selectors that correspond to descriptors that Windows has already built.

As it happens, whenever you allocate a block of memory larger than 64K, Windows defines only as many selectors as are necessary to completely 'cover' the block with 64K "tiles". That is, even though byte $10000 within a large structure is logically and (may be) physically contiguous with byte $FFFF, we have to treat them almost as if they were in two completely different 64K structures: We can not use one selector to refer to both bytes. Similarly, we have to be careful that all references to multibyte types (like numbers, or records, or the 640 byte arrays in Figure 1) are completely contained within a single segment. Trying to read or write past the end of a segment will cause a UAE: We either have to make sure no multibyte structures straddle segment boundaries; refer to them byte-at-a-time; or use multiple block moves to move data to and from an intermediate buffer.

Working within this rigid framework of 64K peepholes into large blocks of memory complicates and slows down any code that has to deal with huge arrays, but certainly shouldn't deter anyone who really deserves their 'programming license'. What does cause trouble is the "__ahincr double whammy": not only does TPW not provide a Pascal binding for __ahincr, the standard Windows constant that you can add to a pointer's selector to step it 64K forward, the SDK manuals only talk about how to use __ahincr, not how to obtain it, since that is normally handled by the C runtime library, or by the file MACROS.INC that comes with MASM, not the SDK. Since I use TASM, not MASM, I had to ask around until I found someone who could tell me "Oh, __ahincr is just the offset of a routine you import from the KERNEL DLL just like you'd import any other API function." (Rule Number 1 of Windows programming is Know A Guru: sooner or later, you're bound to run into something that's not in the manuals and that you can't find by trial and error.)

As you can see, given __ahincr, the HugeModl unit in Listing 1 is pretty straightforward. The unit supports huge model programming on a variety of levels: It provides enhanced GetMem and FreeMem routines that allow you to allocate blocks larger than $FFFF bytes, and it provides three levels of tools for manipulating large data objects. The lowest level is, of course, a Pascal binding for __ahincr. This is used throughout the unit, and you may find your own uses for it, too. The middle level is a set of functions and macros that can step a pointer by any amount or calculate the offset of any byte within an object, while the top level is a Move routine that can move huge blocks without any apparent concern for the 64K tile boundaries.

The enhanced GetMem and FreeMem routines look exactly the same as the System unit routines they "shadow" except that the block size parameter can be greater than $FFFF. This lets you use the same set of routines for small data as for large data, without having to do anything but put HugeModl in your uses clause(s). GetMem and FreeMem pass 'small' (less than 64K) requests on to the System unit routines, and call GetHuge and FreeHuge to handle the 'large' (at least 64K) requests. (Bear in mind that in "Standard" (286) mode, Windows won't allocate any blocks larger than a megabyte.) The GetHuge routine uses the GlobalAlloc function, which returns a handle to the allocated memory block, and then uses the GlobalLock routine to convert the handle into a pointer. The FreeHuge routine in turn uses the GlobalHandle function to convert the pointer's selector back to a handle, and then uses the GlobalFree call to free the handle. One important thing to note is that FreeHuge (and, transitively, HugeModl.FreeMem) can only free pointers that came from GetHuge/GetMem: You cannot allocate a block then free some bytes from the end (you'll have to use the Windows GlobalRealloc function for that) nor can you FreeMem a largish typed constant, say, from the middle of your data segment.

Of course, once you lay claim to the continent, you have to get over the Appalachians! HugeModl supplies three pointer math routines that can take you safely past the 'mountains at the end of the segment': OfsOf, which adds a long offset to a base pointer; PostInc, which steps a pointer by a long and returns the original value; and PreInc, which steps a pointer by a long and returns the new value. All three add the (32-bit) step to the base pointer's (16-bit) offset, using the low word of the result as the new offset, and using the high word to step the selector by the appropriate number of 64K "tiles". (With the {$X+} enhancements, PreInc and PostInc can be used as procedures that modify the pointer argument. PreInc is better for this than PostInc, as it does two fewer memory accesses and is thus a little faster.)

All three pointer math routines are defined as inline macros and as assembler routines. If the HugeModl unit is compiled with Huge_InLine $define-d, the routines are defined as macros, while otherwise each operation requires a far call. It's probably best to use the macros because virtually every routine that takes a huge parameter will need to do some pointer calculations, even though that very ubiquity also means that using the macros can add quite a lot to your object code's size.

Whether you use OfsOf or PostInc/PreInc is largely a matter of taste, though it is often simpler and/or cheaper to step a pointer (using PostInc/PreInc) by the array element's size than it is to keep multiplying a stepped array index by the element size. In either case, you'll quickly find that there is one major drawback to using huge arrays "by hand" instead of letting the compiler do it all for you transparently: the routines in the HugeModl unit all return untyped pointers, and you'll end up having to use a lot of casts, if you don't want to use Move for everything. For example, something like for Idx := 1 to ArrayEls do SmallArray[Idx] := Idx; will become Ptr := HugeArrayPtr; for Idx := 1 to ArrayEls do LongPtr(PostInc(Ptr, SizeOf(Long)))^ := Idx;.

Of course, untyped pointers are no problem when you do use the Move routine as it, too, is meant to extend the range of the System unit routine it shadows without breaking any existing code. Thus, the first two arguments are untyped var parameters pointing to the data's source and destination, while the third argument is the number of bytes to copy. (Naturally, unlike the System unit's Move routine, HugeModl's can move any number of bytes from 0 to $7FFFFFFF.)

You may find that reading the code for the huge Move routine will help you to write your own huge model code. It breaks a potentially huge block move, which might span several segments, into a succession of <= 64K moves, each entirely within a single segment. For each submove, it uses the larger of the source and destination offsets to decide how many bytes it can move without walking off the end of the source or destination segments. It then calls a word move routine to move that many bytes, increments the pointers, and decrements the byte count. Since the block move is by words, not bytes, it can easily handle a 64K byte move (when both the source and destination offsets are 0) as a single 32K word move.

Now, while the HugeModl.Move routine can handle structures that straddle segment boundaries, compiler-level pointer expressions like Ptr^ := Fn(Ptr^); can not. This means that you should only use pointer expressions when you know that they will not cause an UAE by hitting the end of a segment. If the base structure's size is a power of two (words, longs, doubles, &c), you can generally use pointer expressions so long as the array starts an integral number of elements from the high end of the initial segment. That is, an array of words should start on a word boundary (the offset must be even) while an array of doubles should start on a qword boundary (the offset must be a multiple of eight). Since you have to process unaligned data byte-at-a-time or via a buffer you Move data to and from, it may be worth adding blank fields to "short" structures so that their size will be a power of two.

Since GetHuge always returns a pointer with an offset of 0, you don't have to worry about the alignment of "simple" (fixed-length, headerless) arrays. However, most variable length arrays will have a fixed length header and a variable length tail, which can leave the tail unaligned. Rather than using the object-oriented technique of Hax #?? [PC Techniques volume 2, #1], where we declare the header as an object then declare the actual variable length object as a descendant of the header object that adds a large "template" array, we can retain control over our huge tails' alignment by simply putting a tail pointer in the header. Then, when we allocate the object, we can simply allocate as many extra bytes as we might need to insert between the header and the tail to maintain proper alignment. Thus, with a array of longint tail, we would just allocate three extra bytes and say something like TailPtr := PChar(HdrPtr) + ((Ofs(HdrPtr^) + SizeOf(Hdr) + 3) and $FFFC);.

If you've gotten the impression that writing 'huge model' code under TPW is a lot more work than writing normal, '16-bit' code, you're both right and wrong. Yes, you will be making a lot of 'calls' to PostInc/PreInc, and you will be making a lot of casts of their results, but if you already write code that sweeps arrays by stepping pointers, you will probably find that using PostInc/PreInc makes for fewer source lines, which in turn tends to counter the legibility lost in all the calls and casts. Not to mention that huge model Pascal code is a lot easier to write and read than its 32-bit assembler equivalent!


This article originally appeared in PC Techniques

Copyright © 1992 by Jon Shemitz - jon@midnightbeach.com - html markup 8-25-94

KBD icon Return to Publications list