HLA Language Reference and User Manual

Modification History:

v1.39: Updated document to reflect the new VAL operator (for actual parameters) and Unicode support.

v1.38: Discusses the new VAR section alignment and offset assignment options. Discusses the new union constant syntax. Describes the new #for..#endfor compile-time loops.

v1.37: Updated the discussion of constant expressions to describe the 128-bit arithmetic capabilities of HLA v1.37. Added NULL keyword and a brief discussion of its use. Described the new type transfer compile-time functions (@byte, @uns8, @int8, etc.).

v1.36: Began the modification history for this document. Note that version numbers correspond to HLA version numbers.

Overview

HLA, the High Level Assembler, is a vast improvement over traditional assembly languages. With HLA, programmers can learn assembly language faster than ever before and they can write assembly code faster than ever before. John Levine, comp.compilers moderator, makes the case for HLA when describing the PL/360 machine specific language:

1999/07/1119:36:51,themoderatorwrote:

"There'sno reason that assemblers have to have awful syntax. About 30 years ago I used Niklaus Wirth's PL360, which was basically a S/360 assembler with Algol syntax and a a little syntactic sugar like while loops that turned into the obvious branches. It really was an assembler, e.g., you had to write out your expressions with explicit assignments of values to registers, but it was nice. Wirth used it to write Algol W, a small fast Algol subset, which was a predecessor to Pascal. ... -John"

PL/360, and variants that followed like PL/M, PL/M-86, and PL/68K, were true "mid-level languages" that let you work down at the machine level while using more modern control structures (i.e., those loosely based on the PL/I language). Although many refer to "C" as a "medium-level language", C truly is high level when compared with languages like PL/*. The PL/* languages were very popular with those who needed the power of assembly language in the early days of the microcomputer revolution. While it's stretching the point to say that PL/M is "really an assembler," the basic idea is sound. There really is no reason that assemblers have to have an awful syntax.

HLA bridges the gap between very low level languages and very high level languages. Unlike the PL/* languages, HLA really is an assembly language. You can do just about anything with HLA that you can do with a traditional assembler like MASM, TASM, NASM, or Gas. If you want to write low-level assembly code using x86 machine instructions, HLA does not get in your way; if you want to use compares and conditional branches rather than structured control statements, you can. On the other hand, if you prefer to use more readable high-level control structures, HLA allows this, as well. HLA lets you work at the level you are most comfortable with and at the level that is most appropriate for the task at hand.

Beyond supplying a "non-awful" syntax, HLA has one other important feature -- it's extensible. HLA provides special features that let you add new statements to the language. So if HLA is not "high level" (or "low level") enough for your tastes, you can extend it. This document will expend considerable effort describing exactly how to do this in a later section.

In addition to the HLA language itself, the HLA system provides one other very important component - the HLA Standard Library. This is a collection of hundreds of functions that you can use to write assembly language programs as quickly and easily as you would write C programs.

What is a "High Level Assembler"?

The name "High Level Assembler" and its abbreviation "HLA" is certainly not new1. Nor is the concept of a high level assembler. David Salomon in his 1992 text "Assemblers and Loaders" (Ellis Horwood, ISBN 0-13-052564-2) uses these terms to describe various assembly languages dating back to 1966. Furthermore, both IBM and Motorola have assembler products with very similar names (e.g., IBM's HLAsm, though it's somewhat debatable whether HLAsm is truly a high level assembler).

Salomon offers the following definitions for a High Level Assembler (or HLA):

A high-level assembler language (HLA) is a programming language where each instruction is translated into a few machine instructions. The translator is somewhat more complex than an assembler, but much simpler than a compiler. Such a language should not have features like the if, for, and case control structures, complex arithmetic, logical expressions, and multi-dimensional arrays. It should consist of simple instructions, closely resembling traditional assembler instructions, and of a few simple data types.

Since Salomon describes a couple of high level assemblers that exceed this definition, he offers a second definition for high level assemblers that is a bit higher-level:

A high-level assembler language (HLA) is a language that combines most of the features of higher-level languages (easy to use control structures, variables, scope, data types, block structure) with one important feature of assembler languages namely, machine dependence.

Neither definition is particularly useful for describing HLA/86 and other HLAs like Terse, MASM and TASM. Of course the term "High Level Assembler" is very nebulous and offers a fair amount of latitude. Almost any macro assembler could pass as an HLA on the basis that a macro-instruction expands into a few machine instructions.

David Salomon describes several different high level assemblers in his text. The examples he describes are PL/360, NEAT/3, PL516, and BABBAGE.

PL/360 and PL516 are products that conform to the second definition above. They allow simple arithmetic expressions and assignment statements, the use of high level control structures (if, for, while, etc.), high level data declarations, and block structure (among other things). These languages expose the underlying machine's registers and allow the use of machine instructions using a "functional" syntax.

The NEAT/3 language is a much lower-level language; basically it is an assembly language for the NCR Century computers that provide COBOL-style data declarations. Most of its "instructions" translate one-for-one into Century machine instructions, though it does automatically insert code to convert data types from one format two another if the data types of an instruction's operands are incompatible.

The BABBAGE assembly language is an expression-based assembly language (very similar to Terse). It allows simplified high level control structures like if and while. The interesting thing about this assembler is that it was the only assembler for the GEC4000 family of computers.

In addition to the HLAs that Salomon describes, there have been several other high level assemblers created over the years. PL/M and PL/M-86 was designed by Intel for their 8080 and 8086 CPU families. This was an obvious adaptation of the PL/360 style HLA for Intel's CPUs. PL/68 was also available for the Motorola 680x0 family. SL/65 was a similar adaptation of PL/360 for the 6502 family. At one point there was a product named "High Level Assembler" for the Atari ST system (68K based). Jim Neil has also created an expression-based high level assembler (similar in principle to Babbage) for Intel's x86 family. MASM and TASM (for the x86) also fall into the category of a high level assembler due to their inclusion of high level control structures and logical expressions.

So where does HLA/86 fit into these definitions? In truth, the definition of HLA/86 falls somewhere between these two definitions. So the following paragraphs will define the term "High Level Assembler" as it should apply to HLA/86 and similar high level assemblers.

The first definition above is overly restrictive. It implies that any language that exceeds these limits is a high level language, not a high level assembly or traditional assembly language. Obviously, this definition is too restrictive in the sense that by this definition many traditional assemblers would have to be considered as high level languages (even beyond a high level assembler). Furthermore, it elevates many traditional assemblers to the status of an HLA even though we wouldn't normally think of them as high level assemblers; i.e., most macro assemblers provide the ability to create instructions that translate into a few machine instructions. Macro facilities, however, are something we expect out of a modern assembly language; their presence doesn't make the language a "high level" assembly language in most people's mind. Furthermore, most modern assemblers provide a mechanism for declaring multi-dimensional arrays (even though you still have to use some sequence of instructions to index into said arrays).

The second definition David Salomon provides hits the other extreme. Arguably, languages like C could be called HLAs under this definition (yes, there are some machine dependent features in C, though probably not enough to satisfy David Salomon's original intent).

The definition of high level assemblers like Terse, MASM, TASM, and HLA/86 fall somewhere between these extremes. Therefore, this document will define a high level assembler as follows:

A "high level assembly language" (HLAL) is a language that provides a set of statements or instructions that practically map one-to-one to machine instructions of the underlying architecture. The HLAL exposes the underlying machine architecture including access to machine registers, flags, memory, I/O, and addressing modes. Any operation that is possible with a traditional assembler should be possible within the HLAL. In addition to providing access to the underlying architecture, the HLAL must provide some abstractions that are not normally found in traditional assemblers and that are typically found in traditional high level languages; this could include structured control statements (e.g., if, for, and while), high level data types and data structuring facilities, extensive compile-time language facilities, run-time expression evaluation, and standard library support. A "High Level Assembler" is a translator that converts a high level assembly language to machine code.

There is a very important difference between this definition and the ones that David Salomon provides. Specifically, a high-level assembly language must provide access to the underlying machine architecture. Within the HLAL you must be able to specify any (reasonable) machine instruction that is available on the CPU. The HLAL may provide other statements that do not directly map to machine instructions (e.g., an if statement), but it must, at least, provide a set of statements that practically map one-to-one with the machine instructions. The "practically" modifier appears here for two reasons. First of all, some assembly source statements may map to two or more different, but equivalent, machine instructions. A good example is the x86 "mov reg, reg" which can map to two different (though equivalent) opcodes depending on the setting of the direction bit in the opcode. Most assemblers will map the source statement to only one of these opcodes, hence there is not truly a one-to-one mapping (since there exist some opcodes that do not map back to some source instruction). Another allowable restriction is that the HLAL may not allow the use of special "protected mode instructions" if the language is intended only for user-mode programming (as is the case for HLA/86).

In addition to supporting the underlying machine architecture (which almost any traditional assembler will do), the HLAL must also provide support for some features normally found in a high level language. The definition does not require that a HLAL support all the features listed above, nor is it restricted to just the features listed, but a HLAL must support some of the features traditionally found in a high level language. The number and type of features the HLAL supports determines how "high level" the assembly language is. Like HLLs, we can have "low-level" HLALs, "medium-level" HLALs, "high-level" HLALs, and even "very high-level" HLALs. NEAT/3, for example, would be a low-level HLAL since it provides higher-level data types, conversions, and not much else.

MASM and TASM are probably best considered medium-to-high-level HLALs since they provide high level data structuring facilities, structured control statements, high level procedure definitions and invocations, a limited block structure, powerful compile-time language (macro) facilities, standard library support (e.g., the UCR Standard Library and many other available library modules), and other high level language features. In actual use, the programmer is expected to normally use standard machine instructions and rise up to the high level statements only as necessary.

The Terse language is a good example of a medium level HLAL since it uses an expression syntax but otherwise maps statements fairly closely to the assembly counterparts. It does provide some higher-level data structuring capabilities, though this is inherited from the underlying assembler(s) on which Terse is based.

PL/360 and PL516 are definitely high-level HLALs because they fully support simplified arithmetic expressions, control structures, high-level data types, and other features. These languages provide access to the underlying architecture, but the emphasis is to use these langauges as a high level language and drop down to the machine instructions only as necessary.

HLA/86 probably falls in the high-level-to-very-high-level range because it provides high level data types and data structuring abilities, high level and very high level control structures, extensive parameter passing facilities (more than most high level languages), a very extensive compile time language, a very extensive standard library, built-in parsing facilities for language extension, and many other features. As a general rule, HLA/86 has a larger feature set than the other HLALs described above, but there are a couple of design goals that limit the "high-levelness" of HLA/86: (1) with one exception, HLA never emits any code behind the programmer's back that modifies registers or flags (the one exception is object method invocation, and this is well documented), and (2) HLA doesn't support arithmetic expressions (it does support a limited form of logical/boolean expressions). One interesting aspect of HLA/86 is that it is extensible. Using features built into the language, you can extend HLA/86's syntax by adding new statements and other features. This feature gives you the ability to make HLA/86 as high level as you desire (though it may take some effort to achieve certain language features). The bottom line is this: in some ways, HLA/86 is lower level than languages like PL/360 and PL516; in other ways, it's high level than these HLALs. However, as the definition requires, almost anything you can do with a traditional assembler is possible in HLA/86.

What is an "Assembler"

Because high level assemblers are clearly different that traditional assemblers, one might question whether a high level assembly language is truly an assembly language and whether translators for high level assembly languages can be properly called an assembler. Unfortunately, there is a consierable range of opinions as to exactly what consitutes an "assembler" versus other translators. This document will not attempt to get involved in this debate. Instead, this section provides a set of definitions that are useful for describing assemblers at various levels of abstraction.

Pure Assembler:

A "pure assembler" is a program that processes an assembly langauge source file and translates the source code using a direct mapping from source code instructions to individual machine instructions (each source instruction is mapped to exactly one machine instruction). The assembler only provides machine-primitive data types like bytes, words, double words, etc. A pure assembler does not provide macro facilities. A pure assembler always produces machine code as output.

Traditional Assembler:

A "traditional assembler" is a pure assembler plus macro facilities. The assembler may provides some "built-in macros" and instruction synonyms, but in general, the built-in statements should still map to individual machine instructions (note that the programmer may extend this by writing macros). There is no support by the assembler for run-time arithmetic or boolean expressions. A traditional assembler may also provide some simple data typing facilities (such as the ability to rename primitive data types as something else, e.g., byte->char). A traditional assembler always emits machine code as output.

High Level Assembler:

Unlike Traditional and Pure Assemblers, High Level Assemblers (HLAs) do not have to produce machine code as output. If a high level assembler produces machine code directly, then we call the high level assembly translator program an assembler; however, HLAs can also produce an assembly language output file that requires further processing by some other assembler to produce actual machine code; we'll call such translators compilers for a high level assembly language. Note that HLA v1,x (the product, not the classification) is a compiler by this definition. The intent is that HLA v2.0 and later will provide both compiler and assembler versions.

HLA Design Goals

HLA was originally conceived as a tool to teach assembly language programming. In early 1996 I decided to do a Windows version of my electronic text "the Art of Assembly Language Programming" (AoA). After an attempt to develop a new version of the "UCR Standard Library for 80x86 Programmers" (a mainstay of AoA), I came to the conclusion that MASM just wasn't powerful enough to make learning assembly language really easy. I decided to develop an assembler with sufficient power, providing the tools for a good standard library as well as satisify some other requirements. Therefore, HLA has two important goals: provide a system that is powerful enough to develop code and macros to make learning assembly language, which simultaneously providing a system that is easy for beginners to learn.

The principle goal of HLA was to leverage student's existing programming knowledge. For example, a good Pascal programmer can get their first C/C++ program operational in a few minutes. All they've got to do is note the similarities between the two programming languages, make the appropriate syntactical changes, and they're up and running. Take that same Pascal programming and expect them to learn LISP or Prolog the same way, and you'll not meet with the same success. LISP and Prolog are completely different, they use a different "programming paradigm," so the student has to "start over from scratch" when learning these languages. Although assembly language is an imperative language (like Pascal and C/C++), there is a considerable "paradigm shift" when moving from one of these high level languages to assembly. In HLA, I wanted to create a language with high level control structures and declarations that made it possible for someone familiar with an imperative language like Pascal or C/C++ to get their first HLA program running in a matter of minutes (or, at worst, a matter of hours). Of course, to achieve this goal, I needed to add high-level data declarations and high-level control constructs to the HLA language.

The astute reader will quickly point out that high level control structures are not assembly language and letting the students use these types of statements is not really teaching them assembly language. This is quite true; since the purpose of teaching an assembly language course is to teach the students "assembly language programming" it is quite clear that HLA would fail if it only provided these high level control structures (e.g., like the PL/M language does). Fortunately, this is not the case. HLA supports all standard assembly language instructions including CMP and Jcc instructions, so you can still write "pure" assembly language programs without using those high level language control structures. However, it does take time to learn the several hundred different machine instructions. Traditionally, it's taken my students (using only MASM) about five weeks before they could really write any meaningful programs in assembly language (you have to cover things like numeric representation, basic CPU architecture, addressing modes, data types, and introduce the instruction set before any real programs can be written).

HLA lets students write meaningful programs within about a week of it's introduction (e.g., the first assignment I give in a typical quarter is to write an "addition table" program that computes the outer product [addition table] of the two vectors 0..15 and 0..15, printing the table formatted nicely). They achieve this by using statements they already know (like IF and WHILE) with the injection of just a few assembly language concepts (registers, and the MOV and ADD instructions) plus an introduction to the HLA Standard Library. Over the next several weeks, these students write more and more complex programs as they are introduced to new assembly language and HLA concepts (e.g., data representation, basic architecture, addressing modes, data types, and additional instructions). At about the sixth week, I begin "weaning" these students off the high level language statements and force them to use the low level machine instructions. It turns out that they learn how to simulate an IF statement at roughly the same point in the quarter as they did when they used only MASM, but the big difference is that they've written a lot more code up to that point proving out other concepts in machine organzation and assembly language programming. In my limited experience with classroom testing, I've found that students spend less time on the class, cover more material, and retain the knowledge better (by the time of the final exam) than they did when I only used MASM.

The general goal of reducing the learning curve for students is achieved several ways.

(1) As noted above, HLA allows a gradual transition from high level languages into pure assembly language. My favorite analogy here is the Nicoderm CQ smoking cessation system ("gradual steps are better."). Like the Nicoderm system, HLA lets students learn assembly language in gradual steps rather than throwing them into the water and shouting "sink or swim!"

(2) In addition to letting the students employ high level language statements in their assembly language programs, HLA contains several other familiar concepts and syntactical items that ease the transition from high level language programming to assembly language. For example, HLA uses the familiar (to C/C++ programmers) "/*" and "*/" comment delimiters (as well as the "//" comment delimiter). Statements generally end with a semicolon (just as in high level languages). Machine instructions use a functional notation rather than "mnemonic-operand" notation. Constant, type, and variable declarations should look very familiar to Pascal programmers. HLA's standard library should look comfortable to anyone who has used the C/C++ standard library.

In addition to syntactical similarities, well-written HLA programs share a similar programming style with modern high level languages. So a student who has learned how to write readable Pascal, C/C++, or Java programs will be able to write readable HLA programs with almost no additional study. Contrast this with the style guide I've written for (MASM) assembly language programmers that is quite a bit different than high level languages and takes a while to master.

Another factor many people don't consider is the evaluation of a programming project. At UCR we are given about 1.5-2 hours per student per quarter of reader (student grader) time to grade projects. Experienced readers who can grade (or want to grade) assembly language projects are few and far inbetween. Most readers get "stuck" with grading the assembly class rather than volunteer for the job. The fact that most student assembly language projects have a horrible programming style and are hard to read only exacerbates this situation. HLA helps solve this problem. Since good HLA programming style is very similar to good C/C++ style, UCR's readers have a much easier time reading the projects and evaluating their programming style. Also, since the students have (presumably) learned good programming style in the prerequisite course(s), they tend to write easier to read HLA programs than MASM programs. This lets me assign more projects without fear of exceeding my reader budget each quarter.

HLA's advantages are easily summed up by a complaint I had from a student once. She said "HLA drives me nuts. It's so similar to C++ that I often get confused and try out something that would work in C++ only have have the HLA compiler reject it." I agreed with this student that this was a bit of a problem, but I also mentioned "what about all the times you've tried something from C++ and it HAS worked?" She thought about it for a moment and walked away agreeing with my assessment of her complaint. Had this student been learning assembly the traditional way, she wouldn't have bothered to try anything. She would had to have spent extra time learning how to achieve what she wanted by reading an assembly text or she would have missed out on the opportunity to actually learn something new. HLA's similarity to C++ encouraged her to try something out on her own. The experiments weren't always successful, but in those cases where they were, she benefited greatly from this. This anecdote, more than any other, sums up what my goals with HLA were and describes the success I believe I have achieved with it.

How to Learn Assembly Programming Using HLA

Of course, a compiler without a language reference manual and tutorial is useless. This document will provide a reference to the HLA programming language. It is not, however, appropriate pedagogy for beginners (it's more suitable for those who already know assembly language programming and wish to learn HLA's syntax). A better text for beginners is "The Art of Assembly Language Programming/Win32 Edition." This provides a complete college level textbook that teaches assembly language programming from the ground up using HLA. You can find a copy of "AoA" on Webster at http://webster.cs.ucr.edu. Webster also contains the latest version of HLA as well as tons of HLA sample source code. That's the first place you should go for information on learning HLA.

Legal Notice

The HLA v1.xx implementation is a prototype intended to test language design and implementation features. I (Randall Hyde) have placed this code and language design in the public domain so others may benefit from this work. However, keep in mind that, as a prototype, HLA is not up to contemporary commercial standards for software quality. It is your responsibility to evaluate whether HLA is suitable for whatever purpose you intend its use.

At any given time there are several known and unknown defects in this software. Some may be corrected in later releases of HLA v1.x, some may never be corrected in the v1.x series. I (Randall Hyde) do not warrant or guarantee this software in any way. In particular, you cannot expect corrections of any given defect in the system. Obviously, I try to fix known problems (if possible), but I refuse to be held legally responsible for such defects in the software.

Note that defects will come in three general varieties: defects that cause the compiler to fail or generate bad code, defects in support code (e.g., the HLA Standard Library or other example code), and defects in the documentation accompanying this product. No guarantee applies to anything in HLA, especially in these three areas.

The purpose of developing a prototype implementation of the HLA language was to try out language design and implementation ideas. The prototype phase of HLA development is rapidly coming to an end and an "official" HLA language design will be forthcoming. HLA v2.0 will implement this new language. The only guarantees I make about compatibility between HLA v1.x and HLA v2.0 is that there will be some incompatibilities. The exact nature and magnitude of those incompatibilities is unknown at this point, but it is safe to assume that no HLA v1.x program will compile under HLA v2.0 without at least some minor source code changes. So please don't get the idea that any investment you make in HLA source code will be protected in v2.0 (note: after the release of v2.0 this is a relatively safe assumption to make, though there will still be no guarantees). The changes in the source language between HLA v1.25 and HLA v1.26 are but a small harbinger of the changes that will occur between v1.x and v2.0.

The HLA Standard Library may also undergo changes between v1.x and v2.0. So expect this to happen and plan accordingly if you intend to port your HLA code to v2.0 eventually.

Because HLA is constantly changing (typical of a prototype), it is very difficult to keep the documentation in phase with the language. You can expect this documentation (and all HLA documentation) to contain omissions (e.g., of new features that have yet to be documented), discussion of features removed from HLA, and incorrect descriptions of HLA features. Every attempt will be made to keep the documentation in phase with the software, but like so many free software projects, lack of time and motivation prevents perfection2.

This software is not fit for use in mission-critical or life-support software systems. This software is principally intended for evaluation and educational (i.e., learning assembly language) purposes only. It has been successfully used to develop commercial applications and it has been successfully used in educational environments, but again, you are personally responsible for determining the fitness of this software and documentation for your particular application and you must take responsibility for that choice.

Installing HLA Under Windows

HLA is not a stand alone program. It is a compiler that translates HLA source code into a lower-level assembly language. A separate assembler, such as MASM, then completes the processing of this low-level intermediate code to produce an object code file. Finally, you must link the object code output from the assembler using a linker program. Typically you will link the object code produced by one or more HLA source files with the HLA Standard Library (hlalib.lib) and, possibly, several operating system specific library files (e.g., kernel32.lib under Win32). Most of this activity takes place transparently whenever you ask HLA to compile your HLA source file(s). However, for the whole process to run smoothly, you must have installed HLA and all the support files correctly. This section will discuss how to set up HLA on your system.

First, you will need an HLA distribution for your particular Operating System. Since HLA was originally developed for Win32, these installation instructions will cover installation on a Win32 OS. Please see Webster if you're attempting to install HLA on a different OS (assuming it is available for some OS other than Windows; it was not as this was being written). The latest version of HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the latest version if you do not already possess it.

As noted earlier, HLA is not a stand alone assembler. The HLA package contains the HLA compiler, the HLA Standard Library, and a set of include files for the HLA Standard Library. If you write an HLA program with just this code, HLA will produce an "ASM" file and then stop. To produce an executable file you will need Microsoft's MASM and LINK programs, along with some Win32 library files, to complete the process. The easiest way to get all the files you need is to download the "MASM32" package from http://www.pdq.com.au/home/hutch/masm.htm or any of the other places on the net where you can find the MASM32 package. Once you unzip this file, it's easy to install the MASM32 package using the install program it supplies. You must install MASM32 (or MASM/LINK/Win32 library files) before HLA will function properly.

Here are the steps I went through to install MASM32 on my system:

I downloaded masm32v6.zip from the URL above (later versions are probably okay too, although there is a slight chance that the installation will be different.
I double-clicked on the masm32v6.zip file (which runs WinZip on my system).
I choose to extract "install.exe". I told WinZip to extract this file to C:\.
I double-clicked on the "install.exe" icon and selected the "C:" drive in the window that popped up. Then I hit the install button and waited while MASM32 extracted all the pertinent files. This produced a directory called "MASM32". MASM32 is a powerful assembly language development subsystem in its own right; but it uses the traditional MASM syntax rather than the HLA syntax. So we'll use MASM32 mainly for the assembler, linker, and library files. MASM32 also includes a simple editor/IDE and several other tools that may be useful to an HLA programmer. Feel free to check this software out and see if it is useful to you. For now, note that the executable files you will ultimately need are ML.EXE, ML.ERR, LINK.EXE, and a couple of DLLs. You can find them in the MASM32\BIN subdirectory. Leave them there for the time being. The MASM32\LIB directory also contains many Win32 library files you will need. Again, leave them alone for the time being.
Next, if you haven't already done so, download the HLA executables file from Webster at http://webster.cs.ucr.edu. On Webster you can download several different ZIP files associated with HLA from the HLA download page. The "Executables" is the only one you'll absolutely need; however, you'll probably want to grab the documentation and examples files as well. If you're curious, or you want some more example code, you can download the source listings to the HLA Standard Library. If you're really curious (or masochistic), you can download the HLA compiler source listings to (this is not for casual browsing!).
I downloaded the HLA1_25.zip file while writing this. Most likely, there is a much later version available as you're reading this. Be sure to get the latest version. I chose to download this file to my "C:\" root directory.
After downloading HLA1_25.zip to my C: drive, I double-clicked on the icon to run WinZip. I selected "Extract" and told WinZip to extract all the files to my C:\ directory. This created an "HLA" subdirectory in my root on C: with two subdirectories (include and lib) and two EXE files (HLA.EXE and HLAPARSE.EXE. The HLA program is a "shell" program that runs the HLA compiler (HLAPARSE.EXE), MASM (ML.EXE), the linker (LINK.EXE), and other programs. You can think of HLA.EXE as the "HLA Compiler".
Next, I created the following text file and named it "IHLA.BAT" (note that you may need to change the default drive letters if you want to install HLA on a drive other than "C:"):

path=c:\hla;c:\masm32\bin;%path%

set lib=c:\masm32\lib;c:\hla\hlalib;%lib%

set include=c:\hla\include;c:\masm32\include;%include%

set hlainc=c:\hla\include

set hlalib=c:\hla\hlalib\hlalib.lib

Be sure you've typed all the lines exactly as written or HLA will fail to run properly. You amy use any reasonable TEXT editor (e.g., NOTEPAD.EXE) to create this file. Do not use a word processing program (since they generally don't save their data as a TEXT file). Be sure the file is named "IHLA.BAT" and not "IHLA.BAT.TXT" or some other variation.
This batch file tells the system where to find all the files you will need when running HLA. Advanced Win32 users should note that you can set all these environment variables up inside the Windows system control panel in the "Advanced->Environment Variables" area. This is far more convenient (ultimately) than using this batch file (for reasons you'll soon see). However, you can mess up you system if you don't know what you're doing when playing with the system control panel, so only advanced users who've done this stuff before should attempt this.
HLA is a Win32 Console Window program. To run HLA you must open up a console Window. Under Windows 2000, Microsoft has hidden this away in Start->Programs->Accessories->Command Prompt. You might find it in another location. You can also start the command prompt processor by selecting Start->Run and entering "cmd".
Once you've got the command prompt, ("C:>" or something similar), execute the IHLA.BAT file you've created by typing "IHLA" at the command line prompt. Hit the ENTER key to execute the command.
At this point, HLA should be properly installed and ready to run. Try typing "HLA /?" at the command line prompt and verify that you get the HLA help message. If not, go back and figure out what you've done wrong up to this point (it doesn't hurt to start over from the beginning if you're lost).
Thus far, you've verified that HLA.EXE is operational. Now try the following command: "ML /?" This should run the Microsoft Macro Assembler (MASM) and display the help screen. You can ignore the information that appears; you will probably never need to know this stuff.
Next, let's verify the correct operation of the linker. Type "link /?" and verify that the linker program runs. Again, you can ignore the help screen that appears. You want need to know about this stuff.
Now it's time to try your hand at writing an honest to goodness HLA program and verify that the whole system is working. Here's the canonical "Hello World" program written in HLA. Enter it into a text editor and save it using the filename "HW.HLA":

program HelloWorld;

#include( "stdlib.hhf" )

begin HelloWorld;

stdout.put( "Hello, World of Assembly Language", nl );

end HelloWorld;

Make sure you're in the same directory containing the HW.HLA file and type the following command at the "C:>" prompt: "HLA -v HW". The "-v" option tells HLA to produce VERBOSE output during compilation. This is helpful for determining what went wrong if the system fails somewhere along the line. This command should produce the following output:

HLA (High Level Assembler)

Version Version 1.25 build 2933 (prototype)

Files:

1: hw.hla

Compiling "hw.hla" to "hw.asm"

Assembling hw.asm via "ml /c /coff /Cp hw.asm"

Microsoft (R) Macro Assembler Version 6.14.8444

Assembling: hw.asm

Linking via "link -subsystem:console /heap:0x1000000,0x1000000 /stack:0x1000000,0x1000000 /BASE:0x3000000 /machine:IX86 -entry:?HLAMain @hw.link -out:hw.exe kernel32.lib user32.lib c:\hla\hlalib\hlalib.lib hw.obj"

Microsoft (R) Incremental Linker Version 5.12.8078

/section:.text,ER

/section:readonly,R

/section:.edata,R

/section:.data,RW

/section:.bss,RW

If you get all of this output, you're in business. One thing to remember is that unless you set the environment variables permanently in the System control panel, you will have to run the IHLA.BAT file every time you open up a new command prompt window. Since this is a pain, here are some instructions I've taken from the Internet that describe how to set up the environment variables (DO THIS AT YOUR OWN RISK!)

1) Open System Properties (Winkey-Break is a convenient shortcut) and go to Advanced tab, then Environment Variables. Add "c:\hla" to the Path in SYSTEM VARIABLES, not in "User variables for <your win2k login name>". Click OK, but keep the Environment Variables window open, we're not done.

2) Look at the contents of ihla.bat (ABOVE):

3) In "User Variables for <your login name>", you must end up with each of these settings. For example, to create hlainc, you click the "New..." button, type "hlainc" as the name of the variable, and type "c:\hla\include" as the Variable value (all without quotes of course). If there is already a path set, and it already has some value, add this immediately to the end: ";c:\hla;%path%" and that will preserve your existing User and System paths as well as adding c:\hla.

For example, suppose you opened up your User Variables for <login name> and it already said "C:\Private

Files\PantiePix;c:\winnt\system32;c:\winnt;c:\winnt\System32\Wbem;d:\lcc\bin;D:\PROGRA~1\ULTRAE~1;D:\4NT300;C:\msoffice\Office;c:/hla",

you would click on Edit and type "C:\Private Files\PantiePix;c:\hla;%path%"

(Same advice for preserving existing lib and include settings)

4) Once you reboot the computer, you should be all set for "Hello world of assembly language"! (without having to run the IHLA.BAT file.)

Installing HLA is a complex and slightly involved process. Unfortunately, this is necessary because I don't have the rights to distribute MASM, LINK, and other Microsoft files. Fortunately, HUTCH has collected all of these files together so they are easy to download. If you are concerned about possible legal issues with the download, you may legally download MASM and LINK from Microsoft's site. A link on Webster (at the URL above) describes how to do this. At the time this was being written, work was progressing on HLA to produce TASM compatible output and plans were in the works to produce NASM and Gas versions as well. However, you will still have to obtain the Microsoft library files from some source if you intend to produce a Win32 application. Versions of HLA may appear for other Operating Systems as well. Check out Webster to see if any progress has been made in this direction.

The most common two problems people have running HLA involve the location of the Win32 library files and the choice of linker. During the linking phase, HLA (well, link.exe actually) requires the kernel32.lib, user32.lib, and gdi32.lib library files. These must be present in the pathname(s) specified by the LIB environment variable. If, during the linker phase, HLA complains about missing object modules, make sure that the LIB path specifies the directory containing these files. If you're a MS VC++ user, installation of VC++ should have set up the LIB path for you. If not, then locate these files (they are part of the MASM32 distribution) and copy them to the HLA\HLALIB directory (note that the ihla.bat file includes c:\hla\hlalib as part of the LIB path).

Another common problem with running HLA is the use of the wrong link.exe program. Microsoft has distributed several different versions of link.exe; in particular, there are 16-bit linkers and 32-bit linkers. You must use a 32-bit segmented linker with HLA. If you get complaints about "stack size exceeded" or other errors during the linker phase, this is a good indication that you're using a 16-bit version of the linker. Obtain and use a 32-bit version and things will work. Don't forget that the 32-bit linker must appear in the execution path (specified by the PATH environment variable) before the 16-bit linker.

Installing HLA Under Linux

HLA is not a stand alone program. It is a compiler that translates HLA source code into a lower-level assembly language. A separate assembler, such as Gas (as), then completes the processing of this low-level intermediate code to produce an object code file. Finally, you must link the object code output from the assembler using a linker program. Typically you will link the object code produced by one or more HLA source files with the HLA Standard Library (hlalib.a). Most of this activity takes place transparently whenever you ask HLA to compile your HLA source file(s). However, for the whole process to run smoothly, you must have installed HLA and all the support files correctly. This section will discuss how to set up HLA on your system.

First, you will need an HLA distribution for Linux. Please see Webster or the previous section if you're attempting to install HLA on a different OS such as Windows. The latest version of HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the latest version if you do not already possess it.

Here's the steps I went through to install HLA on my Linux system:

First, if you haven't already done so, download the HLA executables file from Webster at http://webster.cs.ucr.edu. On Webster you can download several different ZIP files associated with HLA from the HLA download page. The "Linux Executables" is the only one you'll absolutely need; however, you'll probably want to grab the documentation and examples files as well. If you're curious, or you want some more example code, you can download the source listings to the HLA Standard Library. If you're really curious (or masochistic), you can download the HLA compiler source listings to (this is not for casual browsing!).
I downloaded the HLA1_32.tar.gz file while writing this. Most likely, there is a much later version available as you're reading this. Be sure to get the latest version. I chose to download this file to my "/usr/hla" directory; you can put the file whereever you like, though this documentation assumes that all HLA files wind up in the "/usr/hla/..." directory tree.
After downloading HLA1_32.tar.gz to my "/usr/hla" directory, I changed into the "/usr/hla" subdirectory (via CD) and executed the following shell command: "gzip -d HLA1_32.tar.gz". Once decompression was complete, I extracted the individual files using the command "tar xvf HLA1_32.tar". This extracted a couple of executable files ("hla" and "hlaparse") along with two subdirectories (include and hlalib). The HLA program is a "shell" program that runs the HLA compiler (HLAPARSE.EXE), MASM (ML.EXE), the linker (LINK.EXE), and other programs. You can think of HLA.EXE as the "HLA Compiler". It would be a real good idea, at this point, to set the permissions on "hla" and "hlaparse" so that everyone can read and execute them. You should also set read and execute permissions on the two subdirectories and read permissions on all the files within the directories (if this isn't the default state). Do a "man chmod" from the Linux command-line if you don't know how to change permissions.
Next, (logged in as a plain user rather than root or the super-user), I edited the ".bashrc" file in my home directory ("/home/rhyde" in my particular case, this will probably be different for you). I found the line that defined the "path" variable, it originally looked like this on my system
"PATH=$DBROOT/bin:$DBROOT/pgm:$PATH"
I edited this line to add the path to the HLA directory, producing the following:
"PATH=$DBROOT/bin:$DBROOT/pgm:/usr/hla":$PATH
Without this modification, Linux will probably not find HLA when you attempt to execute it unless you type a full path (e.g., "/usr/hla/hla") when running the program. Since this is a pain, you'll definitely want to add "/usr/hla" to your path.
Next, I added the following four lines to ".bashrc" (note that Linux filenames beginning with a period don't normally show up in directory listings unless you supply the "-a" option to ls):
hlalib=/usr/hla/hlalib/hlalib.a
export hlalib
hlainc=/usr/hla/include
export hlainc
These four lines define (and export) environment variables that HLA needs during compilation. Without these environment variables, HLA will probably complain about not being able to find include files, or the linker (ld) will complain about strange undefined symbols when you attempt to compile your programs.

After saving the ".bashrc" shell, you can tell Linux to make the changes to the system by using the command:

source .bashrc

Note: this discussion only applies to users who run the BASH shell. If you are using a different shell (like the C-Shell or the Korn Shell), then the directions for setting the path and environment variables differs slightly. Please see the documentation for your particular shell if you don't know how to do this.
At this point, HLA should be properly installed and ready to run. Try typing "HLA -?" at the command line prompt and verify that you get the HLA help message. If not, go back and figure out what you've done wrong up to this point (it doesn't hurt to start over from the beginning if you're lost).
Now it's time to try your hand at writing an honest to goodness HLA program and verify that the whole system is working. Here's the canonical "Hello World" program written in HLA. Enter it into a text editor and save it using the filename "hw.hla":

program HelloWorld;

#include( "stdlib.hhf" )

begin HelloWorld;

stdout.put( "Hello, World of Assembly Language", nl );

end HelloWorld;

Make sure you're in the same directory containing the "hw.hla" file and type the following command at the prompt: "hla -v hw". The "-v" option tells HLA to produce VERBOSE output during compilation. This is helpful for determining what went wrong if the system fails somewhere along the line. This command should produce the following output:

HLA (High Level Assembler) Parser

Version Version 1.32 build 4895 (prototype)

-t active

File: t.hla

Compiling "t.hla" to "t.asm"

HLA (High Level Assembler)

Version Version 1.32 build 4895 (prototype)

ELF output

Using GAS assembler

GAS output

-test active

Files:

1: t.hla

Compiling 't.hla' to 't.asm'

using command line [hlaparse -v -sg -test "t.hla"]

Assembling "t.asm" via [as -o t.o "t.asm"]

Linking via [ld -o "t" "t.o" "/usr/hla/hlalib/hlalib.a"]

Installing HLA is a complex and slightly involved process; though take heart, it's a lot simpler to install HLA under Linux than Windows! (See the previous section if you need proof.) Versions of HLA may appear for other Operating Systems (beyond Windows and Linux) as well. Check out Webster to see if any progress has been made in this direction. Note a very unique thing about HLA: Carefully written (console) applications will compile and run on all supported operating systems without change. This is unheard of for assembly language! So if you are using multiple operating systems supported by HLA, you'll probably want to download files for all supported OSes.

Using the HLA Command-Line Compiler

Once you've installed HLA and verified that it is operational, you can run the HLA compiler. The HLA compiler consists of two executables: hla(.exe)3, which is a shell that processes command line arguments, compiles ".hla" files to ".asm" files, assembles the ".asm" files by calling an assembler, and links the resulting files together using a linker program; the second executable is hlaparse(.exe) which compiles a single ".hla" file to an assembly language file. Generally, you would only run hla(.exe). The hla(.exe) program automatically runs the hlaparse(.exe) and assembler/linker programs. The hla(.exe) command uses the following syntax:

hla optional_command_line_parameters Filename_list

The filenames list consists of one or more unambiguous filenames having the extension: ".hla", ".asm" or ".obj"/".o"4. HLA will first run the hlaparse(.exe) program on all files with the HLA extension (producing files with the same basename and an ASM extension). Then HLA runs the assembler on all files with the ".asm" extension (including the files produced by hlaparse). Finally, HLA runs the linker to combine all the object files together (including the ".obj"/".o" files the assembler produces). The ultimate result, assuming there were no errors along the way, is an executable file (with an EXE extension under Windows, with no extension under Linux).

HLA supports the following command line parameters (this output is from the Linux version, the Windows display is slightly different even though the command line options are the same):

options:

-@ Do not generate linker response file.

-axxxxx Pass xxxxx as command line parameter to assembler.

-c Compile and assemble to .OBJ files only.

-dxx Define VAL symbol xx to have type BOOLEAN and value TRUE.

-e name Executable output filename.

-gas Assemble using Gas (default - Linux)

-lxxxxx Pass xxxxx as command line parameter to linker.

-m Create a map file during link

-masm Assemble using MASM (default - windows).

-o:omf Produce OMF files (default - tasm).

-o:win32 Produce win32 COFF files.

Note: default OBJ format depends on OS and assembler.

Win32/TASM = OMF, Win32/MASM = COFF.

-s Compile to .ASM files only.

-sm Compile to MASM files only (default for -s).

-st Compile to TASM files only.

-sg Compile to Gas files only.

-sym Dump symbol table after compile.

-tasm Assemble using TASM.

-test Send diagnostic info to stdout rather than stderr (This

option is intended for HLA test/debug purposes).

-v Verbose compile.

-w Compile as windows app (default is console app).

-? Display this help message.

Note that HLA ignores case when processing command line parameters (unlike typical Linux programs). Hence, "-s" is equivalent to "-S" (for example) when specifying a command line parameter.

Under Windows, HLA always produces a "linker response file" that it supplies to the Microsoft LINK.EXE program during the link phase. This linker response file contains necessary segment declarations and other vital linker information. HLA overwrites any existing ".LINK" file whenever you run the compiler. The "-@" option tells HLA not to create a new ".LINK" file, but to use the existing one. Use this option if you edit the ".LINK" file to change default parameters or add linker options and you want HLA to use the edited linker response file rather than create a new one. If you specify multiple ".HLA" filenames on the command line, HLA only generates a single ".LINK" file using the name of the first ".HLA" file it encounters. Linux's ld program does not require this linker response file, so the Linux version of HLA does not produce this file.

The -aXXXXX option lets you pass assembler-specific command line options to the assembler during the assembler phase. This option is ignored if you use one of the -s options.

The -c option tells HLA to run the hlaparse compiler and the assembler, producing ".obj"/".o" files. HLA will process all filenames on the command line that have ".hla" or ".asm" extension, but it will ignore any filenames with ".obj" extensions. If you compile an HLA unit without compiling an HLA program at the same time, you will need to use this option or the linker will complain about not finding the main program. You may specify the ".obj"/".o" file format using the COFF or OMF command line options (MASM only, TASM always produces OMF files, Gas always produces ELF files).

The -dXXXXX option tells HLA to define the symbol XXXXX as a boolean VAL constant and initialize it with the value TRUE. Generally you use such symbols to control the emission of code during assembly using statements like "#if( @defined( XXXXX )) ..."

By default, HLA creates an executable filename using the extension ".exe" (Windows) or without an extension (Linux) and the basename of the first filename on the command line. You can use the -e name option to specify a different executable file name.

The -lXXXXX option passes the text XXXXX on to the linker as a command line option.

The -m option tells the Microsoft linker to produce a map file during the link phase. This is equivalent to the "-lmap" option. The Linux version of HLA ignores this option.

The -MASM option tells HLA to assemble the output using the MASM assembler. Note that this overrides any earlier -s option. If you use this option, you should not also specify the -sg or -st options. You should not use this option under Linux unless you've, somehow, got MASM running there (e.g., via DOSEMU).

The -o:omf option tells the underlying assembler (MASM, generally) to produce an Object Module Format (OMF) OBJ file. This option is generally applicable only to MASM since TASM always produces OMF files. This option is not legal when using the Gas assembler.

The -o:win32 option instructs the assembler to generate a COFF OBJ file. This option is the default for MASM and may not be available for other assemblers.

The -s option tells the HLA program to run only the hlaparse compiler; HLA will not run an assembler or linker. As a result, HLA ignores any ".asm" or ".obj" filenames you supply on the command line. This option is useful if you wish to view the output of an HLA compilation without producing any actual object code. Note that this option overrides any -MASM or -TASM options appearing earlier on the command line. Similarly, the -MASM or -TASM options override the -s option if they appear after -s on the command line.

The -st option tells HLA to produce TASM-compatible assembly and stop after compilation. If you also want to assemble the code using Borlands Turbo assembler, you must specify the -tasm command line option after the -st option.

The -sm option tells HLA to produce MASM-compatible assembly and stop after compilation. This is (currently) equivalent to the -s command under Windows. You can force a compilation of the source code by specifying the -masm command after -sm on the command line.

The -sg option tells HLA to produce Gas-compatible assembly and stop after compilation. This is (currently) equivalent to the -s command under Linux. You can force a compilation of the source code by specifying the -gas command after -sg on the command line.

The -sym option dumps the symbol table after compiling each file with an HLA extension. This option is primarily intended for testing and debugging the HLA compiler; however, this information can be useful to the HLA programmer on occasion.

The -tasm option tells the compiler to run Borland's TASM32.EXE (V5.0) assembler after compiling the source file. The -st option should appear on the command line prior to this option.

The -test option is intended for hlaparse testing and debugging purposes only. It causes the compiler to send all error messages to the standard output device rather than the standard error device. This allows the test code to redirect all errors to a text file for comparison against other files.

The -v option (verbose) causes HLA to print additional information during compile to show the progress of the compilation. Due to a bug in MASM, if you do not specify the -v option the compilation isn't completely quiet. MASM will still output data to the standard error device even in quiet (non-verbose) mode.

The -w option informs HLA that you are compiling a standard Windows (GUI) application rather than a console application. By default, HLA assumes that you are compiling a executable that will run from the command window. If you want to write a full Windows application, you will need to supply this option to tell HLA not to link the code for console operation. Obviously, this option doesn't apply to Linux systems.

The -? option cause HLA to dump the list of command line options and immediately quit without further work.

Note that the command line options this document describes are for HLA v1.26 and later only. Earlier versions of HLA used a different command line set. See the documentation for the specific version you're using if you have questions.

Manually Assembling and Linking HLA Output Under Windows

Warning: The material in this section is somewhat advanced. If this is your first exposure to HLA, you will probably want to skip this material. This information is generally not needed when writing standard HLA applications; only advanced programmers in some very special circumstances will need this information. This complexity applies mainly to the Windows OS; under Linux, HLA works as you would expect it to, no special tricks are needed to link object modules produced by HLA.

The HLA compiler physically consists of two executable files: HLA.EXE and HLAPARSE.EXE. The HLAPARSE.EXE program is the actual HLA compiler. This file accepts a single HLA source file and compiles it to an .ASM file. The HLA.EXE program is the "user interface" to the compiler. This file processes the command line parameters and calls HLAPARSE.EXE, ML.EXE, and LINK.EXE5, as appropriate, to process the user's source and object files. You can view the individual steps during compilation by specifying the "-v" (for verbose) command line option when running HLA, e.g.,

c:>hla -v t.hla

HLA (High Level Assembler)
Copyright 1999, by Randall Hyde, all rights reserved.
Version Version 1.21 build 2254 (prototype)

Files:
1: t.hla

Compiling "t.hla" to "t.asm"

Assembling t.asm via "ml /c /coff /Cp t.asm"

Assembling: t.asm

Linking via
"link -subsystem:console \
/heap:0x1000000,0x1000000 \
/stack:0x1000000,0x1000000 \
/BASE:0x3000000 \
/machine:IX86 \
/section:cseg,ER \
/section:dseg,RWS \
/section:bssseg,RWS \
/section:readonly,RS \
/section:strings,RS \
-entry:?HLAMain \
-out:t.exe \
kernel32.lib \
c:\hla\hlalib\hlalib.lib \
t.obj"

As you can see from this output, HLA fills in a lot of details for you during compilation6.

By default, HLA compiles win32 console applications. This default was chosen because HLA is an instructional tool and most students will write console applications in assembly language using HLA. While this is a good default for compilation, many programmers will want to create GUI applications or otherwise change the default compilation configuration. This section discusses how to achieve that.

There are four HLA command-line options that let you interrupt the normal compilation process. They are "-s", "-c", "-w", and "-o".

The -w Option

The "-w" option tells HLA to invoke the linker using the command line option

-subsystem:windows

rather than the default

-subsystem:console

This provides a convenient mechanism for those who wish to create win32 GUI applications. Most likely, however, if you wish to create GUI applications, you will run the linker explicitly yourself (as this document will explain), so you'll probably not use the "-w" option very frequently. It's great for some short GUI demos, but larger GUI programs will probably not use this option. This option is only active if HLA compiles the program to an executable. If you compile the program to an OBJ or ASM file, HLA ignores this option.

The "-e" Option

The "-e" option uses the name following the "-e" as the exectuable filename that HLA produces. E.g., "HLA -e x t.hla" compiles "t" to "x.exe". As with the "-e" option, this option is mildly convenient for short projects (and console applications), but serious users will probably not use this HLA command line option since they will likely specify the executable filename as a linker command line option. The "-e" option is only active if HLA actually compiles the program to an executable. If you compile the program to an OBJ or ASM file, HLA ignores this option.

The "-o:omf" and "-o:win32" Options

The -o:omf option tells HLA to produce an Object Module Format OBJ file rather than a COFF (Common Object File Format) OBJ file. OMF files are required by some non-Microsoft languages. The -o:win32 option tells HLA to compile and assemble the files to the standard COFF file format (this is the default condition).

The "-s" Option

The "-s" (s=source) option tells HLA to compile the HLA source file to an ASM source file and then stop. The HLA.EXE program does not run MASM or the linker programs after compiling the HLA source file. This option is primarily used by those who wish to inspect the HLA compiler output (perhaps to verify correctness of the HLA compiler or just to see what HLA is doing). This document will not consider this option any farther.

The "-sm" Option

As above, but this option explicitly tells HLA to create a MASM-compatible .ASM file.

The "-st" Option

This option is also an "assembly output only" option, but it instructs HLA to produce a file that TASM 5.0 can assemble. This option exists primarily for linking HLA code with code produced by Borland's Delphi.

Assembler Selection Options

By default, HLA uses the MASM assembler (ML.EXE) to process the assembly output it produces. You can use the "-masm" command line option to explicitly request the use of MASM. You may also use the "-tasm" directive to explicitly request the use of the TASM5 assembler. If you specify the use of a different assembler, you should also specify the appropriate "-sX" option (before the assembler choice) to obtain the correct source code output format. Note that "-masm" and "-tasm" override the "-sX" option insofar as producing source only. That is, if you specify an assembler, then HLA will run the assembler to produce an OBJ file unless you specify an "-sX" option after the assembler choice on the command line.

The "-c" Option

The "-c" option (compile and assemble only) tells HLA to compile the HLA source code to an ASM file and then run MASM on this ASM file to produce an OBJ file. Compilation stops at that point and it is the user's responsibility to run the linker to produce an executable file.

One common use of this option is to compile HLA units to OBJ files. Since HLA units do not contain a main program, you cannot compile an HLA unit directly to an executable. To compile an HLA unit separately (i.e., without compiling an HLA main program during the same HLA.EXE invocation) you must specify the "-c" option or the compilation will generate an error when it attempts to link the program.

A second reason for using the "-c" option is because you want to explicitly run the linker yourself and supply LINK.EXE command line options that are different than those that HLA automatically provides.

The "-axxxxxx" Option

This option passes "xxxxxxx" as a command line parameter to the assembler. HLA allows multiple instances of this option on the command line (presumably, each one contains a different assembler option).

The "-@" Option Under Windows

By default when running under Windows, HLA produces a linker response file containing segment (section) information. The "-@" option turns this feature off. This is necessary, for example, when you've created your own linker response file and you don't HLA to overwrite your response file. Note that if you haven't created a linker response file already, the compilation and link will fail unless you've also specified the "-c" option. HLA always assumes the presence of a linker response file when it runs the linker.

MAKE Files and the Linker Response File Under Windows

As you probably noticed earlier, HLA supplies a really long command line parameter list to the LINK command. If you compile the HLA source file to an OBJ file (using the "-c" option) and then attempt to manually run the linker, you're going to be spending a lot of time typing in the link command. If you expect to link your files together more than once, you're definitely going to want to automate the compilation and linking process. Fortunately, the MAKE tool is perfect for this job. You can use Microsoft's NMAKE.EXE program, Borland's MAKE.EXE program, or any other UNIX-compatible MAKE tool that runs in a win32 console window.

A typical "makefile" will take the following form:

t.exe: t.hla
hla -c t.hla
link @t.link -out:t.exe t.obj

The "t.link" file contains the standard set of LINK.EXE command line parameters, this file takes the following form (this is typical of the file produced by the "-@" option, though, of course, it is a text file that you can create or modify):

-subsystem:console

/heap:0x1000000,0x1000000

/stack:0x1000000,0x1000000

/BASE:0x3000000

/machine:IX86

/section:.text,ER

/section:.edata,RS

/section:readonly,RS

/section:.data,RWS

/section:.bss,RWS

-entry:?HLAMain

kernel32.lib

user32.lib

c:\hla\hlalib\hlalib.lib

These commands appear in the "t.link" file rather than in the makefile on the LINK.EXE command line because there is too much text on the command line when all of these items are present. Hence, we must use a linker response file and include this data using the LINK.EXE "@t.link" command line parameter. The following sections describe each of the lines in the "t.link" file and how you might modify them.

Note that if your specify multiple HLA source files on the same command line, HLA only generates a single linker response file using the name of the first HLA source file you specify on the command line. E.g., if you run HLA with the following command line it only generates a single linker response file:

hla unitDemoMain.hla UnitDemoUnit.hla

For this example, HLA generates a "unitDemoMain.link" linker response file.

The "-subsystem:console" Option

This is the option that tells the linker you are creating a console application rather than a GUI application. This option must be present if you are compiling a program that directs output to the standard output device (e.g., stdout.XXXX) or reads input from the standard input device (e.g., stdin.XXXX). Applications you compile with this option will run in a win32 console window as a 32-bit executable. If you are creating a GUI (windows) application, you must change this line in the "t.link" file to "-subsystem:windows" to tell the linker this is a windows application.

The "/heap:0x1000000,0x1000000" Option

This option tells the linker to reserve and commit 16 megabytes of storage for the heap. HLA's memory allocation routines use the heap (by calling the Windows global allocation routines). In theory, if you exceed the maximum heap size, Windows will automatically allocate additional heap storage elsewhere in memory. However, if you expect that your application will consume more than 16 MBytes of dynamic storage, you should bump this value up to make more room.

The choice of 16 MBytes was fairly arbitrary. If your applications do not make much use of dynamic memory, you should consider reducing this number to one megabyte (0x100000, that's one less zero than 16 MBytes) or even less. By doing this your applications will use less memory and less system resources. To reduce or expand the heap size, simply replace the two values in the "/heap" option with the value you desire. Note that if you make the heap larger, you should adjust the base address of the program upward to compensate for the new size.

The "/stack:0x1000000,0x1000000" Option

This linker option sets aside 16 MBytes of storage for the 80x86 stack segment. Like the "/heap" option, you can change this value if you would like a larger or smaller stack. If your program does not contain any recursive functions or use a tremendous amount of local automatic (VAR) storage, you can probably reduce this to one megabyte or even less (a good minimum is probably 64K or 0x10000). If you make this value larger than 16 MBytes, you will probably want to change the program's base address using the "/base" option.

The "/base:0x3000000" Option

This option sets the base address of the HLA program.

HLA, along with the default linker options, lays out the program in memory as shown in the following diagram:

The "/base" option specifies the base address of all the segments (sections) other than the stack and heap segments in memory. Normally the stack is located at the low end of memory and the heap is placed immediately above that. If the stack and heap are both at 16 Mbytes long (0x1000000) and the OS reserves a small amount of storage for its own use, then the other sections should start somewhere above address 0x2?????? in memory (the actual address depends on the actual storage in use by the system starting at address zero). The default "/base:0x3000000" command tells the linker to place the remaining segments in memory starting at address 0x3000000 (48 MBytes into the memory space). This should be well above the space reserved (by default) for the stack and heap. Unless you have some special requirements (specifically, if you need a combined heap and stack space larger than 48 MBytes), you should not have to change the base address of the program. Note that if you specify a base address that would place the code in the heap or stack segments, the linker will relocate the stack and heap to higher addresses. Generally, you shouldn't do this because the current memory organization automatically traps heap and stack overflows. If you move these things around in memory the system may not trap these exceptions.

The "/machine:IX86" Option

This just tells the linker that you're linking in 80x86 machine code and quiets certain warnings

that may otherwise appear. You shouldn't change this line in the "t.link" file.

The "/section:XXXXX" Options

There are five lines in the "t.link" file that contain "/section" options. These are

/section:.code,ER
/section:const,RS
/section:readonly,RS
/section:.data,RWS
/section:.bss,RWS

Under win32, a "section" is similar to a memory segment. Of course, win32 programs use the flat memory model so the application has only one physical 80x86 segment7. Sections are the logical equivalent of a segment. In general, a win32 program can have as many sections as desired. HLA, however, only directly supports five sections: the .code section, the HLA constants section (const), the readonly section, the static .data section, and the uninitialized data section (.bss).

The .text section holds the 80x86 machine instructions the HLA compiler emits. The "ER" option on the "/section:.text, ER" line tells the linker that the CPU can execute instructions in this segment and it can read data found in this segment. Since the "W" option is not present, the code segment is read-only. Any attempt to write data to the code section will generate a memory access exception. Generally it is not a good idea to embed writable objects in the code section. If you absolutely need this feature (e.g., self-modifying code) then add the "W" option to this option list. The ability to read data in the code segment isn't strictly required. If you do not embed values in the code stream, you can remove the "R" option from this command. This may help catch some errant pointer accesses during your program's execution. Since fetching parameters from the code stream is not unheard of, HLA defaults to making the code section readable in order to allow the use of this parameter passing technique.

The .edata section holds string and other constants that HLA automatically emits (for example, if you specify an instruction like "mul( 10, EAX);" then HLA will create an object in the .edata section that holds the constant 10 for use by this multiply instruction). This section should always be read-only. The "RS" option specifies read access and shared access. It doesn't really require the "S" option since other processes won't share this data, but it probably doesn't hurt too much to have the "S" option present. You should not make this segment writable since this segment only contains literal constant data and errant writes to this section of memory may produce unusual results (to say the least).

The readonly section holds all the HLA variables you declare in the HLA READONLY declaration section. Like the .edata section, this memory segment is readable and shared but it is not writable. In order to preserve the semantics of the HLA READONLY section, you should not make this section writable by adding the "W" option. Like the .edata section, the readonly section is sharable. It is possible (and practical) to share data in a readonly section between two programs, hence the default setting of shared for this section. Sharing objects is beyond the scope of this documentation, a different article will have to describe how to accomplish that.

The .data section holds all initialized objects you declare in the HLA STATIC and DATA declaration sections. Since these sections are writable as well as readable, the default memory access mode for this section command is "RWS" (readable, writable, sharable). Note that this memory section may not contain executable instructions. If you attempt to jump to code in the .data section, the system will raise a memory access exception. If you intend to create self-modifying code in the .data section, you will need to modify this linker option to allow executable as well as read-write-shared.

The .bss section is where all HLA STORAGE variables go. In theory, this section holds uninitialized variables and shouldn't consume much disk space (even if there are large arrays in the STORAGE declaration sections). The memory access options are the same as .data's.

The "-entry:?HLAMain" Option

This option specifies the entry point of the main program. The HLA main program always uses the symbol "?HLAMain" so you should never modify this option unless you really know what you are doing. The HLA main program initializes the standard library and the exception handling system. Therefore, the program may crash if the program does not begin execution with the HLA entry point.

If you decide to create your own entry point, you should eventually jump to the " ?HLAMain " label to prevent problems with the HLA run-time system. Modifying this option should only be done by those who are well-versed in the HLA compiler and run-time system.

About the only reason for taking control of the entry point yourself is if you are writing stand-along assembly code that doesn't use the HLA exception handling system or HLA Standard Library routines. Further investigation of this feature is left to the reader.

The Library Files Options

The "kernel32.lib" and "user32.lib" files are win32 API interface files that provide the external symbols for most of the win32 APIs a typical (console) application will use. Almost every win32 application will need to specify this file (which is part of the Microsoft SDK). If you are writing GUI applications using HLA you will probably need to specify some other MSSDK library files as well. You will have to refer to the Microsoft win32 API documentation for exact details, but common files you'll need to link in include comctl32.lib, comdlg32.lib, and gdi32.lib in addition to kernel32.lib and user32.lib. There are several dozen different LIB files you may want to use. See the Microsoft documentation for more details.

The c:\hla\hlalib\hlalib.lib option specifies the path to the HLA Standard Library LIB file. You must link in this library module if you call or use any objects in the HLA Standard Library.

Any filename appearing within the linker response file is assumed to be a library file or OBJ file (depending on the suffix). By default, link.exe uses the "LIB" environment variable to determine the location of Win32 library files. Note that LINK.EXE does not honor the value of the "HLALIB" environment variable. You must provide the full path of the HLA Standard Library LIB file (hlalib.lib) in this list.

The LINKER Command Line

link @t.link -out:t.exe t.obj

As noted earlier, the "@t.link" command line option simply tells the linker to fetch the items from the "t.link" file and treat them as though they appear on the linker's command line. This lets you specify far more options than are normally possible on the linker command line. About the only thing worth noting here is that the filename after the at-sign ("@") does not have to be "t.link". You can use any filename you choose; typically it will be something like "projectName.link" where projectName denotes the name of the main source file in your current project. The examples in this document use the project name "t" because the HLA source file is "t.hla".

The "-out:t.exe" option specifies the output (executable) file name. When running the linker independently of HLA, you should specify this option in place of the HLA "-o" option8. If this option is not present the linker will give the executable file the same prefix name as the first object code file on the command line. When using a makefile, it's a good idea to always supply this option so you can modify the linker command line and not worry about generating an executable with a strange name.

The last item on the link command line, "t.obj", specifies the name of the object module compiled previously by HLA and MASM. If you needed to link in several HLA units as well as the main program (or OBJ files created by other languages), you would include them on this linker command line as well.

HLA Language Elements

Starting with this section we being discussing the HLA source language. HLA source files must contain only seven-bit ASCII characters. These are text files with each source line record containing a carriage return/line feed (Windows) or a just a line feed (Linux) termination sequence (HLA is actually happy with either sequence, so text files are portable between OSes without change). White space consists of spaces, tabs, and newline sequences. Generally, HLA does not appreciate other control characters in the file and may generate an error if they appear in the source file.

Comments

HLA uses "//" to lead off single line comments. It uses "/*" to begin an indefinite length comment and it uses "*/" to end an indefinite length comment. C/C++, Java, and Delphi users will be quite comfortable with this notation.

Special Symbols

The following characters are HLA lexical elements and have special meaning to HLA:

* / + - ( ) [ ] { } < > : ; , . = ? & | ^ ! @

Reserved Words

Here are the HLA reserved words. You may not use any of these reserved words as HLA identifiers. HLA reserved words are case insensitive. That is, "MOV" and "mov" (as well as any permutation with resepect to case) both represent the HLA "mov" reserved word.

: HLA Reserved Words

		#asm	#closeread
#closewrite	#code	#const
#else	#elseif	#emit	#endasm
#endfor #endif	#endmacro	#endtext	#endwhile
	#error	#for	#if
#include	#includeonce	#keyword
#macro
		#openread	#openwrite
#print	#readonly	#static	#storage
#system	#terminator	#text	#while
#write	@a	@abs @abstract	@addofs1st
@ae	@align @alignstack	@arity	@b @basereg
@be	@bound	@byte	@c
@cdecl	@ceil	@char @class	@cos @cset
@curdir	@curlex	@curobject	@curoffset
@date	@defined	@delete	@dim
@display	@dword @e	@elements	@elementsize
@enter	@enumsize	@eos	@eval
@exactlynchar	@exactlyncset	@exactlynichar	@exactlyntomchar @external
@exactlyntomcset	@exactlyntomichar	@exceptions	@exp
@extract	@filename	@firstnchar	@firstncset
@firstnichar	@floor	@frame	@g
@ge	@global	@index	@insert
@int8 @int16 @int32 @int64 @int128	@into @isalpha	@isalphanum	@isclass
@isconst	@isdigit	@IsExternal	@isfreg
@islower	@ismem	@isreg	@isreg16
@isreg32	@isreg8	@isspace	@istype
@isupper	@isxdigit	@l	@lastobject
@le	@leave	@length	@lex
@linenumber	@localoffset @localsyms	@log	@log10
@lowercase	@lword @matchid	@matchintconst	@matchistr
@matchnumericconst	@matchrealconst	@matchstr	@matchstrconst
@matchtoistr	@matchtostr	@max	@min
@na	@nae	@name	@nb
@nbe	@nc	@ne	@ng
@nge	@nl	@nle	@no
@noalignstack	@nodisplay	@noenter	@noframe
@noleave	@norlesschar	@norlesscset	@norlessichar
@normorechar	@normorecset	@normoreichar	@nostorage
@np	@ns	@ntomchar	@ntomcset
@ntomichar	@nz	@o	@odd
@offset	@onechar	@onecset	@oneichar
@oneormorechar	@oneormorecset	@oneormoreichar	@oneormorews
@optstrings	@p	@parmoffset @parms	@pascal
@pclass	@pe	@peekchar	@peekcset
@peekichar	@peekws	@po @pointer	@ptype
@qword @random	@randomize	@read	@reg
@reg16	@reg32 @reg8	@real32 @real64 @real80	@rindex @returns
@s	@section	@sin	@size
@sqrt	@staticname	@stdcall	@strbrk
@string	@strset	@strspan	@substr
@tan	@text	@time	@tokenize
@tostring	@trace	@trim	@type
@typename	@uns8 @uns16 @uns32 @uns64 @uns128	@uppercase @uptochar	@uptocset
@uptoichar	@uptoistr	@uptostr	@use
@volatile	@word @wsoreos	@wstheneos	@z
@zeroormorechar	@zeroormorecset	@zeroormoreichar	@zeroormorews
@zerooronechar	@zerooronecset	@zerooroneichar	aaa
aad	aam	aas	abstract
adc	add	ah	al
align	and	anyexception	arpl
ax	begin	bh	bl
boolean	bound	bp	break
breakif	bsf	bsr	bswap
bt	btc	btr	bts
bx	byte	call	cbw
cdq	ch	char	cl
class	clc	cld	cli
clts	cmc	cmova	cmovae
cmovb	cmovbe	cmovc	cmove
cmovg	cmovge	cmovl	cmovle
cmovna	cmovnae	cmovnb	cmovnbe
cmovnc	cmovne	cmovng	cmovnge
cmovnl	cmovnle	cmovno	cmovnp
cmovns	cmovnz	cmovo	cmovp
cmovpe	cmovpo	cmovs	cmovz
cmp	cmpsb	cmpsd	cmpsw
cmpxchg	cmpxchg8b	const	continue
continueif	cpuid	cr0	cr1
cr2	cr3	cr4	cr5
cr6	cr7	cseg	cset
cwd	cwde	cx	daa
das	dec	dh	di
div	dl	do	dr0
dr1	dr2	dr3	dr4
dr5	dr6	dr7	dseg
dup	dword	dx	dx:ax
eax	ebp	ebx	ecx
edi	edx	edx:eax	else
elseif	emms	end	endclass
endfor	endif	endreadonly	endrecord
endstatic	endstorage	endtry	endunion
endwhile	enter	enum	eseg
esi	esp	exception	exit
exitif	external	f2xm1	fabs
fadd	faddp	fbld	fbstp
fchs	fclex	fcmova	fcmovae
fcmovb	fcmovbe	fcmove	fcmovna
fcmovnae	fcmovnb	fcmovnbe	fcmovne
fcmovnu	fcmovu	fcom	fcomi
fcomip	fcomp	fcompp	fcos
fdecstp	fdiv	fdivp	fdivr
fdivrp	ffree	fiadd	ficom
ficomp	fidiv	fidivr	fild
fimul	fincstp	finit	fist
fistp	fisub	fisubr	fld
fld1	fldcw	fldenv	fldl2e
fldl2t	fldlg2	fldln2	fldpi
fldz	fmul	fmulp	fnop
for	foreach	forever	forward
fpatan	fprem	fprem1	fptan
frndint	frstor	fsave	fscale
fseg	fsin	fsincos	fsqrt
fst	fstcw	fstenv	fstp
fstsw	fsub	fsubp	fsubr
fsubrp	ftst	fucom	fucomi
fucomip	fucomp	fucompp	fwait
fxam	fxch	fxtract	fyl2x
fyl2xp1	gseg	hlt	idiv
if	imod	imul	in
inc	inherits	insb	insd
insw	int	int16	int32
int8	intmul	into	invd
invlpg	iret	iretd	iterator
ja	jae	jb	jbe
jc	jcxz	je	jecxz
jf	jg	jge	jl
jle	jmp	jna	jnae
jnb	jnbe	jnc	jne
jng	jnge	jnl	jnle
jno	jnp	jns	jnz
jo	jp	jpe	jpo
js	jt	jz	label
lahf	lar	lazy	lds
lea	leave	les	lfs
lgdt	lgs	lidt	lldt
lock.adc	lock.add	lock.and	lock.btc
lock.btr	lock.bts	lock.cmpxchg	lock.dec
lock.inc	lock.neg	lock.not	lock.or
lock.sbb	lock.sub	lock.xadd	lock.xchg
lock.xor	lodsb	lodsd	lodsw
loop	loope	loopne	loopnz
loopz	lsl	lss	ltreg
method	mm0	mm1	mm2
mm3	mm4	mm5	mm6
mm7	mod	mov	movd
movq	movsb	movsd	movsw
movsx	movzx	mul	name
namespace	neg	nop	not null
or	out	outsb	outsd
outsw	override	overrides	packssdw
packsswb	packuswb	paddb	paddd
paddsb	paddsw	paddusb	paddusw
paddw	pand	pandn	pavgb
pavgw	pcmpeqb	pcmpeqd	pcmpeqw
pcmpgtb	pcmpgtd	pcmpgtw	pextrw
pinsrw	pmaddwd	pmaxsw	pmaxub
pminsw	pminub	pmovmskb	pmulhuw
pmulhw	pmullw	pointer	pop
popa	popad	popf	popfd
por	procedure	program	psadbw
pshufw	pslld	psllq	psllw
psrad	psraw	psrld	psrlq
psrlw	psubb	psubd	psubsb
psubsw	psubusb	psubusw	psubw
punpckhbw	punpckhdq	punpckhwd	punpcklbw
punpckldq	punpcklwd	push	pusha
pushad	pushd	pushf	pushfd
pushw	pxor	qword	raise
rcl	rcr	rdmsr	rdpmc
rdtsc	readonly	real32	real64
real80	record	rep.insb	rep.insd
rep.insw	rep.movsb	rep.movsd	rep.movsw
rep.outsb	rep.outsd	rep.outsw	rep.stosb
rep.stosd	rep.stosw	repe.cmpsb	repe.cmpsd
repe.cmpsw	repe.scasb	repe.scasd	repe.scasw
repeat	repne.cmpsb	repne.cmpsd	repne.cmpsw
repne.scasb	repne.scasd	repne.scasw	repnz.cmpsb
repnz.cmpsd	repnz.cmpsw	repnz.scasb	repnz.scasd
repnz.scasw	repz.cmpsb	repz.cmpsd	repz.cmpsw
repz.scasb	repz.scasd	repz.scasw	result
ret	returns	rol	ror
rsm	sahf	sal	sar
sbb	scasb	scasd	scasw
segment	seta	setae	setb
setbe	setc	sete	setg
setge	setl	setle	setna
setnae	setnb	setnbe	setnc
setne	setng	setnge	setnl
setnle	setno	setnp	setns
setnz	seto	setp	setpe
setpo	sets	setz	sgdt
shl	shld	shr	shrd
si	sidt	sldt	sp
sseg	st0	st1	st2
st3	st4	st5	st6
st7	static	stc	std
sti	storage	stosb	stosd
stosw	streg	string	sub
tbyte	test	text	then
this	thunk	to	try
type	ud2	union	unit
unprotected	uns16	uns32	uns8
until	val	valres	var
verr	verw	vmt	wait
wbinvd	while	word	wrmsr
xadd	xchg	xlat	xmm0
xmm1	xmm2	xmm3	xmm4
xmm5	xmm6	xmm7	xor

Note that "@debughla" is also a reserved compiler symbol. However, this is intended for internal (HLA) debugging purposes only. When the compiler encounters this symbol, it immediately stops the compiler with an assertion failure. Obviously, you should never put this statement in your source code unless you're debugging HLA and you want to stop the compiler immediately after the compilation of some statement.

External Symbols and Assembler Reserved Words

HLA produces an assembly language file during compilation and invokes an assembler such as MASM to complete the compilation process. HLA automatically translates normal identifiers you declare in your program to beneign identifiers in the assembly language program. However, HLA does not translate EXTERNAL symbols, but preserves these names in the assembly language file it produces. Therefore, you must take care not to use external names that conflict with the underlying assembler's set of reserved words or that assembler will generate an error when it attempts to process HLA's output.

For a list of assembler reserved words, please see the documentation for the assembler you are using.

HLA Identifiers

HLA identifiers must begin with an alphabetic character or an underscore. After the first character, the identifier may contain alphanumeric and underscore symbols. There is no technical limit on identifier length in HLA, but you should avoid external symbols greater than about 32 characters in length since the assembler and linkers that process HLA identifiers may not be able to handle such symbols.

HLA identifiers are always case neutral. This means that identifiers are case sensitive insofar as you must always spell an identifier exactly the same (with respect to alphabetic case). However, you are not allowed to declare two identifiers whose only difference is alphabetic case.

Although technically legal in your program, do not use identifiers that begin and end with a single underscore. HLA reserves such identifiers for use by the compiler and the HLA standard library. If you declare such identifiers in your program, the possibility exists that you may interfere with HLA's or the HLA Standard Library's use of such a symbol.

By convention, HLA programmers use symbols beginning with two underscores to represent private fields in a class. So you should avoid such identifiers except when defining such private fields in your own classes.

External Identifiers

HLA lets you explicitly provide a string for external identifiers. External identifiers are not limited to the format for HLA identifiers. HLA allows any string constant to be used for an external identifier. It is your responsibility to use only those characters that are legal in the assembler that processes HLA's intermediate ASM file. Note that this feature lets you use symbols that are not legal in HLA but are legal in external code (e.g., Win32 APIs use the '@' character in identifiers and some non-HLA code may use HLA reserved words as identifiers). See the discussion of the @EXTERNAL option for more details.

Data Types in HLA

Native (Primitive) Data Types in HLA

HLA provides the following basic primitive types:

boolean One byte; zero represents false, one represents true.

Enum One byte; user defined IDs whose value ranges from 0 to 255.

Uns8 Unsigned values in the range 0..255.

Uns16 Unsigned integer values in the range 0..65535.

Uns32 Unsigned integer values in the range 0..4,204,967,295

Byte Generic eight-bit value.

Word Generic 16-bit value.

DWord Generic 32-bit value.

Int8 Signed integer values in the range -128..+127.

Int16 Signed integer values in the range -32768..+32767.

Int32 Signed integer values in the range -2,147,483,648..+2,147,483,647

Char Character values.

Real32 32-bit floating point values.

Real64 64-bit floating point values.

Real80 80-bit floating point values.

String Dynamic length string constants. (Run-time implementation: four-byte pointer.)

CSet A set of up to 128 different ASCII characters (16-byte bitmap).

Text Similar to string, but text constants expand in-place (like #define in C/C++).

Thunk A set of machine instructions to execute.

Often, it is convenient to discuss the types above in various groups. This document will often use the following terms:

Ordinal: boolean, enum, uns8, uns16, uns32, byte, word dword, int8, int16, int32, char.

Unsigned: uns8, uns16, uns32, byte, word dword.

Signed: int8, int16, int32, byte, word dword.

Number: uns8, uns16, uns32, int8, int16, int32, byte, word dword

Numeric: uns8, uns16, uns32, int8, int16, int32, byte, word dword, real32, real64, real80

Composite Data Types

In addition to the primitive types above, HLA supports arrays, records (structures), unions, classes, and pointers of the above types (except for text objects).

Array Data Types

HLA allows you to create an array data type by specifying the number of array elements after a type name. Consider the following HLA type declaration that defines intArray to be an array of int32 objects:

type intArray : int32[ 16 ];

The "[ 16 ]" component tells HLA that this type has 16 four-byte integers. HLA arrays use a zero-based index, so the first element is always element zero. The index of the last element, in this example, is 15 (total of 16 elements with indicies 0..15).

HLA also supports multidimensional arrays. You can specify multidimensional arrays by providing a list of indicies inside the square brackets, e.g.,

type intArray4x4 : int32[ 4, 4 ];
type intArray2x2x4 : int32[ 2,2,4 ];

The mechanism for accessing array elements differs depending upon whether you are accessing compile-time array constants or run-time array variables. A complete discussion of this will appear in later sections.

Union Data Types

HLA implements the discriminant union type using the UNION..ENDUNION reserved words. The following HLA type declaration demonstrates a union declaration:

type allInts: union
i8: int8;
i16: int16;
i32: int32;
endunion;

All fields in a union have the same starting address in memory. The size of a union object is the size of the largest field in the union. The fields of a union may have any type that is legal in a variable declaration section (see the discussion of the VAR section for more details).

Given a union object, say "i" of type "allInts", you access the fields of the union using the familiar dot-notation. The following 80x86 mov instructions demonstrate how to access each of the fields of the "i" variable:

mov( i.i8, al );
mov( i.i16, ax );
mov( i.i32, eax );

Unions also support a special field type known as an anonymous record (see the next section for a description of records). The syntax for an anonymous record in a union is the following:

type

unionWrecord: union

u1Field: byte;

u2Field: word;

u3Field: dword;

record

u4Field: byte[2];

u5Field: word[3];

endrecord;

u6Field: byte;

endunion;

Fields appearing within the anonymous record do not necessarily start at offset zero in the data structure. In the example above, u4Field starts at offset zero while u5Field immediately follows it two bytes later. The fields in the union outside the anonymous record all start at offset zero. If the size of the anonymous record is larger than any other field in the union, then the record's size determines the size of the union. This is true for the example above, so the union's size is 16 bytes since the anonymous record consumes 16 bytes.

Record Data Types9

HLA's records allow programmers to create data types whose fields can be different types. The following HLA type declaration defines a simple record with four fields:

type Planet: record

x: int32;
y: int32;
z: int32;
density: real64;

endrecord;

Objects of type Planet will consume 20 bytes of storage at run-time.

The fields of a record may be of any legal HLA data type including other composite data types. Like unions, anything that is legal in a VAR section is a legal field of a record. Also like unions, you use the dot-notation to access fields of a record object.

In addition to the VAR types, you may also declare anonymous unions within a record. An anonymous union is at union declaration without a fieldname associated with the union, e.g.,

type DemoAU: record
x: real32;
union
u1:int32;
r1:real32;
endunion;
y:real32;
endrecord;

In this example, x, u1, r1, and y are all fields of DemoAU. To access the fields of a variable D of type DemoAU, you would use the following names: D.x, D.u1, D.r1, and D.y. Note that D.u1 and D.r1 share the same memory locations at run-time, while D.x and D.y have unique addresses associated with them.

Record types may inherit fields from other record types. Consider the following two HLA type declarations:

type
Pt2D: record

x: int32;
y: int32;

endrecord;

Pt3D: record inherits( Pt2D )

z: int32;

endrecord;

In this example, Pt3D inherits all the fields from the Pt2D type. The "inherits" keyword tells HLA to copy all the fields from the specified record (Pt2D in this example) to the beginning of the current record declaration (Pt3D in this example). Therefore, the declaration of Pt3D above is equivalent to:

Pt3D: record

x: int32;
y: int32;
z: int32;

endrecord;

In some special situations you may want to override a field from a previous field declaration. For example, consider the following record declarations:

BaseRecord:

record

a: uns32;

b: uns32;

endrecord;

DerivedRecord:

record inherits( BaseRecord )

b: boolean; // New definition for b!

c: char;

endrecord;

Normally, HLA will report a "duplicate" symbol error when attempting to compile the declaration for "DerivedRecord" since the "b" field is already defined via the "inherits( BaseRecord )" option. However, in certain cases it's quite possible that the programmer wishes to make the original field inaccessible in the derived class by using a different name. That is, perhaps the programmer intends to actually create the following record:

DerivedRecord:

record

a: uns32; // Derived from BaseRecord

b: uns32; // Derived from BaseRecord, but inaccessible here.

b: boolean; // New definition for b!

c: char;

endrecord;

HLA allows a programmer explicitly override the definition of a particular field by using the OVERRIDES keyword before the field they wish to override. So while the previous declarations for DerivedRecord produce errors, the following is acceptable to HLA:

BaseRecord:

record

a: uns32;

b: uns32;

endrecord;

DerivedRecord:

record inherits( BaseRecord )

overrides b: boolean; // New definition for b!

c: char;

endrecord;

Normally, HLA aligns each field on the next available byte offset in a record. If you wish to align fields within a record on some other boundary, you may use the ALIGN directive to achieve this. Consider the following record declaration as an example:

type

AlignedRecord:

record

b:boolean; // Offset 0

c:char; // Offset 1

align(4);

d:dword; // Offset 4

e:byte; // Offset 8

w:word; // Offset 9

f:byte; // Offset 11

endrecord;

Note that variable "d" is aligned at a four-byte offset while "w" is not aligned. We can correct this problem by sticking another ALIGN directive in this record:

type

AlignedRecord2:

record

b:boolean; // Offset 0

c:char; // Offset 1

align(4);

d:dword; // Offset 4

e:byte; // Offset 8

align(2);

w:word; // Offset 10

f:byte; // Offset 12

endrecord;

Be aware of the fact that the ALIGN directive in a RECORD only aligns fields in memory if the record object itself is aligned on an appropriate boundary. For example, if an object of type AlignedRecord2 appears in memory at an odd address, then the "d" and "w" fields will also be misaligned (that is, they will appear at odd addresses in memory). Therefore, you must ensure appropriate alignment of any record variable whose fields you're assuming are aligned.

Note that the AlignedRecord2 type consumes 13 bytes. This means that if you create an array of AlignedRecord2 objects, every other element will be aligned on an odd address and three out of four elements will not be double-word aligned (so the "d" field will not be aligned on a four-byte boundary in memory). If you are expecting fields in a record to be aligned on a certain byte boundary, then the size of the record must be an even multiple of that alignment factor if you have arrays of the record. This means that you must pad the record with extra bytes at the end to ensure proper alignment. For the AlignedRecord2 example, we need to pad the record with three bytes so that the size is an even multiple of four bytes. This is easily achieved by using an ALIGN directive as the last declaration in the record:

type

AlignedRecord2:

record

b:boolean; // Offset 0

c:char; // Offset 1

align(4);

d:dword; // Offset 4

e:byte; // Offset 8

align(2);

w:word; // Offset 10

f:byte; // Offset 12

align(4) // Ensures we're padded to a multiple of four bytes.

endrecord;

Note that you should only use values that are integral powers of two in the ALIGN directive.

If you want to ensure that all fields are appropriately aligned on some boundary within the record, but you don't want to have to manually insert ALIGN directives throughout the record, HLA provides a second alignment option to solve your problem. Consider the following syntax:

type

alignedRecord3 : record[4]

<< Set of fields >>

endrecord;

The "[4]" immediately following the RECORD reserved word tells HLA to start all fields in the record at offsets that are multiples of four, regardless of the object's size (and the size of the objects preceeding the field). HLA allows any integer expression that produces a value in the range 1..4096 inside these parenthesis. If you specify the value one (which is the default), then all fields are packed (aligned on a byte boundary). For values greater than one, HLA will align each field of the record on the specified boundary. For arrays, HLA will align the field on a boundary that is a multiple of the array element's size. The maximum boundary HLA will round any field to is a multiple of 4096 bytes.

Note that if you set the record alignment using this syntactical form, any ALIGN directive you supply in the record may not produce the desired results. When HLA sees an ALIGN directive in a record that is using field alignment, HLA will first align the current offset to the value specified by ALIGN and then align the next field's offset to the global record align value.

Nested record declarations may specify a different alignment value than the enclosing record, e.g.,

type

alignedRecord4 : record[4]

a:byte;

b:byte;

c:record[8]

d:byte;

e:byte;

endrecord;

f:byte;

g:byte;

endrecord;

In this example, HLA aligns fields a, b, f, and g on dword boundaries, it aligns d and e (within c ) on eight-byte boundaries. Note that the alignment of the fields in the nested record is true only within that nested record. That is, if c turns out to be aligned on some boundary other than an eight-byte boundary, then d and e will not actually be on eight-byte boundaries; they will, however be on eight-byte boundaries relative to the start of c .

In addition to letting you specify a fixed alignment value, HLA also lets you specify a minimum and maximum alignment value for a record. The syntax for this is the following:

type

recordname : record[maximum : minimum]

<< fields >>

endrecord;

Whenever you specify a maximum and minimum value as above, HLA will align all fields on a boundary that is at least the minimum alignment value. However, if the object's size is greater than the minimum value but less than or equal to the maximum value, then HLA will align that particular field on a boundary that is a multiple of the object's size. If the object's size is greater than the maximum size, then HLA will align the object on a boundary that is a multiple of the maximum size. As an example, consider the following record:

type

r: record[ 4:1 ];

a:byte; // offset 0

b:word; // offset 2

c:byte; // offset 4

d:dword;[2] // offset 8

e:byte; // offset 16

f:byte; // offset 17

g:qword; // offset 20

endrecord;

Note that HLA aligns g on a dword boundary (not qword, which would be offset 24) since the maximum alignment size is four. Note that since the minimum size is one, HLA allows the f field to be aligned on an odd boundary (since it's a byte).

If an array, record, or union field appears within a record, then HLA uses the size of an array element or the largest field of the record or union to determine the alignment size. That is, HLA will align the field without the outermost record on a boundary that is compatible with the size of the largest element of the nested array, union, or record.

HLA sophisticated record alignment facilities let you specify record field alignments that match that used by most major high level language compilers. This lets you easily access data types used in those HLLs without resorting to inserting lots of ALIGN directives inside the record.

Note that there is a big difference in the semantics between the global record alignment option (above) and the similar syntax in the STATIC, READONLY , and STORAGE declaration sections. (which is why their syntax is different) Consider the following:

static(4)

v1: byte;

v2: dword;

Unlike the record alignment option, this example only aligns the first field of the STATIC section, not all the variables in that section (i.e., v2 will not be aligned on a dword boundary in the example above). Keep this difference in mind when using this alignment option.

When declaring record variables in a VAR, STATIC, READONLY, STORAGE , or SEGMENT declaration section, HLA associates the offset zero with the first field of a record. Each additional field in the record is assigned an offset corresponding to the sum of the sizes of all the prior fields. So in the example immediately above, "x" would have the offset zero, "y" would have the offset four, and "z" would have the offset eight.

If you would like to specify a different starting offset, you can use the following syntax for a record declaration:

Pt3D: record := 4;

x: int32;
y: int32;
z: int32;

endrecord;

The constant expression specified after the assignment operator (":=") specifies the starting offset of the first field in the record. In this example x, y, and z will have the offsets 4, 8, and 12, respectively.

Warning: setting the starting offset in this manner does not add padding bytes to the record. This record is still a 12-byte object. If you declare variables using a record declared in this fashion, you may run into problems because the field offsets do not match the actual offsets in memory. This option is intended primarily for mapping records to pre-existing data structures in memory. Only really advanced assembly language programmers should use this option.

Pointer Types

HLA allows you to declare a pointer to some other type using syntax like the following:

pointer to base_type

The following example demonstrates how to create a pointer to a 32-bit integer within the type declaration section:

type pi32: pointer to int32;

HLA pointers are always 32-bit (near32) pointers.

HLA also allows you to define pointers to existing procedures using syntax like the following:

procedure someProc( parameter_list );

<< procedure options, followed by @external, @forward, or procedure body>>

type

p : pointer to procedure someProc;

The p procedure pointer "inherits" all the parameters and other procedure options associated with the original procedure. This is really just shorthand for the following:

procedure someProc( parameter_list );

<< procedure options, followed by @external, @forward, or procedure body>>

type

p : procedure ( Same_Parameters_as_someProc ); <<same options as someProc>>

The former version, however, is easier to maintain since you don't have to keep the parameter lists and procedure options in sync.

Note that HLA provides the reserved word null (or NULL, reserved words are case insensitive) to represent the nil pointer. HLA replaces NULL with the value zero. The NULL pointer is compatible with any pointer type (including strings, which are pointers).

Thunks

A "thunk" is an eight-byte variable that contains a pointer to a piece of code to execute and an execution environment pointer (i.e., a pointer to an activation record). The code associated with a thunk is, essentially, a small procedure that (generally) uses the activation record of the surround code rather than creating its own activation record. HLA uses thunks to implement the iterator "yield" statement as well as pass by name and pass by lazy evaluation parameters. In addition to these two uses of thunks, HLA allows you to declare your own thunk objects and use them for any purpose you desire. To declare a thunk variable is easy, just use a declaration like the following in a VAR or STATIC section:

thunkVar: thunk;

This declaration reserves eight bytes of storage. The first dword holds the address of the code to execute, the second dword holds a pointer to the activation record to load into EBP when the thunk executes.

Of course, like almost any pointer variable, declaring a thunk variable is the easy part; the hard part is making sure the thunk variable is initialized before attempting to call the thunk. While you could manually load the address of some code and the frame pointer value into a thunk variable, HLA provides a better syntax for initializing thunks with small code fragments: the "thunk" statement. The "thunk" statement uses the following syntax:

thunk thunkVar := #{ sequence_of_statements }#;

Consider the following example:

program ThunkDemo;

#include( "stdio.hhf" );

procedure proc1;

var

i: int32;

p1Thunk: thunk;

procedure proc2( t:thunk );

var

i:int32;

begin proc2;

mov( 25, i );

t();

stdout.put( "Inside proc2, i=", i, nl );

end proc2;

begin proc1;

thunk p1Thunk := #{ mov( 0, i ); }#;

mov( 1, i );

proc2( p1Thunk );

stdout.put( "i=", i, nl );

end proc1;

begin ThunkDemo;

proc1();

end ThunkDemo;

In this example, proc1 has two local variables, i and p1Thunk . The THUNK statement initializes the p1Thunk variable with the address of some code that moves a zero into the i variable. The THUNK statement also initializes p1Thunk with a pointer to the current activation record (that is, a pointer to proc1 's activation record). Then proc1 calls proc2 passing p1Thunk as a parameter.

The proc2 routine has its own local variable named i . Of course, this is a different variable than the i in proc1 . Proc2 begins by setting its variable i to the value 25. Then proc2 invokes the thunk (passed to it as a parameter). This thunk sets the variable i to zero. However, since the thunk uses the current activation record when the set statement was executed, this statement sets proc1 's i variable to zero rather than proc2 's i variable. This program produces the following output:

Inside proc2, i=25

i=0

Although you probably won't use thunks that often, they are quite nice for deferred execution. This is especially useful in AI (Artificial Intelligence) programs.

Class Types

Classes and object-oriented programming are the subject of a later section of this document. See See Class Data Types for more details.

Literal Constants

Literal constants are those language elements that we normally think of as non-symbolic constant objects. HLA supports a wide variety of literal constants. The following sections describe those constants.

Numeric Constants

HLA lets you specify several different types of numeric constants.

Decimal Constants

The first and last characters of a decimal integer constant must be decimal digits (0..9). Interior positions may contain decimal digits and underscores. The purpose of the underscore is to provide a better presentation for large decimal values (i.e., use the underscore in place of a comma in large values). Example: 1_234_265.

Note: Technically, HLA does not allow negative literal integer constants. However, you can use the unary "-" operator to negate a value, so you'd never notice this omission (e.g., -123 is legal, it consists of the unary negation operator followed by a positive decimal literal constant). Therefore, HLA always returns type unsXX for all literal decimal constants. Also note that HLA always uses a minimum size of uns32 for literal decimal constants. If you absolutely, positively, want a literal constant to be treated as some other type, use one of the compile-time type coercion functions to change the type (e.g., uns8(1), word(2), or int16(3)). Generally, the type that HLA uses for the object is irrelevant since HLA will automatically promote a value to a larger or smaller type as appropriate.

Here are the following ranges for the various HLA unsigned data types:

uns8: 0..255

uns16: 0..65,535

uns32: 0..4,294,967,295

uns64: 0..18,446,744,073,709,551,615

uns128: 0..340,282,366,920,938,463,463,374,607,431,768,211,455

Hexadecimal Constants

Hexadecimal literal constants must begin with a dollar sign ("$") followed by a hexadecimal digit and must end with a hexadecimal digit (0..9, A..F, or a..f). Interior positions may contain hexadecimal digits or underscores. Hexadecimal constants are easiest to read if each group of four digits (starting from the least significant digit) is separated from the others by an underscore. E.g., $1A_2F34_5438.

If the constant fits into 32 bits or less, HLA always returns the dword type for a hexadecimal constant. For larger values, HLA will automatically use the qword or lword type, as appropriate. If you would like the hexadecimal value to have a different type, use one of the HLA compile-time type coercion functions to change the type (e.g., byte($12) or qword($54)).

Here are the following ranges for the various HLA hexadecimal data types:

uns8: 0..$FF

uns16: 0..$FFFF

uns32: 0..$FFFF_FFFF

uns64: 0..$FFFF_FFFF_FFFF_FFFF

uns128: 0..$FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF

Binary Constants

Binary literal constants begin with a percent sign ("%") followed by at least one binary digit (0/1) and they must end with a binary digit. Interior positions may contain binary digits or underscore characters. Binary constants are easiest to read if each group of four digits (starting from the least significant digit) is separated from the others by an underscore. E.g., %10_1111_1010.

Like hexadecimal constants, HLA always associates the type dword with a "raw" binary constant; it will use the qword or lword type if the value is greater than 32 bits or 64 bits (respectively). If you want HLA to use a different type, use one of the compile-time type coercion functions to achieve this.

Obviously, binary constants may have as many binary digits as there are bits in the underlying type. This document will not attempt to write out the maximum binary literal constant!

Numeric Set Constants

HLA provides a special numeric constant form that lets you specify a numeric value by the bit positions containing ones. This corresponds to a powerset of integer values in the range 0..31. These constants take the following form:

@{ comma_separated_list_of_digits }

The comma_separate_list_of_digits can be empty (signifying no set bits, i.e., the value zero), a single digit, or a set of digits separated by commas. Here are some examples:

@{}

@{8}

@{1,2,8,24}

The corresponding numeric constant is given the type dword and is assigned the value that has ones in all the specified bit positions. For example, "@{8}" is equal to 256 since this value has a single set bit in bit position eight. Note that "@{0}" equals one, not zero (because the value one has a single set bit in position zero).

Real (Floating Point) Constants

Floating point (real) literal constants always begin with a decimal digit (never just a decimal point). A string of one or more decimal digits may be optionally followed by a decimal point and zero or more decimal digits (the fractional part). After the optional fractional part, a floating point number may be followed by "e" or "E", a sign ("+" or "-"), and a string of one or more decimal digits (the exponent part). Underscores may appear between two adjacent digits in the floating point number; their presence is intended to substitute for commas found in real-world decimal numbers.

Examples:

1.2

2.345e-2

0.5

1.2e4

2.3e+5

1_234_567.0

Literal real constants are always 80 bits and have the default type real80 . If you wish to specify real32 or real64 literal constants, then use the real32 or real64 compile-time coercion functions to convert the values, e.g., real32( 3.14159 ) . During compile time, it's rare that you'd want to use one of the smaller types since they are less accurate at representing floating point values (although this might be precisely why you decide to use the smaller real type, so the accuracy matches the computations you're doing at run-time).

The range of real32 constants is approximately 10±38 with 6-1/2 digits of precision; the range of real64 values is approximately 10±308 with approximately 14-1/2 digits of precision, and the range of real80 constants is approximately 10±4096 with about 18 digits of precision.

Boolean Constants

Boolean constants consist of the two predefined identifiers true and false. Note that your program may redefine these identifiers, but doing so is incredibly bad programming style. Since these are actual identifiers in the symbol table (and not reserved words), you must spell these identifiers in all lower case or HLA will complain (unlike reserved words that are case insensitive).

Internally, HLA represents true with one and false with zero. In fact, HLA's boolean operations only look at bit #0 of the boolean value (and always clear the other bits). HLA compile-time statements that expect a boolean expression do not use zero/not zero like C/C++ and a few other languages. Such expressions must have a boolean type and, again.

Character Constants

Character literals generally consist of a single (graphic) character surrounded by apostrophes. To represent the apostrophe character, use four apostrophies, e.g., `'''.

Another way to specify a character constant is by typing the "#" symbol followed by a numeric literal constant (decimal, hexadecimal, or binary). Examples: #13, #$D, #%1101.

Unicode Character Constants

Unicode character constants are 16-bit values. HLA provides limited support for Unicode literal constants. HLA supports the UTF/7 code point (character set) which is just the standard seven-bit ASCII character set and nine high-order zero bits. To specify a 16-bit literal Unicode constant simply prefix a standard ASCII literal constant with a 'u' or 'U', e.g.,

u'A' - UTF/7 character constant for 'A'

Note that UTF/7 constants are simply the ASCII character codes zero extended to 16 bits.

HLA provides a second syntax for Unicode character constants that lets you enter values whose character codes are outside the range $20..$7E. You can specify a Unicode character constant by its numeric value using the 'u#nnnn' constant form. This form lets you specify a 16-bit value following the '#' in either binary, decimal, or hexadecimal form, e.g.,

u#1233

u#$60F

u%100100101001

String Constants

String literal constants consist of a sequence of (graphic) characters surrounded by quotes. To embed a quote within a string, insert a pair of quotes into the string, e.g., "He said ""This"" to me."

If two string literal constants are adjacent in a source file (with nothing but whitespace between them), then HLA will concatenate the two strings and present them to the parser as a single string. Furthermore, if a character constant is adjacent to a string, HLA will concatenate the character and string to form a single string object. This is useful, for example, when you need to embed control characters into a string, e.g.,

"This is the first line" #$d #$a "This is the second line" #$d #$a

HLA treats the above as a single string with a Wndows newline sequence (CR/LF) at the end of each of the two lines of text.

Unicode String Constants

HLA lets you specify Unicode string literals by prefixing a standard string constant with a 'u' or a 'U'. Such string constants use the UTF/7 character set (that is, the standard ASCII character set) but reserve 16 bits for each character in the string. Note that the high order nine bits of each character in the string will contain zero.

As this was being written, there is no support for Unicode strings in the HLA Standard Library, though support for Unicode string functions should appear shortly (note that Windows' programmers can call the Unicode string functions that are part of the Windows' API).

Character Set Constants

A character set literal constant consists of several comma delimited character set expressions within a pair of braces. The character set expressions can either be individual character values or a pair of character values separated by an ellipse (".."). If an individual character expression appears within the character set, then that character is a member of the set; if a pair of character expressions, separated by an ellipse, appears within a character set literal, then all characters between the first such expression and the second expression are members of the set.

Examples:

{`a','b','c'} // a, b, and c.

{`a'..'c'} // a, b, and c.

{`A'..'Z','a'..'z'} //Alphabetic characters.

{` `,#$d,#$a,#$9} //Whitespace (space, return, linefeed, tab).

HLA character sets are currently limited to holding characters from the 128-character ASCII character set. In the future, HLA may support an xcset type (supporting 256 elements) or even wcset (wide character sets), but that support does not currently exist.

Structured Constants

Array Constants

Note: see See Array Data Types for more details about HLA array types.

HLA lets you specify an array literal constant by enclosing a set of values within a pair of square brackets. Since array elements must be homogenous, all elements in an array literal constant must be the same type or conformable to the same type. Examples:

[ 1, 2, 3, 4, 9, 17 ]
[ 'a', 'A', 'b', 'B' ]
[ "hello", "world" ]

Note that each item in the list of values can actually be a constant expression, not a simple literal value.

HLA array constants are always one dimensional. This, however, is not a limitation because if you attempt to use array constants in a constant expression, the only thing that HLA checks is the total number of elements. Therefore, an array constant with eight integers can be assigned to any of the following arrays:

const
a8: int32[8] := [1,2,3,4,5,6,7,8];
a2x4: int32[2,4] := [1,2,3,4,5,6,7,8];
a2x2x2: int32[2,2,2] := [1,2,3,4,5,6,7,8];

Although HLA doesn't support the notation of a multi-dimensional array constant, HLA does allow you to include an array constant as one of the elements in an array constant. If an array constant appears as a list item within some other array constant, then HLA expands the interior constant in place, lengthening the list of items in the enclosing list. E.g., the following three array constants are equivalent:

[ [1,2,3,4], [5,6,7,8] ]
[ [ [1,2], [3,4] ], [ [5,6], [7,8] ] ]
[1,2,3,4,5,6,7,8]

Although the three array constants are identical, as far as HLA is concerned, you might want to use these three different forms to suggest the shape of the array in an actual declaration, e.g.,

const
a8: int32[8] := [1,2,3,4,5,6,7,8];
a2x4: int32[2,4] := [ [1,2,3,4], [5,6,7,8] ];
a2x2x2: int32[2,2,2] := [[[1,2], [3,4] ], [[5,6], [7,8]]];

Also note that symbol array constants, not just literal array constants, may appear in a literal array constant. For example, the following literal array constant creates a nine-element array holding the values one through nine at indexes zero through eight:

const Nine: int32[ 9 ] := [ a8, 9 ];

This assumes, of course, that "a8" was previously declared as above. Since HLA "flattens" all array constants, you could have substituted a2x4 or ax2x2x for a8 in the example above and obtained identical results.

You may also create an array constant using the HLA DUP operator. This operator uses the following syntax:

expression DUP [expression_to_replicate]

Where expression is an integer expression and expression_to_replicate is a some expression, possibly an array constant. HLA generates an array constant by repeating the values in the expression_to_replicate the number of times specified by the expression. (Note: this does not create an array with expression elements unless the expression_to_replicate contains only a single value; it creates an array whose element count is expression times the number of items in the expression_to_replicate ). Examples:

10 dup [1] -- equivalent to [1,1,1,1,1,1,1,1,1,1]

5 dup [1,2] -- equivalent to [1,2,1,2,1,2,1,2,1,2]

Please note that HLA does not allow class constants, so class objects may not appear in array constants. Also, HLA does not allow generic pointer constants, only certain types of pointer constants are legal. See the discussion of pointer constants for more details.

Record Constants

Note: see See Record Data Types for details about HLA Records.

HLA supports record constants using a syntax very similar to array constants. You enclose a comma-separated list of values for each field in a pair of square brackets. To further differentiate array and record constants, the name of the record type and a colon must precede the opening square bracket, e.g.,

Planet:[ 1, 12, 34, 1.96 ]

HLA associates the items in the list with the fields as they appear in the original record declaration. In this example, the values 1, 12, 34, and 1.96 are associated with fields x, y, z, and density, respectively. Of course, the types of the individual constants must match (or be conformable to) the types of the individual fields.

Note that you may not create a record constant for a particular record type if that record includes data types that cannot have compile-time constants associated with them. For example, if a field of a record is a class object, you cannot create a record constant for that type since you cannot create class constants.

Union Constants

Note: see See Union Data Types for more details about HLA's UNION types.

Starting with HLA v1.38, HLA supports union constants. These union constants allow you to initialize static union data structures in memory as well as initialize union fields of other data structures (including anonymous union fields in records). This section describes the syntax you'll use to create union constants.

There are some important differences between HLA compile-time union constants and HLA run-time unions (as well as between the HLA run-time union constants and unions in other languages). Therefore, it's a good idea to begin the discussion of HLA's union constants with a description of these differences.

There are a couple of different reasons people use unions in a program. The original reason was to share a sequence of memory locations between various fields whose access is mutually exclusive. When using a union in this manner, one never reads the data from a field unless they've previous written data to that field and there are no intervening writes to other fields between that previous write and the current read. The HLA comile-time language fully (and only) supports this use of union objects.

A second reason people use unions (especially in high level languages) is to provide aliases to a given memory location; particularly, aliases whose data types are different. In this mode, a programmer might write a value to one field and then read that data using a different field (in order to access that data's bit representation as a different type). HLA does not support this type of access to union constants. The reason is quite simple: internally, HLA uses a special "variant" data type to represent all possible constant types. Whenever you create a union constant, HLA lets you provide a value for a single data field. From that point forward, HLA effectively treats the union constant as a scalar object whose type is the same as the field you've initialized; access to the other fields through the union constant is no longer possible. Therefore, you cannot use HLA compile-time constants to do type coercion; nor is there any need to since HLA provides a set of type coercion operators like @byte, @word, @dword, @int8, etc. As noted above, the main purpose for providing HLA union constants is to allow you to initialize static union variables; since you can only store one value into a memory location at a time, union constants only need to be able to represent a single field of the union at one time (of course, at run-time you may access any field of the static union object you've created; but at compile-time you may only access the single field associated with a union constant).

An HLA literal union constant takes the following form:

typename.fieldname:[ constant_expression ]

The typename component above must be the name of a previously declared HLA union data type (i.e., a union type you've created in the type section). The fieldname component must be the name of a field within that union type. The constant_expression component must be a constant value (expression) whose type is the same as, or is automatically coercable to, the type of the fieldname field. Here is a complete example:

type

u:union

b:byte;

w:word;

d:dword;

q:qword;

endunion;

static

uVar :u := u.w:[$1234];

The declaration for uVar initializes the first two bytes of this object in memory with the value $1234. Note that uVar is actually eight bytes long; HLA automatically zeros any unused bytes when initializing a static memory object with a union constant.

Note that you may place a literal union constant in records, arrays, and other composite data structures. The following is a simple example of a record constant that has a union as one of its fields:

type

r :record

b:byte;

uf:u;

d:dword;

endrecord;

static

sr :r := r:[0, u.d:[$1234_5678], 12345];

In this example, HLA initializes the sr variable with the byte value zero, followed by a dword containing $1234_5678 and a dword containing zero (to pad out the remainder of the union field), followed by a dword containing 12345.

You can also create records that have anonymous unions in them and then initialize a record object with a literal constant. Consider the following type declaration with an anonymous union:

type

rau :record

b:byte;

union

c:char;

d:dword;

endunion;

w:word;

endrecord;

Since anonymous unions within a record do not have a type associated with them, you cannot use the standard literal union constant syntax to initialize the anonymous union field (that syntax requires a type name). Instead, HLA offers you two choices when creating a literal record constant with an anonymous union field. The first alternative is to use the reserved word union in place of a typename when creating a literal union constant, e.g.,

static

srau :rau := rau:[ 1, union.d:[$12345], $5678 ];

The second alternative is a shortcut notation. HLA allows you to simply specify a value that is compatible with the first field of the anonymous union and HLA will assign that value to the first field and ignore any other fields in the union, e.g.,

static

srau2 :rau := rau:[ 1, 'c', $5678 ];

This is slightly dangerous since HLA relaxes type checking a bit here, but when creating tables of record constants, this is very convenient if you generally provide values for only a single field of the anonymous union; just make sure that the commonly used field appears first and you're in business.

Although HLA allows anonymous records within a union, there was no syntactically acceptable way to differentiate anonymous record fields from other fields in the union; therefore, HLA does not allow you to create union constants if the union type contains an anonymous record. The easy workaround is to create a named record field and specify the name of the record field when creating a union constant, e.g.,

type

r :record

c:char;

d:dword;

endrecord;

u :union

b:byte;

x:r;

w:word;

endunion;

static

y :u := u.x:[ r:[ 'a', 5]];

You may declare a union constant and then assign data to the specific fields as you would a record constant. The following example provides some samples of this:

type

u_t :union

b:byte;

x:r;

w:word;

endunion;

val

u :u_t;

?u.b := 0;

?u.w := $1234;

The two assigments above are roughly equivalent to the following:

?u := u_t.b:[0];

and

?u := u_t.w:[$1234];

However, to use the straight assignment (the former example) you must first declare the value u as a u_t union.

To access a value of a union constant, you use the familiar "dot notation" from records and other languages, e.g.,

?x := u.b;

?y := u.w & $FF00;

Note, however, that you may only access the last field of the union into which you've stored some value. If you store data into one field and attempt to read the data from some other field of the union, HLA will report an error. Remember, you don't use union constants as a sneaky way to coerce one value's type to another (use the coercion functions for that purpose).

Pointer Constants

Note: see See Pointer Types for more details about HLA pointer types.

HLA allows a very limited form of a pointer constant. If you place an ampersand in front of a static object's name (i.e., the name of a static variable, readonly variable, uninitialized variable, segment variable, procedure, method, or iterator), HLA will compute the run-time offset of that variable. Pointer constants may not be used in abitrary constant expressions. You may only use pointer constants in expressions used to initialize static or readonly variables or as constant expressions in 80x86 instructions. The following example demonstrates how pointer constants can be used:

program pointerConstDemo;

static
t:int32;
pt: pointer to int32 := &t;

begin pointerConstDemo;

mov( &t, eax );

end pointerConstDemo;

Also note that HLA allows the use of the reserved word NULL anywhere a pointer constant is legal. HLA substitutes the value zero for NULL.

Constant Expressions in HLA

HLA provides a rich expression evaluator to process assembly-time expressions. HLA supports the following operators (sorting by decreasing precedence):

! (unary not), - (unary negation)

*, div, mod, /, <<, >>

+, -

=, = =, <>, !=, <=, >=, <, >

&, |, &, in

The following subsections describe each of these operators in detail.

Type Checking and Type Promotion

Many dyadic (two-operand) operators expect the types of their operands to be the same. Prior to actually performing such an operation, HLA evaluates the types of the operands and attempts to make them compatible. HLA uses a type algebra to determine if two (different) types are compatible; if they are not, HLA will report a type mismatch error during assembly. If the types are compatible, HLA will make them identical via type promotion. The type algebra describes how HLA promotes one type to another in order to make the two types compatible.

Usually, you can state a type algebra easily enough by providing "algebraic" type equations. For example, in high level languages one could use a statement like "r = r + i" to suggest that the type of the resulting sum is real when the left operand is real and the right operand is integer (around the "+" operator). Unfortunately, HLA supports so many different data types and operators that any attempt to describe the type algebra in this fashion will produce so many equations that it would be difficult for the reader to absorb it all. Therefore, this document will rely on an informal English description of the type algebra to explain how HLA operates.

First of all, if two operands have the same basic type, but are different sizes, HLA promotes the smaller object to the same size as the larger object. Basic types include the following sets: {uns8, uns16, uns32, uns64, uns128}, {int8, int16, int32, int64, int128}, {byte, word, dword, qword, lword}, and {real32, real64, real80}10. So if any two operands appear from one of these sets, then both operands are promoted to the larger of the two types.

If an unsigned and a signed operand appear around an operator, HLA produces a signed result. If the unsigned operand is smaller than the signed operand, HLA assigns both operands the signed type prior to the operation. If the unsigned and signed operands are the same size (or the unsigned operand is larger), HLA will first check the H.O. bit of the unsigned operand. If it is set, then HLA promotes the unsigned operand to the next larger signed type (e.g., uns16 becomes int32 ). If the resulting signed type is larger than the other operand's type, it gets promoted as well. This scheme fails if you've got an uns128 value whose H.O. bit is set. In that case, HLA promotes both operands to int128 and will produce incorrect results (since the uns128 value just went negative when it's really positive). Therefore, you should attempt to limit unsigned values to 127 bits if you're going to be mixing signed and unsigned operations in the same expression.

Any mixture of hexadecimal types (byte, word, dword, qword, or lword) and an unsigned type produces an unsigned type; the size of the resulting unsigned type will be the larger of the two types. Likewise, any mixture of hexadecimal types and signed integer types will produce a signed integer whose size is the larger of the two types. This "strengthening" of the type (hexadecimal types are "weaker" than signed or unsigned types) may seem counter-intuitive to a die-hard assembly programmer; however, making the result type hexadecimal rather than signed/unsigned can create problems if the result has the H.O. bit set since information about whether the result is signed or unsigned would be lost at that point.

Mixing unsigned values and a real32 value will produce a real32 result or an error. HLA produces an error if the unsigned value requires more than 24 bits to represent exactly (which is the largest unsigned value you may represent within the real32 format). Note that in addition to promoting the unsigned type to real32 , HLA will also convert the unsigned value to a real32 value (promoting the type is not the same thing as converting the value; e.g., promoting uns8 to uns16 simply involves changing the type designation of the uns8 object, HLA doesn't have to deal with the actual value at all since it keeps all values in an internal 128 bit format; however, the binary representation for unsigned and real32 values is completely different, so HLA must do the value conversion as well). Note that if you really want to convert a value that requires more than 24 bits of precision to a real32 object (with truncation), just convert the unsigned operand to real64 or real80 and then convert the larger operand to real32 using the real32(expr) compile-time function. Since unsigned values are, well, unsigned and real32 objects are signed, the conversion process always produces a non-negative value.

Mixing signed and real32 values in an expression produces a real32 result. Like unsigned operands, signed operands are limited to 24 bits of precision or HLA will report an error. Technically, you should get one more bit of precision from signed operands (since the real32 format maintains its sign apart from the mantissa), but HLA still limits you to 24 bits during this conversion. If the signed integer value is negative, so will be the real32 result.

If you mix hexadecimal and real32 types, HLA treats the hexadecimal type as an unsigned value of the same size. See the discussion of unsigned and real32 values earlier for the details.

If you mix an unsigned, signed, or hexadecimal type with a real64 type, the result is an error (if HLA cannot exactly represent the value in real64 format) or a real64 result. The conversion is very similar to the real32 conversion discussed above except you get 52 bits of integer precision rather than only 24 bits.

If you mix an unsigned, signed, or hexadecimal type with a real80 type, the result is an error (if HLA cannot exactly represent the value in real80 format) or a real80 result. The conversion is very similar to the real32 conversion discussed above except you get 64 bits of integer precision rather than only 24 bits. Note that conversion of integer objects 64-bits or less will always proceed without error; 128-bit values are the only ones that will get you into trouble.

If you mix a pair of different sized real values in the same expression, HLA will promote (and convert) the smaller real value to the same size as the larger real value.

The only non-numeric promotions that take place in an expression are between characters and strings. If a character and a string both appear in an expression, HLA will promote the character to a string of length one11.

HLA will report a type mismatch error if objects of any other types appear within an expression. Note that you may use the type-coercion compile-time functions to convert between types that HLA does not automatically support in an expression.

!expr

The expression must be either boolean or a number. For boolean values, not computes the standard logical not operation. Numerically, HLA inverts only the L.O. bit of boolean values and clears the remaining bits of the boolean value. Therefore, the result is always zero or one when NOTting a boolean value (even if the boolean object errantly contained other set bits prior to the "!" operation). Remember, the "!" operator only looks at the L.O. bit; if the value was originally non-zero but the L.O. bit was clear12, then "!" produces true. This is not a zero/not-zero operation.

For numbers, not computes the bitwise not operation on the bits of the number, that is, it inverts all the bits. The exact semantics of this operation depend upon the original data type of the value you're inverting. Therefore, the result of applying the "!" operator to an integer number may not always be intuitive because HLA always maintains 128-bits of precision, regardless of the underlying data type. Therefore, a full explanation of this operator's semantics must be given on a type-by-type basis.

uns8 : Bits 8..127 of an Uns8 object are always zero. Therefore, when you apply the "!" operator to an Uns8 value, the result can no longer be an Uns8 object since bits 8..127 will now contain ones. Zeroing out the H.O. bits is not wise, because you could be assigning the result of this expression to a larger data type and you may very well expect those bits to be set. Therefore, HLA converts the type of "!u8expr" to type byte (which does allow the H.O. bits to contain non-zero values). If you assign an object of type byte to a larger object (e.g., type word ), HLA will copy over the H.O. bits from the byte object to the larger object. Example:

val

u8 :uns8 := 1;

b8 := !u8; // produces $FFF..FFFE but registers as byte $FE.

w16 :word := b8; // produces $FF..FFFE but registers as word $FFFE.

Note: If you really want to chop the value off at eight bits, you can use the compile-time byte function to achieve this, e.g.,

val

u8 :uns8 := 1;

b8 := byte(!u8); // produces $FE.

w16 :word := b8; // produces $00FE.

uns16 : The semantics are similar to uns8 except, of course, applying "!" to an uns16 value produces a word value rather than a byte value. Again, the "!" operator will set bits 16..127 to one in the final result. If you want to ensure that the final result contains no set bits beyond bit #15, use the compile-time word function to strip the value down to 16 bits (just like the byte function in the example above).

uns32 : The semantics are similar to uns8 except, of course, applying "!" to an uns32 value produces a dword value rather than a byte value. Again, the "!" operator will set bits 32..127 to one in the final result. If you want to ensure that the final result contains no set bits beyond bit #31 use the compile-time dword function to strip the value down to 32 bits (just like the byte function in the example above).

uns64 : The semantics are similar to uns8 except, of course, applying "!" to an uns64 value produces a qword value rather than a byte value. Again, the "!" operator will set bits 64..127 to one in the final result. If you want to ensure that the final result contains no set bits beyond bit #63 use the compile-time qword function to strip the value down to 64 bits (just like the byte function in the example above).

uns128 : Applying the "!" operator to an uns128 object simply inverts all the bits. There are no funny semantics here. Resulting expression type is set to lword .

int8 : Same semantics as byte (see explanation below).

int16 : Same semantics as word (see explanation below).

int32 : Same semantics as dword (see explanation below).

int64 : Same semantics as qword (see explanation below).

int128 : Applying the "!" operator to an int128 object simply inverts all the bits. There are no funny semantics here. Resulting expression type is set to lword .

byte : Bits 8..127 of a byte ( int8 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..7 in the original value and returns this inverted result. Note that the type of the new value is always byte (even if the original subexpression was int8 ).

word : Bits 16..127 of a word ( int16 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..15 in the original value and returns this inverted result. Note that the type of the new value is always word (even if the original subexpression was int16 ).

dword : Bits 32..127 of a d word ( int32 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..31 in the original value and returns this inverted result. Note that the type of the new value is always d word (even if the original subexpression was int32 ).

qword : Bits 64..127 of a q word ( int64 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..63 in the original value and returns this inverted result. Note that the type of the new value is always q word (even if the original subexpression was int64 ).

lword : Applying the "!" operator to an lword object simply inverts all the bits. There are no funny semantics here..

No other types are legal with the "!" operator. HLA will report a type conflict error if you attempt to apply this operator to some other type.

If the operand is one of the integer types (signed, unsigned, hexadecimal), then HLA will set the type of the result to the smallest type within that class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values).

- expr (unary negation operator)

The expression must either be a numeric value or a character set. For numeric values, "-" negates the value. For character sets, the "-" operator computes the complement of the character set (that is, it returns all the characters not found in the set).

Again, the exact semantics depend upon the type of the expression you're negating. The following paragraphs explain exactly what this operator does to its expression. For all integer values (unsXX, intXX, byte, word, dword, qword, and lword), the negation operator always does a full 128-bit negation of the supplied operand. The difference between these different data types is how HLA sets the resulting type of the expressions (as explained in the paragraphs below).

uns8 : If the original value was in the range 128..255, then the resulting type is int16 , otherwise the resulting type is int8 . Since uns8 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns16 : If the original value was in the range 32678..65535, then the resulting type is int32 , otherwise the resulting type is int16 . Since uns16 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns32 : If the original value was in the range $8000_0000..$FFFF_FFFF, then the resulting type is int64 , otherwise the resulting type is int32 . Since uns32 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns64 : If the original value was in the range $8000_0000_0000_0000..$FFFF_FFFF_FFFF_FFFF, then the resulting type is int128 , otherwise the resulting type is int64 . Since uns64 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns128 : The result type is always set to int128 . Note that there is no check for overflow. Effectively, HLA treats uns128 operands as though they were int128 operands with respect to negation. So really large positive ( uns128 ) values become smaller unsigned values after the negation. If you need to mix and match 128-bit values in an expression, you should attempt to limit your unsigned values to 127 bits.

byte, int8,

word, int16,

dword, int32,

qword, int64,

lword,

int128: Negates the expression (full 128 bits) and assigns the original expression type to the result.

real32 : Negates the real32 value and returns a real32 result.

real64 : Negates the real64 value and returns a real64 result.

cset : Computes the set complement (returns cset type). The set complement is all the items that were not previously in the set. Since HLA uses a bitmap representation for character sets, the complement of a character set is the same thing as inverting all the bits in the powerset.

expr1 * expr2

For numeric operands, the "*" operator produces their product. For character set operands, the "*"operator produces the intersection of the two sets. The exact result depends upon the types of the two operands to the "*" operator. To begin with, HLA attempts to make the types of the two operands identical if they are not already identical. HLA achives this via type promotion (see the discussion earlier).

If the operands are unsigned or hexadecimal operands, HLA will compute their unsigned product. If the operands are signed, HLA computes their signed product. If the operands are real, HLA computes their real product. If the operands are integer (signed or unsigned) and less than (or equal to) 64 bits, HLA computes their exact result. If the operands are greater than 64 bits and their product would require more than 128 bits, HLA quietly overflows without error. Note that HLA always performs a 128-bit multiplication, regardless of the operands' sizes; however, objects that require 64 bits or less of precision will always produce a product that is 128 bits or less. HLA automatically extends the size of the result to the next greater size if the product will not fit into an integer that is the same size as the operands. HLA will actually choose the smallest possible size for the product (e.g., if the result only requires 16 bits of precision, the resulting type will be uns16, int16 , or word ). The resulting type is always unsigned if the operands were unsigned, signed if the operands were signed, and hexadecimal if the operands were hexadecimal.

If the operands are real operands, HLA computes their product and always produces a real80 result. If you want to produce a smaller result via the '*' operator, use the real32 or real64 compile-time function to produce the smaller result, e.g., " real32( r32const * r32const2 ) ". Note that all real arithmetic inside HLA is always performed using the FPU, hence the results are always real80 . Other than trying to simulate the actual products a running program would produce, there is no real reason to coerce the product to a smaller value.

If the operands are character set operands, the '*' operator computes the intersection of the two sets. Since HLA uses a bitmap representation for character sets, this operator does a bitwise logical AND of the two 16-byte operands (this operation is roughly equivalent to applying the "&" operator to two lword operands).

expr1 div expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. Supplying any other data type as an operand will produce an error. The div operator divides the first expression by the second and produces the truncated quotient result.

If the operands are unsigned, HLA will do a full 128/128 bit division and the resulting type will be unsigned (HLA sets the type to the smallest unsigned type that will completely hold the result). If the operands are signed, HLA will do a full 128/128 bit signed division and the resulting type will be the smallest intXX type that can hold the result. If the operands are hexadecimal values, HLA will do a full 128/128 bit unsigned division and set the resulting type to the smallest hex type that can hold the result.

Note that the div operator does not allow real operands. Use the "/" operator for real division.

HLA will set the type of the result to the smallest type within its class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values).

expr1 mod expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. The mod operator divides the first expression by the second and produces their remainder (this value is always positive).

If the operands are unsigned, HLA will do a full 128/128 bit division and return the remainder. The resulting type will be unsigned (HLA sets the type to the smallest unsigned type that will completely hold the result).

If the operands are signed, HLA will do a full 128/128 bit signed division and return the remainder. The resulting type will be the smallest intXX type that can hold the result.

If the operands are hexadecimal values, HLA will do a full 128/128 bit unsigned division and set the resulting type to the smallest hex type that can hold the result.

Note that the mod operator does not allow real operands. You'll have to define real modulus and write the expression yourself if you need the remainder from a real division.

expr1 / expr2

The two expressions must be numeric. The '/' operator divides the first expression by the second and produces their (real80) quotient result.

If the operands are integers (unsigned, signed, or hexadecimal) or the operands are real32 or real80 , HLA first converts them to real80 before doing the division operation. The expression result is always real80 .

expr1 << expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. The second operand must be a small (32-bit or less) non-negative value in the range 0..128. The << operator shifts the first expression to the left the number of bits specified by the second expression. Note that the result may require more bits to hold than the original type of the left operand. Any bits shifted out of bit position 127 are lost.

HLA will set the type of the result to the smallest type within the left operan's class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values). Note that the '<<' operator can yield a smaller type (specifcally, an eight bit type) if it shifts all the bits off the H.O. end of the number; generally, though, this operation produces larger result types than the left operand.

expr1 >> expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. The second operand must be a small (32-bit or less) non-negative value in the range 0..128. The >> operator shifts the first expression to the right the number of bits specified by the second expression. Any bits shifted out of the L.O. bit are lost. Note that this shift is a logical shift right, not an arithmetic shift right (this is true even if the left operand is an INTxx value). Therefore, this operation always shifts a zero into bit position 127.

Shift rights may produce a smaller type that the value of the left operand. HLA will always set the type of the result value to the minimum type size that has the same base class as the left operand.

expr1 + expr2

If the two expressions are numeric, the "+" operator produces their sum.

If the two expressions are strings or characters, the "+" operator produces a new string by concatenating the right expression to the end of the left expression.

If the two operands are character sets, the "+" operator produces their union.

If the operands are integer values (signed, unsigned, or hexadecimal), then HLA adds them together. Any overflow out of bit #127 (unsigned or hexadecimal) or bit #126 (signed) is quietly lost. HLA sets the type of the result to the smallest type size that will hold the sum; the type class (signed, unsigned, hexadecimal) will be the same as the operands. Note that it is possible for the type size to grow or shrink depending on the values of the operands (e.g., adding a positive and negative number could reduce the type size, adding two positive or two negative numbers may expand the result type's size).

When adding two real values (or a real and an integer value), HLA always produces a real80 result.

Since HLA uses a bitmap to represent character sets, taking the union of two character sets is the same as doing a bitwise logical OR of all 16 bytes in the character set.

expr1 - expr2

If the two expressions are numeric, the "-" operator produces their difference.

If the two expressions are character sets, the "-" operator produces their set difference (that is, all the characters in expr1 that are not also in expr2).

If the operands are integer values (signed, unsigned, or hexadecimal), then HLA subtracts the right operand from the left operand. Any overflow out of bit #127 (unsigned or hexadecimal) or bit #126 (signed) is quietly lost. HLA sets the type of the result to the smallest type size that will hold their difference; the type class (signed, unsigned, hexadecimal) will be the same as the operands. Note that it is possible for the type size to grow or shrink depending on the values of the operands (e.g., subtracting two negative or non-negative numbers could reduce the type size, subtracting a negative value from a non-negative value may expand the result type's size).

When subtracting two real values (or a real and an integer value), HLA always produces a real80 result.

Since HLA uses a bitmap to represent character sets, taking the set of two character sets is the same as doing a bitwise logical AND of the left operand with the inverse of the right operand.

Comparisons (=, ==, <>, !=, <, <=, >, and >=)

expr1 = expr2

expr1 == expr2

expr1 <> expr2

expr1 != expr2

expr1 < expr2

expr1 <= expr2

expr1 > expr2

expr1 >= expr2

(note: "!=" and "<>" operators are identical. "=" and "==" operators are identical.)

The two expressions must be compatible (described earlier). These operators compare the two operands and return true or false depending upon the result of the comparison.

You may use the "=" and "<>" operators to compare two pointer constants (e.g., "&abc" or "&ptrVar"). The other operators do not allow pointer constant operands.

All the above operators allow you to compare boolean values, enumerated values (types must match), integer (signed, unsigned, hexadecimal) values, character values, string values, real values, and character set values.

When comparing boolean values, note that false < true .

One character set is less than another if it is a proper subset of the other. A character set is less than or equal to another set if it is a subset of that second set. Likewise, one character set is greater than, or greater than or equal to, another set if it is a proper superset, or a superset, respectively.

As with any programming language, you should take care when comparing two real values (especially for equality or inequality) as minor precision drifts can cause the comparison to fail.

expr1 & expr2

(note: "&&" and "&" mean different things to HLA. See the section on high level language control structures for details on the "&&" operator.)

The operands must both be boolean or they must both be numbers. With boolean operands the and operator produces the logical and of the two operands (boolean result). With number operands, the and operator produces the bitwise logical AND of the operands.

expr1 in expr2

The first expression must be a character value. The second expression must be a character set. The in operator returns true if the character is a member of the specified character set; it returns false otherwise.

expr1 | expr2

(note: "||" and "|" mean different things to HLA. See the section on high level language control structures for details on the "||" operator.)

The operands must both be boolean or they must both be numbers. With boolean operands the or operator produces the logical or of the two operands (boolean result). With number operands, the or operator produces the bitwise or of the operands.

expr1 ^ expr2

The operands must both be boolean or they must both be numbers. With boolean operands the xor operator produces the logical exclusive-or of the two operands (boolean result). With number operands, the xor operator produces the bitwise exclusive-or of the operands.

( expr )

You may override the precedence of any operator(s) using parentheses in HLA constant expressions.

[ comma_separated_list_of_expressions ]

This produces an array expression. The type of the expression is an array type whose base element is the type of one of the expressions in the list. If there are two or more constant types in the array expression, HLA promotes the type of the array expression following the rules for mixed-mode arithmetic (see the rules earlier in this document).

record_type_name : [ comma_separated_list_of_field_expressions ]

This produces a record expression. The expressions appearing within the brackets must match the respective fields of the specified record type. See the discussion earlier in this chapter.

identifier

An identifier is a legal component of a constant expression if the identifier's classification is CONST or VAL (that is, the identifier was declared in a constant or value section of the program). The expression evaluator substitutes the current declared value and type of the symbol within the expression. Constant expressions allow the following types:

Boolean, enumerated types, Uns8, Uns16, Uns32, Uns64, Uns128 Byte, Word, DWord, QWord, LWord, Int8, Int16, Int32, Int64, Int128, Char, Real32, Real64, Real80, String, and Cset.

You may also specify arrays whose element base type is one of the above types (or a record or union subject to the following restriction). Likewise, you can specify record or union constants if all of their respective fields are one of the above primitive types or a value array, record, or union constant.

HLA allows array, record, and union constants. If you specify the name of an array, for example, HLA works with all the values of that array. Likewise, HLA can copy all the values of a record or union with a single statement.

identifier1.identifier2 {...}

Selects a field from a record or union constant. Identifier1 must be a record or union object defined in a const or val section. Identifier2 (and any following dot-identifiers) must be a field of the record or union. HLA replaces this object with the value of the specified field.

Examples:

recval.fieldval

recval.subrecval.fieldval

Don't forget that with union constant, you may only access the last field into which you've actually stored data (see the section on union constants for more details).

identifier [ index_list ]

Identifier must be an array constant defined in either a const or val section. Index_list is a list of constant expressions separated by commas. The index list selects a specified element of the "identifier" array. HLA reports an error if you supply more indices than the array has dimensions. HLA returns an array slice if you specify fewer indices than the array has dimensions (for example, if an array is declared as "a:uns8[4,4]" and you specify "a[2]" in a constant expression, HLA returns the third row of the array (a[2,0]..a[2,3]) as the value of this term).

Examples:

arrayval[0]

aval[1,4,0]

Program Structure

An HLA program uses the following general syntax:

program identifier ;

declarations

begin identifier;

statements

end identifier;

The three identifiers above must all match. The declaration section (declarations) consists of label, type, const, val, var, static, uninitialized, readonly, segment, procedure, and macro definitions (all described later). Any number of these sections may appear and they may appear in any order; more than one of each section may appear in the declaration section.

Example:

program TestPgm;
type
integer: int16;
const
i0 : integer := 0;
var
i:integer;

begin TestPgm;

mov( i0, i );

end TestPgm;

If you wish to write a library module that contains only procedures and no main program, you would use an HLA unit. Units have a syntax that is nearly identical to programs, there just isn't a begin associated with the unit, e.g.,

unit TestPgm;

procedure LibraryRoutine;
begin LibraryRoutine;
<< etc. >>
end LibraryRoutine;

end TestPgm;

Procedure Declarations

Procedure declarations are nearly identical to program declarations with two major differences: procedures are declared using the "procedure" reserved word and procedures may have parameters. The general syntax is:

procedure identifier ( optional_parameter_list ); procedure_options

declarations

begin identifier;

statements

end identifier;

Note that you may declare procedures inside other procedure in a fashion analogous to most block-structured languages (e.g., Pascal).

The optional parameter list consists of a list of var-type declarations taking the form:

optional_access_keyword identifier1 : identifier2 optional_in_reg

optional_access_keyword, if present, must be val, var, valres, result, name, or lazy and defines the parameter passing mechanism (pass by value, pass by reference, pass by value/result [or value/returned], pass by result, pass by name, or pass by lazy evaluation, respectively). The default is pass by value (val) if an access keyword is not present. For pass by value parameters, HLA allocates the specified number of bytes according to the size of that object in the activation record. For pass by reference, pass by value/result, and pass by result, HLA allocates four bytes to hold a pointer to the object. For pass by name and pass by lazy evaluation, HLA allocates eight bytes to hold a pointer to the associated thunk and a pointer to the thunk's execution environment (see the sections on parameters and thunks for more details).

The optional_in_reg clause, if present, corresponds to the phrase "in reg" where reg is one of the 80x86's general purpose 8-, 16-, or 32-bit registers. You must take care when passing parameters through the registers as the parameter names become aliases for registers and this can create confusion when reading the code later (especially if, within a procedure with a register parameter, you call another procedure that uses that same register as a parameter).

HLA also allows a special parameter of the form:

var identifer : var

This creates an untyped reference parameter. You may specify any memory variable as the corresponding actual parameter and HLA will compute the address of that object and pass it on to the procedure without further type checking. Within the procedure, the parameter is given the DWORD type.

The procedure_options component above is a list of keywords that specify how HLA emits code for the procedure. There are several different procedure options available: @noalignstack, @alignstack, @pascal, @stdcall, @cdecl, @align ( int_const), @use reg32 , @leave, @noleave, @enter, @noenter, and @returns ("text").

: Procedure Options
Option	Description
@noframe, @frame	By default, HLA emits code at the beginning of the procedure to construct a stack frame. The @noframe option disables this action ( noframe is depreciated, you should always use @noframe ). The @ frame option tells HLA to emit code for a particular procedure if stack frame generation is off by default. See the description of #frame and #noframe for details on controlling the default frame generation. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @frame to true (or @noframe to false) turns on frame generation by default; setting @frame to false (or @noframe to true) turns off frame generation.
@nodisplay, @display	By default, HLA emits code at the beginning of the procedure to construct a display within the frame. The @nodisplay option disable this action (@ nodisplay is depreciated, you should use @nodisplay ). The @ display option tells HLA to emit code to generate a display for a particular procedure if display generation is off by default. Note that HLA does not emit code to construct the display if '@ noframe ' is in effect, though it will assume that the programmer will construct this display themselves. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @display to true (or @nodisplay to false) turns on display generation by default; setting @display to false (or @nodisplay to true) turns off display generation.
@noalignstack, @alignstack	By default (assuming frame generation is active), HLA will an instruction to align ESP on a four-byte boundary after allocating local variables. Win32, Linux, and other 32-bit OSes require the stack to be dword-aligned (hence this option). If you know the stack will be dword-aligned, you can eliminate this extra instruction by specifying the @noalignstack option. Conversely, you can force the generation of this instruction by specifying the @ alignstack procedure option. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @alignstack to true (or @noalignstack to false) turns on stack alignment generation by default; setting @alignstack to false (or @noalignstack to true) turns off stack alignment code generation.
@pascal, @cdecl, @stdcall	These options give you the ability to specify the parameter passing mechanism for the procedure. By default, HLA uses the @ pascal calling sequence. This calling sequence pushes the parameters on the stack in a left-to-right order (i.e., in the order they appear in the parameter list). The @cdecl procedure option tells HLA to pass the parameters from right-to-left so that the first parameter appears at the lowest address in memory and that it is the user's responsibility to remove the parameters from the stack. The @stdcalll procedure option is a hybrid of the @ pascal and @ cdecl calling conventions. It pushes the parameters in the right-to-left order (just like @ cdecl ) but @ stdcall procedures automatically remove their parameter data from the stack (just like @ pascal ). Win32 API calls use the @ stdcall calling convention.
@align( int_constant )	The @ align ( int_const ) procedure option aligns the procedure on a 1, 2, 4, 8, or 16 byte boundary. Specify the boundary you desire as the parameter to this option. The default is @align(1) , which is unaligned; HLA also uses this special identifiers as a compile-time variable to set the default procedure alignment . Setting @align := 1 turns off procedure alignment while supplying some other value (which must be a power of two) sets the default procedure alignment to the specified number of bytes.
@use reg32	When passing parameters, HLA can sometimes generate better code if it has a 32-bit general purpose register for use as a scratchpad register. By default, HLA never modifies the value of a register behind your back; so it will often generate less than optimal code when passing certain parameters on the stack. By using the @use procedure option, you can specify one of the following 32-bit registers for use by HLA: eax, ebx, ecx, edx, esi , or edi . By providing one of these registers, HLA may be able to generate significantly better code when passing certain parameters.
@returns( "text" )	This option specifies the compile-time return value whenever a function name appears as an instruction operand. For example, suppose you are writing a function that returns its result in EAX. You should probably specify a "returns" value of "EAX" so you can compose that procedure just like any other HLA machine instruction (see the example below and the section on machine instructions for more details).
@leave, @noleave	These two options control the code generation for the standard exit sequence. If you specify the @leave option then HLA emits the x86 LEAVE instruction to clean up the activation record before the procedure returns. If you specify the @noleave option, then HLA emits the primitive instructions to achieve this, e.g., mov( ebp, esp ); pop( ebp ); The manual sequence is faster on some architectures, the LEAVE instruction is always shorter. Note that @noleave occurs by default if you've specified @noframe or #noframe . By default, HLA assumes @noleave but you may change the default using these special identifiers as a compile-time variable to set the default LEAVE generation for all procedures. Setting @leave to true (or @noleave to false) turns on LEAVE generation by default; setting @leave to false (or @noleave to true) turns off the use of the LEAVE instruction.
@enter, @noenter	These two options control the code generation for a procedure's standard entry sequence. If you specify the @enter option then HLA emits the x86 ENTER instruction to create the activation record. If you specify the @noenter option, then HLA emits the primitive instructions to achieve this. The manual sequence is always faster, using the ENTER instruction is usually shorter. Note that @noenter occurs by default if you've specified @noframe or #noframe . By default, HLA assumes @noenter but you may change the default using these special identifiers as a compile-time variable to set the default ENTER generation for all procedures. Setting @enter to true (or @noenter to false) turns on ENTER generation by default; setting @enter to false (or @noenter to true) turns off the use of the ENTER instruction.

The following example demonstrates how the @ returns option works:

program returnsDemo;
#include( "stdio.hhf" );

procedure eax0; @returns( "eax" );
begin eax0;

mov( 0, eax );

end eax0;

begin returnsDemo;

mov( eax0(), ebx );
stdout.put( "ebx=", ebx, nl );

end returnsDemo;

To help those who insist on constructing the activation record themselves, HLA declares two local constants within each procedure: _vars_ and _parms_ . The _vars_ symbol is an integer constant that specifies the number of local variables declared in the procedure. This constant is useful when allocating storage for your local variables. The _parms_ constants specifies the number of bytes of parameters. You would normally supply this constant as the parameter to a ret() instruction to automatically clean up the procedure's parameters when it returns.

If you do not specify @ nodisplay , then HLA defines a run-time variable named _display_ that is an array of pointers to activation records. For more details on the _display_ variable, see the section on lexical scope.

You can also declare @external procedures (procedures defined in other HLA units or written in languages other than HLA) using the following syntaxes:

procedure externProc1 (optional parameters) ; @returns( "text" ); @external;

procedure externProc2 (optional parameters) ;

@returns( "text" ); @external( "external_name" );

As with normal procedure declarations, the parameter list and @ returns clause are optional.

The first form is generally used for HLA-written functions. HLA will use the procedure's name (externProc1 in this case) as external name.

The second form lets you refer to the procedure by one name ( externProc2 in this case) within your HLA program and by a different name ("different_name" in this example) in the MASM generated code. This second form has two main uses: (1) if you choose an external procedure name that just happens to be a MASM reserved word, the program may compile correctly but fail to assemble. Changing the external name to something else solves this problem. (2) When calling procedures written in external languages you may need to specify characters that are not legal in HLA identifiers. For example, Win32 API calls often use names like "WriteFile@24" containing illegal (in HLA) identifier symbols. The string operand to the external option lets you specify any name you choose. Of course, it is your responsibility to see to it that you use identifiers that are compatible with the linker and MASM, HLA doesn't check these names.

By default, HLA does the following:

Creates a display for every procedure.
Emits code to construct the stack frame for each procedure.
Emits code to align ESP on a four-byte boundary upon procedure entry.
HLA assumes that it cannot modify any register values when passing (non-register) parameters.
The first instruction of the procedure is unaligned.

These options are the most general and "safest" for beginning assembly language programmers. However, the code HLA generates for this general case may not be as compact or as fast as is possible in a specific case. For example, few procedures will actually need a display data structure built upon procedure activation. Therefore, the code that HLA emits to build the display can reduce the efficiency of the program. Advanced programmers, of course, can use procedure options like "@nodisplay" to tell HLA to skip the generation of this code. However, if a program contains many procedures and none of them need a display, continually adding the "@nodisplay" option can get really old. Therefore, HLA allows you to treat these directives as "pseudo-compile-time-variables" to control the default code generation. E.g.,

? @display := true; // Turns on default display generation.

? @display := false; // Turns off default display generation.

? @nodisplay := true; // Turns off default display generation.

? @nodisplay := false; // Turns on default display generation.

? @frame := true; // Turns on default frame generation.

? @frame := false; // Turns off default frame generation.

? @noframe := true; // Turns off default frame generation.

? @noframe := false; // Turns on default frame generation.

? @alignstack := true; // Turns on default stk alignment code generation.

? @alignstack := false; // Turns off default stk alignment code generation.

? @noalignstack := true; // Turns off default stk alignment code generation.

? @noalignstack := false; // Turns on default stk alignment code generation.

? @enter := true; // Turns on default ENTER code generation.

? @enter := false; // Turns off default ENTER code generation.

? @noenter := true; // Turns off default ENTER code generation.

? @noenter := false; // Turns on default ENTER code generation.

? @leave := true; // Turns on default LEAVE code generation.

? @leave := false; // Turns off default LEAVE code generation.

? @noleave := true; // Turns off default LEAVE code generation.

? @noleave := false; // Turns on default LEAVE code generation.

?@align := 1; // Turns off procedure alignment (align on byte boundary).

?@align := int_expr; // Sets alignment, must be a power of two.

These directives may appear anywhere in the source file. They set the internal HLA default values and all procedure declarations following one of these assignments (up to the next, corresponding assignment) use the specified code generation option(s). Note that you can override these defaults by using the corresponding procedure options mentioned earlier.

Disabling HLA's Automatic Code Generation for Procedures

Before jumping in and describing how to use the high level HLA features for procedures, the best place to start is with a discussion of how to disable these features and write "plain old fashioned" assembly language code. This discussion is important because procedures are the one place where HLA automatically generates a lot of code for you and many assembly language programmers prefer to control their own destinies; they don't want the compiler to generate any excess code for them. So disabling HLA's automatic code generation capabilities is a good place to start.

By default, HLA automatically emits code at the beginning of each procedure to do five things: (1) Preserve the pointer to the previous activation record (EBP); (2) build a display in the current activation record; (3) allocate storage for local variables; (4) load EBP with the base address of the current activation record; (5) adjust the stack pointer (downwards) so that it points at a dword-aligned address.

When you return from a procedure, by default HLA will deallocate the local storage and return, removing any parameters from the stack.

To understand the code that HLA emits, consider the following simple procedure:

procedure p( j:int32 );

var

i:int32;

begin p;

end p;

Here is a dump of the symbol table that HLA creates for procedure p:

p <0,proc>:Procedure type (ID=?1_p)

--------------------------------

_vars_ <1,cons>:uns32, (4 bytes) =4

i <1,var >:int32, (4 bytes, ofs:-12)

_parms_ <1,cons>:uns32, (4 bytes) =4

_display_ <1,var >:dword, (8 bytes, ofs:-4)

j <1,valp>:int32, (4 bytes, ofs:8)

p <1,proc>:

------------------------------------

The important thing to note here is that local variable " i " is at offset -12 and HLA automatically created an eight-bit local variable named " _display_ " which is at offset -4.

HLA emits the following code for the procedure above (annotations in italics are not emitted by HLA, this output is subject to changes in HLA code generation algorithms):

?1_p proc near32

push ebp ;Dynamic link (pointer to previous activation record)

pushd [ebp-04] ;Display for lex level 0

lea ebp,[esp+04] ;Get frame ptr (point EBP at current activation record)

pushd ebp ;Ptr to this proc's A.R. (part of display construction)

sub esp, 4 ;Local storage.

and esp, 0fffffffch ;dword-align stack

; Exit point for the procedure:

?x?1_p:

mov esp, ebp ;Deallocate local variables.

pop ebp ;Restore pointer to previous activation record.

ret 4 ;Return, popping parameters from the stack.

?1_p endp

Building the display data structure is not very common in standard assembly language programs. This is only necessary if you are using nested procedures and those nested procedures need to access non-local variables. Since this is a rare situation, many programmers will immediately want to tell HLA to stop emitting the code to generate the display. This is easily accomplished by adding the "@ nodisplay " procedure option to the procedure declaration. Adding this option to procedure p produces the following:

procedure p( j:int32 ); @nodisplay;

var

i:int32;

begin p;

end p;

Compiling this procedures the following symbol table dump:

p <0,proc>:Procedure type (ID=?1_p)

--------------------------------

_vars_ <1,cons>:uns32, (4 bytes) =4

i <1,var >:int32, (4 bytes, ofs:-4)

_parms_ <1,cons>:uns32, (4 bytes) =4

j <1,valp>:int32, (4 bytes, ofs:8)

p <1,proc>:

------------------------------------

Note that the _display_ variable is gone and the local variable i is now at offset -4. Here is the code that HLA emits for this new version of the procedure:

?1_p proc near32

push ebp ;Save ptr to previous activation record.

mov ebp, esp ;Point EBP at current activation record.

sub esp,4 ;Local storage.

and esp, 0fffffffch ;Align stack on dword boundary.

; Exit point for the procedure:

?x?1_p:

mov esp, ebp ;Deallocate local variables.

pop ebp ;Restore pointer to previous activation record.

ret 4 ;Return, and remove parameters from stack.

?1_p endp

As you can see, this code is smaller and a bit less complex. Unlike the code that built the display, it is fairly common for an assembly language programmer to construct an activation record in a manner similar to this. Indeed, about the only instruction out of the ordinary above is the "AND" instruction that dword-aligns the stack (OS calls require the stack to be dword-aligned, and the system performance is much better if the stack is dword aligned).

This code is still relatively inefficient if you don't pass parameters on the stack and you don't use automatic (non-static, local) variables. Many assembly language programmers pass their few parameters in machine registers and also maintain local values in the registers. If this is the case, then the code above is pure overhead. You can inform HLA that you wish to take full responsibility for the entry and exit code by using the "@ noframe " procedure option. Consider the following version of p :

procedure p( j:int32 ); @nodisplay; @noframe;

var

i:int32;

begin p;

end p;

(this produces the same symbol table dump as the previous example).

HLA emits the following code for this version of p:

?1_p proc near32

?1_p endp

Whoa! There's nothing there! But this is exactly what the advanced assembly language programmer wants. With both the @ nodisplay and @ noframe options, HLA does not emit any extra code for you. You would have to write this code yourself.

By the way, you can specify the @ noframe option without specifying the @ nodisplay option. HLA still generates no extra code, but it will assume that you are allocating storage for the display in the code you write. That is, there will be an eight-byte _display_ variable created and i will have an offset of -12 in the activation record. It will be your responsibility to deal with this. Although this situation is possible, it's doubtful this combination will be used much at all.

Note a major difference between the two versions of p when @ noframe is not specified and @ noframe is specified: if @ noframe is not present, HLA automatically emits code to return from the procedure. This code executes if control falls through to the "end p;" statement at the end of the procedure. Therefore, if you specify the @ noframe option, you must ensure that the last statement in the procedure is a RET() instruction or some other instruction that causes an unconditional transfer of control. If you do not do this, then control will fall through to the beginning of the next procedure in memory, probably with disasterous results.

The RET() instruction presents a special problem. It is dangerous to use this instruction to return from a procedure that does not have the @ noframe option. Remember, HLA has emitted code that pushes a lot of data onto the stack. If you return from such a procedure without first removing this data from the stack, your program will probably crash. The correct way to return from a procedure without the @ noframe option is to jump to the bottom of the procedure and run off the end of it. Rather than require you to explicitly put a label into your program and jump to this label, HLA provides the "exit procname;" instruction. HLA compiles the EXIT instruction into a JMP that transfers control to the clean-up code HLA emits at the bottom of the procedure. Consider the following modification of p and the resulting assembly code produced:

procedure p( j:int32 ); @nodisplay;

var

i:int32;

begin p;

exit p;

nop();

end p;

?2_p proc near32

push ebp

mov ebp, esp

sub esp, 4 ;Local storage.

and esp, 0fffffffch

jmp ?x?2_p ;p

nop

?x?2_p:

mov esp, ebp

pop ebp

ret 4

?2_p endp

As you can see, HLA automatically emits a label to the assembly output file ("? x?2_p " in this instance) at the bottom of the procedure where the clean-up code starts. HLA translates the "exit p;" instruction into a jmp to this label.

If you look back at the code emitted for the version of p with the @ noframe option, you'll note that HLA did not emit a label at the bottom of the procedure. Therefore, HLA cannot generate a jump to this nonexistent label, so you cannot use the exit statement in a procedure with the @ noframe option (HLA will generate an error if you attempt this).

Of course, HLA will not stop you from putting a RET() instruction into a procedure without the @ noframe option (some people who know exactly what they are doing might actually want to do this). Keep in mind, if you decide to do this, that you must deallocate the local variables (that's what the "mov esp, ebp" instruction is doing), you need to restore EBP (via the "pop ebp" instruction above), and you need to deallocate any parameters pushed on the stack (the "ret 4" handles this in the example above). The following code demonstrates this:

procedure p( j:int32 ); @nodisplay;

var

i:int32;

begin p;

if( j = 0 ) then

// Deallocate locals.

mov( ebp, esp );

// Restore old EBP

pop( ebp );

// Return and pop parameters

ret( 4 );

endif;

nop();

end p;

?1_p proc near32

push ebp

mov ebp, esp

sub esp, 4 ;Local storage.

and esp, 0fffffffch

cmp dword ptr [ebp+8], 0

jne ?2_false

mov esp, ebp

pop ebp

ret 4

?2_false:

nop

?x?1_p:

mov esp, ebp

pop ebp

ret 4

?1_p endp

If "real" assembly language programmers would generally specify both the @ noframe and @ nodisplay options, why not make them the default case (and use "@frame" and "@display" options to specify the generation of the activation record and display)? Well, keep in mind that HLA was originally designed as a tool to teach assembly language programming to beginning students. Those students have considerable difficulty comprehending concepts like activation records and displays. Having HLA generate the stack frame code and display generation code automatically saves the instructor from having to teach (and explain) this code. Even if the student never uses a display, it doesn't make the program incorrect to go ahead and generate it. The only real cost is a little extra memory and a little extra execution time. This is not a problem for beginning students who haven't yet learned to write efficient code. Therefore, HLA was optimized for the beginning at the expense of the advanced programmer. It is also worthwhile to point out that the behavior of the EXIT statement depends upon displays if you attempt to exit from a nested procedure; yet another reason for HLA's default behavior. Of course, you can always override HLA's default behavior by using the #nodisplay and #noframe directives.

If you are absolutely certain that your stack pointer is aligned on a four-byte boundary upon entry into a procedure, you can tell HLA to skip emitting the AND( $FFFF_FFFC, ESP ); instruction by specifying the @ noalignstack procedure option. Note that specifying @ noframe also specifies @ noalignstack .

Procedure Calls and Parameters in HLA

HLA's high level support consists of three main features: HLL-like declarations, the HLL statements (IF, WHILE, etc), and HLA's support for procedure calls and parameter passing. This section discusses the syntax for procedure declarations and how HLA generates code to automatically pass parameters to a procedure.

The syntax for HLA procedure declarations was touched on earlier; however, it's probably a good idea to review the syntax as well as describe some options that previous sections ignored. There are several procedure declaration forms, the following examples demonstrate them all13:

// Standard procedure declaration:

procedure procname (opt_parms); proc_options

begin procname;

<< procedure body >>

end procname;

// External procedure declarations:

procedure extname (opt_parms); proc_options @external;

procedure extname (opt_parms); proc_options @external( "name");

// Forward procedure declarations:

procedure fwdname (opt_parms); proc_options @forward;

Opt_parms indicates that the parameter list is optional; the parentheses are not present if there are no parameters present.

Proc_options is any combination (zero or more) of the following procedure options (see the discussion earlier for these options):

@noframe;

@nodisplay;

@noalignstack;

@pascal;

@cdecl;

@stdcall;

@align( expression );

@returns( "string" );

The @external reserved word tells HLA that the specified procedure does not appear in the current compilation, but is present in a different source file that will be compiled separately. Note that the presence of an external declaration doesn't require that the procedure appear in a separate source file. If the actual procedure appears in the same compilation unit as the external declaration, HLA treats the external declaration as a forward declaration (see the next paragraph for details on forward declarations). External procedure declarations have been discussed earlier, see the appropriate section(s) for additional details.

The @forward declaration syntax is necessary because HLA requires all procedure symbols to be declared before they are used. In a few rare cases (where mutual recursion occurs between two or more procedures), it may be impossible to write your code such that every procedure is declared before the first call to the code. More commonly, sorting your procedures to ensure that all procedures are written before their first call may force an artificial organization on the source file, making it harder to read. The forward procedure declaration handles this situation for you. It lets you create a procedure prototype that describes how the procedure is to be called without actually specifying the procedure body. Later on in the source file, the full procedure declaration must appear.

Note: an external declaration also serves as a forward declaration. So if you have an external definition at the beginning of your program (perhaps it appears in an include file), you do not need to provide a forward declaration as well.

Calling HLA Procedures

There are two standard ways to call an HLA procedure: use the call instruction or simply specify the name of the procedure as an HLA statement. Both mechanisms have their plusses and minuses.

To call an HLA procedure using the call instruction is exceedingly easy. Simply use either of the following syntaxes:

call( procName );

call procName;

Either form compiles into an 80x86 call instruction that calls the specified procedure. The difference between the two is that the first form (with the parentheses) returns the procedure's "returns" value, so this form can appear as an operand to another instruction. The second form above always returns the empty string, so it is not suitable as an operand of another instruction. Also, note that the second form requires a statement or procedure label, you may not use memory addressing modes in this form; on the other hand, the second form is the only form that lets you "call" a statement label (as opposed to a procedure label); this form is useful on ocassion.

If you use the call statement to call a procedure, then you are responsible for passing any parameters to that procedure. In particular, if the parameters are passed on the stack, you are responsible for pushing those parameters (in the correct order) onto the stack before the call. This is a lot more work than letting HLA push the parameters for you, but in certain cases you can write more efficient code by pushing the parameters yourself.

The second way to call an HLA procedure is to simply specify the procedure name and a list of actual parameters (if needed) for the call. This method has the advantage of being easy and convenient at the expense of a possible slight loss in effiency and flexibility. This calling method should also prove familiar to most HLL programmers. As an example, consider the following HLA program:

program parameterDemo;

#include( "stdio.hhf" );

procedure PrtAplusB( a:int32; b:int32 ); @nodisplay;

begin PrtAplusB;

mov( a, eax );

add( b, eax );

stdout.put( "a+b=", (type int32 eax ), nl );

end PrtAplusB;

static

v1:int32 := 25;

v2:int32 := 5;

begin parameterDemo;

PrtAplusB( 1, 2 );

PrtAplusB( -7, 12 );

PrtAplusB( v1, v2 );

mov( -77, eax );

mov( 55, ebx );

PrtAplusB( eax, ebx );

end parameterDemo;

This program produces the following output:

a+b=3

a+b=5

a+b=30

a+b=-22

As you can see, call PrtAplusB in HLA is very similar to calling procedures (and passing parameters) in a high level language like C/C++ or Pascal. There are, however, some key differences between and HLA call and a HLL procedure call. The next section will cover those differences in greater detail. The important thing to note here is that if you choose to call a procedure using the HLL syntax (that is, the second method above), you will have to pass the parameters in the parameter list and let HLA push the parameters for you. If you want to take complete control over the parameter passing code, you should use the call instruction.

Parameter Passing in HLA, Value Parameters

The previous section probably gave you the impression that passing parameters to a procedure in HLA is nearly identical to passing those same parameters to a procedure in a high level language. The truth is, the examples in the previous section were rigged. There are actually many restrictions on how you can pass parameters to an HLA procedure. This section discusses the parameter passing mechanism in detail.

The most important restriction on actual parameters in a call to an HLA procedure is that HLA only allows memory variables, registers, constants, and certain other special items as parameters. In particular, you cannot specify an arithmetic expression that requires computation at run-time (although a constant expression, computable at compile time is okay). The bottom line is this: if you need to pass the value of an expression to a procedure, you must compute that value prior to calling the procedure and pass the result of the computation; HLA will not automatically generate the code to compute that expression for you.

The second point to mention here is that HLA is a strongly typed language when it comes to passing parameters. This means that with only a few exceptions, the type of the actual parameter must exactly match the type of the formal parameter. If the actual parameter is an int8 object, the formal parameter had better not be an int32 object or HLA will generate an error. The only exceptions to this rule are the byte, word, and dword types. If a formal parameter is of type byte, the corresponding actual parameter may be any one-byte data object. If a formal parameter is a word object, the corresponding actual parameter can be any two-byte object. Likewise, if a formal parameter is a dword object, the actual parameter can be any four-byte data type. Conversely, if the actual parameter is a byte, word, or dword object, it can be passed without error to any one, two, or four-byte actual parameter (respectively). Programmers who are really lazy make all their parameters bytes, words, or dwords (at least, whereever possible). Programmers who care about the quality of their code use untyped parameters cautiously.

If you want to use the high level calling sequence, but you don't like the inefficient code HLA sometimes produces when generating code to pass your parameters, you can always use the #{...}# sequence parameter to override HLA's code generation and substitute your own code for one or two parameters. Of course, it doesn't make any sense to pass all the parameters is a procedure using this trick, it would be far easier just to use the call instruction. Example:

PrtAplusB

(

mov( i, eax ); // First parameter is "i+5"

add( 5, eax );

push( eax );

}#,

);

HLA will automatically copy an actual value parameter into local storage for the procedure, regardless of the size of the parameter. If your value parameter is a one million byte array, HLA will allocate storage for 1,000,000 bytes and then copy that array in on each call. C/C++ programmers may expect HLA to automatically pass arrays by reference (as C/C++ does) but this is not the case. If you want your parameters passed by reference, you must explicitly state this.

The code HLA generates to copy value parameters, while not particularly bad, certainly isn't optimal. If you need the fastest possible code when passing parameters by value on the stack, it would be better if you explicitly pushed the data yourself. Another alternative that sometimes helps is to use the " use reg32 " procedure option to provide HLA with a hint about a 32-bit scratchpad register that it can use when building parameters on the stack.

Parameter Passing in HLA, Reference, Value/Result, and Result Parameters

The one good thing about pass by reference, pass by value/result, and pass by result parameters is that they are always four byte pointers, regardless of the size of the actual parameter. Therefore, HLA has an easier time generating code for these parameters than it does generating code for pass by value parameters.

HLA treats reference, value/result, and result parameters identically. The code within the procedure is responsible for differentiating these parameter types (value/result and result parameters generally require copying data between local storage and the actual parameter). The following discussion will simply refer to pass by reference parameters, but it applies equally well to pass by value/result and pass by result.

Like high level languages, HLA places a whopper of a restriction on pass by reference parameters: they can only be memory locations. Constants and registers are not allowed since you cannot compute their address. Do keep in mind, however, that any valid memory addressing mode is a valid candidate to be passed by reference; you do not have to limit yourself to static and local variables. For example, "[eax]" is a perfectly valid memory location, so you can pass this by reference (assuming you type-cast it, of course, to match the type of the formal parameter). The following example demonstrate a simple procedure with a pass by reference parameter:

program refDemo;

#include( "stdio.hhf" );

procedure refParm( var a:int32 );

begin refParm;

mov( a, eax );

mov( 12345, (type int32 [eax]));

end refParm;

static

i:int32:=5;

begin refDemo;

stdout.put( "(1) i=", i, nl );

mov( 25, i );

stdout.put( "(2) i=", i, nl );

refParm( i );

stdout.put( "(3) i=", i, nl );

end refDemo;

The output produced by this code is

(1) i=5

(2) i=25

(3) i=12345

As you can see, the parameter a in refParm exhibits pass by reference semantics since the change to the value a in refParm changed the value of the actual parameter ( i ) in the main program.

Note that HLA passes the address of i to refParm , therefore, the a parameter contains the address of i . When accessing the value of the i parameter, the refParm procedure must deference the pointer passed in a . The two instructions in the body of the refParm procedure accomplish this.

Take a look at the code that HLA generates for the call to refParm :

pushd offset32 ?198_i

call ?197_refParm

(" ?198_i " is the MASM compatible name that HLA generated for the static variable " i ".)

As you can see, this program simply computed the address of i and pushed it onto the stack. Now consider the following modification to the main program:

program refDemo;

#include( "stdio.hhf" );

procedure refParm( var a:int32 );

begin refParm;

mov( a, eax );

mov( 12345, (type int32 [eax]));

end refParm;

static

i:int32:=5;

var

j:int32;

begin refDemo;

mov( 0, j );

refParm( j );

refParm( i );

lea( eax, j );

refParm( [eax] );

end refDemo;

This version emits the following code:

mov dword ptr [ebp-8] , 0 ;j

push eax

lea eax, dword ptr [ebp-8] ;j

xchg eax, [esp]

call ?197_refParm ;refParm

pushd offset32 ?198_i

call ?197_refParm ;refParm

lea eax, dword ptr [ebp-8] ;j

push eax

lea eax, dword ptr [eax+0] ;[eax]

mov [esp+4],eax

pop eax

call ?197_refParm ;refParm

As you can see, the code emitted for the last call is pretty ugly (we could easily get rid of three of the instructions in this code). This call would be a good candidate for using the call instruction directly. Also see "Hybrid Parameters" a little later in this document. Another option is to use the "use reg32" option to tell HLA it can use one of the 32-bit registers as a scratchpad. Consider the following:

procedure refParm( var a:int32 ); use esi;

lea( eax, j );

refParm( [eax] );

This sequence generates the following code (which is a little better than the previous example):

lea eax, dword ptr [ebp-8] ;j

lea eax, dword ptr [eax+0] ;[eax]

push eax

call ?197_refParm ;refParm

As a general rule, the type of an actual reference parameter must exactly match the type of the formal parameter. There are a couple exceptions to this rule. First, if the formal parameter is dword , then HLA will allow you to pass any four-byte data type as an actual parameter by reference to this procedure. Second, you can pass an actual dword parameter by reference if the formal parameter is a four-byte data type.

There is a third exception to the "the types must exactly match" rule. If the formal reference parameter is some data type HLA will allow you to pass an actual parameter that is a pointer to this type. Note that HLA will actually pass the value of the pointer, rather than the address of the pointer, as the reference parameter. This turns out to be really convenient, particularly when calling Win32 API functions and other C/C++ code. Note, however, that this behavior isn't always intuitive, so be careful when passing pointer variables as reference parameters.

If you want to pass the value of a double word or pointer variable in place of the address of such a variable to a pass by reference, value/result, or result parameter, simply prefix the actual parameter with the VAL reserved word in the call to the procedure, e.g.,

refParm( val dwordVar );

This tells HLA to use the value of the variable rather than it's address.

Untyped Reference Parameters

HLA provides a special formal parameter syntax that tells HLA that you want to pass an object by reference and you don't care what its type is. Consider the following HLA procedure:

procedure zeroObject( var object:byte; size:uns32 );

begin zeroObject;

<< code to write "size" zeros to "object" >

end zeroObject;

The problem with this procedure is that you will have to coerce non-byte parameters to a byte before passing them to zeroObject . That is, unless you're passing a byte parameter, you've always got to call zeroObject thusly:

zeroObject( (type byte NotAByte), sizeToZero );

For some functions you call frequently with different types of data, this can get painful very quickly.

The HLA untyped reference parameter syntax solves this problem. Consider the following declaration of zeroObject :

procedure zeroObject( var object:var; size:uns32 );

begin zeroObject;

<< code to write "size" zeros to "object" >

end zeroObject;

Notice the use of the reserved word "VAR" instead of a data type for the object parameter. This syntax tells HLA that you're passing an arbitrary variable by reference. Now you can call zeroObject and pass any (memory) object as the first parameter and HLA won't complain about the type, e,g.,

zeroObject( NotAByte, sizeToZero );

Note that you may only pass untyped objects by reference to a procedure.

Note that untyped reference parameters always take the address of the actual parameter to pass on to the procedure, even if the actual parameter is a pointer (normal pass by reference semantics in HLA will pass the value of a pointer, rather than the address of the pointer variable, if the base type of the pointer matches the type of the reference parameter). Sometimes you'll have the address of an object in a register or a pointer variable and you'll want to pass the value of that pointer object (i.e., the address of the utlimate object) rather than the address of the pointer variable. To do this, simply prefix the actual parameter with the VAL keyword, e.g.,

zeroObject( ptrVar ); // Passes the address of ptrVal

zeroObject( val ptrVar ); // Passes ptrVar's value.

Parameter Passing in HLA, Name and Lazy Evaluation Parameters

HLA provides a modicum of support for pass by name and pass by lazy evaluation parameters. A pass by name parameter consists of a thunk that returns the address of the actual parameter. A pass by lazy evaluation parameter is a thunk that returns the value of the actual parameter. Whenever you specify the "name" or "lazy" keywords before a parameter, HLA reserves eight bytes to hold the corresponding thunk in the activation record. It is your responsibility to create a thunk whenever calling the procedure.

There is a minor difference between passing a thunk parameter by value and passing a lazy evaluation or name parameter to a procedure. Pass by name/lazy parameters require an immediate thunk constant; you cannot pass a thunk variable as a pass by name or lazy parameter.

To pass a thunk constant as a parameter to a pass by name or pass by lazy evaluation parameter, insert the thunk's code inside "#{...}#" sequence in the parameter list and preface the whole thing with the THUNK reserved word. The following example demonstrates passing a thunk as a pass by name parameter:

program nameDemo;

#include( "stdio.hhf" );

procedure passByName( name ary:int32; var ip:int32 );

@nodisplay;

const i:text := "(type int32 [ebx])";

const a:text := "(type int32 [eax])";

begin passByName;

mov( ip, ebx );

mov( 0, i );

while( i < 10 ) do

ary(); // Get address of "ary[i]" into eax.

mov(i, ecx );

mov( ecx, a );

inc( i );

endwhile;

end passByName;

procedure thunkParm( t:thunk );

begin thunkParm;

t();

end thunkParm;

var

index:int32;

array:int32[10];

th:thunk;

begin nameDemo;

thunk th := #{ stdout.put( "Thunk Variable",nl ) }#;

thunkParm( th );

thunkParm( thunk #{ stdout.put( "Thunk Constant" nl ); }# );

// passByName( th, index ); -- would be illegal;

passByName

(

thunk

push( ebx );

mov( index, ebx );

lea( eax, array[ebx*4] );

pop( ebx );

}#,

index

);

mov( 0, ebx );

while( ebx < 10 ) do

stdout.put

(

"array[",

(type int32 ebx),

"]=",

array[ebx*4],

);

inc( ebx );

endwhile;

end nameDemo;

This program produces the following output:

Thunk Variable

Thunk Constant

array[0]=0

array[1]=1

array[2]=2

array[3]=3

array[4]=4

array[5]=5

array[6]=6

array[7]=7

array[8]=8

array[9]=9

Hybrid Parameter Passing in HLA

HLA's automatic code generation for parameters specified using the high level language syntax isn't always optimal. In fact, sometimes it is downright inefficient. This is because HLA makes very few assumptions about your program. For example, suppose you are passing a word parameter to a procedure by value. Since all parameters in HLA consume an even multiple of four bytes on the stack, HLA will zero extend the word and push it onto the stack. It does this using code like the following:

pushw 0

pushw Parameter

Clearly you can do better than this if you know something about the variable. For example, if you know that the two bytes following "Parameter" are in memory (as opposed to being in the next page of memory that isn't allocated, and access to such memory would cause a protection fault), you could get by with the single instruction:

push dword ptr Parameter

Unfortunately, HLA cannot make these kinds of assumptions about the data because doing so could create malfunctioning code.

One solution, of course, is to forego the HLA high level language syntax for procedure calls and manually push all the parameters yourself and call the procedure via the CALL instruction. However, this is a major pain that involves lots of extra typing and produces code that is difficult to read and understand. Therefore, HLA provides a hybrid parameter passing mechanism that lets you continue to use a high level language calling syntax yet still specify the exact instructions needed to pass certain parameters. This hybrid scheme works out well because HLA actually does a good job with most parameters (e.g., if they are an even multiple of four bytes, HLA generates efficient code to pass the parameters; it's only those parameters that have a weird size that HLA generates less than optimal code for).

If a parameter consists of the "#{" and "}#" brackets with some corresponding code inside the brackets, HLA will emit the code inside the brackets in place of any code it would normally generate for that parameter. So if you wanted to pass a 16-bit parameter efficiently to a procedure named "Proc" and you're sure there is no problem accessing the two bytes beyond this parameter, you could use code like the following:

Proc( #{ push( (type dword WordVar) ); }# );

Notice the similarity to pass by name/eval parameters. However, no THUNK reserved word prefaces the code in this instance.

Whenever you pass a non-static14 variable as a parameter by reference, HLA generates the following code to pass the address of that variable to the procedure:

push eax

lea eax, Variable

mov [esp+4], eax

pop eax

It generates this particular code to ensure that it doesn't change any register values (after all, you could be passing some other parameter in the EAX register). If you have a free register available, you can generate slightly better code using a calling sequence like the following (assuming EBX is free):

HasRefParm

(

lea( ebx, Variable );

push( ebx );

);

Parameter Passing in HLA, Register Parameters

HLA provides a special syntax that lets you specify that certain parameters are to be passed in registers rather than on the stack. The following are some examples of procedure declarations that use this feature:

procedure a( u:uns32 in eax ); forward;

procedure b( w:word in bx ); forward;

procedure d( c:char in ch ); forward;

Whenever you call one of these procedures, the code that HLA automatically emits for the call will load the actual parameter value into the specified register rather than pushing this value onto the stack. You may specify any general purpose 8-bit, 16-bit, or 32-bit register after the "IN" keyword following the parameter's type. Obviously, the parameter must fit in the specified register. You may only pass reference parameters in 32-bit registers; you cannot pass parameters that are not one, two, or four bytes long in a register.

You can get in to trouble if you're not careful when using register parameters, consider the following two procedure definitions:

procedure one( u:uns32 in eax; v:dword in ebx ); forward;

procedure two( a:uns32 in eax );

begin two;

one( 25, a );

end two;

The call to "one" in procedure "two" looks like it passes the values 25 and whatever was passed in for "a" in procedure two. However, if you study the HLA output code, you will discover that the call to "one" passes 25 for both parameters. They reason for this is because HLA emits the code to load 25 into EAX in order to pass 25 in the "u" parameter. Unfortunately, this wipes out the value passed into "two" in the "a" variable, hence the problem. Be aware of this if you use register parameters often.

Lexical Scope

HLA is a block-structured language that enforces the scope of local identifiers. HLA uses lexical scope to determine when and where an identifier is visible to the program. Identifiers declared within a procedure are always visible within that procedure and to any local procedures declared after the identifier. Local identifiers are never visible outside the procedure declaration. The scoping rules are similar to languages like Pascal, Ada, and Modula-2. As an example, consider the following code:

program scopeDemo;

#include( "stdio.hhf" );

var
i:int32;
j:int32;
k:int32;

procedure lex1;
var
i:int32;
j:int32;

procedure lex2;
var
i:int32;
begin lex2;

mov( i, eax ); /1

mov( ebx::j, eax ); //2

mov( ecx::k, eax ); //3

end lex2;

begin lex1;

mov( i, eax ); //4

mov( j, eax ); //5

mov( ecx::k, eax ); //6

end lex1;

procedure alsolex1;
var
i:int32;
m:int32;
begin alsolex1;

mov( i, eax ); //4

mov( m, eax ); //5

mov( ecx::k, eax ); //6

end alsolex1;

begin scopeDemo;

mov( i, eax ); //7

mov( j, eax ); //8

mov( k, eax ); //9

end scopeDemo;

(Note: the purpose of the ebx:: and ecx:: prefixes on certain variables will become clear in a moment. Also note that this code is not functional, it was written only as an illustration.)

In this example you will note that lex2 is nested within lex1 , which is nested within the main program. The alsolex1 procedure is nested within the main program but inside no other procedure. To describe this arrangement, compiler writers use the term lex level to denote the depth of nesting. HLA defines the main program to be lex level zero. Each time you nest a procedure, you increase its lex level. So lex1 is at lex level one since it is directly nested inside the main program at lex level zero. The lex2 procedure is at lex level two because it is nested inside the lex1 procedure. Finally, alsolex1 is also at lex level one because it is nested inside the main program (which is lex level zero).

Within a given procedure (or the main program), all identifiers must be unique. That is, you cannot have two symbols named "i" in the same procedure. In different procedures, however, you may reuse the names. If all procedures were written at lex level one, then no procedure would be able to directly access the local variables in any other procedure (this is the case with the C/C++ language). In block structured languages, like HLA, it is possible to access certain non-local variables in other procedures if the current procedure (whose code is attempting to access said variable) is nested within the other procedure.

In the example above, lex2 accesses three variables: i , j , and k . The i variable is local to lex2 , so there is nothing surprising here. The j variable is local to lex1 and global to lex2 . Likewise, the k variable is global to both lex1 and lex2 yet lex2 can access it. Whenever a procedure is nested within another (either directly or indirectly), the nested procedure can access all variables in the global, nesting, procedures (including the main program)15 unless the procedure declares a local name with the same name as a global name (the local name always takes precedence in this case). The term "scope" refers to the visibility of these names.

Being able to use a name during compilation is one thing, accessing the memory location associated with that name at run-time is something else entirely. Most block structured high level languages (HLLs) emit lots of extra code to access these "intermediate" and global variables for you. Why the extra code? Well remember, local procedure variables are accessed on the stack by indexing off the EBP register (which points at a procedure's "activation record"). When a procedure like lex1 above calls a local procedure like lex2 , the lex2 procedure promptly saves the value in EBP (that points at lex1 's activation record) and it points EBP at the new activation record for lex2 . Unfortunately, lex2 no longer has access to lex1 's local variables since EBP no longer points at lex1 's locals. This creates a bit of a problem.

"But wait!" you exclaim. "EBP is pointing at the pointer to lex1 's activation record, why not just use double indirection to get the pointer to lex1 's locals?" This is a good idea, but it fails if lex2 is recursive. There are two or three general solutions to this problem, HLA uses a display to access non-local values.

A display is nothing more than an array of pointers. Display [0] is a pointer to the most recent activation record at lex level zero, Display [1] is a pointer to the most recent activation record at lex level one, Display [2] is a pointer to the most recent activation record at lex level two, etc. (note the use of the phrase most recent. This ensures that displays work properly even when recursion occurs). With a display, to access a non-local variable, you just go to the memory location specified by Display [ varlex ] + varoffset where " varlex " is the lex level of the symbol you wish to access and " varoffset " is the offset into the activation record where the variable's data can be found.

Sound complex? Actually, HLA simplifies this quite a bit. First, as long as you don't specify the @ nodisplay procedure option, HLA automatically emits the code to build a display at the start of the procedure's code16. HLA also defines a run-time variable, _display_ , that points at this array of pointers. To access a non-local variable requires two instructions, one to fetch the address of the variable's activation record and one to access the variable. Correcting the previous program, the code would look something like this:

procedure lex2;
var
i:int32;
begin lex2;

mov( i, eax );

// access non-local variable j
// at lex level 1.

mov( _display_[-1*4], ebx );

mov( ebx::j, eax );

// access non-local variable k
// at lex level 0.

mov( _display_[0], ecx );

mov( ecx::k, eax );

end lex2;

There are two things to note about the display: first, the entries are stored at negative indicies in the array (0, -1, -2, etc) rather than at positive indicies (this is due to HLA's implementation). Second, don't forget that this is a run-time array of dwords so you must multiply each index by the array element size, which is four in this case.

Once you've loaded the address into a register, the reg:var syntax tells HLA to use the specified register rather than EBP as the pointer to the variable's activation record. The "mov(ecx::k,eax);" instruction, for example, compiles to "mov eax, [ecx+koffset]" where koffset represents the offset of k in the main program's activation record.

In general, few programs take advantage of nested procedures and access to local variables, so it is very common to find programmers putting " @nodisplay " after all their procedures. Of course, if you do this, HLA does not generate display and access to non-local variables (declared in the var section) is not possible. Of course, static variables are not allocated in the activation record, so you always have access to non-local static variables even if you don't generate the code for a display.

Class Data Types

HLA supports object-oriented programming via the class data type. A class declaration takes the following form:

class

<< declarations >>

endclass;

Classes allow const, val, var, static, readonly, uninitialized, procedure, method, and macro declarations. In general, just about everything allowed in a program declaration section except types, segments, and namespaces are legal in a class declaration.

Unlike C++ and Object Pascal, where the class declarations are nearly identical to the record/struct declarations, HLA class declarations are noticably different than HLA records because you supply const, var, static, etc., declaration sections within the class. As an example, consider the following HLA class declaration:

type SomeClass: class

var
i:int32;

const
pi:=3.14159;

method incrementI;

endclass;

Unlike records, you must put each declaration into an appropriate section. In particular, data fields must appear in a static, readonly, uninitialized, or var section.

Note that the body of a procedure or method does not appear in the class declaration. Only prototypes (forward declarations) appear within the class definition itself. The actual procedure or method is declared elsewhere in the code.

Classes, Objects, and Object-Oriented Programming in HLA

HLA provides support for object-oriented program via classes, objects, and automatic method invocation. Indeed, supporting method calls requires HLA to violate an important design principle (that HLA generated code does not disturb values in any registers except ESP and EBP). Nevertheless, supporting object-oriented programming and automatic method calls was so important, an exception was made in this instance. But more on that in a moment.

It is worthwhile to review the syntax for a class declaration. First of all, class declaration may only appear in a type section within an HLA program. You cannot define classes in the VAR, STATIC, STORAGE, or READONLY sections and HLA does not allow you to create class constants17. Within the TYPE section, a class declaration takes one of the following forms:

type

baseClass:

class

Declarations, including const,

val, var, and static sections, as

well as procedures, methods, and

macros.

endclass;

derivedClass:

class inherits( baseClass )

Declarations, including const,

val, var, and static sections, as

well as procedure and method prototypes, and

macros.

endclass;

Note that you may not include type sections or namespaces in a class. Allowing type sections in a class creates some special problems (having to due with the possibility of nested class definitions). Namespaces are illegal because they allow type sections internally (and there is no real need for namespaces within a class).

Note that you may only place procedure, iterator, and method prototypes in a class definition. Procedure and method prototypes look like a forward declaration without the forward reserved word; They use the following syntax:

procedure procName(optional_parameters); options

method methodName(optional_parameters); options

iterator iterName( optional_parameters ); optional_external

" procName ", " iterName ", and " methodName " are the names you wish to assign to these program units. Note that you do not preface these names with the name of the class and a period.

If the procedure, iterator, or method has any parameters, they immediately following the procedure/iterator/method name enclosed in parentheses. The parentheses must not be present if there are no parameters. A semicolon immediately follows the parameters, or the procedure/method name if there are no parameters.

Class procedure and method prototypes allow two options: a @RETURNS clause and/or an @EXTERNAL clause. The @ pascal, @cdecl, @stdcall, @nodisplay and @ noframe options are not allowed in the prototype. See the section on procedures for more details on the @ returns and @ external clauses. The iterator only allows the @external option.

Unlike procedures and methods, if you define a macro within a class you must supply the body of the macro within the class definition.

Consider the following example of a class declaration:

type

baseClass:

class

var

i:int32;

procedure create; @returns( "esi" );

procedure geti; @returns( "eax" );

method seti( ival:int32 ); @external;

endclass;

By convention, all classes should have a class procedure named " create ". This is the constructor for the class. The create procedure should return a pointer to the class object in the ESI register, hence the @returns( "esi" ); clause in this example.

This procedure includes two accessor functions, geti and seti , that provide access to the class variable " i ". Note that HLA classes do not support the public, private, and protected visibility options found in HLLs like C++ and Delphi. HLA's design assumes that an assembly language programmers are sufficiently disciplined such that they will not access fields that should be private18.

Of course, the class' procedures and methods must be defined at one point or another. Here are some reasonable examples of these class definitions (a full explanation will appear later):

procedure baseClass.create;

begin create;

push( eax );

if( esi = 0 ) then

malloc( @size( baseClass ));

mov( eax, esi );

endif;

mov( baseClass._VMT_, this._pVMT_ );

pop( eax );

ret();

end create;

procedure baseClass.geti; @nodisplay; @noframe;

begin geti;

mov( this.i, eax );

ret();

end geti;

method baseClass.seti( ival:int32 ); @nodisplay;

begin seti;

push( eax );

mov( ival, eax );

mov( eax, this.i );

pop( eax );

end seti;

These procedure and method declarations look almost like regular procedure declarations with one important difference: the class name and a period precede the procedure or method name on the first line of the procedure/method declaration. Note, however, that only the procedure or method name appears after the BEGIN and END clauses.

Another important difference is the procedure options. Only the @ nodisplay /@ display , @ noalignstack/@alignstack , and @ noframe/@frame options are legal here (the converse of the class procedure/method prototype definitions which only allow @ external and @returns ). Note that call procedures, methods, and iterators do not support the @ pascal, @cdecl , or @ stdcall procedure options (they always use the Pascal calling convention).

Class procedures and methods must be defined at the same lex level and within the same scope as the class declaration. Usually class declarations are a lex level zero (i.e., inside the main program or within a unit), so the corresponding procedure and method declarations must appear at lex level zero as well. Of course, it is perfectly legal to declare a class type within some other procedure (at lex level one or higher). If you do this, the class procedure and method declarations must appear at the same level.

Inheritence

HLA classes support inheritence using the INHERITS reserved word. Consider the following class declaration that inherits the fields from the baseClass declaration in the previous section:

derivedClass:

class inherits( baseClass )

var

j:int32;

f:real64;

endclass;

This class inherits all the fields from baseClass and adds two new fields, j and f . This declaration is roughly equivalent to:

derivedClass:

var

i:int32;

procedure create; @returns( "esi" );

procedure geti; @returns( "eax" );

method seti( ival:int32 ); @external;

var

j:int32;

f:real64;

endclass;

It is "roughly" equivalent because there is no need to create the derivedClass.create and derivedClass.geti procedures or the derivedClass.seti method. This class inherits the procedures and methods written for baseClass along with the field definitions.

Like records, it is possible to "override" the VAR fields of a base class in a derived class. To do this, you use the OVERRIDES keyword. Note that this keyword is valid only for VAR fields in a class, you may not override static objects with this keyword. Example:

derivedClass:

class inherits( baseClass )

procedure create; @returns( "esi" );

procedure geti; @returns( "eax" );

method seti( ival:int32 ); @external;

var

overrides i: dword; // New copy of i for this class.

j:int32;

f:real64;

endclass;

Occasionally, you may want to override a procedure in a base class. For example, it is very common to supply a new constructor in each derived class (since the constructor may need to initialize fields in the derived class that are not present in the base class). The override19 keyword tells HLA that you intend to supply a new procedure or method declaration and you do not want to call the corresponding functions in the base class. Consider the following modifications to derivedClass that override the create procedure and seti method:

derivedClass:

class inherits( baseClass )

var

j:int32;

f:real64;

override procedure create;

override method seti;

endclass;

When you override a procedure or method, you are not allowed to specify any parameters or procedure options except the @external option. This is because the parameters and @returns strings must exactly match the declarations in the base class. So even though seti in this derived class doesn't have an explicit parameter declared, the " ival " parameter is still required in a call to seti .

Of course, once you override procedures and methods in a derived class, you must provide those program units in your code. Here is an example of a section of a program that provides overridden procedures and methods along with their declarations:

type

base: class

var

i:int32;

procedure create;

method geti;

method seti( ival:int32 );

endclass;

derived:class inherits( base )

var

j:int32;

override procedure create;

override method seti;

method getj;

method setj( jval:int32 );

endclass;

procedure base.create; @nodisplay; @noframe;

begin create;

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &base._VMT_, this._pVMT_ );

mov( 0, this.i );

pop( eax );

ret();

end create;

method base.geti; @nodisplay; @noframe;

begin geti;

mov( this.i, eax );

ret();

end geti;

method base.seti( ival:int32 ); @nodisplay;

begin seti;

push( eax );

mov( ival, eax );

mov( eax, this.i );

pop( eax );

end seti;

procedure derived.create; @nodisplay; @noframe;

begin create;

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

// Do any initialization done by the base class:

call base.create;

// Do our own specific initialization.

mov( &derived._VMT_, this._pVMT_ );

mov( 1, this.j );

// Return

pop( eax );

ret();

end create;

method derived.seti( ival:int32 ); @nodisplay;

begin seti;

push( eax );

mov( ival, eax );

// call inherited code to do whatever it does:

(type base [esi]).seti( ival );

// Now handle the code that we do specially.

mov( eax, this.j );

// Okay, return to caller.

pop( eax );

end seti;

method derived.setj( jval:int32 ); @nodisplay;

begin setj;

push( jval );

pop( this.j );

end setj;

method derived.getj; @nodisplay; @noframe;

begin getj;

mov( this.j, eax );

ret();

end getj;

Abstract Methods

Sometimes you will want to create a base class as a template for other classes. You will never create instances (variables) of this base class, only instances of classes derived from this class. In object-oriented terminology, we call this an abstract class. Abstract classes may contain certain methods that will always be overridden in the derived classes. Hence, there is no need to actually supply the method for this base class. HLA, however, always checks to verify that you supply all methods associated with a class. Therefore, you normally have to supply some sort of method, even if it's just an empty method, to satisfy the compiler. In those instances where you really don't need such a method, this is an annoyance. HLA's abstract methods provide a solution to this problem.

You declare an abstract method in a class declaration as follows:

type

c: class

method absMethod( parameters: uns32 ); @abstract;

endclass;

The @ABSTRACT keyword must follow the @RETURNS option if the @RETURNS option is present.

The @ABSTRACT keyword tells HLA not to expect an actual method associated with this class. Instead, it is the responsibility of all classes derived from "c" to override this method. If you attempt to call an abstract method, HLA will raise an exception and abort program execution.

Classes versus Objects

An object is an instance of a class. In plain English, this means that a class is only a data type while an object is a variable whose type is some class type. Therefore, actual objects may be declared in the var or static section of a program. Here are a couple of typical examples:

var

b: base;

static

d: derived;

Each of these declarations reserves storage for all the data in the specified class type.

For reasons that will shortly become clear, most programmers use pointers to objects rather than directly declared objects. Pointer declarations look like the following:

var

ptrToB: pointer to base;

static

ptrToD: pointer to derived;

Of course, if you declare a pointer to an object, you will need to allocate storage for the object (call the HLA Standard Library " malloc " routine) and initialize the pointer variable with the address of the allocated storage. As you will soon see, the class constructor typically handles this allocation for you.

Initializing the Virtual Method Table Pointer

Whether you allocate storage for an object statically (in the STATIC section), automatically (in the VAR section), or dynamically (via a call to malloc ), it is important to realize that the object is not properly initialized and must be initialized before making any method calls. Failure to do so will, most likely, cause your program to crash when you attempt to call a method or access other data in the class.

The first four bytes of every object contain a pointer to that object's virtual method table. The virtual method table, or VMT, is an array of pointers to the code for each method in the class. To help you initialize this pointer, HLA automatically adds two fields to every class you create: _VMT_ which is a static dword entry (the significance of this being a static entry will become clear later) and _pVMT_ which is a VAR field of the class whose type is pointer to dword. _pVMT_ is where you must put a pointer to the virtual method table. The pointer value to store here is the address of the _VMT_ entry. This initialization can be done using the following statement:

mov( &ClassName._VMT_, ObjectName._pVMT_ );

ClassName represents the name of the class and ObjectName represents the name of the STATIC or VAR variable object. If you've allocated storage for an object pointer using malloc , you'd use code like the following:

mov( ObjectPtr, ebx );

mov( &ClassName._VMT_, (type ClassName [ebx])._pVMT_ );

In this example, ObjectPtr represents the name of the pointer variable. ClassName still represents the name of the class type.

Typically, the initialization of the pointer to the virtual method table takes place in the class' constructor procedure (it must be a procedure, not a method!). Consider the example from the previous section:

procedure base.create; @nodisplay; @noframe;

begin create;

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &base._VMT_, this._pVMT_ );

mov( 0, this.i );

pop( eax );

ret();

end create;

As you can see here, this example uses the keyword " this._pVMT_ " rather than " (type derived [esi])._pVMT_ " That's because " this " is a shorthand for using the ESI register as a pointer to an object of the current class type.

Creating the Virtual Method Table

For various technical reasons (related to efficiency), HLA does not automatically create the virtual method table for you; you must explicitly tell HLA to emit the table of pointers for the virtual method table. You can do this in either the STATIC or the READONLY declaration sections. The simple way is to use a statement like the following in either the STATIC or READONLY section:

VMT( classname );

If you need to be able to access the pointers in this table, there are two ways to do this. First, you can refer to the " classname._VMT_ " dword variable in the class. Another way is to directly attach a label to the VMT you create using a declaration like the following:

vmtLabel: VMT( classname );

The " vmtLabel " label will be a static object of type dword.

Calling Methods and Class Procedures

Once the virtual method table of an object is properly initialized, you may call the methods and procedures of that object. The syntax is very similar to calling a standard HLA procedure except that you must prefix the procedure or method name with the object name and a period. For example, assume you have some objects with the following types ("base" is the type in the examples of the previous sections):

var

b: base;

pb: pointer to base;

With these variable declarations, and some code to initialize the pointers to the " base " virtual method table, the calls to the base procedures and methods might look like the following:

b.create();

b.geti();

b.seti( 5 );

pb.create();

pb.geti();

pb.seti( eax );

Note that HLA uses the same syntax for an object call regardless of whether the object is a pointer or a regular variable.

Whenever HLA encounters a call to an object's procedure or method, HLA emits some code that will load the address of the object into the ESI register. This is the one place HLA emits code that modifies the value in a general purpose register! You must remember this and not expect to be able to pass any values to an object's procedure or methods in the ESI register. Likewise, don't expect the value in ESI to be preserved across a call to an object's procedure or method. As you will see momentarily, HLA may also emit code that modifies the EDI register as well as the ESI register. So don't count on the value in EDI, either.

The value in ESI, upon entry into the procedure or method, is that object's "this" pointer. This pointer is nececessary because the exact same object code for a procedure or method is shared by all object instances of a given class. Indeed, the "this" reserved word within a method or class procedure is really nothing more than shorthand for "(type ClassName [esi])".

Perhaps an obvious question is "What is the difference between a class procedure and a method?" The difference is the calling mechanism. Given an object b , a call to a class procedure emits a call instruction that directly calls the procedure in memory. In other words, class procedure calls are very similar to standard procedure calls with the exception that HLA emits code to load ESI with the address of the object20. Methods, on the other hand, are called indirectly through the virtual method table. Whenever you call a method, HLA actually emits three machine instructions: one instruction that load the address of the object into ESI, one instruction that loads the address of the virtual method table (i.e., the first four bytes of the object) into EDI, and a third instruction that calls the method indirectly through the virtual method table. For example, given the following four calls:

b.create();

b.geti();

pb.create();

pb.geti();

HLA emits the following 80x86 assembly language code:

lea esi, [ebp-12] ;b

call ?8_create

lea esi, [ebp-12] ;b

mov edi, [esi]

call dword ptr [edi+0] ;geti

mov esi, dword ptr [ebp-16] ;pb

call ?8_create

mov esi, dword ptr [ebp-16] ;pb

mov edi, [esi]

call dword ptr [edi+0] ;geti

HLA class procedures roughly correspond to C++'s static member functions. HLA's methods roughly correspond to C++'s virtual member functions. Read the next few sections on the impact of these differences.

Non-object Calls of Class Procedures

In addition to the difference in the calling mechanism, there is another major difference between class procedures and methods: you can call a class procedure without an associated object. To do so, you would use the class name and a period, rather than an object name and a period, in front of the class procedure's name. E.g.,

base.create();

Since there is no object here (remember, base is a type name, not a variable name, and types do not have any storage allocated for them at run-time), HLA cannot load the address of the object into the ESI register before calling create. This situation can create some big problems in your code if you attempt to use the "this" pointer within a class procedure. Remember, an instruction like "mov( this.i, eax );" really expands to "mov( (type base [esi]).i, eax );" The question that should come to mind is "where is ESI pointing when one makes a non-object call to a class procedure?"

When HLA encounters a non-object call to a class procedure, HLA loads the value zero (NULL) into ESI immediately before the call. So ESI doesn't contain junk but it does contain the NULL pointer. If you attempt to dereference NULL (e.g., by accessing " this.i ") you will probably bomb the program. Therefore, to be really safe, you must check the value of ESI inside your class procedures to verify that it does not contain zero.

The base.create constructor procedure demonstrates a great way to use class procedures to your advantage. Take another look at the code:

procedure base.create; @nodisplay; @noframe;

begin create;

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &base._VMT_, this._pVMT_ );

mov( 0, this.i );

pop( eax );

ret();

end create;

This code follows the standard convention for HLA constructors with respect to the value in ESI. If ESI contains zero, this function will allocate storage for a brand new object, initialize that object, and return a pointer to the new object in ESI21. On the other hand, if ESI contains a non-null value, then this function does not allocate memory for a new object, it simply initializes the object at the address provided in ESI upon entry into the code.

Certainly you do not want to use this trick (automatically allocating storage if ESI contains NULL) in all class procedures; but it's still a real good idea to check the value of ESI upon entry into every class procedure that accesses any fields using ESI or the "this" reserved word. One way to do this is to use code like the following at the beginning of each class procedure in your program:

if( ESI = 0 ) then

raise( AttemptToDerefZero );

endif;

If this seems like too much typing, or if you are concerned about efficiency once you've debugged your program, you could write a macro like the following to solve your problem:

#macro ChkESI;

#if( CheckESI )

if( ESI = 0 ) then

raise( AttemptToDerefZero );

endif;

#endif

#endmacro

Now all you've got to do is stick an innocuous " ChkESI " macro invocation at the beginning of your class procedures (maybe on the same line as the "begin" clause to further hide it) and you're in business. By defining the boolean constant " CheckESI " to be true or false at the beginning of your code, you can control whether this "inefficent" code is generated into your programs.

Static Class Fields

There exists only one copy, shared by all objects, of any static data objects in a class. Since there is only one copy of the data, you do not access variables in the class' static section using the object name or the "this" pointer. Instead, you preface the field name with the class name and a period.

For example, consider the following class declaration that demonstrates a very common use of static variables within a class:

program DemoOverride;

#include( "memory.hhf" );

#include( "stdio.hhf" );

type

CountedClass:

class

static

CreateCnt:int32 := 0;

procedure create;

procedure DisplayCnt;

endclass;

procedure CountedClass.create; @nodisplay; @noframe;

begin create;

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &CountedClass._VMT_, this._pVMT_ );

inc( this.CreateCnt );

pop( eax );

ret();

end create;

procedure CountedClass.DisplayCnt; @nodisplay; @noframe;

begin DisplayCnt;

stdout.put( "Creation Count=", CountedClass.CreateCnt, nl );

ret();

end DisplayCnt;

var

b: CountedClass;

pb: pointer to CountedClass;

begin DemoOverride;

CountedClass.DisplayCnt();

b.create();

CountedClass.DisplayCnt();

CountedClass.create();

mov( esi, pb );

CountedClass.DisplayCnt();

end DemoOverride;

In this example, a static field ( CreateCnt ) is incremented by one for each object that is created and initialized. The DisplayCnt procedure prints the value of this static field. Note that DisplayCnt does not access any non-static fields of CountedClass . This is why it doesn't bother to check the value in ESI for zero.

Program Unit Initializers and Finalizers

HLA does not automatically call an object's constructor like C++ does. Also, there is no code associated with a unit that automatically executes to initialize that unit as in (Turbo) Pascal or Delphi. Likewise, HLA does not automatically call an object's destructor. However, HLA does provide a mechanism by which you can automatically execute initialization and shut-down code without explicitly specifying the code to execute at the beginning and end of each procedure. This is handled via the HLA " _initialize_ " and " _finalize_ " strings. All programs, procedures, methods, and iterators have these two predeclared string constants (VALUE strings, actually) associated with them. Whenever you declare a program unit, HLA inserts these constants into the symbol table and initializes them with the empty string.

HLA expands the " _initialize_ " string immediately before the first instruction it finds after the "BEGIN" clause for a program, procedure, iterator, or method. Likewise, it expands the " _finalize_ " string immediately before the END clause in these program units. Since, by default, these string constants hold the empty string, they usually have no effect. However, if you change the values of these constants within a declaration section, HLA emits the corresponding code at the appropriate point. Consider the following example:

procedure HasInitializer;

?_initialize_ := "mov( 0, eax );";

begin HasInitializer;

stdout.put( "EAX = ", eax, nl );

end HasInitializer;

This program will print "EAX = 0000_0000" since the " _initialize_ " string contains an instruction that moves zero into EAX.

Of course, the previous example is somewhat irrelevant since you could have more easily put the MOV instruction directly into the program. The real purpose of initialize and finalize strings in an HLA program is to allow macros and include files to slip in some initialization code. For example, consider the following macro:

#macro init_int32( initValue ):theVar;

:forward( theVar );

theVar: int32

?_initialize_ = _initialize_ +

"mov( " +

@string:initValue +

", " +

@string:theVar +

" );";

#endmacro

Now consider the following procedure:

procedure HasInitedVars;

var

i: init_int32( 0 );

j: init_int32( -1 );

k: init_int32( 1 );

begin HasInitedVars;

stdout.put( "i=", i, " j=", j, " k=", k, nl );

end HasInitedVars;

The first "init_int32" macro above expands to (something like) the following code:

i: forward( _1002_ );

_1002_: int32

?_initialize_ := _initialize_ +

"mov( " +

"0" +

", " +

"i" +

" );";

Note that the last statement is equivalent to:

?_initialize_ := _initialize_ + "mov( 0, i );"

Also note that the text object _1002_ expands to " i ".

If you take a step back from this code and look at it from a high level persepective, you can see that what it does is initialize a VAR variable by emitting a MOV instruction that stores the macro parameter into the VAR object. This example makes use of the FORWARD declaration clause in order to make a copy of the variable's name for use in the MOV instruction. The following is a complete program that demonstrates this example (it prints "i=1", if you're wondering):

program InitDemo;

#include( "stdlib.hhf" )

#macro init_int32( initVal ):theVar;

forward( theVar );

theVar:int32;

?_initialize_ :=

_initialize_ +

"mov( " +

@string:initVal +

", " +

@string:theVar +

" );";

#endmacro

var

i:init_int32( 1 );

begin InitDemo;

stdout.put( "i=", i, nl );

end InitDemo;

Note how this example uses string concatenation to append an initialization string to the end of the existing string. Although " _initialize_ " and " _finalize_ " start out as the empty string, there may be more than one initialization string required by the program. For example, consider the following modification to the code above:

var

i:init_int32( 1 );

j:init_int32( 2 );

The two macro invocations above produce the initialization string "mov( 1, i);mov(2,j);". Had the macro not used string concatenation to attach its string to the end of the existing " _initialize_ " string, then only the last initialization statement would have been generated.

You can put any number of statements into an initialization string, although the compiler tools used to write HLA limit the length of the string to something less than 32,768 characters. In general, you should try to limit the length of the initialization string to something less than 4,096 characters (this includes all initialization strings concatenated together within a single procedure).

Two very useful purposes for the initialization string include automatic constructor invocation and Unit initialization code invocation. Let's consider the UNITs situation first. Associated with some unit you might have some code that you need to execute to initialize the code when the program first loads in to memory, e.g.,

unit NeedsInit;

#include( "NeedsInit.hhf" )

static

i:uns32;

j:uns32;

procedure InitThisUnit;

begin InitThisUnit;

mov( 0, i );

mov( 1, j );

end InitThisUnit;

end NeedsInit;

Now suppose that the "NeedsInit.hhf" header file contains the following lines:

procedure InitThisUnit; @external;

?_initialize_ := _initialize_ + "InitThisUnit();";

When you include the header file in your main program (that uses this unit), the statement above will insert a call to the " InitThisUnit " procedure into the main program. Therefore, the main program will automatically call the " InitThisUnit " procedure without the user of this unit having to explicitly make this call.

You can use a similar approach to automatically invoke class constructors and destructors in a procedure. Consider the following program that demonstrates how this could work:

program InitDemo2;

#include( "stdlib.hhf" )

type

_MyClass:

class

procedure create;

var

i: int32;

endclass;

#macro MyClass:theObject;

forward( theObject );

theObject: _MyClass;

?_initialize_ := _initialize_ +

@string:theObject +

".create();"

#endmacro

procedure _MyClass.create;

begin create;

push( eax );

if( esi = 0 ) then

malloc( @size( _MyClass ) );

mov( eax, esi );

endif;

mov( &_MyClass._VMT_, this._pVMT_ );

mov( 12345, this.i );

pop( eax );

end create;

procedure UsesMyClass;

var

mc:MyClass;

begin UsesMyClass;

stdout.put( "mc.i=", mc.i, nl );

end UsesMyClass;

static

vmt( _MyClass );

begin InitDemo2;

UsesMyClass();

end InitDemo2;

The variable declaration " mc:MyClass ;" inside the UsesMyClass procedure (effectively) expands to the following text:

mc: _MyClass;

?_initialize_ := _initialize_ + "mc.create();";

Therefore, when the UsesMyClass procedure executes, the first thing it does is call the constructor for the mc/_MyClass object. Notice that the author of the UsesMyClass procedure did not have to explicitly call this routine.

You can use the " _finalize_ " string in a similar manner to automatically call any destructors associated with an object.

Note that if an exception occurs and you do not handle the exception within a procedure containing " _finalize_ " code, the program will not execute the statements emitted by " _finalize_ " (any more than the program will execute any other statements within a procedure that an exception interrupts). If you absolutely, positively, must ensure that the code calls a destructor before leaving a procedure (via an exception), then you might try the following code:

?_initialize_ :=

_initialize_ +

<<string to call constructor>> +

"try ";

?_finalize_ :=

_finalize_ +

"anyexception push(eax); " +

<<string to call destructor>> +

"pop(eax); raise( eax ); endtry; " +

<<string to call destructor>>;

This version slips a TRY..ENDTRY block around the whole procedure. If an exception occurs, the ANYEXCEPTION handler traps it and calls the associated destructor, then reraises the exception so the caller will handle it. If an exception does not occur, then the second call to the destructor above executes to clean up the object before control transfers back to the caller.

Declarations

Programs, units, procedures, methods, and iterators all have a declaration section. Classes and namespaces also have a declaration section, though it is somewhat limited. A declaration section can contain one or more of the following components:

A label section
A type section.
A const section.
A val section.
A var section.
A static section.
A namespace.
A procedure.
A method.
An iterator.

The order of these sections is irrelevant as long as you ensure that all identifiers used in a program are defined before their first use. Furthermore, as noted above, you may have multiple sections within the same set of declarations. For example, the two const sections in the following procedure declaration are perfectly legal:

procedure TwoConsts;
const MaxVal := 5;
type Limits: int32[ MaxVal ];
const MinVal := 0;
begin TwoConsts;

//...

end TwoConsts;

C/C++ programmers who are used to specifying "typedef" or "const" before each declaration can do so in HLA:

type intArray: int32[4];
const pi := 3.14159;
var i:int32;
const MaxVal := 10;
const MinVal := 0;
etc.

Pascal/Delphi users can put as many declarations in each section should they choose to do so. Neither is a preferable style over the other.

Label Section

The label section allows you to forward-declare statement labels that appear in a module. This section takes the following form:

label

id1;

id2; @external;

id3; @external( "external_name" );

etc.

For the most part, HLA already handles forward references on labels, so you will rarely need a label section in your programs. The one time where this section is handy is when you want to refer to a statement label at an outer lex level from within a procedure. By predeclaring the label at the outer lexlevel, you can access to that symbol within the procedure, e.g.,

program funnyStuff;

label

funny;

procedure weird;

begin weird;

jmp funny;

end weird;

begin funnyStuff;

call weird;

funny:

end funnyStuff;

Note that this call returns the "weird" procedure, but leaves a bunch of stuff on the stack (like the return address and other parts of the activation record). This is useful in some bizzare cases, but is not common in normal code.

Perhaps the primary use for the label declaration section is to declare labels that are external to the program. This is done by attaching the @external option to the label you are defining. Without the optional external name string, HLA will define an external label using the label name you specify (e.g., id2 in the example above); if the optional external string is present, HLA uses the specified external name when referencing that label. These external declarations are quite useful when one module needs to refer to statement labels appearing in a different module.

Note that if you define the label in the same procedure that you declare it as external, then HLA treats that as a public declaration of that statement label, e.g.,

procedure SomeProc;

label

here; @external;

begin SomeProc;

here:

end SomeProc;

In this example, the label "here" is a public symbol and is available globally throughout the source file and it is externally accessible by other modules. Using the label declaration section is the only way to make statement labels visible outside the procedure (or main program) in which you declare them.

Another use of the label declaration symbol is to make HLA statement labels visible to code you place in the #asm..#endasm section or within an #emit(...) statement. See the discussion of #asm or #emit for more details.

Type Section

You can declare user-defined data types in the type section. The type section appears in a declaration section and begins with the reserved word type. It continues until encountering another declaration reserved word (e.g., const, var, or val) or the reserved word begin. A typical type definition begins with an indentifier followed by a colon and a type definition. The following paragraphs describe the legal types of type definitions.

id1 : forward( id2 );

This isn't an actual type declaration at all. What it will do is create a text constant (id2) and initialize that constant with the string "id1". The purpose of this declaration form is to let you defer the declaration of a symbol within a macro. For example, suppose you want to create a data type "template" (like those in C++). A template is just a macro you use in place of a data type. Given HLA's declaration syntax, however, the identifier for the template type has already appeared on the current source line. The forward declaration lets you "undo" this declaration and move it later. For example, consider the "strStorage" macro:

#macro strStorage( NumChars ):
theIdentifier,
MaxLength,
CurLength;

forward( theIdentifier );
MaxLength: dword := NumChars;
CurLength: dword := 0;
theIdentifier: byte[ (NumChars+4) & $FFFF_FFFC ];

#endmacro

Now consider the following variable declaration in the STATIC section:

static
s: strStorage( 250 );

HLA expands this template/macro to (something like) the following:

static
s: forward( _1000_ );
_1001_ :dword := 250;
_1002_ :dword := 0;
_1000_ :byte[ 252 ];

Note that _1001_ is a text constant containing the string "s", so the last line above expands to

s: byte[ 252 ];

This example demonstrates how you can use the "forward" clause to defer the declaration of a symbol within the type section.

id1 : pointer to id2

This declaration creates a new type (id1) which is a pointer to some other type (id2). Pointer objects always consume four bytes at run-time. Note that you may not use pointer types in constant expressions. If id2 is undefined earlier in the program, then the program must declare id2 before the end of the current procedure (that is, id2 must be defined before the current lex level is reduced).

Examples:

intPtr: pointer to int32;

PtrToPtr: pointer to CharPtr;

CharPtr: pointer to char;

id1 : enum { id_list };

This declaration creates a new type (id1) which is an enumerated data type. <id_list> is a list of names that represent the values of this data type.

Examples:

Colors:enum {red, green, blue};

Gender:enum {female, male};

State:enum {on, off};

id : procedure( optional_parameter_list );

Defines a pointer type that points at a procedure. The optional parameter list consists of a list of parameter declarations (described later) separated by semicolons. If there are no parameters, do not include the parentheses in the type declaration. Like other pointers, procedure pointers are always 32-bits long (four-byte near pointers for the flat memory model).

Examples:

ProcPtr: procedure; options

ProcI : procedure( i:int32 ); options

ProcIF: procedure( i:int32; f:real64 ); options

Procedure variables (pointers) allow the @pascal, @cdecl, @stdcall , and @returns options immediately after the semicolon following the optional parameters. The @returns option attaches a "returns" string for use with instruction composition to calls through this pointer variable. For more information about the @returns clause, see the section on procedures earlier in this documentation. The @ pascal, @cdecl , and @ stdcall options (which are mutually exclusive) select the parameter passing mechanism and calling convention for the procedure object.

Examples:

ProcPtr: procedure; @returns( "eax" );

ProcI : procedure( i:int32 ); @returns( "esi" );

ProcIF: procedure( i:int32; f:real64 ); @returns( "st0" );

id : pointer to procedure id2;

Defines a pointer type that points at a procedure. id2 must be a previously declared procedure. id inherits all the parameters and procedure options of procedure id2.

Examples:

procedure xyz( a:byte; b:word; c:dword ); @returns( "eax" );

type

p : pointer to procedure xyz;

The declaration for p is equivalent to:

type

p : procedure( a:byte; b:word; c:dword ); @returns( "eax" );

Note that the phrase " pointer to xyz " does not imply that p must point at xyz ; it only means that p points at a procedure whose procedure prototype is identical to xyz's .

id1 : id2;

Defines a new type (id1) that has the same characteristics as the specified type (id2). This is a type isomorphism; that is, you can rename a type.

Examples:

integer : int32;

float : real64;

double : float;

id1 : id2 [ dim_list ];

This declaration defines an array type. Id1 is an array whose base type is specified by id2 that has the number of elements and dimensions (arity) specified by the dimension list (dim_list). Dim_list is a comma-separated list of one or more integer constant expressions.

Examples:

InpBufType : char[ 128 ];

Matrix3D : real32[ 4, 4 ];

ScreenType: char[ 25, 80 ];

id1 : union

field_declarations

endunion;

This declaration creates a discriminate union type. The field declarations can be anything that is legal in the var declaration section (see the var section for details) including other composite types (records, unions, arrays, pointers, etc). HLA allows union constants, but only if all the fields are data types that may legally appear in a const declaration section (e.g., no pointer objects and no procedure objects). Unlike records, unions do not allow inheritence. All objects within a union begin at the same base address in memory.

Examples:

FourBytes:
union
a4: uns8[4];
b2: uns16[2];
c1: uns32;
endunion;

Str: union
s:string;
cp: [char];
endunion;

Note that a union type definition must have at least one field declaration or HLA will generate an error.

id1 : record

field_declarations

endrecord;

id2 : record inherits ( optional_base_type )

field_declarations

endrecord;

This declaration creates a record type. The field declarations can be anything that is legal in the var declaration section (see the var section for details) including other composite types (records, unions, arrays, pointers, etc). HLA allows record constants, but only if all the fields are data types that may legally appear in a const declaration section (e.g., no pointer objects and no procedure objects). If the "inherits" reserved word and optional_base_type identifier is present, then the base type identifier must also be a record type and the current record definition "inherits" all the fields from the base type (that is, all of the base record's fields are automatically included in the current record's definition).

Examples:

student:
record
name: string;
ID: char[11];
year: int8;
endrecord;

GradStudent:
record inherits (student )
ThesisTitle: string;
TA: boolean;
RA: boolean;
endrecord;

course:
record
instructor: string;
StudentCnt: int16;
CourseName: string;
CourseID: string;
endrecord;

Record type declarations may contain anonymous union fields. An anonymous union field is a union declaration without a preceding field name and colon. For example, consider the following record definition:

vType : enum { integer, real, str, character };

variant:
record
DataType: vType;

union
i : int32;
r : real64;
s : string;
c : char;
endunion;
endrecord;

Anonymous union fields add their field names to the list of names belonging to the outside record type. For example, if you have a variable "x" of type "variant" you could refer to the fields in the anonymous union as x.i, x.r, x.s, and x.c. Contrast this with the following record definition that would require you to use the field names x.u.i, x.u.r, x.u.s, and x.u.c, respectively:

vType2 : enum{ { integer, real, str, character };

variant2:
record
DataType: vType2;

u : union
i : int32;
r : real64;
s : string;
c : char;
endunion;
endrecord;

Note that a record definition must have at least one field present or HLA will generate an error.

Const Section

You may declare manifest constants in the CONST section of an HLA program. Manifest constants are named constant values. In particular, HLA can replace the name of a manifest constant by its actual value during the assembly process. The value of an HLA constant is bound at the moment the constant's declaration is encountered at assembly time. That is, a given constant can be given exactly one value (within the current scope) during assembly. It is illegal to attempt to change the value of a constant at some later point during assembly. Of course, at run-time the constant always has a static value.

Const objects can be one of the following types:

Boolean, enumerated types, Uns8, Uns16, Uns32, Byte, Word, DWord, Int8, Int16, Int32, Char, Real32, Real64, Real80, String, Cset, and Text.

Constants can also be arrays or recordsas long as all elements/fields of these composite objects are valid const objects.

The constant declaration section begins with the reserved word const and is followed by a sequence of constant definitions. The constant declaration section ends when HLA encounters a keyword such as const, type, var, val, etc. Actual constant definitions take the forms specified in the following subsections.

id1: forward( id2 );

Defers the definition of id1. See the description of forward in the TYPE section for more details.

id := expr;

Associates the value and type of expr with the name id. Future references to id within the current scope will use the value of the expression in place of the identifier. If expr evaluates to an array constant, id is stored as a single dimension array, even if you attempt a trick like declaring an array of array expressions. If expr consists of an array name, then id inherits the dimensions and type of the specified array name. The expression must be a constant expression whose value can be computed at the point of this particular constant declaration (i.e., no forward declared identifiers).

Examples:

u := 5;

i := -5;

i2 := u * i;

b := true;

c := `a';

s := "string";

a := [1,2,3,4];

id1 : id2 := expr;

This declaration defines a constant, id1, of type id2, that is given the value of expr. The type of id2 and the expression must be compatible. If id2 is an array type, the expression must be an array constant with the same number of elements; the arity (number of dimensions) does not need to agree as long as the element count is the same.

Examples:

i8 : int8 := -5;

i16 : int16 := -6;

s : string := "Hello World!";

// Assume array4x4 is defined as "array4x4 : uns8[4,4]"

a : array4x4 := [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ];

id1 : id2 [ dimension_list ] := expr2;

This declaration creates a constant, id1, that is an array of type id2 with the size and arity specified by id2 (if id2 is an array type) and the dimension_list (a comma-separated list of array dimension sizes). The id1 constant is given the value of the array constant specified by expr2 (which must have the same base type and number of elements, though not necessarily the same shape, as id2[dimension_list]).

Examples:

i8a : int8[4] := [1,2,3,4];

a4x4: uns8[4,4] := [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ];

// Assume array2x2 is defined as "array2x2:uns8[2,2]"

a2222 : array2x2[2,2] := a4x4;

a2222a : array2x2[2,2] := [ a2222[1], a2222[0] ];

Val Section

HLA allows a second type of constant declaration: the value declaration. The major difference between const and val symbols is that you can only bind a value to a const symbol once within a given scope; you may, however, bind different values to a val identifier within the same scope. At run-time, both const and val objects have a constant value (at least, at any given statement in the program). At assembly-time, however, it is better to view const objects as constants and val objects as assembly-time variables. The val declaration section begins with the reserved word val and continues until encountering another declaration section, a program unit (procedure, macro, etc), or the begin reserved word. The following subsections describe the legal syntax of the statements that may appear within the val section.

id1: forward( id2 );

Defers the definition of id1. See the description of forward in the TYPE section for more details.

id := expr;

Associates the value of the specfied constant expression with the identifier on the left hand side of the assignment operator. If id is already defined at within the current scope, it must have been defined as a val object. In this case, the type and value of the expression on the right hand side of the assignment operator replaces the current value and type of id.

id1 : id2;

This declaration defines object id1 to be a value of type id2, but does not associate a value with it. HLA will actually assign a value that roughly corresponds to zero to id1 (e.g., integer/unsigned zero, 0.0, false, #0, the empty string, the empty character set, etc.) although you should not depend upon this initialization within the body of your code. Id2 must be a type identifier that is a legal constant (val) type. The primary purpose of this declaration is to give a particular symbol a type when future assignments may not completely specify the type. For example, a future assignment like "v := 5;" doesn't really specify whether v is unsigned, signed, or generic, 8, 16, or 32 bits, etc. By predeclaring v as "v:uns32;" you can eliminate this ambiguity (HLA would default to uns32 in this case, but it is always better programming style to explicitly state the type of a constant object).

id1 : id2 [ bounds_list ];

Declares an array named id1 whose base type is id2 with the specified number of dimensions and elements (bounds_list is a list of comma-separated constant expressions that specifies the size of the array). HLA allocates storage for id1 (assembly-time storage) and initializes each element to a value that approximates zero for the given type. The ultimate purpose for this declaration is to allow you to fix the element type in the declaration section and then assign appropriate values (that could be one of many different types, e.g., uns8, uns16, or uns32) to the individual elements later in the code. If id1 already exists, the array declaration replaces its current type and value(s). Otherwise this declaration creates a new constant (val) object.

Examples:

a: int32[2,2,4];

b: Some_User_Type[5];

c: char[128];

d: cset[2];

id1 : id2 := expr

This declaration defines id1 to be of type id2 and is given the value of expr. Id2 must be a value type identifier (that is legal for constants) and expr must be type compatible with this type. If id1 is currently undefined in the current scope, HLA creates a new val object with the specified type and value. If id1 has already been defined in the current scope, HLA replaces its value and type with the type of id2 and the value of expr; the previous value of id1 would be lost in this case.

Examples: (assume array is defined in a type section as "array:uns8[2,2];")

i : int8 := -5;

u : uns8 := 0;

a : array := [1,2,3,4];

id1 : id2 [ bounds_list ] := expr;

Declares id1 to be an array of type id2 with the number of dimensions and elements specified by the bounds_list comma-delimited list of array bounds; this declaration also assigns the values of expr (which must be an array constant containing the same number of elements as id1) to id1. Id2 must be a valid constant (val) type. If id1 is already defined in the current scope, the new value of id1 replaces the old value.

Examples:

clrs : Colors[4] := [

red, green, green, yellow ]; //Assumes Colors is an enum type.

clrs2 : Colors[4] := clrs;

TwoByTwo : real32[2,2] := [1.0,4.0,2.5,3.0];

id1[ bounds_list ] := expr;

Id1 must be an array constant declared in a val section. This statement replaces the current value of the specified element of id1 with the value of the expr. The type of the expr must be assignment compatible with the type of the array element. If id1 has more dimensions that specified in bounds_list, then the expr must be an array constant with the same number of elements as the array slice selected from id1.

Examples:

clrs[0] := red;

clrs2[2] := blue;

TwoByTwo[0,0] := 0.0;

id1.fieldlist := expr;

Assigns the value of expr to the specified field of the record or union constant id1. Expr must be assignment compatible with the specified field. This val assignment replaces the current value of the specified field in id1. Id1 must have been previously declared as a record or union object.

Examples:

Pt.X := 0.0;

Pt.Y := 1.0;

Student.Name.Last := "Hyde";

Note: Of course, you can extrapolate the array and field access for recursively nested structures (i.e., arrays of records and fields that contain arrays). For example, given the type definitions:

type
Name : record
Last : string;
First: string;
MI : char;
endrecord;

Student : record
SName : Name;
ProjectScores : uns16[8];
ID : uns32;
endrecord;

And the following val declaration:

val
Course: Student[58];

Then the following are examples of legal statements in a val section (assuming the above are all still in scope):

Course[0].SName.Last := "Hyde"

Course[0].SName.First := "Randy"

Course[0].SName.MI := `L';

Course[0].ProjectScores[0] := 100;

Course[0].ProjectScores[1] := 65;

Course[0].ID := 555_12_5687;

Special Syntax for val objects: because it is sometimes convenient to modify a value object outside the val section, HLA provides a special syntax that allows you to insert any legal val statement whereever white space is legal in the program. By preceding a val declaration with a question mark ("?"), you may embed a val statement anywhere in the program. This allows you to use macros and other HLA features to automatically generate unique code within this other sections by using HLA's string handling facilities and a value object to generate unique labels. Consider the following example:

val
lblCntr : uns16 := 0;

const

@text( "L" + string(lblCntr) ) : uns16 := lblCntr;
? lblCntr := lblCntr + 1;
@text( "L" + string(lblCntr) ) : uns16 := lblCntr;
? lblCntr := lblCntr + 1;
@text( "L" + string(lblCntr) ) : uns16 := lblCntr;
? lblCntr := lblCntr + 1;
@text( "L" + string(lblCntr) ) : uns16 := lblCntr;

The sequence above would generate the statements:

L0 : uns16 := lblCntr;

? lblCntr := lblCntr + 1;

L1 : uns16 := lblCntr;

? lblCntr := lblCntr + 1;

L2 : uns16 := lblCntr;

? lblCntr := lblCntr + 1;

L3 : uns16 := lblCntr;

(Note that "@text" expands its string parameter as text at the point "@text" appears in the program.)

As you can see in this example, it was useful to be able to embed val statements within the const declaration section. Of course, this example would have been a little more realistic had it used macros, but that would have somewhat obfuscated the use of the val objects in this example.

The "?" operator is actually HLA's compile-time assignment statement (see the section on the HLA compile-time language for more details on the compile-time language). In addition to the straight-forward assignment noted above (which is syntactically identical to an assignment in the VAL section), the HLA compile-time assignment statement offers two additional forms:

?Scalar += expression;

?Scalar -= expression;

These forms add or subtract the value of the expression on right hand side to/from the scalar variable (non-array/non-record) on the left hand side of the "+=" or "-=" operator. C/C++ and Java programmers should be familiar with this operator. Note that HLA only allows scalar variables on the left hand side of the operator. This isn't a major limitation because 99% of the time you'll just be incrementing or decrementing compile-time variables (VAL objects) with these operators. And you can always use a statement of the form:

?CompositeObject := CompositeObject + expression;

?CompositeObject := CompositeObject - expression;

For non-scalar compile-time variables.

Var Section

HLA supports two basic types of variables: static variables and automatic variables (automatic variables are also known as semi-dynamic variables). HLA assumes that automatic variables are allocated on the stack in the activation record of the current program unit (e.g., a procedure); it assumes that static variables are allocated in the static data area (e.g., the data segment). You declare automatic run-time variables in the var portion of an HLA declaration section.

Example:

var
i: int8;
u: uns16;
d: dword;
r: real64;

Unlike const and val objects, you cannot assign values to a var object during assembly. Therefore, the var declaration section is rather simple and straight-forward -- you can associate a data type with name and that's about it.

HLA assumes that all var objects are allocated on the stack immediately below the frame pointer (EBP is usually the frame pointer value in a typical assembly language program). For each variable in a program unit, HLA subtracts the size of the object from the current variable offset and uses the result value as the offset for the variable. For example, HLA would associate the following offsets with each of the corresponding variables:

var

i : int8; // offset -1

j: int16; // offset -3 (-1 minus the size of an int16 [-2] produces -3).

k:int 32; // offset -7 (-3 minus the size of an int32 [-4] produces -7).

a:uns8[9]; // offset -16 (-7 minus the size [9 bytes] is -16).

etc.

In addition to the syntax used above, HLA provides some addition forms of the var declaration that lets you control the alignment and offsets of the variables. The generic syntax is (braces surround optional items):

var { [ maxAlignment { : minAlignment } ] { := startingOffset ;}

Here are some examples that demonstrate all the possible forms:

var [ 4 ]

var [4:2]

var := -4;

var [4] := -4;

var [4:2] := -4;

The maxAlignment value specifies the largest boundary upon which HLA will align all the variables in this particular var section. For example, "var [4:2]" tells HLA to align all variables on no greater than a double word boundary within the activation record.

The minAlignment value specifies the smallest boundary upon which HLA will align all the variables in the particular var section. Note that you may not specify a minAlignment value without also specifying a maxAlignment value, though you may specify a maxAlignment value without a minAlignment value (in which case HLA uses the value you specify for both the minAlignment and maxAlignment values). The default value, if you do not specify an alignment value at all, is one for both minAlignment and maxAlignment .

When HLA processes the VAR section it maintains an internal "current offset" variable. At the beginning of a procedure, HLA initializes this value to zero22. As you declare automatic variables in the var section, HLA drops the value of this current offset variable by the size of the object and then uses the new current offset value as the offset for that variable. For example, if a procedure has a single declaration, as follows:

var

b:byte;

The HLA assigns the offset "-1" to b since b is one byte long (and zero minus one is "-1").

Whenever you specify alignment values, HLA will choose an offset within the activation record that is either the size of the object or minAlignment if the object's size is less than minAlignment ; or maxAlignment if the object's size is greater than maxAlignment . For example, with the following declaration the alignment chosen will be two since the object's size (one) is less than the minAlignment value:

var [4:2]

b:byte;

Since HLA aligns b on an even offset, b's offset will be -2 rather than -1.

If the size of the object is greater than the maxAlignment value, then HLA will align the object on a boundary that is a multiple of the maxAlignment value. For example, the following declaration aligns q on a boundary that is an even multiple of four, but not necessarily an even multiple of eight ( q is a quadword value):

var [4:2]

b:byte; // Offset = -2

q:qword; // Offset = -12

If the size of an object falls between the minAlignment and maxAlignment values, inclusive, then HLA will align the object on an offset that is an even multiple of the object's size. The following declaration aligns all objects on a boundary that is an even multiple of their size unless the object is larger than eight bytes:

var [8:1]

b:byte; // offset = -1

w:word; // offset = -4

d:dword; // offset = -8

b2:byte; // offset = -9

q:qword; // offset = -24

One very important thing to note about these offsets - the fact that an offset is aligned on a particular boundary provides no guarantee that the object is aligned on that same boundary in memory. These offsets are based upon the value in EBP and if the value in EBP is not aligned on the largest boundary you specify in a var section, then that variable will not be aligned on the desired address. Generally (though this is certainly not guaranteed), the stack is aligned on a double-word boundary (and, therefore, EBP usually is as well). So you can probably count on alignments up to four, but anything after that will require special coding on your part (within the procedure that needs the alignment) do guarantee memory address alignment on a larger alignment boundary.

Note that HLA resets the minAlignment and maxAlignment values back to zero at the start of each var section. So if you do the following, on the variables appearing in the first var section obey the alignment:

var [4]

a:byte; // offset = -4

b:dword; // offset = -8

var

c:byte; // offset = -9

d:dword; // offset = -13

Also, remember that you can change the alignment of a single variable by using the align directive in the var section. The align directive only temporarily changes the alignment for a single variable declaration. Immediately after the variable, the minAlignment/maxAlignment values again control the alignment, e.g.,

var [4:2]

b:byte; // offset -2

w:word; // offset -4

d:dword; // offset -8

align(1);

b2:byte; // offset -9

d2:dword; // offset -16 -- Goes back to [4:2] alignment.

The other optional item you can attach to a var declaration is to assign a starting offset to the section. This is done by following the var reserved word or the alignment option with the assignment operator (":="), an integer constant expression, and a semicolon, e.g.,

var := -8;

<< declarations >>

var [4:2] := -8;

<< declarations >>

As noted earlier, HLA will normally use a starting offset of zero when it encounters the first var declaration section in a procedure. HLA subtracts the size of an object from the current offset prior to assigning the offset to the variable. The offset assignment option, above, lets you choose a different starting value. Also note that the first declaration after such a var clause will use the offset assigned; it will not first subtract its size, e.g.,

var := -8;

b:byte; // offset -8 (not -9!)

c:char; // offset -9

w:word; // offset -11

If a declaration section contains two var declaration sections, then the second will continue to use the current offset value at the point of its declaration unless it has an explicit offset value, e.g.,

var := -8;

b:byte; // offset -8 (not -9!)

var

c:char; // offset -9

var

w:word; // offset -11

var := -8;

AlsoB:byte; // offset -8 (alias to "b" above)

Probably the only sane reason for playing around with the starting offset in the var section is because you're not going to build a standard activation record and access your automatic variables by indexing off EBP . If you aren't constantly pushing and popping data throughout the execution of your procedure, you might be able to index all your locals off ESP and save having to preserve, setup, and restore EBP in your procedure.

Statements in the var section can take one of the following forms:

align( expr );

As noted above, this temporarily sets the alignment for the next variable you declare in the current var section (this alignment will not carry over into another var section later in the procedure).

id1: forward( id2 );

Defers the definition of id1. See the description of forward in the TYPE section for more details.

id1 : [ id2 ];

Declares id1 to be a pointer to an object of type id2. Since HLA generates code for the 32-bit flat model, pointers are always 32-bit offsets. Hence, HLA always reserves exactly four bytes for a pointer object (regardless of what type the variable is pointing at). If id2 is not defined at the point of id1's declaration, then id2 must be defined before the end of the current program unit (that is, id2 must be defined in the same scope as id1).

id : procedure; options

id : procedure ( parameter_list ); options

These declarations define a procedure pointer variable. Like other pointers, procedure pointers are four-byte objects. If HLA encounters id as a statement in the main body of the a program unit, it will automatically emit an indirect call through this pointer variable. See the section on procedures for the syntax of a valid parameter list. The legal procedure options include @pascal, @cdecl, @stdcall , and @returns . Note that the @ pascal, @cdecl , and @ stdcall options are mutually exclusive. See the section on procedure declarations for a discussion of all these options.

id : enum { enum_list };

Declares id to be an enumerated data type whose run-time values can be one of the identifiers appearing in the enum_list (a comma-separated list of identifiers given the consecutive values 0, 1, 2, etc.). By default, HLA reserves one byte of storage for enumerated data types.

id1 : id2 ;

Declares the variable id1 to be an object of type id2. Allocates enough space for id1 to hold a value of type id2. Id2 must be defined at the point of id1's declaration.

id1 : id2 [ expr ] ;

Id1 is an array whose elements are of type id2. There will be expr elements in this array (expr is a constant expression that HLA computes at assembly time). HLA allocates sufficient storage for the array in the activate record and associates the lowest address of this block of memory with the symbol id1 (i.e., the base address of the array).

id1 : record

field_definitions

endrecord;

This declaration declares an automatic variable that is a record type. See the description of records in the section on type declarations for more details. HLA computes an offset for id1 that will reserve sufficient space in the activation record for the specified record data.

id1 : union

field_definitions

endunion;

This declaration declares an automatic variable that is a union type. See the description of unions in the section on type declarations for more details. HLA computes an offset for id1 that will reserve sufficient space in the activation record for the largest object in the union.

Note that you cannot declare class variables directly in the var section. You must define a class type in the type section and then declare a variable of the specified type.

Static Section

The static section is syntatically similar to the var section except it begins with the reserved word "static" rather than "var". One difference is that the static section only allows a single alignment form:

static ( const_expr )

<< declarations >>

This declaration will align the next declaration on the boundary specified. The value of const_expr should be 1, 2, 4, 8, 10, or 16. Warning: this feature is depreciated. Use the align directive instead. This will be changed in a future version of HLA (to match the alignment options for records and the var section). Code that uses this syntax will break at that time.

You can also use the align directive within the static section to force the alignment of the next variable you declare. This directive uses the following syntax:

align( constant );

The constant value should be 1, 2, 4, 8, 10, or 16. This is the preferred way to align a single static variable declaration to a particular boundary in the static section.

HLA assumes that all static objects are allocated in a global data area (e.g., the data segment). For each variable in a program unit, HLA allocates storage for the object in successive memory locations in the global segment. For example, HLA would associate the following offsets with each of the corresponding variables (assuming no other static objects at this point):

static

i : int8; // offset 0

j: int16; // offset 1 (The size of an int8 [1] produces an offset of one).

k:int 32; // offset 3 (the size of the previous variables).

a:uns8[9]; // offset 7 (the size of the previous variables).

etc.

Unlike objects in the var section, static variables can be initialized during assembly. The syntax is similar to that used by the val section, e.g.,

static

i : int8 := -2; // Initializes i with $FE when program loads into memory.

j: int16 := 20; // Initializes j with 16.

k:int 32 := 0; // Initializes k with zero.

a:uns8[9] := [0, 1, 2, 3, 4, 5,6 ,7 ,8 ];

// Initialize array with specified values.

Each of the values used to initialize static variables must be constants or constant expressions. Note that the initialization only occurs once, when the program is loaded into memory. Static initialization that occurs inside a procedure does not imply that initialization occurs on each call of the procedure.

To initialize procedure variables (i.e., procedure pointers) you would normally take the address of a procedure using the "&" (static address-of) operator. Here's the syntax for procedure variables in the STATIC section:

id : procedure; options optional_external

id : procedure ( parameter_list ); options optional_external

id : procedure := &procedure_name; options optional_external

id : procedure ( parameter_list ) := &procedure_name; options optional_external

The options are the same as for procedure declarations in the VAR section with the addition of certain "variable options" you'll read about in a later section (See Variable Options). The optional_external clause is either @EXTERNAL or @EXTERNAL ( "external_name" ).

Within the body of a procedure or program you may also embed static variable declarations using the static..endstatic directives. E.g.,

mov( 0, ax );

static

i:int32;

endstatic;

mov( ax, bx );

Note that HLA still inserts the variables into the data segment area. The variable " i " in the example above is not inserted into the machine code between the two MOV instructions. The object code for the two MOV instructions is adjacent in the emitted code. The principal reason for having the static..endstatic section is to allow macros to create static variables on the fly (unfortunately, there is no good way to generate automatic [ var ] variables within the middle of the code, so this only works for static objects).

Variables appearing in the static section are always initialized. If you do not specify an initial value, HLA automatically initializes the variable with zero.

In general, you can assume that variables you declare in the same static section ( static, readonly, storage , or segment ) are adjacent to one another in memory. HLA, MASM, and the linker will typically assign higher memory addresses to variables declared later in the same static section as other variables. You may not, however, make any assumptions about variables declared in different static sections, even if those static sections are adjacent to one another in the source code. I.e.,

static

i: int32;

j: int32;

static

k: int32;

You can assume that i and j are adjacent (and j immediately follows i in memory). You cannot assume anything about the placement of k with respect to i or j . The k variable could come before or after i and j , and there could be other objects between them. Note that the adjacency of objects in HLA v2.0 may not be the same as v1.x, so you should not count on the adjacency of variables in v1.x if you can help it.

You can also place "unlabelled" data values into the static data section. Unlabelled data objects take the following form:

typeID list_of_constants ;

TypeID must be a predeclared type identifier (e.g., a predefined type like dword or a type you've declared in the type section). The list_of_constants component must be a comma separated list of one or more constant items. Each constant in the list must be the type specified by typeID. Examples:

type
eType: enum {e, f, g};

static
eVar: eType:= e;
eType e, e, f, g, f, e;

pStr: byte := 12;
byte "Hello There";

Assuming enums are one byte objects (the default), these declarations create an array of seven eType objects and a "Pascal" string consisting of a length byte followed by the specified number of characters.

The example above shows that string literals may appear in a byte statement. This does not output an HLA string constant, instead it simply outputs the sequence of characters in the string with no extra data (i.e., no length values and no zero terminating byte). If you need these, you can manually add them.

Initialized string constants store the pointer to the specified string in the static segment and the actual string data in a special (inaccessible to you) segment. Therefore, if you have a declaration like the following:

static
s:string := "hello";

The string variable s consists of a single dword pointer. This pointer, initialized to point at the string data, is created in the static segment in memory. The actual characters, along with the two length dwords and zero terminating byte associated with HLA strings, is stored into the "strings" memory segment. The upshot of this is that you cannot overwrite a string variable allocated in this fashion. If you absolutely, positively, must be able to overwrite literal string constants at run-time (a very poor practice), you can achieve this as follows:

static

s: string := &sss;

dword 5; // MaxStrLen value.

dword 5; // length value

sss: byte := 'h';

byte "ello", 0,0,0;

Note that some HLA library routines assume that the string data is an even multiple of four bytes long. Hence the extra zeros (padding) in the this example. Also note that string literals appearing in a byte directive do not output HLA style strings. This example also demonstrates that you can assign a pointer constant ("&sss" above) to a string variable. This is legal because, after all, strings in HLA are really nothing more than pointers to the actual data.

Like the VAR section, you may use the forward clause to defer the definition of a symbol in the STATIC section, e.g.,

id1: forward( id2 );

Defers the definition of id1. See the description of forward in the TYPE section for more details.

The STATIC section supports a special syntax that lets you associate an address and type with a variable without actually reserving any storage for that object. That syntax is as follows:

id: type; @nostorage;

The address of the variable id is the same address as whatever declaration happens to follow in memory (generally the next declaration in the STATIC section). This is quite useful for creating aliases:

ValAsWord: word ; @nostorage;
ValAsDword: dword;

In this example, ValAsWord and ValAsDword both refer to the same memory location because no storage is actually associated with the ValAsWord identifier.

Another use of the @ nostorage option is to create an arbitrary table of values using the unlabelled data feature of the STATIC section, e.g.,

MyTable: dword; @nostorage;

dword 0, 1, 2, 3;

This example creates an array of data with four dwords.

Segments

Although HLA does not support 80x86 segmentation, it does allow you to create your own named segments in the variable declaration section23. The primary purpose for segments is to allow you to create named segments in memory with special names for interface to high level languages and other code that expects a certain segment name or alignment type. The general syntax for a segment declaration is the following

segment segmentID ( alignment, "class" );

<< static declarations >>

segmentID is the name of the segment you wish to create. This must be either a unique identifier in the program or the name of an existing segment. Note that segment names are not lexically scoped. That is, segment names are global even if you define them inside a procedure. If you define multiple segment sections with the same name, HLA combines them all into the same memory segment.

The alignment parameter must be one of the following: byte, word, dword, para, or page. This option defines the alignment boundary in memory for the start of the segment. This value should be greater than or equal to the largest align value you specify within the segment (e.g., use PARA if you have an ALIGN(16) directive).

The class string specifies the combine class for this segment. This is usually the segmentID enclosed within quotes, but you can specify a common data for several different segments and the linker will combine these segments together during the link phase. "data" is a good combination string if you want your segments merged with the HLA static data in the STATIC section.

See the section on Segment Names a little later in this document for more details on the SEGMENT directive.

Following the segment statement, up to the next VAR, STATIC, etc., statement come the variable declarations for this particular segment. The segment section accepts the same declarations as the STATIC section.

Readonly Section

The readonly section is another section where you may declare static variables. The syntax is very similar to the static declaration with the following three differences:

You use the "readonly" reserved word rather than "static" to begin the declarations.

All variables you declare in a readonly section must have an initializer.

Any attempt to write to the variable at run-time will produce a run-time error24.

Any variable you declare in a readonly section winds up in the READONLY segment in memory. Note that HLA also emits certain constant objects to the readonly memory segment. Hence, there is no guarantee that two adjacent declarations in a readonly section will consume adjacent memory locations at run-time. E.g., consider the following code:

readonly
s: string := "hello world";
i: int32 := 10;

The READONLY section lets you emit unlabelled data within the segment. Unlabelled data consists of a type name followed by a parentheses, a list of objects of the specified type, and a closing parenthesis. E.g., "int32 0, 1, 2, 3;" emits four dwords containing the values zero, one, two, and three at the current point at in the readonly segment. See the discussion in "Static Section" for more details.

Like the static section, you can specify the alignment of the first declaration by specifying the alignment value within parentheses after the readonly keyword:

readonly(4)

AlignedOn4: uns32 := 32;

However, this feature is being depreciated and you should not use it. Instead, you should use the align directive as in the static section.

Like the static section, you may use the @ nostorage option to define a name without actually allocating storage.

HLA also provides a readonly..endreadonly block that may appear in the code segment. Variables you declare in such a section are moved to the readonly segment in memory. E.g.,

mov( 0, ax );

readonly

ro:int32 := 10;

endreadonly;

mov( ax, bx );

Storage Section

The storage section is yet another static variable declaration section. Unlike the static section, however, you cannot initialize variables in the storage section - it simply reserve storage for uninitialized variables. Note that variables declared in the storage section go into the "bssseg" segment in memory, so they are in a different segment than variables you declare in the static or readonly sections.

Example:

storage
i:uns32;
j:int8;

Like the static section, you can specify the alignment of the first declaration by specifying the alignment value within parentheses after the storage keyword:

storage(4)

AlignedOn4: uns32;

Again, like the static and readonly sections, this feature is depreciated and will go away soon. You should use the " align " directive instead.

Note that it is not legal to put unlabelled objects in the storage section. Unlabelled data objects may only appear in a declaration section that supports initialization (i.e., static or readonly ). However, the @ nostorage option is perfectly legal in the storage section.

Variable Options

The syntax for the declarations appearing the the previous sections is not totally complete. Variable declarations in the static , readonly , and storage sections also allow certain options following the declarations. This section discusses those options.

A typical declaration in one of the static sections ( static , readonly , or storage ) takes the following form:

varname : vartype; options

The previous sections discuss the varname and vartype components, they are not particularly interesting to us in this section. Of interest is the (optional) options component. This is a sequence of zero or more keywords that provide the HLA compiler with additional information about these symbols.

Actually, there are three types of options that may follow a variable in one of the static sections, depending on the type of the variable. These are variable options (proper), procedure options (for procedure variables), and the @ external option. If multiple types of options appear after a variable declaration, they must appear in this order (variable, procedure, @external ). However, within one of these sets of options, the order of the individual options is irrelevant (e.g., the order of the @ nostorage and @volatile options within the variable options section doesn't matter). Here are the option types:

Variable Options:

@nostorage;
@volatile;

Procedure Options:

@pascal;
@cdecl;
@stdcall;
@returns( "string" );

External Option:

@external;
@external( "string" );

The procedure options may only appear after a procedure variable; these options are not legal following other types of variable objects.

The @NOSTORAGE Option

The @ nostorage option tells HLA to associate the current offset in the segment with the specified variable, but don't actually allocate any storage for the object. This option effectively creates an alias of the current variable with the next object you declare in one of the static sections. Consider the following example:

static

b: byte; @nostorage;

w: word; @nostorage;

d: dword;

Because the b and w variables both have the @ nostorage option associated with them, HLA does not reserve any storage for these variables. The d variable does not have the @ nostorage option, so HLA does reserve four bytes for this variable. The b and w variables, since they don't have storage associated with them, share the same address in memory with the d variable.

Note that is is not legal to supply an initializer to a variable that has the @ nostorage option. I.e., the following is illegal:

IllegalDeclaration: byte := 5; @nostorage;

This should be obvious since an initializer supplies initial data for the variable's storage, yet the @ nostorage option implies that no such storage exists.

The @ nostorage option is legal in the readonly section. As noted above, however, you cannot supply an initial value for an object when specifying the @ nostorage option. Normally, though, declarations in the readonly section require an initializer. HLA will allow a readonly variable declaration without an initializer if the @ nostorage option appears. This lets you create aliaes in the readonly section, e.g.,

readonly

alias: byte; @nostorage;

aliased: byte := 0;

Both alias and aliased refer to the same value in memory (zero in this case).

Note to long-time HLA users (and those reading code written by long-time HLA users). HLA v1.25 and earlier supported a fourth static variable declaration section, DATA. As of HLA v1.26 this static section no longer exists. In the DATA section, all variables had an implied "@ nostorage " option associated with them. This section was removed after the @ nostorage option was added to the language since the DATA section is not superfluous. If you find a DATA section in some HLA code, simply change it to a static section and attach the @ nostorage option to all variables appearing in that section.

The @VOLATILE Option

The @ volatile option is the second variable option. Currently, HLA ignores (though allows) this variable option. The purpose of this option is to tell the compiler that a variable's value can change unexpectedly due to hardware access to this object or via modification by a different thread of execution. An optimizer would use this information to take special care when manipulating volatile objects. However, since HLA v1.x does not support an optimizer (that is slated for v2.x), HLA cannot currently make use of this information.

Although HLA currently ignores the @volatile option, you should use it if a variable is indeed volatile. First, this is a good way to document the fact that the variable's value can change unexpectedly. Second, when HLA v2.x finally begins to utilitize this information, you won't have to go back and change your source code to accomodate the optimizer.

Example:

static

v: dword; @volatile;

Note: the @ volatile option is legal in the var section as well as the static sections.

The @PASCAL, @CDECL, and @STDCALL Options

These three options are procedure options and are only legal following a procedure variable declaration. Remember that the @ volatile or @ nostorage options must appear before all procedure options; so if you use one of these three options along with one or more of the variable options, these options must follow all the variable options.

The @ pascal, @cdecl, and @stdcall options are mutually exclusive25. They define the calling sequence HLA will use when calling the procedure variable you are declaration with these options. If none of these options appears, then HLA will assume the use of the pascal calling convention.

The @ pascal calling convention pushes parameters in the order of their declaration (left to right in the parameter list) and it is the procedure's responsibility to remove the parameters from the stack upon return. The @ cdecl calling convention pushes the parameters in the opposite order of their declaration (right to left in the parameter list) and it is the caller's responsibility to remove the parameters from the stack when the procedure returns. The @ stdcall calling convention pushes the parameters in the reverse order, like @ cdecl , but it is the procedure's responsibility to remove the parameters (like the @ pascal convention).

For more details, see See Procedure Declarations.

The @RETURNS Option

As for procedure declarations, (see See Procedure Declarations), the returns option lets you specify a string that HLA substitutes for a procedure invocation when using instruction composition. For more details, see See The 80x86 Instruction Set in HLA.

The @EXTERNAL Option

The @ external option gives you the ability to reference static variables that you declare in other files. Like the @external clause for procedures, there are two different syntax for the external clause appearing after a variable declaration:

varName: varType; @external;

varName: varType; @external( "external_Name" );

The first form above uses the variable's name for both the internal and external names. The second form uses varName as the internal name that HLA uses and it associates this varible with external_Name in the external modules. The @external option is always the last option associated with a variable declaration. If other options (like @ nostorage or @ stdcall ) also appear, they must appear before the @ external clause. Don't forget that all external names in an HLA program must be compatible with the assembly code that HLA emits. For example, if you're emitting MASM code, you must not use any MASM reserved words for your external symbols.

You may only attach the external clause to static objects (those you declare in a static , readonly , or storage section). Automatic ( var ) variables can never be external. Note that, unlike external procedures, you may declare external variables at any lexical scope level. You can even declare (static) objects in a class to be external.

Of course, if you declare an object to be external, you are making a promise to HLA that you will define that variable in a different object module. If you do not, then the linker will complain about an "unresolved external" when it attempts to link your modules together.

If the actual variable definition for an external object appears in a source file after an external declaration, this tells HLA that the definition is a public variable that other modules may access (the default is local to the current source file). This is the only way to declare a variable to be public so that other modules can use it. Usually, you would put the external declaration in a header file that all modules (wanting to access the variable) include; you also include this header file in the source file containing the actual variable declaration. Note that HLA scoping rules still apply, so if you put the external declaration at one lex level and the variable definition at a different lex level, HLA will treat them as separate objects, e.g.,

static i:int32; @external;

procedure HideI;

static i:int32; // Not the same I as above!

begin HideI;

end HideI;

You cannot place an external declaration after a variable definition in the source file; HLA will complain about a duplicate defined symbol if you do. HLA will also complain if an external definition of a variable appears twice in a source file.

Segment Names

By default, HLA uses the following segment names: _TEXT for code, _DATA for the STATIC section, READONLY for the READONLY section, _BSS for the STORAGE section, and CONST for internally generated constants (this segment is not normally accessible in your programs). Although you can insert data directly into these segments by using a segment declaration, you should avoid using these segment names in your own SEGMENT declaration sections (especially avoid READONLY and CONST). If you want to use one of these segments, use the appropriate HLA data declaration section.

When interfacing with other code (e.g., a high level language) you may need to change the default names that HLA uses for these segments. The #code, #static, #readonly, #storage, and #const directives let you do this. These directives let you change the name of the _TEXT, _DATA, READONLY, _BSS, and CONST segments, respectively. The syntax for these directives is the following:

#xxxx( "segmentName", "alignment", "class" )

where "#xxxx" represents one of these directives.

Note that all three parameters are string constants. The first parameter specifies the segment name and should be a legal (MASM) segment name. If you supply an illegal name here, HLA will not complain but MASM will report an error when it attempts to assemble the HLA output file. There are three special names that HLA recognizes for the segmentName string: ".code", ".data", and ".bss". These are the default names HLA uses for the _TEXT, _DATA, and _BSS segments, respectively. If you supply an empty string as the segment name parameter, HLA will use these default names (though this is unnecessary since HLA already uses these names by default).

If you specify ".code" (or the empty string) then HLA ignores the second and third parameters and uses the simplified segment directive ".code" to specify code segment output. This creates a code segment with the name "_TEXT" and several other assembler-dependent default values.

The ".data" segment name (or the empty string) tells HLA to use the simplified segment directive ".data" for STATIC sections in your programs. This creates a data segment with the name "_DATA" and other assembler-dependent default values.

The ".bss" segment name tells HLA to use the simplified segment directive ".bss" for the STORAGE segment variables. This creates a non-initialized data segment named "_BSS" that uses assembler-dependent default values for alignment and class.

Code you link with other programs (written in other languages) stand a much better chance of linking properly if you use the default segment names ".code", ".data", and ".bss" rather than making up your own names. Since MASM generally picks default alignment and class values for these segments that are compatible with high level (and other) languages, you'll probably have fewer problems if you stick with the defaults. You should only change the segment names if you've got some special interface requirements.

The second parameter is the segment alignment value. It must be BYTE, WORD, DWORD, PARA, or PAGE. HLA will report an error if you supply a string containing some other text. This alignment directive specifies the boundary on which a segment may begin. PARAgraph is a 16-byte boundary while PAGE specifies a 256-byte boundary. The HLA default is PARA. In stand alone HLA programs segments get aligned on 4096-byte boundaries, but high level languages and other code may have different ideas about the alignment. Note that MASM requires PARA alignment for the _TEXT, _DATA, _BSS, and CONST segments. You'll have to rename these segments if you want to use a different alignment value when using MASM to assemble HLA's output. Other assemblers may have their own restrictions on this field.

The third parameter is the segment group class. This is usually the same string as the first parameter, but you can group several segments together by using a common class name (e.g., "DATA" for data segments, "CODE" for code segments, etc.).

You may only use these directives once in a program and they must appear before the UNIT or PROGRAM statement in the source file. In general, you should only use these directives to change HLA's default segment names if you're writing code to interface with a high level language (or some other system) and that language requires that you use specific segment names for code, data, uninitialized data, etc.

HLA ".link" Files (Windows Only)

During compilation, HLA creates a file with the same prefix as the source file name and a ".link" suffix. This file lists all the segment names in use by the compilation. You may use this file (or a set of these files) as input to Microsoft's linker to specify the segments and their attributes. The typical ".link" file that HLA produces, assuming you don't define any user segments, contains the following:

/section:.code,ER

/section:.rdata,R

/section:readonly,R

/section:.data,RW

/section:.bss,RW

Note that .code is "_TEXT", .bss is "_BSS", and .data is "_DATA". MASM allows you to use either of these names, but the linker only understands the dot-names. Note that the ".rdata" section is associated with HLA's internal CONST segment.

If you create any user segments, HLA will add an appropriate "/section" statement to this file listing that segment. The E, R, and W options on each line specify whether the segment may contain executable code, is readable, or is writable.

Normally, you'd supply this file as a command file to the Microsoft linker using the command line option "@file.link" where "file" is the name of the link output file that HLA produces. Usually, HLA automatically does this for you. You can see the full LINK.EXE command line by running HLA in verbose mode with the "-v" command line option. See the information about the HLA "-@" command line option if you want to control the emission of a ".link" file by HLA.

Namespaces

A namespace declaration takes the following form:

namespace identifier;

<< declarations >>

end identifer;

To access an identifier declared in in namespace, you would preface the identifier with the name of the namespace and a dot (similar to a record, class, or union reference).

Within a namespace, you normally may only access other identifiers defined previously in that same namespace. Since you may sometimes need to access other identifiers (especially namespace'd identifiers) outside the current namespace, a special lexeme has been added to the language to provide access to global objects: "@global:identifier". This form tells HLA to ignore any local symbols (in the current namespace) and only look outside the current namespace for the specified identifier.

If you declare a second namespace using the same namespace identifier as a previous namespace, then HLA will append those names to the end of the existing namespace. This only applies if the new namespace identifier is at the same lex level (the same scope) as the previous namespace. I.e., if you create a local namespace in a procedure using the same name as a global namespace, then the normal rules of scope apply and the new namespace is local to that procedure and overrides the global definition.

Macros

HLA has one of the most powerful macro expansion facilities of any programming language. HLA's macros are the key to extended the HLA language. The following subsections describe HLA's powerful macro processing facilities.

Standard Macros

HLA provides powerful macro capabilities. You can declare macros in the declaration section of a program using the following syntax:

#macro identifier ( optional_parameter_list ) ;

statements

#endmacro

Note that a semicolon does not follow the #endmacro clause. However, since HLA currently requires all macro definitions to appear in the declaration section, you can place a semicolon after the #endmacro if you prefer (since HLA allows lone semicolons in a declaration section).

Example:

#macro MyMacro;

?i = i + 1;

#endmacro

The optional parameter list must be a list of one or more identifiers separated by commas. Unlike procedure declarations, you do not associate a type with macro parameters. HLA automatically associates the type "text" with all macro parameters (except for one special case noted below). Example:

#macro MacroWParms( a, b, c );

?a = b + c;

#endmacro

Optionally, the last (or only) name in the identifier list may take the form " identifier []". This syntax tells the macro that it may allow a variable number of parameters and HLA will create an array of string objects to hold all the parameters (HLA uses a string array rather than a text array because text arrays are illegal). Example:

#macro MacroWVarParms( a, b, c[] );

?a = b + text(c[0]) + text(c[1]);

#endmacro

If the macro does not allow any parameters, then you follow the identifier with a semicolon (i.e., no parentheses or parameter identifiers). See the first example in this section for a macro without any parameters.

Occasionally you may need to define some symbols that are local to a particular macro invocation (that is, each invocation of the macro generates a unique symbol for a given identifier). The local identifier list allows you to do this. To declare a list of local identifiers, simply following the parameter list (after the parenthesis but before the semicolon) with a colon (":") and a comma separated list of identifiers, e.g.,

#macro ThisMacro(parm1):id1,id2;

...

HLA automatically renames each symbol appearing in the local identifier list so that the new name is unique throughout the program. HLA creates unique symbols of the form " _XXXX_ " where XXXX is some hexadecimal numeric value. To guarantee that HLA can generate unique symbols, you should avoid defining symbols of this form in your own programs (in general, symbols that begin and end with an underscore are reserved for use by the compiler and the HLA standard library). Example:

#macro LocalSym : i,j;

j: cmp(ax, 0)

jne( i )

dec( ax )

jmp( j )

#endmacro

Without the local identifier list, multiple expansions of this macro within the same procedure would yield multiple statement definitions for " i " and " j ". With the local statement present, however, HLA substitutes symbols similar to " _0001_ " and " _0002_ " for i and j for the first invocation and symbols like " _0003_ " and " _0004_ " for i and j on the second invocation, etc. This avoids duplicate symbol errors if you do not use (poorly chosen) identifiers like " _0001_ " and " _0004_ " in your code.

The statements section of the macro may contain any legal HLA statements (including definitions of other macros). However, the legality of such statements is controlled by where you expand the macro.

To invoke a macro, you simply supply its name and an appropriate set of parameters. Unless you specify a variable number of parameters (using the array syntax) then the number of actual parameters must exactly match the number of formal parameters. If you specify a variable number of parameters, then the number of actual parameters must be greater than or equal to the number of formal parameters (not counting the array parameter).

During macro expansion, HLA automatically substitutes the text associated with an actual parameter for the formal parameter in the macro's body. The array parameter, however, is a string array rather than a text array so you will have force the expansion yourself using the "@text" function:

#macro example( variableParms[] );

?@text(variableParms[0]) := @text(variableParms[1]);

#endmacro

Actual macro parameters consist of a string of characters up to, but not including a separate comma or the closing parentheses, e.g.,

example( v1, x+2*y )

" v1 " is the text for parameter #1, " x+2*y " is the text for parameter #2. Note that HLA strips all leading whitespace and control characters before and after the actual parameter when expanding the code in-line. The example immediately above would expand do the following:

?v1 := x+2*y;

If (balanced) parentheses appear in some macro's actual parameter list, HLA does not count the closing parenthesis as the end of the macro parameter. That is, the following is perfectly legal:

example( v1, ((x+2)*y) )

This expands to:

?v1 := ((x+2)*y);

If you need to embed commas or unmatched parentheses in the text of an actual parameter, use the HLA literal quotes "#(" and ")#" to surround the text. Everything (except surrounding whitespace) inside the literal quotes will be included as part of the macro parameter's text. Example:

example( v1, #( array[0,1,i] )# )

The above expands to:

?v1 := array[0,1,i];

Without the literal quote operator, HLA would have expanded the code to

?V1 := array[0;

and then generated an error because (1) there were too many actual macro parameters (four instead of two) and (2) the expansion produces a syntax error.

Of course, HLA's macro parameter parser does not consider commas appearing inside string or character constants as parameter separators. The following is perfectly legal, as you would expect:

example( charVar, `,' )

As you may have noticed in these examples, a macro invocation does not require a terminating semicolon. Macro expansion occurs upon encountering the closing parenthesis of the macro invocation. HLA uses this syntax to allow a macro expansion anywhere in an HLA source file. Consider the following:

#macro funny( dest )

, dest );

#endmacro

mov( 0 funny( ax )

This code expands to "mov( 0, ax );" and produces a legal machine instruction. Of course, the this is a truly horrible example of macro use (the style is really bad), but it demonstrates the power of HLA macros in your program. This "expand anywhere" philosophy is the primary reason macro invocations do not end with a semicolon.

Multi-part (Context Free) Macro Invocations:

HLA macros provide some very powerful facilities not found in other macro assemblers. One of the really unique features that HLA macros provides is support for multi-part (or context-free) macro invocations. This feature is accessed via the #terminator and #keyword reserved words. Consider the following macro declaration:

program demoTerminator;

#include( "stdio.hhf" );

#macro InfLoop:TopOfLoop, LoopExit;

TopOfLoop:

#terminator endInfLoop;

jmp TopOfLoop;

LoopExit:

#endmacro;

static

i:int32;

begin demoTerminator;

mov( 0, i );

InfLoop

stdout.put( "i=", i, nl );

inc( i );

endInfLoop;

end demoTerminator;

The # terminator keyword, if it appears within a macro, defines a second macro that is available for a one-time use after invoking the main macro. In the example above, the " endInfLoop " macro is available only after the invocation of the " InfLoop " macro. Once you invoke the EndInfLoop macro, it is no longer available (though the macro calls can be nested, more on that later). During the invocation of the # terminator macro, all local symbols declared in the main macro ( InfLoop above) are available (note that these symbols are not available outside the macro body. In particular, you could not refer to either " TopOfLoop " nor " LoopExit " in the statements appearing between the InfLoop and endInfLoop invocations above). The code above, by the way, emits code similar to the following:

_0000_:

stdout.put( "i=", i, nl );

inc( i );

jmp _0000_;

_0001_:

The macro expansion code appears in italics. This program, therefore, generates an infinite loop that prints successive integer values.

These macros are called multi-part macros for the obvious reason: they come in multiple pieces (note, though, that HLA only allows a single # terminator macro). They are also refered to as Context-Free macros because of their syntactical nature. Earlier, this document claimed that you could refer to the # terminator macro only once after invoking the main macro. Technically, this should have said "you can invoke the terminator once for each outstanding invocation of the main macro." In other words, you can nest these macro calls, e.g.,

InfLoop

mov( 0, j );

InfLoop

inc( i );

inc( j );

stdout.put( "i=", i, " j=", j, nl );

endInfLoop;

The term Context-Free comes from automata theory; it describes this nestable feature of these macros.

As should be painfully obvious from this InfLoop macro example, it would be really nice if one could define more than one macro within this context-free group. Furthermore, the need often arises to define limited-scope scope macros that can be invoked more than once (limited-scope means between the main macro call and the terminator macro invocation). The # keyword definition allows you to create such macros.

In the InfLoop example above, it would be really nice if you could exit the loop using a statement like " brkLoop " (note that "BREAK" is an HLA reserved word and cannot be used for this purpose). The # keyword section of a macro allows you to do exactly this. Consider the following macro definition:

#macro InfLoop:TopOfLoop, LoopExit;

TopOfLoop:

#keyword brkLoop;

jmp LoopExit;

#terminator endInfLoop;

jmp TopOfLoop;

LoopExit:

#endmacro;

As with the "# terminator " section, the " #keyword " section defines a macro that is active after the main macro invocation until the terminator macro invocation. However, # keyword macro invocations to not terminate the multi-part invocation. Furthermore, # keyword invocations may occur more that once. Consider the following code that might appear in the main program:

mov( 0, i );

InfLoop

mov( 0, j );

InfLoop

inc( j );

stdout.put( "i=", i, " j=", j, nl );

if( j >= 10 ) then

brkLoop;

endif

endInfLoop;

inc( i );

if( i >= 10 ) then

brkLoop;

endif;

endInfLoop;

The " brkLoop " invocation inside the "if( j >= 10)" statement will break out of the inner-most loop, as expected (another feature of the context-free behavior of HLA's macros). The " brkLoop " invocation associated with the "if( i >= 10 )" statement breaks out of the outer-most loop. Of course, the HLA language provides the FOREVER..ENDFOR loop and the BREAK and BREAKIF statements, so there is no need for this InfLoop macro, nevertheless, this example is useful because it is easy to understand. If you are looking for a challenge, try creating a statement similar to the C/C++ switch/case statement; it is perfectly possible to implement such a statement with HLA's macro facilities, see the HLA Standard Library for an example of the SWITCH statement implemented as a macro.

The discussion above introduced the " #keyword " and " #terminator " macro sections in an informal way. There are a few details omitted from that discussion. First, the full syntax for HLA macro declarations is actually:

#macro identifier ( optional_parameter_list ) :optional_local_symbols;

statements

#keyword identifier ( optional_parameter_list ) :optional_local_symbols;

statements

note: additional #keyword declarations may appear here

#terminator identifier ( optional_parameter_list ) :optional_local_symbols;

statements

#endmacro

There are three things that should immediately stand out here: (1) You may define more than one # keyword within a macro. (2) # keywords and # terminators allow optional parameters. (3) # keywords and # terminators allow their own local symbols.

The scope of the parameters and local symbols isn't particularly intuitive (although it turns out that the scope rules are exactly what you would want). The parameters and local symbols declared in the main macro declaration are available to all statements in the macro (including the statements in the #keyword and #terminator sections). The InfLoop macro used this feature since the JMP instructions in the brkLoop and endInfLoop sections refered to the local symbols declared in the main macro. The InfLoop macro did not declare any parameters, but had they been present, the brkLoop and endInfLoop sections could have used those macros as well.

Parameters and local symbols declared in a #keyword or #terminator section are local to that particular section. In particular, parameters and/or local symbols declared in a #keyword section are not visible in other #keyword sections or in the #terminator section.

One important issue is that local symbols in a mutipart macro are visible in the main code between the start of the multipart macro and the terminating macro. That is, if you have some sequence like the following:

InfLoop

jmp LoopExit;

endInfLoop;

The HLA substitutes the appropriate internal symbol (e.g., " _xxxx_ ") for the LoopExit symbol. This is somewhat unintuitive and might be considered a flaw in HLA's design. Future versions of HLA may deal with this issue; in the meantime, however, some code takes advantage of this feature (to mask global symbols) so it's not easy to change without breaking a lot of code. Be forewarned before taking advantage of this "feature", however, that it will probably change in HLA v2.x.

Macro Invocations and Macro Parameters:

As mentioned earlier, HLA treats all non-array macro parameters as text constants that are assigned a string corresponding to the actual parameter(s) passed to the macro. I.e., consider the following:

#macro SetI( v );

?i := v;

#endmacro

SetI( 2 );

The above macro and invocation is roughly equivalent to the following:

const

v : text := "2";

?i := v;

When utilizing variable parameter lists in a macro, HLA treats the parameter object as a string array rather than a text array (because HLA v1.x does not currently support text arrays). For example, consider the following macro and invocation:

#macro SetI2( v[] );

?i := v[0];

#endmacro

SetI2( 2 );

Although this looks quite similar to the previous example, there is a subtle difference between the two. The former example will initialize the constant (value) i with the int32 value two. The second example will initialize i with the string value "2".

If you need to treat a macro array parameter as text rather than as a string object, use the HLA "@text" function that expands a string parameter as text. E.g., the former example could be rewritten as:

#macro SetI2( v[] );

?i := @text( v[0]);

#endmacro

SetI2( 2 );

In this example, the @text function tells HLA to expand the string value v[0] (which is "2") directly as text, so the "SetI2( 2 )" invocation expands as

?i := 2;

rather than as

?i := "2";

On occasion, you may need to do the converse of this operation. That is, you may want to treat a standard (non-array) macro parameter as a string object rather than as a text object. Unfortunately, text objects are expanded by the lexer in-line upon initial processing; the compiler never sees the text variable name (or parameter name, in this particular case). Therefore, writing an "@string" function in HLA wouldn't work because the lexer would simply expand the text object parameter before HLA got a chance to process it.

To work around this limitation, the lexer provides a special syntactical entity that converts a text object to the corresponding string. The syntax is "@string:identifier" where identifier is the name of the text constant (or macro parameter or macro local symbol) that you wish converted to a string. When HLA encounters this construct, it will substitute a string constant for the identifier. The following example demonstrates one possible use of this feature:

program demoString;

#macro seti3( v );

#print( "i is being set to " + @string:v )

?i := v;

#endmacro

begin demoString;

seti3( 4 )

#print( "i = " + string( i ) )

seti3( 2 )

#print( "i = " + string( i ) )

end demoString;

If an identifier is a TEXT constant (e.g., a macro parameter or a const/value identifier of type TEXT), special care must be taken to modify the string associated with that text object. A simple VAL expression like the following won't work:

?textVar:text := "SomeNewText";

The reason this doesn't work is subtle: if textVar is already a text object, HLA immediately replaces textVar with its corresponding string; this includes the occurrence of the identifier immediately after the "?" in the example above. So were you to execute the following two statements:

?textVar:text := "x";

?textVar:text := "1";

the second statement would not change textVar's value from " x " to "1". Instead, the second statement above would be converted to:

?x:text := "1";

and textVar's value would remain " x ". To overcome this problem, HLA provides a special syntactical entity that converts a text object to a string and then returns the text object ID. The syntax for this special form is "@tostring:identifier". The example above could be rewritten as:

?textVar:text := "x";

?@tostring:textVar:text := "1";

In this example, textVar would be a text object that expands to the string "1".

Processing Macro Parameters

As described earlier, HLA processes as parameters all text between a set of matching parentheses after the macro's name in a macro invocation. HLA macro parameters are delimited by the surrounding parentheses and commas. That is, the first parameter consists of all text beyond the left parenthesis up to the first comma (or up to the right parenthesis if there is only one parameter). The second parameter consists of all text just beyond the first comma up to the second comma (or right parenthesis if there are only two parameters). Etc. The last parameter consists of all text from the last comma to the closing right parenthesis.

Note that HLA will strip away any white space at the beginning and end of the parameter's text (though it does not remove any white space from the interior of the parameter's text).

If a single parameter must contain commas or parentheses, you must surround the parameter with the literal text macro quotes "#(" and ")#". HLA considers everything but leading and trailing space between these macro quote symbols as a single parameter. Note that this applies to macro invocations appearing within a parameter list. Consider the following (erroneous) code:

CallToAMacro( 5, "a", CallToAnotherMacro( 6,7 ), true );

Presumably, the "( 6,7 )" text is the parameter list for the " CallToAnotherMacro " invocation. When HLA encounters a macro invocation in a parameter list, it defers the expansion of the macro. That is, the third parameter of " CallToAMacro " should expand to " CallToAnotherMacro( 6,7 ) ", not the text that " CallToAnotherMacro " would expand to. Unfortunately, this example will not compile correctly because the macro processor treats the comma between the 6 and the 7 as the end of the third parameter to CallToAMacro (in other words, the third parameter is actually " CallToAnotherMacro( 6 " and the fourth parameter is " 7 ) ". If you really need to pass a macro invocation as a parameter, use the "#(" and ")#" macro quotes to surround the interior invocation:

CallToAMacro( 5, "a", #( CallToAnotherMacro( 6,7 ) )#, true );

In this example, HLA passes all the text between the "#(" and ")#" markers as a single parameter (the third parameter) to the " CallToAMacro " macro.

This example demonstrates another feature of HLA's macro processing system - HLA uses deferred macro parameter expansion. That is, the text of a macro parameter is expanded when HLA encounters the formal parameter within the macro's body, not while HLA is processing the actual parameters in the macro invocation (which would be eager evaluation).

There are three exceptions to the rule of deferred parameter evaluation: (1) text constants are always expanded in an eager fashion (that is, the value of the text constant, not the text constant's name, is passed as the macro parameter). (2) The @text function, if it appears in a parameter list, expands the string parameter in an eager fashion. (3) The @eval function immediately evaluates its parameter; the discussion of @eval appears a little later.

In general, there is very little difference between eager and deferred evaluation of macro parameters. In some rare cases there is a semantic difference between the two. For example, consider the following two programs:

program demoDeferred;

#macro two( x, y ):z;

?z:text:="1";

x+y

#endmacro

const

z:string := "2";

begin demoDeferred;

?i := two( z, 2 );

#print( "i=" + string( i ))

end demoDeferred;

In the example above, the code passes the actual parameter " z " as the value for the formal parameter " x ". Therefore, whenever HLA expands " x " it gets the value " z " which is a local symbol inside the "two" macro that expands to the value " 1 ". Therefore, this code prints " 3 " ( " 1 " plus y's value which is " 2 ") during assembly. Now consider the following code:

program demoEager;

#macro two( x, y ):z;

?z:text:="1";

x+y

#endmacro

const

z:string := "2";

begin demoEager;

?i := two( @text( z ), 2 );

#print( "i=" + string( i ))

end demoEager;

The only differences between these two programs are their names and the fact that demoEager invocation of " two " uses the @text function to eagerly expand z's text. As a result, the formal parameter " x " is given the value of z's expansion (" 2 ") and HLA ignores the local value for " z " in macro " two ". This code prints the value " 4 " during assembly. Note that changing " z " in the main program to a text constant (rather than a string constant) has the same effect:

program demoEager;

#macro two( x, y ):z;

?z:text:="1";

x+y

#endmacro

const

z:text := "2";

begin demoEager;

?i := two( z, 2 );

#print( "i=" + string( i ))

end demoEager;

This program also prints "4" during assembly.

One place where deferred vs. eager evaluation can get you into trouble is with some of the HLA built-in functions. Consider the following HLA macro:

#macro DemoProblem( Parm );

#print( string( Parm ) )

#endmacro

DemoProblem( @linenumber );

(The @linenumber function returns, as an uns32 constant, the current line number in the file).

When this program fragment compiles, HLA will use deferred evaluation and pass the text "@linenumber" as the parameter " Parm ". Upon compilation of this fragment, the macro will expand to "#print( string( @linenumber ))" with the intent, apparently, being to print the line number of the statement containing the DemoProblem invocation. In reality, that is not what this code will do. Instead, it will print the line number, in the macro, of the "#print( string (Parm));" statement. By delaying the substitution of the current line number for the "@linenumber" function call until inside the macro, deferred execution produces the wrong result. What is really needed here is eager evaluation so that the @linenumber function expands to the line number string before being passed as a parameter to the DemoProblem macro. The @eval built-in function provides this capability. The following coding of the DemoProblem macro invocation will solve the problem:

DemoProblem( @eval( @linenumber ) );

Now the compiler will execute the @linenumber function and pass that number as the macro parameter text rather than the string "@linenumber". Therefore, the #print statement inside the macro will print the actual line number of the DemoProblem statement rather than the line number of the #print statement.

Keep these minor differences in mind if you run into trouble using macro parameters.

HLA High Level Language Statements

HLA provides several control structures that provide a high level language flavor to assembly language programming. The statements HLA provides are

try..unprotect..exception..anyexception..endtry, raise

if..then..elseif..else..endif

while..endwhile

repeat..until

for..endfor

foreach..endfor

forever..endfor

break, breakif

continue, continueif

begin..end, exit, exitif

These HLL statements provide two basic improvements to assembly language programs: (1) they make many algorithms much easier to read; (2) they eliminate the need to create tons of labels in a program (which also helps make the program easier to read).

Generally, these instructions are "macros" that emit one or two machine instructions. Therefore, these instructions are not always as flexible as their HLL counterparts. Nevertheless, they are suitable for about 85% of the uses people typically have for these instructions.

Do keep in mind, that even though these statements compile to efficient machine code, writing assembly language using a HLL mindset produces intrinsically inefficient programs. If speed or size is your number one priority in a program, you should be sure you understand exactly which instructions each of these statements emits before using them in your code.

The JT and JF statements are actually "medium level language" statements. They are intended for use in macros when constructing other HLL control statements; they are not intended for use as standard statements in your program (not that they don't work, they're just not true HLL statements).

Note: The FOREACH..ENDFOR loop is mentioned above only for completeness. The full discussion of the FOREACH..ENDFOR statement appears a little later in the section on iterators.

Exception Handling in HLA

HLA uses the TRY..EXCEPTION..ENDTRY and RAISE statements to implement exception handling. The syntax for these statements is as follows:

try

<< HLA Statements to execute >>

<< unprotected // Optional unprotected section.

<< HLA Statements to execute >>

exception( const1 )

<< Statements to execute if exception const1 is raised >>

<< optional exception statements for other exceptions >>

<< anyexception //Optional anyexception section.

<< HLA Statements to execute >>

endtry;

raise( const2 );

Const1 and const2 must be unsigned integer constants. Usually, these are values defined in the excepts.hhf header file. Some examples of predefined values include the following:

ex.StringOverflow

ex.StringIndexError

ex.ValueOutOfRange

ex.IllegalChar

ex.ConversionError

ex.BadFileHandle

ex.FileOpenFailure

ex.FileCloseError

ex.FileWriteError

ex.FileReadError

ex.DiskFullError

ex.EndOfFile

ex.MemoryAllocationFailure

ex.AttemptToDerefNULL

ex.WidthTooBig

ex.TooManyCmdLnParms

ex.ArrayShapeViolation

ex.ArrayBounds

ex.InvalidDate

ex.InvalidDateFormat

ex.TimeOverflow

ex.AssertionFailed

ex.ExecutedAbstract

Windows Structured Exception Handler exception values:

ex.AccessViolation

ex.Breakpoint

ex.SingleStep

ex.PrivInstr

ex.IllegalInstr

ex.BoundInstr

ex.IntoInstr

ex.DivideError

ex.fDenormal

ex.fDivByZero

ex.fInexactResult

ex.fInvalidOperation

ex.fOverflow

ex.fStackCheck

ex.fUnderflow

ex.InvalidHandle

ex.StackOverflow

ex.ControlC

This list is constantly changing as the HLA Standard Library grows, so it is impossible to provide a compete list of standard exceptions at this time. Please see the excepts.hhf header file for a complete list of standard exceptions. As this was being written, the Linux-specific exceptions (signals) had not been added to the list. See the excepts.hhf file on your Linux system to see if these have been added.

The HLA Standard Library currently reserves exception numbers zero through 1023 for its own internal use. User-defined exceptions should use an integer value greater than or equal to 1024 and less than or equal to 65535 ($FFFF). Exception value $10000 and above are reserved for use by Windows Structured Exception Handler and Linux signals.

The TRY..ENDTRY statement contains two or more blocks of statements. The statements to protect immediately follow the TRY reserved word. During the execution of the protected statements, if the program encounters the first exception block, control immediately transfers to the first statement following the endtry reserved word. The program will skip all the statements in the exception blocks.

If an exception occurs during the execution of the protected block, control is immediate transferred to an exception handling block that begins with the exception reserved word and the constant that specifies the type of exception.

Example:

repeat

mov( false, GoodInput );

try

stdout.put( "Enter an integer value:" );

stdin.get( i );

mov( true, GoodInput );

exception( ex.ValueOutOfRange )

stdout.put( "Numeric overflow, please reenter ", nl );

exception( ex.ConversionError )

stdout.put( "Conversion error, please reenter", nl );

endtry;

until( GoodInput = true );

In this code, the program will repeatedly request the input of an integer value as long as the user enters a value that is out of range (+/- 2 billion) or as long as the user enters a value containing illegal characters.

TRY..ENDTRY statements can be nested. If an exception occurs within a nested TRY protected block, the EXCEPTION blocks in the innermost try block containing the offending statement get first shot at the exceptions. If none of the EXCEPTION blocks in the enclosing TRY..ENDTRY statement handle the specified exception, then the next innermost TRY..ENDTRY block gets a crack at the exception. This process continues until some exception block handles the exception or there are no more TRY..ENDTRY statements.

If an exception goes unhandled, the HLA run-time system will handle it by printing an appropriate error message and aborting the program. Generally, this consists of printing "Unhandled Exception" (or a similar message) and stopping the program. If you include the excepts.hhf header file in your main program, then HLA will automatically link in a somewhat better default exception handler that will print the number (and name, if known) of the exception before stopping the program.

Note that TRY..ENDTRY blocks are dynamically nested, not statically nested. That is, a program must actually execute the TRY in order to activate the exception handler. You should never jump into the middle of a protected block, skipping over the TRY. Doing so may produce unpredictable results.

You should not use the TRY..ENDTRY statement as a general control structure. For example, it will probably occur to someone that one could easily create a switch/case selection statement using TRY..ENDTRY as follows:

try

raise( SomeValue );

exception( case1_const)

exception( case2_const)

etc.

endtry

While this might work in some situations, there are two problems with this code.

First, if an exception occurs while using the TRY..ENDTRY statement as a switch statement, the results may be unpredictable. Second, HLA's run-time system assumes that exceptions are rare events. Therefore, the code generated for the exception handlers doesn't have to be efficient. You will get much better results implementing a switch/case statement using a table lookup and indirect jump (see the Art of Assembly) rather than a TRY..ENDTRY block.

Warning: The TRY statement pushes data onto the stack upon initial entry and pops data off the stack upon leaving the TRY..ENDTRY block. Therefore, jumping into or out of a TRY..ENDTRY block is an absolute no-no. As explained so far, then, there are only two reasonable ways to exit a TRY statement, by falling off the end of the protected block or by an exception (handled by the TRY statement or a surrounding TRY statement).

The UNPROTECTED clause in the TRY..ENDTRY statement provides a safe way to exit a TRY..ENDTRY block without raising an exception or executing all the statements in the protected portion of the TRY..ENDTRY statement. An unprotected section is a sequence of statements, between the protected block and the first exception handler, that begins with the keyword UNPROTECTED. E.g.,

try

<< Protected HLA Statements >>

unprotected

<< Unprotected HLA Statements >>

exception( SomeExceptionID )

<< etc. >>

endtry;

Control flows from the protected block directly into the unprotected block as though the UNPROTECTED keyword were not present. However, between the two blocks HLA compiler-generated code removes the data pushed on the stack. Therefore, it is safe to transfer control to some spot outside the TRY..ENDTRY statement from within the unprotected section.

If an exception occurs in an unprotected section, the TRY..ENDTRY statement containing that section does not handle the exception. Instead, control transfers to the (dynamically) nesting TRY..ENDTRY statement (or to the HLA run-time system if there is no enclosing TRY..ENDTRY).

If you're wondering why the UNPROTECTED section is necessary (after all, why not simply put the statements in the UNPROTECTED section after the ENDTRY?), just keep in mind that both the protected sequence and the handled exceptions continue execution after the ENDTRY. There may be some operations you want to perform after exceptions are released, but only if the protected block finished successfully. The UNPROTECTED section provides this capability. Perhaps the most common use of the UNPROTECTED section is to break out of a loop that repeats a TRY..ENDTRY block until it executes without an exception occuring. The following code demonstrates this use:

forever

try

stdout.put( "Enter an integer: " );

stdin.geti8(); // May raise an exception.

unprotected

break;

exception( ex.ValueOutOfRange )

stdout.put( "Value was out of range, reenter" nl );

exception( ex.ConversionError )

stdout.put( "Value contained illegal chars" nl );

endtry;

endfor;

This simple example repeatedly asks the user to input an int8 integer until the value is legal and within the range of valid integers.

Another clause in the TRY..EXCEPT statement is the ANYEXCEPTION clause. If this clause is present, it must be the last clause in the TRY..EXCEPT statement, e.g.,

try

<< protected statements >>

unprotected

Optional unprotected statements

<< exception( constant ) // Note: may be zero or more of

of these.

Optional exception handler statements

anyexception

<< Exception handler if none of the others execute >>

endtry;

Without the ANYEXCEPTION clause present, if the program raises an exception that is not specifically handled by one of the exception clauses, control transfers to the enclosing TRY..ENDTRY statement. The ANYEXCEPTION clause gives a TRY..ENDTRY statement the opportunity to handle any exception, even those that are not explicitly listed. Upon entry into the ANYEXCEPTION block, the EAX register contains the actual exception number.

The HLA RAISE statement generates an exception. The single parameter is an 8, 16, or 32-bit ordinal constant. Control is (ultimately) transferred to the first (most deeply nested) TRY..ENDTRY statement that has a corresponding exception handler (including ANYEXCEPTION).

If the program executes the RAISE statement within the protected block of a TRY..ENDTRY statement, then the enclosing TRY..ENDTRY gets first shot at handling the exception. If the RAISE statement occurs in an UNPROTECTED block, or in an exception handler (including ANYEXCEPTION), then the next higher level (nesting) TRY..ENDTRY statement will handle the exception. This allows cascading exceptions; that is, exceptions that the system handles in two or more exception handlers. Consider the following example:

try

<< Protected statements >>

exception( someException )

<< Code to process this exception >>

// The following re-raises this exception, allowing

// an enclosing try..endtry statement to handle

// this exception as well as this handler.

raise( someException );

<< Additional, optional, exception handlers >>

endtry;

The IF..THEN..ELSEIF..ELSE..ENDIF Statement in HLA

HLA provides a limited IF..THEN.ELSEIF..ELSE..ENDIF statement that can help make your programs easier to read. For the most part, HLA's if statement provides a convenient substitute for a CMP and a conditional branch instruction pair (or chain of such instructions when employing ELSEIF's).

The generic syntax for the HLA if statement is the following:

if( conditional_expression ) then

<< Statements to execute if expression is true >>

endif;

if( conditional_expression ) then

<< Statements to execute if expression is true >>

else

<< Statements to execute if expression is false >>

endif;

if( expr1 ) then

<< Statements to execute if expr1 is true >>

elseif( expr2 ) then

<< Statements to execute if expr1 is false

and expr2 is true >>

endif;

if( expr1 ) then

<< Statements to execute if expr1 is true >>

elseif( expr2 ) then

<< Statements to execute if expr1 is false

and expr2 is true >>

else

<< Statements to execute if both expr1 and

expr2 are false >>

endif;

Note: HLA's if statement allows multiple ELSEIF clauses. All ELSEIF clauses must appear between IF clause and the ELSE clause (if present) or the ENDIF (if an ELSE clause is not present).

See the next section for a discussion of valid boolean expressions within the IF statement (this section appears first because the section on boolean expressions uses IF statements in its examples).

Boolean Expressions for High-Level Language Statements

The primary limitation of HLA's IF and other HLL statements has to do with the conditional expressions allowed in these statements. These expressions must take one of the following forms:

operand1 relop operand2

memory in constant .. constant

memory not in constant .. constant

reg8 in CSet_Constant

reg8 in CSet_Variable

reg8 not in CSet_Constant

reg8 not in CSet_Variable

!register

memory

!memory

Flag

( boolean_expression )

!( boolean_expression )

boolean_expression && boolean_expression

boolean_expression || boolean_expression

For the first form, "operand1 relop operand2", relop is one of:

= or == (either one, both are equivalent)

<> or != (either one)

Operand1 and operand2 must be operands that would be legal for a " cmp(operand1, operand2); " instruction.

For the IF statement, HLA emits a CMP instruction with the two operands specified and an appropriate conditional jump instruction that skips over the statements following the "THEN" reserved word if the condition is false. For example, consider the following code:

if( al = 'a' ) then

stdout.put( "Option 'a' was selected", nl );

endif;

Like the CMP instruction, the two operands cannot both be memory operands.

Unlike the conditional branch instructions, the six relational operators cannot differentiate between signed and unsigned comparisons (for example, HLA uses "<" for both signed and unsigned less than comparisons). Since HLA must emit different instructions for signed and unsigned comparisons, and the relational operators do not differentiate between the two, HLA must rely upon the types of the operands to determine which conditional jump instruction to emit.

By default, HLA emits unsigned conditional jump instructions (i.e., JA, JAE, JB, JBE, etc.). If either (or both) operands are signed values, HLA will emit signed conditional jump instructions (i.e., JG, JGE, JL, JLE, etc.) instead.

HLA considers the 80x86 registers to be unsigned. This can create some problems when using the HLA if statement. Consider the following code:

if( eax < 0 ) then

<< do something if eax is negative >>

endif;

Since neither operand is a signed value, HLA will emit the following code:

cmp( eax, 0 );

jnb SkipThenPart;

<< do something if eax is negative >>

SkipThenPart:

Unfortunately, it is never the case that the value in EAX is below zero (since zero is the minimum unsigned value), so the body of this if statement never executes. Clearly, the programmer intended to use a signed comparison here. The solution is to ensure that at least one operand is signed. However, as this example demonstrates, what happens when both operands are intrinsically unsigned?

The solution is to use coercion to tell HLA that one of the operands is a signed value. In general, it is always possible to coerce a register so that HLA treats it as a signed, rather than unsigned, value. The IF statement above could be rewritten (correctly) as

if( (type int32 eax) < 0 ) then

<< do something if eax is negative >>

endif;

HLA will emit the JNL instruction (rather than JNB) in this example. Note that if either operand is signed, HLA will emit a signed condition jump instruction. Therefore, it is not necessary to coerce both unsigned operands in this example.

The second form of a conditional expression that the IF statement accepts is a register or memory operand followed by "in" and then two constants separated by the ".." operator, e.g.,

if( al in 0..10 ) then ...

This code checks to see if the first operand is in the range specified by the two constants. The constant value to the left of the ".." must be less than the constant to the right for this expression to make any sense. The result is true if the operand is within the specified range. For this instruction, HLA emits a pair of compare and conditional jump instructions to test the operand to see if it is in the specified range.

HLA also allows a exclusive range test specified by an expression of the form:

if( al not in 0..10 ) then ...

In this case, the expression is true if the value in AL is outside the range 0..10.

In addition to integer ranges, HLA also lets you use the IN operator with CSET constants and variables. The generic form is one of the following:

reg8 in CSetConst

reg8 not in CSetConst

reg8 in CSetVariable

reg8 not in CSetVariable

For example, a statement of the form "if( al in {'a'..'z'}) then ..." checks to see if the character in the AL register is a lower case alphabetic character. Similarly,

if( al not in {'a'..'z', 'A'..'Z'}) then...

checks to see if AL is not an alphabetic character.

The fifth form of a conditional expression that the IF statement accepts is a single register name (eight, sixteen, or thiry-two bits). The IF statement will test the specified register to see if it is zero (false) or non-zero (true) and branches accordingly. If you specify the not operator ("!") before the register, HLA reverses the sense of this test.

The sixth form of a conditional expression that the IF staement accepts is a single memory location. The type of the memory location must be boolean, byte, word, or dword. HLA will emit code that compares the specified memory location against zero (false) and generate an appropriate branch depending upon the value in the memory location. If you put the not operator ("!") before the variable, HLA reverses the sense of the test.

The seventh form of a conditional expression that the IF statement accepts is a Flags register bit or other condition code combination handled by the 80x86 conditional jump instructions. The following reserved words are acceptable as IF statement expressions:

@c, @nc, @o, @no, @z, @nz, @s, @ns, @a, @na, @ae, @nae, @b, @nb, @be,

@nbe, @l, @nl, @g, @ne, @le, @nle, @ge, @nge, @e, @ne

These items emit an appropriate jump (of the opposite sense) around the THEN portion of the IF statement if the condition is false.

If you supply any legal boolean expression in parenthesis, HLA simply uses the value of the internal expression for the value of the whole expression. This allows you to override default precedence for the AND, OR, and ! operators.

The !( boolean_expression ) evaluates the expression and does just the opposite. That is, if the interior expression is false, then !( boolean_expression ) is true and vice versa. This is mainly useful with conjunction and disjunction since all of the other interesting terms already allow the not operator in front of them. Note that in general, the "!" operator must precede some parentheses. You cannot say "! AX < BX", for example.

Originally, HLA did not include support for the conjunction (&&) and disjunction (||) operators. This was explicitly left out of the design so that beginning students would be forced to rethink their logical operations in assembly language. Unfortunately, it was so inconvenient not to have these operators that they were eventually added. So a compromise was made: these operators were added to HLA but "The Art of Assembly Language Programming/Win32 Edition" doesn't bother to mention them until an advanced chapter on control structures.

The conjunction and disjunction operators are the operators && and ||. They expect two valid HLA boolean expressions around the operator, e.g.,

eax < 5 && ebx <> ecx

Since the above forms a valid boolean expression, it, too, may appear on either side of the && or | operator, e.g.,

eax < 5 && ebx <> ecx || !dl

HLA gives && higher precedence than ||. Both operators are left-associative so if multiple operators appear within the same expression, they are evaluated from left to right if the operators have the same precedence. Note that you can use parentheses to override HLA's default precedence.

One wrinkle with the addition of && and || is that you need to be careful when using the flags in a boolean expression. For example, "eax < ecx && @nz" hides the fact that HLA emits a compare instruction that affects the Z flag. Hence, the "@nz" adds nothing to this expression since EAX must not equal ECX if eax<ecx. So take care when using && and ||.

HLA uses short-circuit evaluation when evaluating expressions containing the conjunction and disjunction operators. For the && operator, this means that the resulting code will not compute the right-hand expression if the left-hand expression evaluates false. Similarly, the code will not compute the right-hand expression of the || operator if the left-hand expression evaluates true.

Note that the evaluation of complex boolean expressions involving the !(---), &&, and || operators does not change any register or memory values. HLA strictly uses flow control to implement these operations.

Note that the "&" and "|" operators are for compile-time only expression while the "&&" and "||" operators are for run-time boolean expressions. These two groups of operators are not synonyms and you cannot use them interchangably.

If you would prefer to use a less abstract scheme to evaluate boolean expressions, one that lets you see the low-level machine instructions, HLA provides a solution that allows you to write code to evaluate complex boolean expressions within the HLL statements using low-level instructions. Consider the following syntax:

(#{

<<arbitrary HLA statements >>

}#) then

<< "True" section >>

else //or elseif...

<< "False" section >>

endif;

The "#{" and "}#" brackets tell HLA that an arbitrary set of HLA statements will appear between the braces. HLA will not emit any code for the IF expression. Instead, it is the programmer's responsibility to provide the appropriate test code within the "#{---}#" section. Within the sequence, HLA allows the use of the boolean constants " true " and " false " as targets of conditional jump instructions. Jumping to the " true " label transfers control to the true section (i.e., the code after the "THEN" reserved word). Jumping to the " false " label transfers control to the false section. Consider the following code that checks to see if the character in AL is in the range "a".."z":

(#{

cmp( al, 'a' );

jb false;

cmp( al, 'z' );

ja false;

}#) then

<< code to execute if AL in {'a'..'z'} goes here >>

endif;

With the inclusion of the #{---}# operand, the IF statement becomes much more powerful, allowing you to test any condition possible in assembly language. Of course, the #{---}# expression is legal in the ELSEIF expression as well as the IF expression.

It would be a good idea for you to write some code using the HLA if statement and study the MASM code produced by HLA for these IF statements. By becoming familiar with the code that HLA generates for the IF statement, you will have a better idea about when it is appropriate to use the if statement versus standard assembly language statements.

The WHILE..ENDWHILE Statement in HLA

The while..endwhile statement allows the following syntax:

while( boolean_expression ) do

<< while loop body>>

endwhile;

while(#{ HLA_statements }#) do

<< while loop body>>

endwhile;

The WHILE statement allows the same boolean expressions as the HLA IF statement. Like the HLA IF statement, HLA allows you to use the boolean constants " true " and " false " as labels in the #{...}# form of the WHILE statement above. Jumping to the true label executes the body of the while loop, jumping to the false label exits the while loop.

For the "while( expr ) do" forms, HLA moves the test for loop termination to the bottom of the loop and emits a jump at the top of the loop to transfer control to the termination test. For the "while(#{stmts}#)" form, HLA compiles the termination test at the top of the emitted code for the loop. Therefore, the standard WHILE loop may be slightly more efficient (in the typical case) than the hybrid form.

The REPEAT..UNTIL Statement in HLA

HLA's REPEAT..UNTIL statement uses the following syntax:

repeat

<< statements to execute repeatedly >>

until( boolean_expression );

repeat

<< statements to execute repeatedly >>

until(#{ HLA_statements }#);

For those unfamiliar with REPEAT..UNTIL, the body of the loop always executes at least once with the test for loop termination ocurring at the bottom of the loop. The REPEAT..UNTIL loop (unlike C/C++'s do..while statement) terminates loop execution when the expression is true (that is, REPEAT..UNTIL repeats while the expression is false).

As you can see, the syntax for this is very similar to the WHILE loop. About the only major difference is the fact that jump to the " true " label in the #{---}# sequence exits the loop while jumping to the " false " label in the #{---}# sequence transfers control back to the top of the loop.

The FOR..ENDFOR Statement in HLA

The HLA for..endfor statement is very similar to the C/C++ for loop. The FOR clause consists of three components:

for( initialize_stmt; if_boolean_expression; increment_statement ) do

The initialize_statement component is a single machine instruction. This instruction typically initializes a loop control variable. HLA emits this statement before the loop body so that it executes only once, before the test for loop termination.

The if_boolean_expression component is a simple boolean expression (same syntax as for the IF statement). This expression determines whether the loop body executes. Note that the FOR statement tests for loop termination before executing the body of the loop.

The increment_statement component is a single machine instruction that HLA emits at the bottom of the loop, just before jumping back to the top of the loop. This instruction is typically used to modify the loop control variable.

The syntax for the HLA for statement is the following:

for( initStmt; BoolExpr; incStmt ) do

<< loop body >>

endfor;

Semantically, this statement is identical to the following while loop:

initStmt;

while( BoolExpr ) do

<< loop body >>

incStmt;

endwhile;

Note that HLA does not include a form of the FOR loop that lets you bury a sequence of statements inside the boolean expression. Use the WHILE loop if you want to do that. If this is inconvenient, you can always create your own version of the FOR loop using HLA's macro facilities.

The FOREVER..ENDFOR Statement in HLA

The forever statement creates an infinite loop. Its syntax is

forever

<< Statements to execute repeatedly >>

endfor

This HLA statement simply emits a single JMP instruction that unconditionally transfers control from the ENDFOR clause back up to the beginning of the loop.

In addition to creating infinite loops, the FOREVER..ENDFOR loop is very useful for creating loops that test for loop termination somewhere in the middle of the loop's body. For more details, see the BREAK and BREAKIF statements, next.

The BREAK and BREAKIF Statements in HLA

The BREAK and BREAKIF statements allow you to exit a loop at some point other than the normal test for loop termination. These two statements allow the following syntax:

break;

breakif( boolean_expression );

breakif(#{ stmts }#);

There are two very important things to note about these statements. First, unlike many HLA machine instructions, you do not follow the BREAK statement with a pair of empty parentheses. The 80x86 machine instructions behave like compile-time functions, so it made sense to require empty parentheses after those instructions. The HLA HLL statements do not behave like compile-time functions; the lack of parentheses after BREAK (and other HLL statements, e.g., ELSE) makes sense here if you think about it for a moment.

The second thing to note is that the BREAK and BREAKIF statements are legal only inside WHILE, FOREACH, FOREVER, and REPEAT loops. HLA does not recognize loops you've coded yourself using discrete assembly language instructions (of course, you can probably write a macro to provide a BREAK function for your own loops). Note that the FOREACH loop pushes data on the stack that the BREAK statement is unaware of. Therefore, if you break out of a FOREACH loop, garbage will be left on the stack. The HLA BREAK statement will issue a warning if this occurs. It is your responsibility to clean up the stack upon exiting a FOREACH loop if you break out of it.

The CONTINUE and CONTINUEIF Statements in HLA

The continue and continueif statements allow you to restart a loop. These two statements allow the following syntax:

continue;

continueif( boolean_expression );

continueif(#{ stmts }#);

There are two very important things to note about these statements. First, unlike many HLA machine instructions, you do not follow the CONTINUE statement with a pair of empty parentheses. The 80x86 machine instructions behave like compile-time functions, so it made sense to require empty parentheses after those instructions. The HLA HLL statements do not behave like compile-time functions; the lack of parentheses after continue (and other HLL statements, e.g., else) makes sense here if you think about it for a moment.

The CONTINUE and CONTINUEIF statements are legal only inside WHILE, FOREACH, FOREVER, and REPEAT loops. HLA does not recognize loops you've coded yourself using discrete assembly language instructions (of course, you can probably write a macro to provide a CONTINUE function for your own loops).

For the WHILE and REPEAT statements, the CONTINUE and CONTINUEIF statements transfer control to the test for loop termination. For the FOREVER loop, the CONTINUE and CONTINUEIF statements transfer control the the first statement in the loop. For the FOREACH loop, CONTINUE and CONTINUEIF transfer control to the bottom of the loop (i.e., forces a return from the yield () call).

The BEGIN..END, EXIT, and EXITIF Statements in HLA

The BEGIN..END statement block provides a structured goto statement for HLA. The BEGIN and END clauses surround a group of statements; the EXIT and EXITIF statements allow you to exit such a block of statements in much the same way that the BREAK and BREAKIF statements allow you to exit a loop. Unlike BREAK and BREAKIF, which can only exit the loop that immediately contains the BREAK or BREAKIF, the exit statements allow you to specify a BEGIN label so you can exit several nested contexts at once. The syntax for the BEGIN..END, EXIT, and EXITIF statements is as follows:

begin contextLabel ;

<< statements within the specified context >>

end contextLabel;

exit contextLabel;

exitif( boolean_expression ) contextLabel;

exitif(#{ stmts }#) contextLabel;

The BEGIN..END clauses do not generate any machine code (although END does emit a label to the assembly output file). The EXIT statement simply emits a JMP to the first instruction following the END clause. The EXITIF statement emits a compare and a conditional jump to the statement following the specified end.

If you break out of a FOREACH loop using the EXIT or EXITIF statements, there will be garbage left on the stack. It is your responsibility to be aware of this situation (i.e., HLA doesn't warn you about it) and clean up the stack, if necessary.

You can nest BEGIN..END blocks and EXIT out of any enclosing BEGIN..END block at any time. The BEGIN label provides this capability. Consider the following example:

program ContextDemo;

#include( "stdio.hhf" );

static

i:int32;

begin ContextDemo;

stdout.put( "Enter an integer:" );

stdin.get( i );

begin c1;

begin c2;

stdout.put( "Inside c2" nl );

exitif( i < 0 ) c1;

end c2;

stdout.put( "Inside c1" nl );

exitif( i = 0 ) c1;

stdout.put( "Still inside c1" nl );

end c1;

stdout.put( "Outside of c1" nl );

end ContextDemo;

The EXIT and EXITIF statements let you exit any BEGIN..END block; including those associated with a program unit such as a procedure, iterator, method, or even the main program. Consider the following (unusable) program:

program mainPgm;

procedure LexLevel1;

procedure LexLevel2;

begin LexLevel2;

exit LexLevel2; // Returns from this procedure.

exit LexLevel1; // Returns from this procedure and

// and the LexLevel1 procedure

// (including cleaning up the stack).

exit mainPgm; // Terminates the main program.

end LexLevel2;

begin LexLevel1;

end LexLevel1;

begin mainPgm;

end mainPgm;

Note: You may only exit from procedures that have a display and all nested procedures from the procedure you wish to exit from through to the EXIT statement itself must have displays. In the example above, both LexLevel1 and LexLevel2 must have displays if you wish to exit from the LexLevel1 procedure from inside LexLevel2 . By default, HLA emits code to build the display unless you use the "@ nodisplay " procedure option.

Note that to exit from the current procedure, you must not have specified the "@ noframe " procedure option. This applies only to the current procedure. You may exit from nesting (lower lex level) procedures as long as the display has been built.

The JT and JF Medium Level Instructions in HLA

The JT (jump if true) and JF (jump if false) instructions are a cross between the 80x86 conditional jump instruction and the HLA IF statement. These two instructions use the following syntax:

JT ( booleanExpression ) targetLabel;

JF ( booleanExpression ) targetLabel;

The booleanExpression component can be any legal HLA boolean expression that you'd use in an IF, WHILE, REPEAT..UNTIL, or other HLA HLL statement. The HLA compiler emits code that will transfer control to the specified target label in your program if the condition is true.

These instructions are primarily intended for use in macros when creating your own HLL control statements. For a discussion of macros and creating your own control structures, see the HLA documentation on the compile-time language.

Iterators and the HLA Foreach Loop

HLA provides a very powerful user-defined looping control structure, the FOREACH..ENDFOR loop. The FOREACH loop uses the following syntax:

foreach iteratorProc( parameters ) do

<< foreach loop body >>

endfor;

The iteratorProc( parameters ) component is a call to a special kind of procedure known as an iterator26. Iterators have the special property that they return one of two states, success or failure. If an iterator returns success, it generally also returns a function result. If an iterator returns success, the foreach loop will execute the loop body and reenter the iterator (more on that later) at the top of the loop. If an iterator returns failure, then the loop terminates.

If you've never used true iterators before, you may be thinking "big deal, an iterator is simply a function that returns a boolean value." This, however, isn't entirely true. An iterator behaves like a value returning function when it succeeds, it behaves like a procedure when it fails. The success or failure state of the iterator call is not the return value. To understand the difference, consider the syntax for an iterator:

iterator iteratorName <<( optional_parameters )>>;

<< procedure options >>

<< local declarations >>

begin iteratorName;

<< iterator statements >>

end iteratorName;

Other than the use of the "ITERATOR" keyword rather than "PROCEDURE," this declaration looks just like a procedure or method declaration. However, there are some crucial differences. First of all, HLA emits different code for building iterator activation records than it does for procedures and methods. Furthermore, whenever you declare an iterator, HLA automatically creates a special thunk variable named " yield ". Also, HLA will not let you call an iterator directly by specifying the iterator's name as an HLA statement (although you can still use the CALL instruction to call an iterator procedure, though you'd better have set the stack up properly before doing so).

If an iterator returns via a EXIT( iteratorname ) or RET() statement, or returns by "falling off the end of the function" (i.e., executing the "end" clause), then the iterator returns failure to the calling FOREACH loop (hence, the loop will terminate). To return success, and return a value to the body of the FOREACH loop, you must invoke the " yield " thunk. Yield doesn't actually return to the FOREACH loop, instead, it calls the body of the FOREACH loop and at the bottom of the FOREACH loop HLA emits a return instruction that transfers control back into the iterator (to the first statement following the yield ). This may seem counter-intuitive, but it has some important ramifications. First of call, an iterator maintains its context until it fails. This means that local variables maintain their values across the yield calls. Likewise, when a FOREACH loop reenters an iterator, it picks up immediately after the yield , it does not pass new parameters and begin execution at the top of the iterator code.

Consider the following typical iterator code:

program iteratorDemo;

#include( "stdio.hhf" );

iterator range( start:int32; stop:int32 ); @nodisplay;

begin range;

forever

mov( start, eax );

breakif( eax > stop );

yield();

inc( start );

endfor;

end range;

static

i:int32;

begin iteratorDemo;

foreach range( 1, 10 ) do

stdout.put( "eax = ", eax, nl );

endfor;

end iteratorDemo;

This example demonstrates how to create a standard "for" loop like those found in Pascal or C++27. The range iterator is passed two parameters, a starting value and an ending value. It returns a sequence of values between the starting and ending values (respectively) and fails once the return value would exceed the ending value. The FOREACH loop in this example prints the values one through ten to the display.

Warning: because the iterator's activation is left on the stack while executing a FOREACH loop, you should take care when breaking out of a FOREACH loop using BREAK, BREAKIF, EXIT, EXITIF, or some sort of jump. Cavalierly jumping out of a loop in this fashion leaves the iterator's activation record on the stack. You will need to clean this up manually if you exit an iterator in this fashion. Since HLA cannot determine the myriad of ways one could jump out of a FOREACH loop body, it is up to you to make sure you don't do this (or that you handle the garbage on the stack in an appropriate way).

Keep in mind that the body of a FOREACH loop is actually a procedure your program calls when it encounters the yield statement28. Therefore, any registers whose values you change will be changed when control returns to the code following the yield . If you need to preserve any registers across a yield , either push and pop them at the beginning of the FOREACH loop body or place the PUSH and POP instructions around the yield .

HLA Compile-Time Language and Pragmas

This topic section describes one of HLA's more impressive features - the compile time language. Combined with the macro preprocessor, the HLA compile-time language lets you customize the HLA language in almost an infinite variety of ways.

Compile-time programs are just that- programs that execute while HLA is compiling your source file. You embed compile-time language statements directly in your HLA source files and these short program fragments control how HLA compiles your assembly code.

This section doesn't fully explain the HLA compile-time language because you've already seen some major parts of it. For example, VAL constants in the HLA source file are equivalent to compile-time variables. The "?" statement is the compile-time assignment statement. Macros provide compile-time procedures and functions. Etc. This topic section, therefore, builds on the material that appears elsewhere in this document.

Built-in Functions:

HLA provides several built-in functions that take constant operands and produce constant results. It is important that you differentiate these compile-time functions from run-time functions. These functions do not emit any object code, and therefore do not exist while your program is running. They are only available while HLA is compiling your program. Note that many of these functions are trivial to implement in assembly language or have counterparts in the HLA standard library. Therefore, the fact that they are not available at run-time shouldn't prove to be much of a problem.

Constant Type Conversion Functions

boolean( const_expr )

The expression must be an ordinal or string expression. If const_expr is numeric, this function returns false for zero and true for everything else. If const_expr is a character, this function returns true for "T" and false for "F". It generates an error for any other character value. If const_expr is a string, the string must contain "true" or "false" else HLA generates an error.

int8( const_expr )

int16( const_expr )

int32( const_expr )

int64( const_expr )

int128( const_expr )

uns8( const_expr )

uns16 const_expr )

uns32( const_expr )

uns64( const_expr )

uns128( const_expr )

byte( const_expr )

word( const_expr )

dword( const_expr )

qword( const_expr )

lword( const_expr )

These functions convert their parameter to the specified integer. For real operands, the result is truncated to form a numeric operand. For all other numeric operands, the result is ranged checked. For character operands, the ASCII code of the specified character is returned. For boolean objects, zero or one is returned. For string operands, the string must be a sequence of decimal characters which are converted to the specified type. Note that byte, word, and dword types are synonymous with uns8, uns16, and uns32 for the purposes of range checking.

real32( const_expr )

real64( const_expr )

real80( const_expr )

Similar to the integer functions above, except these functions produce the obvious real results. Only numeric and string parameters are legal.

char( const_expr )

Const_expr must be a ordinal or string value. This function returns a character whose ASCII code is that ordinal value. For strings, this function returns the first character of the string.

string( const_expr )

This function produces a reasonable string representation of the parameter. Almost all data types are legal.

cset( const_expr )

The parameter must be a character, string, or cset. For character parameters, this function returns the singleton set containing only the specified character. For strings, each character in the string is unioned into the set and the function returns the result. If the parameter is a cset, this function makes a copy of that character set.

text( str_expr)

See the @text function.

Bitwise Type Transfer Functions

The type conversion functions of the previous section will automatically convert their operands from the source type to the destination type. Sometimes you might want to change the type of some object without changing its value. For many "conversions" this is exactly what takes place. For example, when converting and uns8 object to an uns16 value using the uns16(---) function, HLA does not modify the bit pattern at all. For other conversions, however, HLA may completely change the underlying bit pattern when doing the conversion. For example, when converting the real32 value 1.0 to a dword value, HLA completely changes the underlying bit pattern ($3F80_0000) so that the dword value is equal to one. On occasion, however, you might actually want to copy the bits straight across so that the resulting dword value is $3F80_0000. The HLA bit-transfer type conversion compile-time functions provide this facility.

The HLA bit-transfer type conversion functions are the following:

@int8( const_expr )

@int16( const_expr )

@int32( const_expr )

@int64( const_expr )

@int128( const_expr )

@uns8( const_expr )

@uns16 const_expr )

@uns32( const_expr )

@uns64( const_expr )

@uns128( const_expr )

@byte( const_expr )

@word( const_expr )

@dword( const_expr )

@qword( const_expr )

@lword( const_expr )

@real32( const_expr )

@real64( const_expr )

@real80( const_expr )

@char( const_expr )

@cset( const_expr )

The above functions extract eight, 16, 32, 64, or 128 bits from the constant expression for use as the value of the function. Note that supplying a string expression as an argument isn't particularly useful since the functions above will simply return the address of the string data in memory while HLA is compiling the program. The @byte function provides an additional syntax with two parameters, see the next section for details.

@string( const_expr )

HLA string objects are pointers (in both the language as well as within the compiler). So simply copying the bits to the internal string object would create problems since the bit pattern probably is not a valid pointer to string data during the compilation. With just a few exceptions, what the @string function does is takes the bit data of its argument and translates this to a string (up to 16 characters long). Note that the actual string may be between zero and 16 characters long since the HLA compiler (internally) uses zero-terminated strings to represent string constants. Note that the first zero byte found in the argument will end the string.

If you supply a string expression as an argument to @string , HLA simply returns the value of the string argument as the value for the @string function. If you supply a text object as an argument to the @string function, HLA returns the text data as a string without first expanding the text value (similar to the @string:identifier token). If you supply a pointer constant as an argument to the @string function, HLA returns the string that HLA will substitute for the static object when it emits the assembly file.

General functions

@abs( numeric_expr )

Returns the absolute equivalent of the numeric value passed as a parameter.

@byte( integer_expr, which )

The which parameter is a value in the range 0..15. This function extracts the specified byte from the value of the integer_expression parameter. (This is an extension of the @byte type transfer function.)

@byte( real32_expr, which )

The which parameter is a value in the range 0..3. This function extracts the specified byte from the value of the real32_expression parameter.

@byte( real64_expr, which )

The which parameter is a value in the range 0..7. This function extracts the specified byte from the value of the real64_expression parameter.

@byte( real80_expr, which )

The which parameter is a value in the range 0..9. This function extracts the specified byte from the value of the real80_expression parameter.

@ceil( real_expr )

This function returns the smallest integer value larger than or equal to the expression passed as a parameter. Note that although the result will be an integer, this function return a real80 value.

@cos( real_expr )

The real parameter is an angle in radians. This function returns the cosine of that angle.

@date

This function returns a string of the form "YYYY/MM/DD" containing the current date.

@exp( real_expr )

This function returns a real80 value that is the result of the computation e** real_expr (i.e., e raised to the specified power).

@extract( cset_expr )

This function returns a character from the specified character set constant. Note that this function doesn't actually remove the character from the set, if you want to do that, then you will need to explicitly remove the character yourself. The following code demonstrates how to do this:

program extractDemo;

val

c:cset := {'a'..'z'};

begin extractDemo;

#while( c <> {} )

?b := @extract( c );

#print( "b=" + b )

?c := c - {b};

#endwhile

end extractDemo;

@floor( real_expr )

This function returns the largest integer value less than or equal to the supplied expression. Note that the returned result is of type real80 even though it is an integer value.

@isalpha( char_expr )

This function returns true if the character expression is an upper or lower case alphabetic character.

@isalphanum( char_expr )

This function returns true if the parameter is an alphabetic or numeric character. It returns false otherwise.

@isdigit( char_expr )

This function returns true if the character expression is a decimal digit.

@islower( char_expr )

This function returns true if the character expression is a lower case alphabetic character.

@isspace( char_expr )

This function returns true if the character expression is a "whitespace" character. Typically, this would be spaces, tabs, newlines, returns, linefeeds, etc.

@isupper( char_expr )

This function returns true if the character expression is an upper case alphabetic character.

@isxdigit( char_expr )

This function returns true if the supplied character expression is a hexadecimal digit.

@log( real_expr )

This function returns the natural (base e) logarithm of the supplied parameter.

@log10( real_expr )

This function returns the base-10 logarithm of the supplied parameter.

@max( comma_separated_list_of_ordinal_or_real_values )

This function returns the largest value from the specified list.

@min( comma_separated_list_of_ordinal_or_real_values )

This function returns the least of the values in the specified list.

@odd( int_expr )

This function returns true if the integer expression is an odd number.

@random( int_expr )

This function returns a random uns32 value.

@randomize( int_expr )

This function uses the integer expression passed as a parameter as the new seed value for the random number generator.

@sin( real_expr )

This function returns the sine of the angle (in radians) passed as a parameter.

@sqrt( real_expr )

This function returns the square root of the parameter.

@tan( real_expr )

This function returns the tangent of the angle (in radians) passed as a parameter.

@time

This function returns a string of the form "HH:MM:SS xM" (x= A or P) denoting the time at the point this function was called (according to the system clock).

String functions:

@delete( str_expr, int_start, int_len )

This function returns a string consisting of the str_expr passed as a parameter with ( possibly) some characters removed. This function removes int_len characters from the string starting at index int_start (note that strings have a starting index of zero).

@index( str_expr1, int_start, str_expr2 )

This function searches for str_expr2 within str_expr1 starting at character position int_start within str_expr1 . If the string is found, this function returns the index into str1_expr1 of the first match (starting at int_start ). This function returns -1 if there is no match.

@insert( str_expr1, int_start, str_expr2 )

This function insert str_expr2 into str_expr1 just before the character at index int_start .

@length( str_expr )

This function returns the length of the specified string.

@lowercase( str_expr, int_start )

This function returns a string of characters from str_expr with all uppercase alphabetic characters converted to lower case. Only those characters from int_start on are copied into the result string.

@rindex( str_expr1, int_start, str_expr2 )

Similar to the index function, but this function searches for the last occurrence of str_expr2 in str_expr1 rather than the first occurrence.

@strbrk( str_expr, int_start, cset_expr )

This function returns the index of the first character beyond int_start in str_expr that is a member of the cset_expr parameter. This function returns -1 if none of the characters are in the set.

@strset( char_expr, int_len )

This function returns a string consisting of int_len copies of char_expr .

@strspan( str_expr, int_start, cset_expr )

This function returns the index of the first character beyond position int_start in str_expr that is not a member of the cset_expr parameter. This function returns -1 if all of the characters are in the set.

@substr( str_expr, int_start, int_len )

This function returns the substring specified by the starting position and length in str_expr .

@tokenize( str_expr, int_start, cset_delims, cset_quotes )

This function returns an array of strings obtained by doing a lexical scan of the str_expr passed as a parameter (starting at character index int_start ). Each element of this array consists of all characters between any sequence of delimiter characters (specified by the cset_delims parameter). The only exceptions are strings appearing between bracketing (quoting) symbols. The fourth parameter specifies the possible bracketing characters. If cset_quotes contains a quotation mark (") then all sequences of characters between a pair of quotes will be treated as a single string. Similarly, if cset_quotes contains an apostrophe, then all characters between a pair of apostrophes will be treated as a single string. If the cset_quotes parameters contains one of the pairs "(" / ")", "{" / "}", or "[" / "]" (both characters from a given pair must be present), then Tokenize will consider all characters between these bracketing symbols to be a single string.

You should use the @elements function to determine how many strings are present in the resulting array of strings (this will always be a one-dimensional array, although it is possible for it to have zero elements).

@trim( str_expr, int_start )

This function returns a string consisting of the characters in str_expr starting at position int_start with all leading and trailing whitespace removed.

@uppercase( str_expr, int_start )

This function returns a string consisting of the characters in str_expr starting at position int_start with all lower case alphabetic character converted to uppercase.

String/Pattern matching functions

The HLA string/pattern matching functions all attempt to match a string against a pattern. These functions all return a boolean result indicating success or failure (i.e., whether the string matches the pattern).

Most of these funtions have two optional parameters: Remainder and Matched . If the function succeeds it generally copies the matched string into the VAL string constant specified by the Matched parameter and it copies all the characters in the InputStr parameter the follow the matched text into the Remainder parameter. You may specify the Remainder parameter without also specifying the Matched parameter, but if you need the Matched result, you must specify all the parameters. The Remainder and Matched parameters appear in italics in all of the following functions to denote that they are optional.

If the function fails, the values of the Remainder and Matched parameters are generally undefined.

@peekCset( InputStr, charSet, Remainder, Matched )

This function checks the first character of InputStr to see if it is a member of charSet . The function returns true/false depending on the result of the set membership test. If the function succeeds, it copies the value of the InputStr parameter to Remainder and creates a single character string from the first character of InputStr and stores this into Matched .

@oneCset( InputStr, charSet, Remainder, Matched )

This function checks the first character of InputStr to see if it is a member of charSet . The function returns true/false depending on the result of the set membership test. If the function succeeds, it copies all characters but the first of InputStr parameter to Remainder and copies the first character of InputStr into Matched .

@uptoCset( InputStr, charSet, Remainder, Matched )

This function matches all characters up to, but not including, a single character from the charSet character set parameter. If the InputStr parameter does not contain a character in the specified cset, this function fails. If it succeeds, it copies all the matched characters (not including the character in the cset) to the Matched parameter and it copies all remaining characters (including the character in the cset) to the Remainder parameter.

@zeroOrOneCset( InputStr, charSet, Remainder, Matched )

If the first character of InputStr is a member of charSet , this function succeeds and returns that character in the Matched parameter. It also returns the remaining characters in the string in the Remainder parameter.

This function always succeeds (since it matches zero characters). If the first character of InputStr is not in charSet , then this function returns InputStr in Remainder and returns the empty string in Matched .

@exactlynCset( InputStr, charSet, n, Remainder, Matched )

This function returns true if the first ' n ' characters of InputStr are in the cset specified by charSet . The n+1st character must not be in the character set specified by charSet . If this function succeeds (i.e., returns true), then it copies the first n characters to the Matched string and it copies all remaining characters into the Remainder string. If this function fails and returns false, Remainder and Matched are undefined.

@firstnCset( InputStr, charSet, n, Remainder, Matched )

This function is very similar to exactlyncset except it doesn't require that the n+1st character not be a member of the charSet set. If the first n characters of InputStr are in charSet , this function succeeds (returning true) and copies those n characters into the Matched string; it also copies any following characters into the Remainder string.

@nOrLessCset( InputStr, charSet, n, Remainder, Matched )

This function always succeeds. It will match between zero and n characters in InputStr from the charSet set. The n+1st character may be in charSet , this function doesn't care and only matches upto the nth character. This function copies up to n matched characters to the Matched string (the empty string if it matches zero characters); the remaining characters in the string are copied to the Remainder parameter.

@nOrMoreCset( InputStr, charSet, n, Remainder, Matched )

This function succeeds if it matches at least n characters from InputStr against the charSet set. It returns false if there are fewer than n characters from charSet at the beginning of InputStr . If this function succeeds, it copies the characters it matches to the Matched string and all characters after that sequence to the Remainder string.

@ntomCset( InputStr, charSet, n, Remainder, Matched )

This function succeeds if InputStr begins with at least n characters from charSet . If additional characters in InputStr are in this set, ntomcset will match up to m characters ( n < m ). It will not match any additional characters beyond the mth character, although those characters may be in the charSet set without affecting the success/failure of this routine. If this routine succeeds, it copies all the characters it matches to the Matched parameter and any remaining characters to the Remainder parameter.

@exactlyntomCset( InputStr, charSet, n, Remainder, Matched )

Similar to the ntomcset function, except this function fails if more than ' m ' characters at the beginning of InputStr are in the specified character set.

@zeroOrMoreCset( InputStr, charSet, Remainder, Matched )

This function always succeeds. If the first character of InputStr is not in charSet , this function copies InputStr to Remainder , sets matched to the empty string, and returns true. If some sequence of characters at the beginning of InputStr are in charSet , this function copies those characters to Matched and copies the following characters to Remainder .

@oneOrMoreCset( InputStr, charSet, Remainder, Matched )

This function succeeds if InputStr begins with at least one character from charSet . It will match all characters at the beginning of InputStr that are members of charSet . It copies the matched chars to the Matched string and any remaining characters to the Remainder string. It fails if the first character of InputStr is not a member of charSet.

@peekChar( InputStr, Character, Remainder, Matched )

This function succeeds if the first character of InputStr matches Character . If it succeeds, it copies the character to the Matched string and copies the entire string (including the first character) to Remainder .

@oneChar( InputStr, Character, Remainder, Matched )

This function succeeds if the first character if InputStr is equal to Character . If it succeeds, it copies the matched character to Matched and any remaining characters to Remainder . If it fails, then Remainder and Matched are undefined.

@uptoChar( InputStr, Character, Remainder, Matched )

This function matches all characters up to, but not including, the specified character. If fails if the specified character is not in the InputStr string. If this function succeeds and returns true, it copies the matched character to the Matched string and copies all remaining characters to the Remainder string (the Remainder string will begin with the value found in Character ). If this function fails, it leaves Remainder and Matched undefined.

@zeroOrOneChar( InputStr, Character, Remainder, Matched )

This function always succeeds since it can match zero characters. If the first character of InputStr is not equal to Character , this function returns true and sets Remainder equal to InputStr and sets Matched to the empty string. If the first character of InputStr is equal to Character , then this function returns that character in Matched and returns any remaining characters from InputStr in Remainder .

@zeroOrMoreChar( InputStr, Character, Remainder, Matched )

This function always succeeds since it can match zero characters. If the first character of InputStr is not equal to Character , this function returns true and sets Remainder equal to InputStr and setsMatched to the empty string. If InputStr begins with a sequence of characters that are all equal to Character , then this function returns those characters in Matched and returns any remaining characters from InputStr in Remainder .

@oneOrMoreChar( InputStr, Character, Remainder, Matched )

This function always succeeds since it can match zero characters. If the first character of InputStr is not equal to Character , this function returns true and sets Remainder equal to InputStr and sets Matched to the empty string. If InputStr begins with a sequence of characters that are all equal to Character , then this function returns those characters in Matched and returns any remaining characters from InputStr in Remainder .

@exactlynChar( InputStr, Character, n, Remainder, Matched )

This function returns true if the first ' n ' characters of InputStr are equal to Character . The n+1st character cannot be equal to Character . If this function succeeds, it returns a string consisting of ' n ' copies of Character in Matched and returns any remaining characters in Remainder . Matched and Remainder are undefined if this function returns false.

@firstnChar( InputStr, Character, n, Remainder, Matched )

This function returns true if the first ' n ' characters of InputStr are equal to Character . The n+1st character may or may not be equal to Character . If this function succeeds, it returns a string consisting of ' n ' copies of Character in Matched and returns any remaining characters in Remainder .

@nOrLessChar( InputStr, Character, n, Remainder, Matched )

This function always returns true. It matches up to ' n ' copies of Character at the beginning of InputStr . More than n characters can be equal to Character and this routine will still succeed. However, this routine only matches the first n copies of Character in InputStr . It copies the matched characters to the Matched string and copies any remaining characters to the Remainder string.

@nOrMoreChar( InputStr, Character, n, Remainder, Matched )

The normorechar function matches any string that begins with at least n copies of Character . If it succeeds, it copies the sequence of Character chars to the Matched string and copies any remaining characters (that must begin with something other than Character ) to the Remainder string. This function fails and returns false if the string doesn't begin with at least ' n ' copies of Character. Note that Remainder and Matched are undefined if this function fails.

@ntomChar( InputStr, Character, n, m, Remainder, Matched )

This function returns true if the first ' n ' characters of InputStr are equal to Character. It will match up to m characters ( m >= n ). The m+st character does not have to be different than Character , although this function will match, at most, m characters. If this function succeeds, it copies the matched characters to the Matched string and any following characters to the Remainder string. If this function fails and returns false, the values of Matched and Remainder are undefined.

@exactlyntomChar( InputStr, Character, n, m, Remainder, Matched )

This function succeeds and returns true if there are at least n copies of Character at the beginning of InputStr and no more than m copies of Character at the beginning of InputStr . If this function succeeds, it copies the matched characters at the beginning of InputStr to the Matched parameter and any following characters to the Remainder parameter. If this function fails, the values of Remainder and Matched are undefined upon return.

@peekiChar

@oneiChar

@uptoiChar

@zeroOrOneiChar

@zeroOrMoreiChar

@oneOrMoreiChar

@exactlyniChar

@firstniChar

@nOrLessiChar

@nOrMoreiChar

@ntomiChar

@exactlyntomiChar

These functions use the same syntax as the standard xxxxxChar functions. The difference is that these function do a case insensitive comparison of the Character parameter with the InputStr parameter.

@matchStr( InputStr, String, Remainder, Matched )

This function checks to see if the string specified by String appears as the first set of characters at the beginning of InputStr . This function returns true if InputStr begins with String . If this function succeeds, it copies String to Matched and any following characters to Remainder .

@matchiStr( InputStr, String, Remainder, Matched )

Just like @ matchStr except this function does a case insenstive comparison.

@uptoStr( InputStr, String, Remainder, Matched )

The uptoStr function matches all characters in InputStr up to, but not including, the string specified by " String ". If it succeeds, it copies all the matched characters (not including the string specified by ' String ') into the Matched parameter an any following characters to Remainder . If this function returns false, the values of Remainder and Matched are undefined.

@uptoiStr( InputStr, String, Remainder, Matched )

Same as @ uptoStr function except that this function does a case insensitive comparison.

@matchToStr( InputStr, String, Remainder, Matched )

Similar to @ uptoStr except this function matches all characters up to and including the characters in the ' String ' parameter.

@matchToiStr( InputStr, String, Remainder, Matched )

Same as @ matchToStr except this function does a case insensitive comparison.

@matchID( InputStr, Remainder, Matched )

This is a special matching function that matches characters in InputStr that correspond to an HLA identifier. That is, InputStr must begin with an alphabetic character or an underscore and @ matchID will match all following alphanumeric or underscore characters. If this function succeeds by matching a prefix of InputStr that looks like an identifier, it copies the matched characters to Matched and all following characters to Remainder . This function returns false if the first character of InputStr is not an underscore or an alphabetic character. Note that the first character beyond a matched identifier can be anything other than an alphanumeric or underscore character and this function will still succeed.

@matchIntConst( InputStr, Remainder, Matched )

This function matches a string of one or more decimal digit characters (i.e., an unsigned integer constant). The Matched parameter, if present, must be an "int32" VAL object. If @ matchIntConst succeeds, it will convert the string to an integer and copy this integer to the Matched parameter; it will also copy any characters following the integer string to the Remainder parameter.

@matchRealConst( InputStr, Remainder, Matched )

This function matches a sequence of characters at the beginning of InputStr that correspond to a real constant (note that a simple sequence of digits, i.e., an integer, satisifies this). The number may have a leading plus or minus sign followed by at least one decimal digit, an optional fractional part and an optional exponent part (see the definition of an HLA real literal constant for more details). If this function succeeds, it converts the string to a real80 value and stores this value into Matched (which must be a real80 VAL object). The characters after the matched string are copied into the Remainder parameter. If this function fails, the values of Matched and Remainder are undefined.

@matchNumericConst( InputStr, Remainder, Matched )

This is a combination of @ matchRealConst and @ matchIntConst . It checks the prefix of InputStr . If it corresponds to an integer constant it will behave like @ matchIntConst . If the prefix string corresponds to a real constant, this function behaves like @ matchRealConst . If the prefix matches neither, this function returns false.

@matchStrConst( InputStr, Remainder, Matched )

This function matches a sequence of characters that correspond to an HLA literal string constant. Note that such constants generally contain quotes surrounding the string. If this function returns true, it copies the matched string, minus the quote delimiters, to the Matched parameter and it copies the following characters to the Remainder parameter. If this function fails, those two paremeter values are undefined.

This function automatically handles several idiosyncrases of HLA literal string constants. For example, if two adjacent quotes appear within a string, @ matchStrConst copies only a single quote to the Matched parameter. If two quoted strings appear at the beginning of InputStr separated only by whitespace (a space or any control character other than NUL), then this function concatenates the two strings together. Likewise, any character objects (surrounded by apostrophes or taking the form #ddd, #$hh, or #%bbbbbbbb where ddd is a decimal constant, hh is a hexadecimal constant, and bbbbbbbb is a binary constant) are automatically concatenated into the result string. See the definition of HLA literal constants for more details.

@zeroOrMoreWS( InputStr, Remainder )

This function always succeeds. It matches zero or more whitespace characters (white space is defined here as a space or any control character other than NUL [ASCII code zero]). This function copies any characters following the white space characters to the Remainder parameter (this could be the empty string).

@oneOrMoreWS( InputStr, Remainder )

This function matches one or more whitespace characters (white space is defined here as a space or any control character other than NUL [ASCII code zero]). If this function succeeds, it copies any characters following the white space characters to the Remainder parameter. If this function fails, the Remainder string's value is undefined.

@WSorEOS( InputStr, Remainder )

This function always succeeds. It matches zero or more whitespace characters (white space is defined here as a space or any control character) or the end of string token (a zero terminating byte). This function copies any characters following the white space characters to the Remainder parameter (this could be the empty string if it matches EOS or there is only white space at the end of the string).

@WSthenEOS( InputStr)

This function matches zero or more whitespace characters (white space is defined here as a space or any control character) immediately followed by the EOS token (a zero terminating byte). Technically, it allows a Remainder parameter, but such a parameter will always be set to the empty string if this function succeeds, so it's hardly useful to supply the parameter.

@peekWS( InputStr, Remainder )

This function returns true if the first character if InputStr is a white space character. If it succeeds and the Remainder parameter is present, this function copies InputStr to Remainder .

@EOS( InputStr )

This function returns true if InputStr is the empty string.

Symbol and constant related functions and assembler control functions

@name( identifier )

This function returns a string of characters that corresponds to the name of the identifier (note: after text/macro expansion). This is useful inside macros when attempting to determine the name of a macro parameter variable (e.g., for error messages, etc). This function returns the empty string if the parameter is not an identifier.

@type( identifier_or_expression )

This function returns a unique integer value that specifies the type of the specified symbol. Unfortunately, this unique integer may be different across assemblies. Do not use this function when comparing types of objects in different source code modules.

@typename( identifier_or_expression )

This function returns the string name of the type of the identifier or constant expression. Examples include "int32", "boolean", and "real80".

@ptype( identifier_or_expression )

This function returns a small integer constant denoting the primitive type of the specified identifier or expression. Primitive types would include things like int32, boolean, and real80. See the "hla.hhf" header file for the latest set of constant definitions for pType. At the time this was written, the definitions were:

// pType constants.

hla.ptIllegal = 0

hla.ptBoolean = 1

hla.ptEnum = 2

hla.ptUns8 = 3

hla.ptUns16 = 4

hla.ptUns32 = 5

hla.ptByte = 6

hla.ptWord = 7

hla.ptDWord = 8

hla.ptInt8 = 9

hla.ptInt16 = 10

hla.ptInt32 = 11

hla.ptChar = 12

hla.ptReal32 = 13

hla.ptReal64 = 14

hla.ptReal80 = 15

hla.ptString = 16

hla.ptCset = 17

hla.ptArray = 18

hla.ptRecord = 19

hla.ptUnion = 20

hla.ptClass = 21

hla.ptProcptr = 22

hla.ptThunk = 23

hla.ptPointer = 24

hla.ptQWord = 25

hla.ptTByte = 26

hla.ptLabel = 27

hla.ptProc = 28

hla.ptMethod = 29

hla.ptClassProc = 30

hla.ptClassIter = 31

hla.ptProgram = 32

hla.ptMacro = 33

hla.ptText = 34

hla.ptNamespace = 35

hla.ptSegment = 36

hla.ptAnonRec = 37

hla.ptVariant = 38

hla.ptError = 39

@class( identifier_or_expression )

This returns a symbol's class type. The class type is constant, value, variable, static, etc., this has little to do with the class abstract data type See the "hla.hhf" header file for the current symbol class definitions. At the time this was written, the definitions were:

hla.cIllegal = 0

hla.cConstant = 1

hla.cValue = 2

hla.cType = 3

hla.cVar = 4

hla.cParm = 5

hla.cStatic = 6

hla.cLabel = 7

hla.cMacro = 8

hla.cKeyword = 9

hla.cTerminator = 10

hla.cProgram = 11

hla.cProc = 12

hla.cClassProc = 13

hla.cMethod = 14

hla.cNamespace = 15

hla.cNone = 16

@size( identifier_or_expression )

This function returns the size, in bytes, of the specified object.

@elementsize( identifier_or_expression )

This function returns the size, in bytes, of an element of the specified array. If the parameter is not an array identifier, this function generates an assembly-time error.

@offset( identifier )

For VAR, PARM, METHOD, and class ITERATOR objects only, this function returns the integer offset into the activation record (or object record) of the specified symbol.

@staticname( identifier )

For STATIC objects, procedures, methods, iterators, and external objects, this function returns a string specifying the "static" name of that string. This is the name that HLA emits to the assembly output file for certain objects.

@lex( identifier )

This function returns an integer constant specifying the static lexical nesting for the specified symbol. Variables declared in the main program have a lex level of zero. Variables declared in procedures (etc.) that are in the main program have a lex level of one. This function is useful as an index into the _display_ array when accessing non-local variables.

@IsExternal( identifier )

This function returns true if the specified identifier is an external symbol.

@arity( identifier_or_expression )

This function returns zero if the specified identifier is not an array. Otherwise it returns the number of dimension of that array.

@dim( array_identifier_or_expression )

This function returns a single array of integers with one element for each dimension of the array passed as a parameter. Each element of the array returned by this function gives the number of elements in the specified dimension. For example, given the following code:

val threeD: int32[ 2, 4, 6];

tdDims:= @dim( threeD );

The tdDims constant would be an array with the three elements [2, 4, 6];

@elements( array_identifier_or_expression )

This function returns the total number of elements in the specified array. For multi-dimensional array constants, this function returns the number of all elements, not just a particular row or column.

@defined( identifier )

This function returns true if the specified identifier is has been previously defined in the program and is currently in scope.

@pclass( identifier )

If the specified identifer is a parameter, this function returns a small integer indicating how the parameter was passed to the function. These constants are defined in the hla.hhf header file. At this time this document was written, these constants had the following values.

hla.illegal_pc := 0;

hla.valp_pc := 1;

hla.refp_pc := 2;

hla.vrp_pc := 3;

hla.result_pc := 4;

hla.name_pc := 5;

hla.lazy_pc := 6;

valp_pc means pass by value. refp_pc means pass by reference. vrp_pc means pass by value/result (value/returned). result_pc means pass by result. name_pc means pass by name. lazy_pc means pass by lazy evaluation.

@localsyms( record_union_procedure_method_or_iterator_identifier )

This function returns an array of string listing the local names associated with the argument. If the argument is a record or union object, the elements of the string array contain the field names for the specified record or union. Note that the field names appear in their declaration order (that is, element zero contains the name of the first field, element one contains the name of the second field, etc.).

If the argument is a procedure, method, or iterator, the string array this function returns is a list of all the local identifiers in that program unit. Note that the local object names appear in the reverse order of their declarations (that is, element zero contains the name of the last local name in the program unit, element one contains the second identifier, etc.). Note that parameters are consider local identifiers and will appear in this array. Also note that HLA automatically predefines several symbols when you declare a program unit, those HLA declared symbols also appear in the array of strings @localsyms creates.

Currently, @localsyms does not allow namespace, program, or class identifiers. This restriction may be lifted in the future if there is sufficient need.

@isconst( expr )

This function returns true if the specified parameter is a constant identifier or expression.

@isreg( expr )

This function returns true if the specified parameter is one of the 80x86 general purpose registers. It returns false otherwise.

@isreg8( expr )

This function returns true if the specified parameter is one of the 80x86 eight-bit general purpose registers. It returns false otherwise.

@isreg16( expr )

This function returns true if the specified parameter is one of the 80x86 16-bit general purpose registers. It returns false otherwise.

@isreg32( expr )

This function returns true if the specified parameter is one of the 80x86 32-bit general purpose registers. It returns false otherwise.

@isfreg( expr )

This function returns true if the specified parameter is one of the 80x86 FPU registers. It returns false otherwise.

@ismem( expr )

This function returns true if the specified expression is a memory address.

@isclass( expr )

This function returns true if the specified parameter is a class or a class object.

@istype( identifier )

This function returns true if the specified identifier is a type id.

@linenumber

This function returns the current line number in the source file.

@filename

This function returns the name of the current source file.

@curlex

This function returns the current static lex level (e.g., zero for the main program).

@curoffset

This function returns the current VAR offset within the activation record.

@curdir

This function returns +1 if processing parameters, it returns -1 otherwise. This corresponds to whether variable offsets are increasing or decreasing in an activation record during compilation. This function also returns +1 when processing fields in a record or class. This function returns zero when processing fields in a union.

@addofs1st

This function returns true when processing local variables, it returns false when processing parameters and record/class/union fields.

@lastobject

This function returns a string containing the name of the last macro object processed.

@curobject

This function returns a string containing the name of the last class object processed.

Pseudo-Variables

HLA provides several special identifiers that act as functions in expressions and as variables in VAL assignments. These "pseudo-variables" let you control the code emission during compilation. Typically, you would use these pseudo-variables in a statement like "?@bound:=true;" in order to set their values.

@parmoffset

This variable contains the the starting offset for parameters. This is generally eight for most procedures since the parameters start at offset eight. You can change this value during assembly by assigning a value to this variable (e.g., ?@parmoffset = 10;). However, this activity is not recommended except by advanced programmers.

@localoffset

This variable returns the starting offset for local variables in an activation record. This is typically zero. You can change this value during assembly by assigning a value to this variable (e.g., ?@localoffset = -10;). However, this activity is not recommended except by advanced programmers.

@basereg

This variable returns a string containing either "ebp" or "esp". You assign either ebp or esp (the registers, not a string) to this variable. This sets the base register that HLA uses for automatic (VAR) variables. The default is ebp . Examples:

?SaveBase :string := @basereg;

?@basereg := esp;

<< code that uses esp to access locals and parameters>>

?@basereg := @text( SaveBase ); // Restore to original register.

Note the use of @text to convert the string to an actual register name. This must be done because HLA only allows the assignment of the actual ebp/esp registers to @basereg , not a string.

@enumsize

This assembly time variable specifies the size (in bytes) of enumerated objects. This has a default value of one.

@minparmsize

This assembly time variable has the initial value four. You should not change the value of this object when running under Win32, Linux, or other 32-bit OS.

@bound

This assembly time variable is a boolean value that indicates whether HLA compiles the BOUND instruction into actual machine code (or ignores the BOUND instruction).

@into

This assembly time variable is a boolean value that indicates whether HLA compiles the INTO instruction into actual machine code.

@exceptions

This assembly time variable controls whether HLA emits full exception handling code or an abbreviated set of routines. If this variable contains true, then HLA emits the full exception handling code. If false, the HLA emits the minimal amount of code to pass exceptions on to Windows or Linux. Note that this variable only affects code generation in the main program, it does not affect the code generation in a UNIT. This variable must be set to true before the BEGIN clause associated with the main program if it is to have any effect. Note that including the EXCEPTS.HHF file automatically sets this to true; so you will have to explicitly set it to false if you include this file (or some other file that includes EXCEPTS.HHF, like STDLIB.HHF).

@optstring

By default, HLA folds string constants to generate better code. This means that whenever you ask the compiler to emit code for a string constant like "Hello World" the compiler will first check to see if it has already emitted such a string. If so, the compiler uses the reference to the original string constant rather than emitting a second copy of the string; this shortens the size of your program if there are multiple occurrences of the same string in the program. Since string constants generally go into a read-only section of memory, the program cannot accidentally change this unique occurrence. However, if you elect to make the CONSTS segment writable, you might not want HLA to fold string constants in this manner. The @optstrings pseudo-variable lets you control this optimization. If @optstrings is true (the default condition), then HLA folds all duplicate string constants; if @optstrings is false, then HLA emits duplicate strings to the CONSTS section.

@trace

This boolean variable controls the emission of "trace" statements by the HLA compiler. This feature is offered in lieu of a decent debugger for tracing through HLA programs. When this variable is false (the default), HLA emits the code you specify. However, if you set this compile-time variable to true, HLA emits the following code before most statements in the program:

_traceLine_( filename, linenumber );

The filename parameter is a string the specifies the current filename HLA is processing. The linenumber parameter is an uns32 value that specifies the current line number in the file. You are responsible for supplying the " _traceLine_ " procedure somewhere in your program. Here's a typical implementation:

procedure trace( filename:string; linenumber:uns32 ); @external( "_traceLine_" );

procedure trace( filename:string; linenumber:uns32 ); @nodisplay;

begin trace;

pushfd(); // This function must preserve all registers and flags!

stdout.put( filename, ": #", linenumber, nl );

popfd();

end trace;

As the comments above note, it is your responsibility to preserve all registers and flags in the _traceLine_ procedure. If you fail to do this, it will corrupt those values in the code that calls _traceLine_ .

A common operation inside the _traceLine_ procedure is to display register values. Don't forget that EBP's and ESP's values are modified by this call. Furthermore, if you do any processing whatsoever at all, the flag values will change. To obtain EBP's value prior to the call, fetch the dword at address [EBP+0]. To obtain ESP's value, take the value of EBP inside _traceLine_ and subtract 16 from it (EBP, return address, and eight bytes of parameters are on the stack). Obviously if you build _traceLine_ 's activation record yourself, these values can change. To display the flag values, access the copy of the FLAGs register you pushed on the stack (at offset [EBP-4] in the code above).

In addition to simply displaying values, you can write some very sophisticated debugging routines that let you set breakpoints, watch values, and so on. Someday the HLA Standard Library will include some trace support functions, until then have fun doing whatever you want.

Text emission functions

@text( str_expr )

This function replaces itself with the text of the specified parameter. The result is then processed by HLA. E.g.,

@text( "mov( 0, eax );" );

The above is equivalent to the single move instruction.

@string:identifier

The identifier must be a constant of type text. HLA replaces this item with the string data assigned to the text object.

@tostring:identifier

Like @string:identifier, the identifier must be a constant of type text. Also like @string:identifier, HLA replaces this item with the string data assigned to the text object. However, this function also converts identifier from a text to a string object.

Miscellaneous Functions

@section

This function returns a 32-bit bitmap that identifies the current point in the source. Identification is as follows:

Bit 0: Currently processing the CONST section.

Bit 1: Currently processing the VAL section.

Bit 2: Currently processing the TYPE section.

Bit 3: Currently processing the VAR section.

Bit 4: Currently processing the STATIC section.

Bit 5: Currently processing the READONLY section.

Bit 6: Currently processing the STORAGE section.

Bit 12: Currently processing statements in the "main" program.

Bit 13: Currently processing statements in a procedure.

Bit 14: Currently processing statements in a method.

Bit 15: Currently processing statements in an iterator.

Bit 16: Currently processing statements in a #macro.

Bit 17: Currently processing statements in a #keyword macro.

Bit 18: Currently processing statements in a #terminator macro.

Bit 19: Currently processing statements in a thunk.

Bit 23: Currently processing statements in a Unit.

Bit 24: Currently processing statements in a Program.

Bit 25: Currently processing statements in a record.

Bit 26: Currently processing statements in a union.

Bit 27: Currently processing statements in a class.

Bit 28: Currently processing statements in a namespace.

This function is useful in macros to determine if a macro expansion is legal at a given point in a program.

#Text and #endtext Text Collection Directives

The #TEXT and #ENDTEXT directives surround a block of text in an HLA program from which HLA will create an array of string constants. The syntax for these directives is:

#text( identifier )

<< arbitrary lines of text >>

#endtext

The identifier must either be an undefined symbol or an object declared in the VAL section.

This directive converts each line of text between the #TEXT and #ENDTEXT directives into a string and then builds an array of strings from all this text. After building the array of strings, HLA assigns this array to the identifier symbol. This is a VAL constant array of strings. The #TEXT..#ENDTEXT directives may appear anywhere in the program where white space is allowed.

Although these directives provide an easy way to initialize a constant array of strings, the real purpose for these directives is to allow the inclusion of Domain Specific Embedded Language (DSEL) text within an HLA program. Presumably, a parser (written with macros and the HLA compile-time language) would process the statements between the #TEXT and #ENDTEXT directives.

The #Include Directive

Like most languages, HLA provides a source inclusion directive that inserts some other file into the middle of a source file during compilation. HLA's #INCLUDE directive is very similar to the pragma of the same name in C/C++ and you primarily use them both for the same purpose: including library header files into your programs.

HLA's include directive has the following syntax:

#include( string_expression );

Note that any arbitrary compile-time string expression is legal. You are not limited to a literal string constant.

The #INCLUDE directive is legal anywhere whitespace is legal. The string specifies a filename that HLA will insert into the program during compilation at the point the #INCLUDE appears. If HLA cannot find the file specified by the string constant in the current directory (or in the directory specified if the string contains a pathname), then HLA tries to find the file in the location specified by the "hlainc" environment variable. If HLA still doesn't find the file, HLA will report an error.

Although you can use the #INCLUDE directive to insert any arbitrary text at an arbitrary point in your program, the vast majority of the time you will use #INCLUDE to include a library header file (either an HLA Standard Library header file or a library header file you've written) into your program. HLA requires that you compile all external files at lex level zero. Therefore, if you are including some declarations into your program, the #INCLUDE directive should be just inside the main program. Convention dictates that #INCLUDE directives that include library headers should appear immediately after the "program" or "unit" header in a file.

The #IncludeOnce Directive

When composing complex header files, particularly when constructing library header files, you may find in necessary to insert a #INCLUDE("file") directive into some other header files. Generally, this is not a problem, HLA certainly allows nested include files (up to 256 files deep). However, unless you are very careful about how you organize your files, it is very easy to create an "include loop" where one header file includes another and that other header file includes the first. Attempting to compile a program that includes either header file results in an infinite "include loop" during compilation. Clearly, this is not desirable.

The standard way to handle this situation is to surround all the statements in an include file with a #IF statement as follows:

#if( !@defined( headerfilename_hhf ))

?headerfilename_hhf := true;

<< Statements associated with this header file go here >>

#endif

The first time HLA includes this file the symbol "headerfilename_hhf" is not defined, so HLA processes the statements in the body of the #IF statement. The very first statement defines this "headerfilename_hhf" symbol (the value and type of this symbol are irrelevant for our purposes; only the fact that the symbol exists is important). Thereafter, if some other header file includes this file a second (or additional) time, the "headerfilename_hhf" symbol is defined, so HLA skips all the statements in the header file since the value of the boolean expression in the #IF statement is false. Therefore, HLA only processes the statements of this header file (at least those inside the #IF statement) the first time it encounters this particular header file.

A drawback to this scheme is that HLA must still open the header file and read each and every line from the file, even if it ignores all the lines in the file. For large header files (e.g., the "stdlib.hhf" header file) this can consume a significant amount of time during compilation. The #includeonce directive provides a solution for this problem.

You use the #INCLUDEONCE directive just like the #INCLUDE directive. The only difference between the two is that HLA keeps track of all files it has processed using the #INCLUDE or #INCLUDEONCE directives and will not process a header file a second time if you attempt to include it using the #INCLUDEONCE directive.

Whenever HLA processes the #INCLUDEONCE directive, it first compares its string operand with a list of strings appearing in previous #INCLUDE or #INCLUDEONCE directives. If it matches one of these previous strings, then HLA ignores the #INCLUDEONCE directive; if the include filename does not appear in HLA' internal list, then HLA adds this filename to the list and includes the file.

Note that HLA's #INCLUDEONCE directive only compares strings for equality. If you use two separate filenames for the same file, HLA will not detect this and it will include the file a second time. E.g., if the current directory is "C:\hlafiles" then the following sequence will include the file "whoops.hhf" twice:

#IncludeOnce( "whoops.hhf" )

#IncludeOnce( "c:\whoops.hhf" )

Also note that the #INCLUDE directive will include its file regardless of whether the program previously included that file with a #INCLUDEONCE directive, e.g., the following sequence also includes "whoops.hhf" twice:

#IncludeOnce( "whoops.hhf" )

#Include( "whoops.hhf" )

For these two reasons, it's still a good idea to protect all header files using the #IF technique mentioned earlier, even if you use the #IncludeOnce directive throughout.

The #asm..#endasm and #emit Directives

HLA is far from perfect. There are many missing instructions (some left out on purpose, some left out because of laziness, and some left out because of ignorance). For example, HLA currently doesn't support the SSE instruction set. Fret not, though, HLA v1.x provides two escape mechanisms that let you do anything legal in MASM, Gas, or the underlying assembler used to process HLA's output.

The first of these escape mechanisms is the #ASM..#ENDASM section. The syntax for an assembly block is as follows:

#asm

<< text that is >>

<< emitted directly >>

<< to the MASM out- >>

<< put file. >>

#endasm

All text appearing between the #ASM and #ENDASM directives is emitted directly to the .ASM output file produced by HLA. MASM, Gas, or whatever assembler you're using, will be responsible for assembling this code. Of course, you must supply source code that is compatible with your assembler in an assembly block or the assembly process will fail when the assembler encounters your code.

The second escape mechanism is the #EMIT directive. This directive takes the following form:

#emit( string_expression )

HLA evaluates the string expression and then emits that expression to the output .ASM file.

Within the #asm..#endasm block or within the string you supply as a parameter to #emit, the assembly language source code may only access external labels you declare in your HLA program. Furthermore, you must refer to that external object using its external name, not the HLA name; the external name is the same as the HLA name if you did not supply the optional string parameter to the external clause, the external name is the value of the optional string parameter if the external directive takes the form @external("extName");

Note that the .ASM file that HLA produces contains very few of the labels found in an HLA source file (generally, only external names appear unchanged in the HLA output file). You cannot expect to access HLA variables, procedures, and other non-external objects by simply using their name in an assembly block or in the #emit statement. Indeed, accessing non-external HLA names in the #asm..#endasm block is nearly impossible.

You cannot directly access non-external HLA names in the code emitted by the #emit directive, but by using HLA built-in string and symbol table functions you can build instructions that achieve the desired result. Consider the following simple code segment:

program DemoEmit;

#include( "stdio.hhf" );

var

i:int32;

begin DemoEmit;

mov( 5, i );

#emit( "add dword ptr [ebp+" + string( @offset(i)) +"], 6" );

stdout.put( "i=", i, nl );

end DemoEmit;

In this example, the "@offset" instruction was used to compute the offset of the local variable and build the appropriate addressing mode for the add instruction. Of course, using #emit in this way is silly (should have just used the ADD instruction here), but it demonstrate the basic idea.

Another solution is to use HLA's " #:identifer " operator that is active only within a #asm..#endasm block. HLA will substitute the internal value (or identifier) for the specified identifier wherever it appears within the #asm..#endasm block. Note: within a #asm..#endasm block, HLA only looks for the #:identifier sequence. It will perform this substitution even if this pattern appears within a comment or string constant. So be careful when you use this. Example:

var

i:int32;

static

j:int32;

#asm

mov eax, #:i ;substitutes [ebp±disp] for i.

mov #:j, eax ;substitutes internal name for j.

#endasm

Warning: #asm..#endasm and #emit are intended for use by advanced programmers only. They should not be used by casual or beginning assembly language programmers.

Note that it is perfectly reasonable to reference external HLA names within the #asm..#endasm block or in the string supplied to #emit . The following example demonstrates how to reference an HLA statement label from within a #asm..#endasm block:

procedure refsLabel

label

x; @external;

y; @external( "z" );

begin refsLabel;

#asm

jne x

jmp z ;Note, refers to 'y' using external name!

#endasm

end refsLabel;

The example above shows how to reference statement labels within a #asm..#endasm block; keep in mind that you can referency any external symbol from the assembly output code including static variables and procedures. You may not access var, val, const , or macro names in this manner (though you may use the #:identifier form to access var, val , and const objects).

The #system Directive

The #SYSTEM directive requires a single string parameter. It executes this string as an operating system (shell/command interpreter) operation via the C "system" function call. This call is useful, for example, to run a program during compilation that dynamically creates a text file that an HLA program may include immediately after the #system invocation.

Example:

#system( "dir" )

Note that the "#system" directive is legal anywhere white space is allowable and doesn't require a semicolon at the end of the statement.

The #print and #error Directives

The #PRINT directive displays its parameter values during compilation. The basic syntax is the following:

#print( comma, separated, list, of, constant, expressions, ... )

The #PRINT statement is very useful for displaying messages during assembly (e.g., when debugging complex macros or compile-time programs). The items in the #PRINT list must evaluate to constant (CONST or VAL) values at compile time.

The #ERROR directive behaves like #PRINT insofar as it prints its parameter to the console device during compilation. However, this instruction also generates an HLA error message and does not allow the creation of an object file after compilation. This statement only allows a single string expression as a parameter. If you need to print multiple values of different types, use string concatenation and the @string function to achieve this. Example:

#error( "Error, unexpected value. Value = " + #string( theValue ))

Notice that neither the #print nor the #error statements end with a semicolon.

Compile-Time File Output (#openwrite, #write, #closewrite)

These compile-time statements let you do simple file output during compilation. The #openwrite statement opens a single file for output, #write writes data to that output file, and #closewrite closes the file when output is complete. These statements are useful for automatically generating INCLUDE files that the source file will include later on during the compilation. These statements are also useful for storing bulk data for later retrieval or generating a log during assembly.

The #openwrite statement uses the following syntax:

#openwrite( string_expression )

This call opens a single output file using the filename specified by the string expression. If the system cannot open the file, HLA emits a compilation error. Note that #openwrite only allows one output file to be active at a time. HLA will report an error if you execute #openwrite and there is already an output file open. If the file already exists, HLA deletes it prior to opening it (so be careful!). If the file does not already exist, HLA creates a new one with the specified name.

The #write statement uses the same syntax as the #print directive. Note, however, that #write doesn't automatically emit a newline after writing all its operands to the file; if you want a newline output you must explicitly supply it as the last parameter to #write .

The #closewrite statement closes the file opened via #openwrite . HLA automatically closes this file at the end of assembly if you leave it open. However, you must explicitly close this file before attempting to use the data (via include or #openread ) in your program. Also, since HLA allows only one open output file at a time, you must use #closewrite to close the file before you can open another with #openwrite .

Warning: Internally, the #write statement simply redirects the standard output stream to send output to the write file and then invokes #print , restoring the standard output file handle upon return. This creates a minor problem if there is a syntax error in the #write operand list -- the error message gets written to the output file! If you're having problems with the #write output, temporarily change it to #print to see if there's an error in the statement. This defect will probably get fixed in some future version (beyond HLA v1.32).

Compile-time File Input (#openread, @read, #closeread)

These compile-time statements and function let you do simple file input during compilation. The #openread statement opens a single file for input, @read is a compile-time function that reads a line of text from the file, and #closeread closes the file when input is complete. These statements are useful for reading files produced by #openwrite/#write/#close write or any other text file during compilation.

The #openwrite statement uses the following syntax:

#openwrite( filename )

The filename parameter must be a string expression or HLA reports an error. HLA attempts to open the specified file for reading; HLA prints an error message if it cannot open the file.

The @read function uses the following call syntax:

@read( val_object )

The val_object parameter must either be a symbol you've defined in a VAL section (or via "?") or it must be an undefined symbol (in which case @read defines it as a VAL object). @read is an HLA compile-time function (hence the "@" prefix rather than "#"; HLA uses "#" for compile-time statements). It returns either true or false, true if the read was successful, false if the read operation encountered the end of file. Note that if any other read error occurs, HLA will print an error message and return false as the function result. If the read operation is successful, then HLA stores the string it read (up to 4095 characters) into the VAL object specified by the parameter. Unlike #openread and #closeread , the @read function may not appear arbitrarily in your source file. It must appear within a constant expression since it returns a boolean result (and it is your responsibility to check for EOF).

The #closeread statement closes the input file. Since you may only have one open input file at a time, you must close an open input file with #closeread prior to opening a second file. Syntax:

#closeread

Example of using compile-time file I/O:

#openwrite( "hw.txt" )

#write( "Hello World", nl )

#closewrite

#openread( "hw.txt" )

?goodread := @read( s );

#closeread

#print( "data read from file = ", s )

The Conditional Compilation Statements (#if)

The conditional compilation statements in HLA use the following syntax:

#if( constant_boolean_expression )

<< Statements to compile if the >>

<< expression above is true. >>

#elseif( constant_boolean_expression )

<< Statements to compile if the >>

<< expression immediately above >>

<< is true and the first expres->>

<< sion above is false. >>

#else

<< Statements to compile if both >>

<< the expressions above are false. >>

#endif

The #ELSEIF and #ELSE clauses are optional. As you would expect, there may be more than one #ELSEIF clause in the same conditional if sequence.

Unlike some other assemblers and high level languages, HLA's conditional compilation directives are legal anywhere whitespace is legal. You could even embed them in the middle of an instruction! While directly embedding these directives in an instruction isn't recommended (because it would make your code very hard to read), it's nice to know that you can place these directives in a macro and then replace an instruction operand with a macro invocation.

An important thing to note about this directive is that the constant expression in the #IF and #ELSEIF clauses must be of type boolean or HLA will emit an error. Any legal constant expression that produces a boolean result is legal here. In particular, you are limited to expressions like those allowed by the HLA HLL IF statement.

Keep in mind that conditional compilation directives are executed at compile-time, not at run-time. You would not use these directives to (attempt to) make decisions while your program is actually running.

The Compile-Time Loop Statements (#while and #for)

The HLA compile time language also provides a couple of looping structures -- the #WHILE loop and the #FOR loop.

The #while..#endwhile compile-time loop takes the following form:

#while( constant_boolean_expression )

<< Statements to execute as long >>

<< as the expression is true. >>

#endwhile

While processing the #while..#endwhile loop, HLA evaluates the constant boolean expression. If it is false, HLA immediately skips to the first statement beyond the #endwhile directive.

If the expression is true, then HLA proceeds to compile the body of the #while loop. Upon encountering the #endwhile directive, HLA jumps back up to the #while clause in the source code and repeats this process until the expression evaluates false.

Warning: since HLA allows you to create loops in your source code that evaluation during the compilation process, HLA also allows you to create infinite loops that will lock up the system during compilation. If HLA seems to have gone off into la-la land during compilation and you're using #while loops in your code, it might not be a bad idea to put some #print directives into your loop(s) to see if you've created an infinite loop.

Note: because of the limitations of HLA's implementation language (FLEX and BISON), it is not possible to begin a #while loop and have the matching #endwhile appear in a (different) macro or TEXT constant. When the HLA compiler encounters a #while statement it scans the source code looking for the matching #endwhile collecting up the statements that make up the body of the loop. During this scan it does not expand TEXT constants or macros. Hence, if you bury the #endwhile in a macro or TEXT constant HLA will not be able to find it. For performance and functional reasons, HLA cannot expand macro and TEXT variables during this scan. This is a limitation we will all have to live with until v2.0 of HLA (which will be rewritten in a different language).

The #for..#endfor loop can take one of the following forms:

#for( loop_control_var := Start_expr to end_expr )

<< Statements to execute as long as the loop control variable's >>

<< value is less than or equal to the ending expression. >>

#endfor

#for( loop_control_var := Start_expr downto end_expr )

<< Statements to execute as long as the loop control variable's >>

<< value is greater than or equal to the ending expression. >>

#endfor

The HLA compile-time #for..#endfor statement is very similar to the for loops found in languages like Pascal and BASIC. This is a definite loop that executes some number of times determine when HLA first encounters the #for directive (this can be zero or more times, but the number is computed only once when HLA encounters the #for ). The loop control variable must be a VALUE object or an undefined identifier (in which case, HLA will create a new VALUE object with the specified name). Also, the number control variable must be an eight, sixteen, or thirty-two bit integer value (uns8, uns16, uns32, int8, int16, or int32). Also, the starting and ending expressions must be values that an int32 VALUE object can hold.

The #for loop with the to clause initializes the loop control variable with the starting value and repeats the loop as long as the loop control variable's value is less than or equal to the ending expression's value. The #for..to..#endfor loop increments the loop control variable on each iteration of the loop.

The #for loop with the downto clause initializes the loop control variable with the starting value and repeats the loop as long as the loop control variable's value is greater than or equal to the ending expression's value. The #for..downto..#endfor loop decrements the loop control variable on each iteration of the loop.

Note that the #for..to/downto..#endfor loop only computes the value of the ending expression once, when HLA first encounters the #for statement. If the components of this expression would change as a result of the execution of the #for loop's body, this will not affect the number of loop iterations.

The #for..#endfor loop can also take the following form:

#for( loop_control_var in composite_expr )

<< Statements to execute for each element present in the expression >>

#endfor

The composite_expr in this syntactical form may be a string, a character set, an array, or a record constant.

This particular form of the #for loop repeats once for each item that is a member of the composite expression. For strings, the loop repeats once for each character in the string and the loop control variable is set to each successive character in the string. For character sets, the loop repeats for each character that is a member of the set; the loop control variable is assigned the value of each character found in the set (you should assume that the extraction of characters from the set is arbitrary, even though the current implementation extracts them in order of their ASCII codes). For arrays, this #for loop variant repeats for each element of the array and assigns each successive array element to the loop control variable. For record constants, the #for loop extracts each field and assigns the fields, in turn, to the loop control variable.

Examples:

#for( c in "Hello" )

#print( c ) // Prints the five characters 'H', 'e', ..., 'o'

#endfor

// The following prints a..z and 0..9 (not necessarily in that order):

#for( c in {'a'..'z', '0'..'9'} )

#print( c )

#endfor

// The following prints 1, 10, 100, 1000

#for( i in [1, 10, 100, 1000] )

#print( i )

#endfor

// The following prints all the fields of the record type r

// (presumably, r is a record type you've defined elsewhere):

#for( rv in r:[0, 'a', "Hello", 3.14159] )

#print( rv )

#endfor

Compile-Time Functions (macros)

Keep in mind that HLA macros are text expansion devices that may appear anywhere whitespace is allowed. Therefore, you can use them for so much more than 80x86 instruction synthesis. In particular, along with the "?" operator, you can create compile-time functions. For example, consider the following macro that converts the first character of a string to upper case and forces the remaining characters to lower case:

program macroFuncDemo;

#include( "stdio.hhf" );

#macro Capitalize( s );

@uppercase( @substr( s,0,1), 0 ) +

@lowercase( @substr( s, 1, 1000 ), 0)

#endmacro

static

Hello: string := Capitalize( "hELLO" );

World: string := Capitalize( "world" );

begin macroFuncDemo;

stdout.put( Hello, " ", World, nl );

end macroFuncDemo;

HLA Units and External Compilation

This section discusses how to create separately compilable modules in HLA and how you can link HLA code with code written in other languages.

External Declarations

HLA provides two features to support separate compilation: units and external objects. HLA uses a very general scheme, similar to C++ to communicate linkage information between object modules. This scheme lets HLA programmers link to their HLA programs code written in HLA, "pure" assembly (i.e., MASM code), and even code written in other high level languages (HLLs). Conversely, the HLA program can also write modules to be linked with programs written in this other languages (as well as HLA).

Writing separate modules is quite similar to writing a single HLA program. The first thing to note is that an executable can have only one main program. When writing HLA programs, the "PROGRAM" reserved word tells HLA that you are writing a module that contains a main program. When writing other modules, you must use a "UNIT" rather than a "PROGRAM" so as not to generate an extra main procedure. If you wish to write a library module that contains only procedures and no main program, you would use an HLA unit. Units have a syntax that is nearly identical to programs, there just isn't a BEGIN associated with the unit, e.g.,

unit UnitName;

<< Declarations >>

end UnitName;

Since a unit does not contain a main program, it cannot compile into a stand-alone program; therefore, you should always compile units with the "-c" command line option to avoid running the linker on the unit code (which will always produce a link error)29.

HLA uses the "@EXTERNAL" keyword to communicate names between modules in a compilation group. If a symbol is defined to be external, HLA assumes that the symbol is declared in a separate module and leaves it up to the linker to resolve the symbol's address.

Only two types of symbols may be external: procedures and static variables30. Variables declared in the VAR section cannot be external because the linker cannot statically resolve their run-time address. Constants declared in the CONST or VAL section cannot be external, however this is not a limitation because most programmers place public constants in header files and include them in the source files that require them.

Recall the syntax for a procedure declaration presented in the basic HLA documentation:

procedure identifier ( optional_parameter_list ); procedure_options

declarations

begin identifier;

statements

end identifier;

There are two additional forms to consider:

procedure identifier ( optional_parameter_list );

options

@external;

procedure identifier ( optional_parameter_list );

options

@external("extname");

These two forms tell the HLA compiler that it is okay to call the specified procedure, but the procedure itself may not otherwise appear in the current source file. It is the responsibility of the linker to ensure that the specified external procedures actually appear within the object modules the linker is combining.

The first form above is generally used when the external procedure is an HLA procedure that appears in a different source module. HLA assumes that the external name is the same name as the procedure identifier31.

The second form above is generally used when calling code written in a language other than HLA32. This form lets you explicitly state (via the string constant "extname") the name of the external procedure. This is especially important when calling procedures whose names contain characters that are not "HLA-Friendly." For example, many Windows API calls have at signs ("@") in their names; to call such routines you would use the second form of the external declaration above supplying the Windows API compatible name as the parameter to the @external reserved word.

It is perfectly legal to declare an external procedure in the same source file that the procedure's actual code appears. However, the external declaration must appear before the actual declaration or HLA will generate an error. Whenever an external declaration appears in the same source file as the actual procedure code, HLA emits code to ensure that the procedure's name is public. Therefore, the external declaration must appear in the same file as the procedure's code if you wish the linker to be able to resolve the procedure's address at link time. This external declaration serves the same purpose as the "public" directive in other assemblers (e.g., MASM). Note that, unlike C/C++, procedure names are not automatically public. An external declaration must appear in the same file as the procedure code to make the symbol public.

Also note above that the only options an external procedure declaration supports are the @returns, @pascal, @cdecal, and @ stdcall options. You cannot use the @align, @noalignstack, @noframe or @ nodisplay options in an external declaration. Conversely, if an @external (or @forward , for that matter) declaration appears in a source file, the corresponding procedure code may only contain the @align, @noalignstadk, @noframe , and/or @ nodisplay options. The @returns, @pascal, @cdecl , and @stdcall options are not legal in a procedure declaration if a corresponding @ external (or @forward ) declaration is present in the source code.

Note: External procedures are only legal at lex level one. You cannot declare an external procedure that is embedded inside another procedure.

In addition to procedures, HLA also lets you declare external variables. You may reference such variables in different source modules. The declaration of an external variable is very similar to the declaration of an external procedure: you follow the variable's name with the external clause. If an optional string parameter is not present, HLA uses the variable's name as it's external name. If you need to specify a specific name, to avoid conflicts with MASM or to contain characters illegal in an HLA identifier, then provide a string with the identifier you need.

Note that HLA does not allow the @EXTERNAL keyword after every static declaration. Instead, only the following variable declarations allow the @EXTERNAL keyword:

In particular, note that static variable declarations with initializers cannot be external. Also note that ENUM, RECORD, and UNION variables (those variables you directly create as ENUM, RECORD, or UNION) may not be external; this is not a serious limitation, however, since you can declare a named type in the "TYPE" section and use the third form above to create an external object of the desired type (this is also how you would declare @EXTERNAL class variables).

Like the C/C++ language, you normally put all your external declarations in a header file and include that header file using the "#include" directive in each of the source files that reference the external symbols. This eases program maintence by having to change only a single definition in an include file rather than multiple definitions across different source files (if not using include files). See the HLA Standard Library code for some good examples of using HLA header files.

By convention, HLA header files that contain external declarations always have an ".HHF" suffix (HLA Header File). To help make your programs easy to read by others, you should always use this same suffix for your HLA header files.

HLA Naming Conventions and Other Languages

If you wish to link together code written in a different language with code written in HLA, you must be aware of the differences in naming conventions between the two languages.

With respect to names, keep in mind that HLA is a case-neutral language. To the outside world, this means that HLA is case sensitive. Therefore, all public names that HLA and MASM export are case sensitive. If you are using a case insensitive language like Pascal or Delphi, you should check with your compiler vendor to determine how the language emits public names (usually, case insensitive languages convert all public symbols to all upper case or all lower case). Some languages, e.g., MASM, let you choose whether public symbols are case sensitive or case insensitive; for such languages, you should select case sensitivity as the default and spell your names the same (with respect to case) between the HLA code and the other language.

In some cases, it might not be possible to match an HLA identifier with a public or external identifier in another language. One possible reason for this problem is that HLA only allows alphanumeric characters and underscores in identifiers; some other languages (e.g., MASM) allow other characters in their names while other language (e.g., C++) often "mangle" their names by adding additional characters that are normally illegal within identifiers (e.g., the at sign, "@").

The HLA @EXTERNAL directive provides an option that lets you use a standard HLA identifier within your program, but utiltize a completely different identifier as the public symbol. The standard HLA identifier restrictions do not apply to the external name33. This variant of the external directive takes the following forms:

External procedure declaration:

procedure ProcName; @external( "ExtProcName" );

External variable declaration:

varName: SomeType; @external( "ExtVarName" );

Within the confines of the HLA program, you would use the HLA identifiers " ProcName " and " varName ". To the outside world, however, you would use the names " ExtProcName " and " ExtVarName " to reference these objects.

Since the "@EXTERNAL" parameter is a string constant rather than an HLA identifier, you can use characters that would otherwise be illegal in an HLA identifier. For example, Microsoft's Visual C++ language and Windows often insert the "@" symbol into identifiers. Normally, this character is illegal in (user-defined) HLA symbols. You may, however, give an identifier a legal HLA name and then specify the VC++ compatible name within the string constant. For example, here is a typical procedure declaration found in the HLA standard library "fileio.hla" source file:

procedure WriteFile

(

overlapped: dword;

var bytesWritten: dword;

len: dword;

var buffer: byte;

Handle: dword

);

@external( "_WriteFile@20" );

(The "@20" suffix is a Win32 convention that indicates that there are 20 bytes of parameter data in this external function.)

As noted above, many languages "mangle" their external names for one reason or another. In addition to the "@20" suffix in the previous example, you will also note that VC++ added a leading underscore to the name (this procedure calls the Win32 API " WriteFile " function). Once again, this name mangling is a function of the particular compiler being used. Sinces Windows itself is written in VC++, Win32 API calls follow the VC++ standards for name mangling.

In addition to giving you the ability to conform external names as needed by external languages, the string parameter of the @EXTERNAL directive will let you change the name for more mundane reasons. For example, if you really don't like the external name, perhaps it is not descriptive of the operation, you can use the string parameter feature of the external directive to allow the use of a different, perhaps more descriptive, name in your HLA code.

Some languages, for example C++, provide function overloading. This means that a program can use the same name to reference two completely different procedures in the code. Within the object file, however, all names must be unique. Once again, the compiler's name mangling facilities come into play to generate unique names. How a particular name is mangled is extremely compiler sensitive (e.g., Borland's C++ mangles names differently than Microsoft's Visual C++, even when compiling the same exact C++ program). When deciding on the name with which to reference an external procedure, you may need to consult your compiler documentation or be willing to experiment around a bit.

HLA Calling Conventions and Other Languages

Of course, HLA is an assembly language, so it is possible via the PUSH and CALL instructions to mimic any calling sequence used by any language that allows the call of external assembly language code (which covers almost all languages). However, when using the HLA high level language features, in particular, HLA procedure declarations and calls, there are some details you must be aware of in order to successfully call code written in other languages or have those other languages call your code.

By default, HLA assumes that all parameters are pushed on the stack in a left-to-right order as the parameters appear in the formal parameter list. Some languages, like Pascal and Delphi, use this same calling mechanism. A few languages, most notably C/C++, push their parameters in the right-to-left order. If the language expects the parameters to be in the reverse order (right-to-left), a simple solution is to use the @cdecl or @stdcall procedure options to specify the calling convention.

Many languages, like HLA, Pascal, and Delphi, make it the procedure's resposibility to clear parameters from the stack when the procedure returns to the caller. Some languages, like C/C++ make it the caller's responsibility to clear parameters from the stack after the procedure returns to the caller. Procedures you declare with the @ pascal and @ stdcall procedure options automatically remove their parameter data from the stack when they return. Procedures you declare with the @ cdecl option leave it up to the caller to remove the parameter data from the stack. Note that when using the HLA high-level procedure calling syntax, HLA automatically pushes the parameters on the stack in the correct order ("correct" as defined by the procedure's calling convention).

HLA procedures do not support a variable number of parameters in a parameter list. If you need this facility (e.g., to call a C/C++ function) then you will need to manually push the parameters on the stack yourself prior to calling the function. Procedures that have a variable number of parameters almost always using the @ cdecl calling convention; since only the caller knows how much parameter data to remove from the stack, the procedure generally cannot remove the parameter data (as the @ pascal and @ stdcall conventions do).

Calling Procedures Written in a Different Language

When calling a subroutine written in a different language, your code must pass the parameters as the other language expects and clean up the parameters if the target language requires your code to do so upon return. Generally, calling code written in other languages is relatively easy. You've got to ensure that you're passing the parameters in the proper places (e.g., in registers or pushing them on the stack in an appropriate order). Generally, such a call only requires that you provide a suitable external procedure declaration (e.g., swapping the order of the parameters in the parameter list if the language passes parameters in a right-to-left order). Some languages may require additional data structures (e.g., static links) to be passed. It is your resposibility to determine if such data is necessary and pass it to the subroutine you are calling.

Calling HLA Procedures From Another Language

Calling HLA procedures from another language is somewhat more complex that the converse operation. You still have the problem of parameter ordering; though this is usually fixed by reversing the parameters in the parameter list (e.g., using the @ cdecl or @ stdcall procedure options).

A bigger problem is the responsibility of cleaning up the parameters on the stack. By default, an HLA procedure automatically removes parameter data from the stack upon return. If the calling code thinks that it has the responsibility to do this cleanup, the parameter data will be removed twice, with disasterous results. Such code must use the @ cdecl calling convention or you must use the @ noframe option (and probably @ nodisplay as well) to disable the automatic generation of procedure entry and exit code. Then you must manually write the code that sets up the activation record and returns from the procedure. Upon return, you must use the "RET()" instruction without a numeric parameter.

HLA external procedures must always be declared at lex level one. Since the condition of the stack is unknown upon entry into HLA code from some externally written code, your external HLA procedures should not depend upon the display to access non-local variables. HLA procedures that other languages call should always have the @ nodisplay option associated with them. While it is okay to access non-local STATIC objects, you should never attempt to access non-local VAR objects from a procedure that code written in a different language will call.

HLA's @ pascal, @stdcall , and @ cdecl procedure options cover the calling conventions of most modern high level languages. However, other calling conventions do exist (for example, the METAWARE compilers give you an option of passing parameters in the left-to-right order and it is the caller's responsibility to clean up the stack afterwards). Some languages don't even pass their parameters on the stack. Some languages pass some or all of the parameters in registers. If you are linking your HLA code with a language that uses one of these non-standard calling conventions, it is your responsibility to write the explicit HLA code that passes these parameters and cleans up the parameter data upon return from the procedure.

Linking in Code Written in Other Languages

When linking in code written in a different language to an HLA main program, keep in mind that the foreign code may make calls to the standard library associated with the other language. You may need to link in that code as well. Also keep in mind that some compilers emit code that assumes that certain initialization has occurred when the program is loaded into memory. Unfortunately, if the main program is not written in this other language (i.e., main is written in HLA), this initialization might not have been done. This may very well cause the routine you're linking into an HLA program to fail.

Conversely, be very careful about calling HLA standard library routines in code you expect to link into programs written in other languages. The HLA standard library routines (and the exception handling code, in particular), rely upon initialization that the HLA main program performs. This could create a problem, for example, if you attempt to execute some procedure that raises an exception and the exception handling code has not been initialized.

The 80x86 Instruction Set in HLA

One of the most obvious differences between HLA and standard 80x86 assembly language is the syntax for the machine instructions. The two primary differences are the fact that HLA uses a functional notation for machine instructions and HLA arranges the operands in a (source, dest) format rather than the (dest, source) format used by Intel.

A second difference, related to the fact that HLA uses a functional notation, is that HLA allows you to compose instructions. That is, one instruction may appear as an operand to a second instruction, e.g.,

mov( mov( 0, eax ), ebx );

To decipher this instruction, all you need to do is to realize that at compile time each instruction returns a string that HLA substitutes in place of the composed instruction. Usually, the string an instruction returns is that instruction's destination operand. In the example above, the interior mov instruction's destination operand is EAX, so that mov instruction "returns" the string "EAX" which HLA substitutes for the interior mov instruction, producing "mov( eax, ebx );" as the outside instruction. HLA always processes interior instructions from left-to-right interior-first. Therefore, the above instruction is really equivalent to the MASM sequence:

mov eax, 0

mov ebx, eax

Consider a second example:

add( mov( i, eax ), mov( j, ebx ));

This instruction is equivalent to:

mov eax, i

mov ebx, j

add ebx, eax

Although, used sparingly, instruction composition is useful and can help improve the readability of your HLA programs in certain contexts, you should be careful when using instruction composition because it can quickly produce unreadable code. Even this second example (add(mov,mov)) would probably prove difficult to read by most programmers.

If you need to modify the RETURNS value of an instruction (in a macro, for example), you may use the "returns" statement in HLA. This statement takes the following form:

RETURNS( { statements }, "string Constant" )

This statement emits the code for the statement(s) between the curly braces and then returns the specified string constant as the "returns" value for this statement.

The following paragraphs describe each of the HLA machine instructions. They also describe the string each instruction yields during compile time (this is called the "returns" string). Note that some instructions return the empty string as there is no return value one could reasonably associated with them. Such instructions cannot generally be used as operands within other instructions.

These descriptions do not describe the purpose for each instruction; see an assembly text like "The Art of Assembly Language Programming" for details on the operation of each instruction.

Zero Operand Instructions (Null Operand Instructions)

: Null Operand Instructions
Instruction	Description
aaa( )	ASCII adjust for addition. Returns "ax".
aad( )	ASCII adjust for division. Returns "ax".
aam( )	ASCII adjust for multiplication. Returns "ax".
aas( )	ASCII adjust for subtraction. Returns "ax".
cbw( )	Convert byte to word (sign extension). Returns "ax"
cdq( )	Convert double to quadword. Returns "eax". Note: in the future, this may return "edx:eax".
clc( )	Clear carry flag. Returns "".
cld( )	Clear direction flag. Returns "".
cli( )	Clear interrupt flag. Returns "".
clts()	Clear task switched flag in CR0 (OS use only).
cmc( )	Complement carry flag. Returns "".
cmpsb( )	Compares the byte at [esi] to the byte at [edi] and increments or decrements ESI & EDI by one. Returns "".
cmpsd( )	Compares the dword at [esi] to the byte at [edi] and increments or decrements ESI & EDI by four. Returns "".
cmpsw( )	Compares the word at [esi] to the byte at [edi] and increments or decrements ESI & EDI by two. Returns "".
cpuid()	On entry, EAX contains zero, one, or two to determine how this instruction behaves. If EAX contains zero then this instruction returns vendor information in EAX, EBX, ECX, and EDX. If EAX contains one upon entry, EAX returns with version information and EDX contains feature information. If EAX contains two upon entry, EAX..EDX return with cache information. See the Intel documentation for more details concerning this instruction.
cwd( )	Convert word to doubleword. Returns "ax". Note: in the future, this may return "dx:ax".
cwde( )	Convert word to dword, extended. Returns "eax".
daa( )	Decimal adjust for addition. Returns "al".
das( )	Decimal adjust for subtraction. Returns "al".
hlt()	Halt instruction (OS and embedded use only).
insb( )	Inputs a byte from the port specified by DX and stores the byte at [EDI], then increments or decrements EDI by one. Returns "".
insd( )	Inputs a dword from the port specified by DX and stores the dword at [EDI], then increments or decrements EDI by four. Returns "".
insw( )	Inputs a word from the port specified by DX and stores the word at [EDI], then increments or decrements EDI by two. Returns "".
into( )	Interrupt on overflow. Returns "". Raises the ex.IntoInstr exception if the overflow flag is set when you execute this instruction.
invd()	Invalidate internal caches (OS use only).
iret( )	Interrupt return. Returns "".
iretd( )	Interrupt return poping 32-bit flags. Returns "".
lahf( )	Load AH from flags. Returns "al".
leave( )	Remove activation record from stack. Returns "".
lodsb( )	Load al from [ESI] and increment ESI by one. Returns "al".
lodsd( )	Load eax from [ESI] and increment ESI by four. Returns "eax".
lodsw( )	Load ax from [ESI] and increment ESI by two. Returns "ax".
movsb( )	Moves a byte from the location specified by [ESI] to the location specified by [EDI], then increments or decrements ESI & EDI by one. Returns "".
movsd( )	Moves a dword from the location specified by [ESI] to the location specified by [EDI], then increments or decrements ESI & EDI by four. Returns "".
movsw( )	Moves a word from the location specified by [ESI] to the location specified by [EDI], then increments or decrements ESI & EDI by two. Returns "".
nop( )	No operation. Returns "".
outsb( )	Outputs the byte at address [ESI] to the port specified by DX, then increments or decrements ESI by one. Returns "".
outsd( )	Outputs the dword at address [ESI] to the port specified by DX, then increments or decrements ESI by four. Returns "".
outsw( )	Outputs the word at address [ESI] to the port specified by DX, then increments or decrements ESI by two. Returns "".
popad( )	Pop all general purpose 32-bit registers from stack. Returns "".
popa( )	Pop all general purpose 16-bit registers from stack. Returns "".
popf( )	Pop 16-bit flags register from stack. Returns "".
popfd( )	Pop 32-bit flags register from stack. Returns "".
pusha( )	Push all general purpose 16-bit registers onto the stack. Returns "".
pushad( )	Push all general purpose 32-bit registers onto the stack. Returns "".
pushf( )	Push 16-bit flags register onto the stack. Returns "".
pushfd( )	Push 32-bit flags register onto the stack. Returns "".
rdmsr()	Read from model specific register specified by ECX into EDX:EAX (OS use only).
rdpmc()	Read performance monitoring counter specified by ECX into EDX:EAX (OS use only).
rdtsc()	Reads the "time stamp" counter and returns the 64-bit result in edx:eax.
rep.insb( )	Transfers ECX bytes from the port specified by DX to the location specified by [EDI]. Increments or decrements EDI by one after each transfer. Returns "".
rep.insd( )	Transfers ECX dwords from the port specified by DX to the location specified by [EDI]. Increments or decrements EDI by four after each transfer. Returns "".
rep.insw( )	Transfers ECX words from the port specified by DX to the location specified by [EDI]. Increments or decrements EDI by two after each transfer. Returns "".
rep.movsb( )	Copies ECX bytes from the memory location specified by [ESI] to the location specified by [EDI]. Increments or decrements EDI & ESI by one after each transfer. Returns "".
rep.movsd( )	Copies ECX dwords from the memory location specified by [ESI] to the location specified by [EDI]. Increments or decrements EDI & ESI by four after each transfer. Returns "".
rep.movsw( )	Copies ECX words from the memory location specified by [ESI] to the location specified by [EDI]. Increments or decrements EDI & ESI by two after each transfer. Returns "".
rep.outsb( )	Transfers ECX bytes from the the location specified by [ESI] to the port specified by DX. Increments or decrements EDI by one after each transfer. Returns "".
rep.outsd( )	Transfers ECX dwords from the the location specified by [ESI] to the port specified by DX. Increments or decrements EDI by four after each transfer. Returns "".
rep.outsw( )	Transfers ECX words from the the location specified by [ESI] to the port specified by DX. Increments or decrements EDI by two after each transfer. Returns "".
rep.stosb( )	Copies CX bytes from AL to the location specified by [EDI]. Increments or decrements EDI by one after each transfer. Returns "".
rep.stosd( )	Copies ECX dwords from EAX to the location specified by [EDI]. Increments or decrements EDI by four after each transfer. Returns "".
rep.stosw( )	Copies ECX words from AX to the location specified by [EDI]. Increments or decrements EDI by two after each transfer. Returns "".
repe.cmpsb( )	Compares ECX bytes starting at location [ESI] to the set of bytes at location [EDI] as long as the bytes are equal. The comparison stops once two unequal bytes are found. After each successful compare, this instruction increments or decrements ESI and EDI by one (and decrements ECX). Returns "".
repe.cmpsd( )	Compares ECX dwords starting at location [ESI] to the set of dwords at location [EDI] as long as the dwords are equal. The comparison stops once two unequal dwords are found. After each successful compare, this instruction increments or decrements ESI and EDI by four (and decrements ECX). Returns "".
repe.cmpsw( )	Compares ECX words starting at location [ESI] to the set of words at location [EDI] as long as the words are equal. The comparison stops once two unequal words are found. After each successful compare, this instruction increments or decrements ESI and EDI by two (and decrements ECX). Returns "".
repe.scasb( )	Compares AL against ECX bytes starting at location [EDI] as long as the bytes are equal. The comparison stops once two unequal bytes are found. After each successful compare, this instruction increments or decrements EDI by one (and decrements ECX). Returns "".
repe.scasd( )	Compares EAX against ECX dwords starting at location [EDI] as long as the dwords are equal. The comparison stops once two unequal dwords are found. After each successful compare, this instruction increments or decrements EDI by four (and decrements ECX). Returns "".
repe.scasw( )	Compares AX against ECX words starting at location [EDI] as long as the words are equal. The comparison stops once two unequal words are found. After each successful compare, this instruction increments or decrements EDI by two (and decrements ECX). Returns "".
repne.cmpsb( )	Compares ECX bytes starting at location [ESI] to the set of bytes at location [EDI] as long as the bytes are not equal. The comparison stops once two equal bytes are found. After each successful compare, this instruction increments or decrements ESI and EDI by one (and decrements ECX). Returns "".
repne.cmpsd( )	Compares ECX dwords starting at location [ESI] to the set of dwords at location [EDI] as long as the dwords are not equal. The comparison stops once two equal dwords are found. After each successful compare, this instruction increments or decrements ESI and EDI by four (and decrements ECX). Returns "".
repne.cmpsw( )	Compares ECX words starting at location [ESI] to the set of words at location [EDI] as long as the words are not equal. The comparison stops once two equal words are found. After each successful compare, this instruction increments or decrements ESI and EDI by two (and decrements ECX). Returns "".
repne.scasb( )	Compares AL against ECX bytes starting at location [EDI] as long as the bytes are not equal. The comparison stops once two equal bytes are found. After each successful compare, this instruction increments or decrements EDI by one (and decrements ECX). Returns "".
repne.scasd( )	Compares EAX against ECX dwords starting at location [EDI] as long as the dwords are not equal. The comparison stops once two equal dwords are found. After each successful compare, this instruction increments or decrements EDI by four (and decrements ECX). Returns "".
repne.scasw( )	Compares AX against ECX words starting at location [EDI] as long as the words are not equal. The comparison stops once two equal words are found. After each successful compare, this instruction increments or decrements EDI by two (and decrements ECX). Returns "".
rsm()	Resume from system management mode (OS use only).
sahf( )	Store AH into the flags register. Returns "ah".
scasb( )	Compares the byte in al to the location specified by [EDI], then increments or decrements EDI by one. Returns "".
scasd( )	Compares the dword in eax to the location specified by [EDI], then increments or decrements EDI by four. Returns "".
scasw( )	Compares the word in ax to the location specified by [EDI], then increments or decrements EDI by two. Returns "".
stc( )	Set the carry flag. Returns "".
std( )	Set the direction flag. Returns "".
sti( )	Set the interrupt flag. Returns "".
stosb( )	Stores the byte in al to the location specified by [EDI], then increments or decrements EDI by one. Returns "".
stosd( )	Stores the dword in eax to the location specified by [EDI], then increments or decrements EDI by four. Returns "".
stosw( )	Stores the word in ax to the location specified by [EDI], then increments or decrements EDI by two. Returns "".
ud2()	Undefined opcode instruction. This instruction always raises an undefine opcode exception.
wbinvd()	Write back and invalidate cache (OS use only).
wait( )	Coprocessor wait instruction. Returns "".
xlat( )	Translate instruction. Returns "".

Note: if the NULL-Operand instructions appear as a stand-alone instruction (i.e., they are not part of an instruction composition and, thus, appear as the operand to another instruction), you can drop the "( )" after the instruction as long as you terminate the instruction with a semicolon.

General Arithmetic and Logical Instructions

These instructions include adc, add, and, mov, or, sbb, sub, test, and xor. They all take the same basic form (substitute the appropriate mnemonic for "adc" in the syntax examples below):

Generic Form:

adc( source, dest );
lock.adc( source, dest );

Specific forms allowed:

adc( Reg8, Reg8 )
adc( Reg16, Reg16 )
adc( Reg32, Reg32 )

adc( const, Reg8 )
adc( const, Reg16 )
adc( const, Reg32 )

adc( const, mem )

adc( Reg8, mem )
adc( Reg16, mem )
adc( Reg32, mem )

adc( mem, Reg8 )
adc( mem, Reg16 )
adc( mem, Reg32 )

adc( Reg8, AnonMem )
adc( Reg16, AnonMem )
adc( Reg32, AnonMem )

adc( AnonMem, Reg8 )
adc( AnonMem, Reg16 )
adc( AnonMem, Reg32 )

Note: for the form "adc( const, mem )", if the specified memory location does not have a size or type associated with it, you must explicitly specify the size of the memory operand, e.g., "adc(5,(type byte [eax]));"

These instructions all return their destination operand as the "returns" value.

See Chapter Six in "Art of Assembly" for a further discussion of these instructions.

If the "lock." prefix is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid only on instructions that reference memory.

The XCHG Instruction

The xchg instruction allows the following syntactical forms:

Generic Form:

xchg( source, dest );
lock.xchg( source, dest );

Specific Forms:

xchg( Reg8, Reg8 )
xchg( Reg8, mem )
xchg( Reg8, AnonMem)
xchg( mem, Reg8 )
xchg( AnonMem, Reg8 )

xchg( Reg16, Reg16 )
xchg( Reg16, mem )
xchg( Reg16, AnonMem)
xchg( mem, Reg16 )
xchg( AnonMem, Reg16 )

xchg( Reg32, Reg32 )
xchg( Reg32, mem )
xchg( Reg32, AnonMem)
xchg( mem, Reg32 )
xchg( AnonMem, Reg32 )

This instruction returns its destination operand as its "returns" value.

See Chapter Six in "Art of Assembly" for a further discussion of this instruction.

If the "lock." prefix is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid only on instructions that reference memory.

The CMP Instruction

The "cmp" instruction uses the following general forms:

Generic:

cmp( LeftOperand, RightOperand );

Specific Forms:

cmp( Reg8, Reg8 );
cmp( Reg8, mem );
cmp( Reg8, AnonMem );
cmp( mem, Reg8 );
cmp( AnonMem, Reg8 );
cmp( Reg8, const );

cmp( Reg16, Reg16 );
cmp( Reg16, mem );
cmp( Reg16, AnonMem );
cmp( mem, Reg16 );
cmp( AnonMem, Reg16 );
cmp( Reg16, const );

cmp( Reg32, Reg32 );
cmp( Reg32, mem );
cmp( Reg32, AnonMem );
cmp( mem, Reg32 );
cmp( AnonMem, Reg32 );
cmp( Reg32, const );

cmp( mem, const );

Note that the CMP instruction's operands are ordered "dest, source" rather than the usual "source,dest" format (that is, the operands are in the same order as MASM expects them). This is to allow an intuitive use of the instruction mnemonic (that is, CMP normally reads as "compare dest to source."). We will avoid this confusion by simply referring to the operands as the "left operand" and the "right operand". Left vs. right signifies the placement of the operands around a comparison operator like "<=" (e.g., "left <= right").

For the "cmp( mem, const )" form, the memory operand must have a type or size associated with it. When using anonymous memory locations you must always coerce the type of the memory location, e.g., "cmp( (type word [ebp-4]), 0 );".

These instructions return their dest (first) operand as their "returns" value.

The Multiply Instructions

HLA supports several variations on the 80x86 "MUL" and IMUL instructions. The supported forms are:

Standard Syntax:

mul( reg8 )
mul( reg16)
mul( reg32 )
mul( mem )

mul( reg8, al )
mul( reg16, ax )
mul( reg32, eax )

mul( mem, al )
mul( mem, ax )
mul( mem, eax )

mul( AnonMem, ax )
mul( AnonMem, dx:ax )
mul( AnonMem, edx:eax )

imul( reg8 )
imul( reg16)
imul( reg32 )
imul( mem )

imul( reg8, al )
imul( reg16, ax )
imul( reg32, eax )

imul( mem, al )
imul( mem, ax )
imul( mem, eax )

imul( AnonMem, ax )
imul( AnonMem, dx:ax )
imul( AnonMem, edx:eax )

intmul( const, Reg16 )
intmul( const, Reg16, Reg16 )
intmul( const, mem, Reg16 )
intmul( const, AnonMem, Reg16 )

intmul( const, Reg32 )
intmul( const, Reg32, Reg32 )
intmul( const, mem, Reg32 )
intmul( const, AnonMem, Reg32 )

intmul( Reg16, Reg16 )
intmul( mem, Reg16 )
intmul( AnonMem, Reg16 )

intmul( Reg32, Reg32 )
intmul( mem, Reg32 )
intmul( AnonMem, Reg32 )

Extended Syntax:

mul( const, al )
mul( const, ax )
mul( const, eax )

imul( const, al )
imul( const, ax )
imul( const, eax )

The first, and probably most important, thing to note about HLA's multiply instructions is that HLA uses a different mnemonic for the extended-precision integer multiply versus the single-precision integer multiply (i.e., IMUL vs. INTMUL). Standard MASM syntax uses the same mnemonic for both instructions. There are two reasons for this change of syntax in HLA. First, there needed to be some way to differentiate the "mul( const, al )" and the "intmul( const, al )" instructions (likewise for the instructions involving AX and EAX). Second, the behavior of the INTMUL instruction is substantially different from the IMUL instruction, so it makes sense to use different mnemonics for these instructions.

The extended syntax instructions create a static data variable, initialized with the specified constant, and then specify the address of this variable as the source operand of the MUL or IMUL instruction.

These instructions return their destination operand (AX, DX:AX, or EDX:EAX for the extended precision MUL and IMUL instructions) as their "returns" value.

See "The Art of Assembly Language Programming" for more details on these instructions.

The Divide Instructions

HLA support several variations on the 80x86 DIV and IDIV instructions. The supported forms are:

Generic Forms:

div( source );
div( source, dest );

mod( source );
mod( source, dest );

idiv( source );
idiv( source, dest );

imod( source );
imod( source, dest );

Specific Forms:

div( reg8 )
div( reg16)
div( reg32 )
div( mem )

div( reg8, ax )
div( reg16, dx:ax)
div( reg32, edx:eax )

div( mem, ax )
div( mem, dx:ax)
div( mem, edx:eax )

div( AnonMem, ax )
div( AnonMem, dx:ax )
div( AnonMem, edx:eax )

mod( reg8 )
mod( reg16)
mod( reg32 )
mod( mem )

mod( reg8, ax )
mod( reg16, dx:ax)
mod( reg32, edx:eax )

mod( mem, ax )
mod( mem, dx:ax)
mod( mem, edx:eax )

mod( AnonMem, ax )
mod( AnonMem, dx:ax )
mod( AnonMem, edx:eax )

idiv( reg8 )
idiv( reg16)
idiv( reg32 )
idiv( mem )

idiv( reg8, ax )
idiv( reg16, dx:ax)
idiv( reg32, edx:eax )

idiv( mem, ax )
idiv( mem, dx:ax)
idiv( mem, edx:eax )

idiv( AnonMem, ax )
idiv( AnonMem, dx:ax )
idiv( AnonMem, edx:eax )

imod( reg8 )
imod( reg16)
imod( reg32 )
imod( mem )

imod( reg8, ax )
imod( reg16, dx:ax)
imod( reg32, edx:eax )

imod( mem, ax )
imod( mem, dx:ax)
imod( mem, edx:eax )

imod( AnonMem, ax )
imod( AnonMem, dx:ax )
imod( AnonMem, edx:eax )

Extended Syntax:

div( const, ax )
div( const, dx:ax )
div( const, edx:eax )

mod( const, ax )
mod( const, dx:ax )
mod( const, edx:eax )

idiv( const, ax )
idiv( const, dx:ax )
idiv( const, edx:eax )

imod( const, ax )
imod( const, dx:ax )
imod( const, edx:eax )

The destination operand is always implied by the 80x86 "div" and "idiv" instructions (AX, DX:AX, or EDX:EAX ). HLA allows the specification of the destination operand in order to make your programs easier to read (although the use of the destination operand is optional).

The HLA divide instructions support an extended syntax that allows you to specify a constant as the divisor (source operand). HLA allocates storage in the static data segment and initializes the storage with the specified constant, and then divides the accumulator by this newly specified memory location.

The DIV and IDIV instructions return "AL", "AX", or "EAX" as their "returns" value (the quotient is left in the accumulator register). The MOD and IMOD instructions return "AH", "DX", or "EDX" as their "returns" value. Indeed, the "returns" value is the only difference between these instructions. The DIV and MOD instructions compile into the 80x86 DIV instruction; the IDIV and IMOD instructions compile into the 80x86 IDIV instruction.

See the "Art of Assembly" for a further discussion of these instructions.

Single Operand Arithmetic and Logical Instructions

These instructions include dec, inc, neg, and not. They take the following general forms (substituting the specific mnemonic as appropriate):

Generic Form:

dec( dest );;
lock.dec( dest );

Specific forms allowed:

dec( Reg8 );
dec( Reg16 );
dec( Reg32 );
dec( mem );

Note: if mem is an untyped or unsized memory location (i.e., an anonymous memory location), you must explicitly provide a size; e.g., "dec( (type word [edi]));"

These instructions all return their destination operand as the "returns" value.

See the "Art of Assembly" for a further discussion of these instructions.

If the "lock." prefix is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid only on instructions that reference memory.

Shift and Rotate Instructions

These instructions include RCL, RCR, ROL, ROR, SAL, SAR, SHL, and SHR. These instructions support the following generic syntax, making the appropriate mnemonic substitution.

Generic Form:

shl( count, dest );

Specific Forms:

shl( const, Reg8 );
shl( const, Reg16 );
shl( const, Reg32 );

shl( const, mem );

shl( cl, Reg8 );
shl( cl, Reg16 );
shl( cl, Reg32 );

shl( cl, mem );

The " const " operand is an unsigned integer constant between zero and the maximum number of bits in the destination operand. The forms with a memory operand must have a type or size associated with the operand; e.g., when using anonymous memory locations, you must coerce the type,

"shl( 2, (type dword [esi]));"

These instructions return their destination operand as their "returns" value.

See the "Art of Assembly" for a further discussion of these instructions.

The Double Precision Shift Instructions

These instruction use the following general form (you can substitute SHRD for SHLD below):

Generic Form:

shld( count, source, dest )

Specific Forms:

shld( const, Reg16, Reg16 )
shld( const, Reg16, mem )
shld( const, Reg16, AnonMem )

shld( cl, Reg16, Reg16 )
shld( cl, Reg16, mem )
shld( cl, Reg16, AnonMem )

shld( const, Reg32, Reg32 )
shld( const, Reg32, mem )
shld( const, Reg32, AnonMem )

shld( cl, Reg32, Reg32 )
shld( cl, Reg32, mem )
shld( cl, Reg32, AnonMem )

These instructions return their destination operand as the "returns" value.

See the "Art of Assembly" for a further discussion of these instructions.

The Lea Instruction

These instructions use the following syntax:

lea( Reg32, memory )
lea( Reg32, AnonMem )
lea( Reg32, ProcID )
lea( Reg32, LabelID )

Extended Syntax:

lea( Reg32, StringConstant )
lea( Reg32, const ConstExpr )

lea( memory, Reg32 )
lea( AnonMem, Reg32 )
lea( ProcID, Reg32 )
lea( LabelID, Reg32 )
lea( StringConstant, Reg32 )
lea( const ConstExpr, Reg32 )

The "lea" instruction loads the specified 32-bit register with the address of the specified memory operand, procedure, or statement label. Note in the extended syntax you can reverse the order of the operands. Since exactly one operand must be a register, there is no ambiguity between the two forms (this syntax was added to satisfy those who complained about the (reg,memory) syntax). Of course, good programming style suggests that you use only one form (either reg,memory or memory, reg) within your programs.

The extended syntax form lets you specify a constant rather than a memory address. There is no such thing as the address of a constant, but HLA will create a memory variable in the constants data segment and initialize that variable with the value of the specified memory constant and then load the address of this variable into the specified register (or push it onto the stack).

There is a subtle difference between the following two instructions:

lea( eax, "String" );
lea( eax, const "String" );

The first instruction loads EAX with the address of the first character of the literal string constant. The second form loads the EAX register with the address of a string variable (which is a pointer containing the address of the first character of the string literal).

The LEA instructions return the 32-bit register as their "returns" value.

See Chapter Six in "Art of Assembly" for a further discussion of the LEA instruction.

Note: HLA does not support an LEA instruction that loads a 16-bit address into a 16-bit register. That form of the LEA instruction is not very useful in 32-bit programs running on 32-bit operating systems.

The Sign and Zero Extension Instructions

The HLA MOVSX and MOVZX instructions use the following syntax:

Generic Forms:

movsx( source, dest );
movzx( source, dest );

Specific Forms:

movsx( Reg8, Reg16 )
movsx( Reg8, Reg32 )
movsx( Reg16, Reg32 )
movsx( mem8, Reg16 )
movsx( mem8, Reg32 )
movsx( mem16, Reg32 )

movzx( Reg8, Reg16 )
movzx( Reg8, Reg32 )
movzx( Reg16, Reg32 )
movzx( mem8, Reg16 )
movzx( mem8, Reg32 )
movzx( mem16, Reg32 )

These instructions sign (MOVSX) or zero (MOVZX) extend their source operand into the destination operand. They return their destination operand as their "returns" value.

See the "Art of Assembly" for a further discussion of these instructions.

The Push and Pop Instructions

These instructions take the following general forms:

pop( reg16 );
pop( reg32 );
pop( mem );

push( Reg16 )
push( Reg32 )
push( memory )

pushw( Reg16 )
pushw( memory )
pushw( AnonMem )
pushw( Const )

pushd( Reg32 )
pushd( memory )
pushd( AnonMem )
pushd( Const )

These instructions push or pop their specified operand. They all return their operand as their "returns" value.

Procedure Calls

HLA provides several different ways to call a procedure. Given a procedure named "MyProc", any of the following syntaxes are legal:

MyProc( parameter_list );
call( MyProc );
call MyProc;

If MyProc has a set of declared parameters, the number and types of actual parameters must match the number and types of the formal parameters. HLA will emit the code needed to push the parameter list on the stack. In the two call statements above, it is the programmer's responsibility to pass any needed parameters. For more details, see the section on procedure declarations.

In the examples above, MyProc can either be the name of an actual procedure or a procedure variable (that is a pointer to a procedure declared as "myproc:procedure( parameters );" in the VAR or a static section). If you need to call a procedure using an anonymous memory variable (i.e., an addressing mode like [ebx]), an untyped dword value, or via a register, you must use the syntax of the second call above, e.g., "call( ebx );". Of course, any legal HLA/80x86 address mode would be legal here.

When declaring a standard procedure, the procedure declaration syntax allows you to specify a "returns" value for that procedure, e.g.,

procedure MyProc; returns( "eax" );

HLA substitutes the string that appears as the "returns" argument for the call when using the first syntax above. For example, supposing that MyProc is a function returning its result in EAX, you could use the following to call MyProc and save the return value in the "Result" variable:

mov( MyProc(), Result );

For more details, see the section on procedure declarations.

To call a class procedure, one would use one of the following syntaxes:

className.ProcName( parameters );
call( className.ProcName );
call ClassName.ProcName;

objectName.ProcName( parameters );
call( objectName.ProcName );
call objectName.ProcName;

The difference between " className " and " objectName " is that " className " represents the actual name of the class data type whereas " objectName " represents the name of an instance of this class (i.e., a variable of type " className " declared in the VAR or a static section).

When calling a class procedure, HLA loads the ESI register with the address of the object before calling the specified procedure. Since there is no instance variable (object) associated with the className form, HLA loads ESI with zero (NULL). Inside the class procedure you can test the value of ESI to determine if the procedure was called via the class name or an object name. This is quite useful, for example when writing constructors, to determine whether the procedure needs to allocate storage for an object. Consider the following program that demonstrates the use of an object constructor (create):

program demo;

#include( "memory.hhf" );
#include( "stdio.hhf" );

type
cc: class

var
i:int32;

procedure create; returns( "esi" );

endclass;

var
ccVar: cc;
ccPtr: pointer to cc;

static
ccStat:cc;

procedure cc.create; @nodisplay;
begin create;

push( eax );
if( esi = 0 ) then

stdout.put( "Allocating" nl );
malloc( @size( cc ));
mov( eax, esi );

else

stdout.put( "Already allocated" nl );

endif;
mov( &cc._VMT_, this._pVMT_ );
mov( 0, this.i );
pop( eax );

end create;

begin demo;

// This first call to create allocates storage.

mov( cc.create(), ccPtr );

// In all the remaining calls, ESI is loaded with
// the address of the object and no storage is
// created.

ccPtr.create();
ccVar.create();
ccStat.create();

end demo;

The call( ) statement allows any one of the following syntaxes:

call ProcID;
call( ProcID );
call( dwordvar );
call( anonmem ); // Addressing mode like [ebx].
call( Reg32 );

The second form above returns the string (if any) specified by ProcID's "returns" option. The remaining call instructions return the empty string as their "returns" value.

You may also call an iterator procedure via the CALL instruction. However, it is your responsibility to set up the parameters and other state information prior to the call (see the section on iterators for more details).

The Ret Instruction

The RET( ) statement allows two syntactical forms:

ret( );
ret( integer_constant_expression );

The first form emits a simple 80x86 RET instruction, the second form emits the 80x86 RET instruction with the specified numeric constant expression value (used to remove parameters from the stack).

Normally, you would use these instructions in a procedure that has the "@ noframe " option. Unless you know exactly what you are doing, you should never use the "RET" instruction inside a standard HLA procedure without this option since doing so almost always produces disasterous results. If you do use this instruction within such a procedure, it is your responsibility to deallocate local variables and the display (if any), restore EBP, and remove any parameters from the stack.

The Jmp Instructions

The HLA "jmp" instruction supports the following syntax:

jmp Label;
jmp ProcedureName;
jmp( dwordMemPtr );
jmp( anonMemPtr );
jmp( reg32 );

" Label " represents a statement label in the current procedure. (You are not allowed to jump to labels in other procedures in the current version of HLA. This restriction may be relaxed somewhat in future versions.) A statement label is a unique (within the current procedure) identifier with a colon after the identifier, e.g.,

InfiniteLoop:
<< Code inside the infinite loop>>
jmp InfiniteLoop;

Jumping to a procedure transfers control to the first instruction in the specified procedure. You are responsible for explicitly pushing any parameters and the return address for that procedure.

These instructions all return the empty string as their "returns" value.

The Conditional Jump Instructions

These instructions include JA, JAE, JB, JBE, JC, JE, JG, JGE, JL, JLE, JO, JP, JPE, JPO, JS, JZ, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JCXZ, JECXZ, LOOP, LOOPE, LOOPZ, LOOPNE, and LOOPNZ. They all take the following generic form (substituting the appropriate instruction for "JA").

ja LocalLabel;

" LocalLabel " must be a statement label defined in the current procedure.

These instructions all return the empty string as their "returns" value.

Note: due to the nature of the HLA compilation process, you should avoid the use of the JCXZ, JECXZ, LOOP, LOOPE, LOOPZ, LOOPNE, and LOOPNZ instructions. Unlike the other conditional jump instructions, these instructions have a very limited +/- 128 range. Unfortunately, HLA cannot detect if the branch is out of range (this task is handled by MASM), so if a range error occurs, HLA cannot warn you about this. The MASM assembly will fail, but the result will be hard to decipher. Fortunately, these instructions are easily, and usually more efficiently, implemented using other 80x86 instructions so this should not prove to be a problem.

In a few special cases, the boolean constants "true" and "false" are legal labels. See the discussion of HLA's high level language features for more details.

The Conditional Set Instructions

These instructions include: SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETO, SETP, SETPE, SETPO, SETS, SETZ, SETNA, SETNAE, SETNB, SETNBE, SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, and SETNZ. They take the following generic forms (substituting the appropriate mnemonic for seta):

seta( Reg8 )
seta( mem )
seta( AnonMem )

See the "Art of Assembly" for a further discussion of these instructions.

The Conditional Move Instructions

These instructions include CMOVA, CMOVAE, CMOVB, CMOVBE, CMOVC, CMOVE, CMOVG, CMOVGE, CMOVL, CMOVLE, CMOVO, CMOVP, CMOVPE, CMOVPO, CMOVS, CMOVZ, CMOVNA, CMOVNAE, CMOVNB, CMOVNBE, CMOVNC, CMOVNE, CMOVNG, CMOVNGE, CMOVNL, CMOVNLE, CMOVNO, CMOVNP, CMOVNS, and CMOVNZ. They use the following general syntax:

CMOVcc( src, dest );

Allowable operands:

CMOVcc( reg16, reg16 );
CMOVcc( reg32, reg32 );
CMOVcc( mem16, reg16 );
CMOVcc( mem32, reg32 );

These instructions move the data if the specified condition is true (specified by the cc condition). If the condition is false, these instructions behave like a no-operation.

The Input and Output Instructions

The "in" and "out" instructions use the following syntax:

in( port, al )
in( port, ax )
in( port, eax )

in( dx, al )
in( dx, ax )
in( dx, eax )

out( al, port )
out( ax, port )
out( eax, port )

out( al, dx )
out( ax, dx )
out( eax, dx )

The "port" parameter must be an unsigned integer constant in the range 0..255. The IN instructions return the accumulator register (AL, AX, or EAX) as their "returns" value. The OUT instructions return the port number (or DX) as their "returns" value.

Note that these instructions may be priviledged instructions when running under Win32 or Linux. Their use may generate a fault in certain instances or when accessing certain ports.

See the "Art of Assembly" for a further discussion of these instructions.

The Interrupt Instruction

This instruction uses the syntax "int( constant)" where the constant operand is an unsigned integer value in the range 0..255.

This instruction returns the empty string as its "returns" value.

See Chapter Six in "Art of Assembly" (DOS version) for a further discussion of this instruction. Note, however, that one generally does not use "int" under Win32 to make OS or BIOS calls. The "int $80" instruction is what you'd normally use to make very low-level Linux calls.

Bound Instruction

This instruction takes the following forms:

bound( Reg16, mem )
bound( Reg16, AnonMem )

bound( Reg32, mem )
bound( Reg32, AnonMem )

Extended Syntax Form:

bound( Reg16, constL, constH )
bound( Reg32, ConstL, ConstH )

These instructions return the register as their "returns" value.

The extended syntax forms emit the two constants to the static data segment and substitute the address of the first constant ( ConstL ) as their memory operand.

The BOUND instruction compares the register operand against the two constants (or the two consecutive memory locations at the specified address). If the register value is outside the range specified by the operand(s), then the 80x86 CPU raises an ex.BoundInstr exception. You can handle this exception using the TRY..ENDTRY HLL statement in HLA.

Because the BOUND instruction tends to be slow, and of course it consumes memory, many programmers don't use it as often as they should for fear it will make their programs less efficient. HLA solves this problem through the use of the "@bound" compile-time pseudo-variable. If @bound contains true (the default value) then HLA will compile the BOUND instruction and it will behave normally. If @bound contains false, then HLA will not emit any code for the bound instruction (this is similar to "asserts" in C/C++). You can set the value of @bound in the VAL section or with the "?" operator, e.g.,

?@bound := false;

// Code that ignores BOUND instructions
.
.
.
?@bound := true;

// BOUND instructions are active again.

The Enter Instruction

The ENTER instruction uses the syntax: "enter( const, const );". The first constant operand is the number of bytes of local variables in a procedure, the second constant operand is the lex level of the procedure. As a general rule, you should not use this instruction (and the corresponding LEAVE) instructions. HLA procedures automatically construct the display and activation record for you (more efficiently than when using ENTER).

See the "Art of Assembly" for a further discussion of this instruction and the LEAVE instruction.

CMPXCHG Instruction

This instruction uses the following syntax:

Generic Form:

cmpxchg( reg/mem, reg );

lock.cmpxchg( reg/mem, reg);

Specific Forms:

cmpxchg( Reg8, Reg8 )
cmpxchg( Reg8, Memory )
cmpxchg( Reg8, AnonMem )

cmpxchg( Reg16, Reg16 )
cmpxchg( Reg16, Memory )
cmpxchg( Reg16, AnonMem )

cmpxchg( Reg32, Reg32 )
cmpxchg( Reg32, Memory )
cmpxchg( Reg32, AnonMem )

This instruction returns the empty string as its "returns" value.

See the "Art of Assembly" for a further discussion of this instruction.

If the "lock." prefix is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid only on instructions that reference memory.

CMPXCHG8B Instruction

This instruction uses the following syntax:

Generic Form:

cmpxchg( mem64 );

lock.cmpxchg8b( mem64);

This instruction compares edx:eax with the specified qword operand. If the values are equal, this instruction stores the value in ECX:EBX into the destination operand; otherwise it loads the memory operand into EDX:EAX.

This instruction returns the empty string as its "returns" value.

See the "Art of Assembly" for a further discussion of this instruction.

If the "lock." prefix is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid only on instructions that reference memory.

The XADD Instruction

The XADD instruction uses the following syntax:

Generic Form:

xadd( source, dest );

lock.xadd( source, dest );

Specific Forms:

xadd( Reg8, Reg8 )
xadd( mem, Reg8 )
xadd( AnonMem, Reg8 )

xadd( Reg16, Reg16 )
xadd( mem, Reg16 )
xadd( AnonMem, Reg16 )

xadd( Reg32, Reg32 )
xadd( mem, Reg32 )
xadd( AnonMem, Reg32 )

This instruction returns its destination operand as its "returns" value.

See the "Art of Assembly" for a further discussion of this instruction.

If the "lock." prefix is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid only on instructions that reference memory.

BSF and BSR Instructions

The bit scan instructions use the following syntax (substitute BSR for BSF as appropriate):

Generic Form:

bsr( source, dest );

Specific Forms Allowed:

bsf( Reg16, Reg16 );
bsf( mem, Reg16 );
bsf( AnonMem, Reg16 );

bsf( Reg32, Reg32 );
bsf( mem, Reg32 );
bsf( AnonMem, Reg32 );

These instructions return the destination register as their "returns" value.

See the "Art of Assembly" for a further discussion of these instructions.

The BSWAP Instruction

This instruction takes the form "bswap( reg32 )". It converts between little endian and big endian data formats in the specified 32-bit register.

It returns the 32-bit register as its "returns" value.

See the "Art of Assembly" for a further discussion of this instruction.

Bit Test Instructions

This group of instructions includes BT, BTC, BTR, and BTS. They allow the following generic forms:

Generic Form:

bt( BitNumber, Dest );

Specific Forms:

bt( const, Reg16 );

bt( const, Reg32 );

bt( const, mem );

bt( Reg16, Reg16 );

bt( Reg16, mem );

bt( Reg16, AnonMem );

bt( Reg32, Reg32 );

bt( Reg32, mem );

bt( Reg32, AnonMem );

bt( Reg16, CharacterSetVariable );

bt( Reg32, CharacterSetVariable );

Substitute the BTC, BTR, or BTS mnemonic for BT in the examples above for these other instructions. The BTC, BTR, and BTS instructions also allow a "lock." prefix, e.g., "lock.btc( reg32, mem );" If the "lock." prefix is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid only on instructions that reference memory.

These instruction return the destination operand as their "returns" value.

Notice the two special forms that allow character set variables. HLA actually casts these 16-byte objects as word or dword memory variables, but they otherwise work just fine with cset objects.

Special forms available only with the BT instruction:

bt( reg16, CharacterSetConstant );

bt( reg32, CharacterSetConstant );

These two forms return the source register (BitNumber) as their "returns" value. Note that HLA will create a phantom variable that contains the character set constant and then supplies the name of this constant, effectively making these two instruction equivalent to "bt( reg, CharacterSetVariable);".

See the "Art of Assembly" for a further discussion of these instructions.

Floating Point Instructions

HLA supports the following FPU instructions. Note: all FPU instructions have a "returns" value of "st0" unless otherwise noted.

fld( FPreg );
fst( FPreg );

fld( FPmem ); // Returns operand.
fst( FPmem ); // 32 and 64-bits only! Returns operand.
fstp( FPmem ); // Returns operand.

fxch( FPreg );

fild( FPmem ); // Returns operand.
fist( FPmem ); // 32 and 64-bits only! Returns operand.
fistp( FPmem ); // Returns operand.

fbld( FPmem ); // Returns operand.
fbstp( FPmem ); // Returns operand.

fadd( );
fadd( FPreg, st0 );
fadd( st0, FPreg );
fadd( FPmem ); // Returns operand.
fadd( FPconst ); // Returns operand.

faddp( );
faddp( st0, FPreg );

fmul( );
fmul( FPreg, st0 );
fmul( st0, FPreg );
fmul( FPmem ); // Returns operand.
fmul( FPconst ); // Returns operand.

fmulp( );
fmulp( st0, FPreg );

fsub( );
fsub( FPreg, st0 );
fsub( st0, FPreg );
fsub( FPmem ); // Returns operand.
fsub( FPconst ); // Returns operand.

fsubp( );
fsubp( st0, FPreg );

fsubr( );
fsubr( FPreg, st0 );
fsubr( st0, FPreg );
fsubr( FPmem ); // Returns operand.
fsubr( FPconst ); // Returns operand.

fsubrp( );
fsubrp( st0, FPreg );

fdiv( );
fdiv( FPreg, st0 );
fdiv( st0, FPreg );
fdiv( FPmem ); // Returns operand.
fdiv( FPconst ); // Returns operand.

fdivp( );
fdivp( st0, FPreg );

fdivr( );
fdivr( FPreg, st0 );
fdivr( st0, FPreg );
fdivr( FPmem ); // Returns operand.
fdivr( FPconst ); // Returns operand.

fdivrp( );
fdivrp( st0, FPreg );

fiadd( mem16 ); // Returns operand.
fiadd( mem32 ); // Returns operand.
fiadd( const ); // Returns operand.

fimul( mem16 ); // Returns operand.
fimul( mem32 ); // Returns operand.
fimul( const ); // Returns operand.

fidiv( mem16 ); // Returns operand.
fidiv( mem32 ); // Returns operand.
fidiv( mem32 ); // Returns operand.
fidiv( const ); // Returns operand.

fidivr( mem16 ); // Returns operand.
fidivr( mem32 ); // Returns operand.
fidivr( const ); // Returns operand.

fcom( );
fcom( FPreg );
fccom( FPmem ); // Returns operand.

fcomp( );
fcomp( FPreg );
fcomp( FPmem ); // Returns operand.

fucom( );
fucom( FPreg );

fucomp( );
fucomp( FPreg );

fcompp();
fucompp();

ficom( mem16 ); // Returns operand.
ficom( mem32 ); // Returns operand.
ficom( const ); // Returns operand.

ficomp( mem16 ); // Returns operand.
ficomp( mem32 ); // Returns operand.
ficomp( const ); // Returns operand.

fsqrt(); // The following all return "st0"
fscale();
fprem();
fprem1();
frndint();
fxtract();
fabs();
fchs();
ftst();
fxam();
fldz();
fld1();
fldpi();
fldl2t();
fldl2e();
fldlg2();
fldln2();
f2xm1();
fsin();
fcos();
fsincos();
fptan();
fpatan();
fyl2x();
fyl2xp1();

finit(); // Returns ""
fwait();
fclex();
fincstp();
fdecstp();
fnop();
ffree( FPreg );
fldcw( mem );
fstcw( mem );
fstsw( mem );

See the chapter on real arithmetic in "The Art of Assembly Language Programming" for details on these instructions. Note that HLA does not support the entire FPU instruction set. If you absolutely need the few remaining instructions, use the #ASM..#ENDASM or #EMIT directives to generate them.

Additional Floating Point Instructions for Pentium Pro and Later Processors

The FCMOVcc instructions (cc= a, ae, b, be, na, nae, nb, nbe, e, ne, u, nu) use the following basic syntax:

FCMOVcc( stn, st0); // n=0..7

They move the specified floating point register to ST0 if the specified condition is true.

The FCOMI and FCOMIP instructions use the following syntax:

fcomi( st0, stn );
fcomip( st0, stn );

These instructions behave like their (syntactical equivalent) FCOM and FCOMP brethren except they store the status in the EFLAGs register directly rather than in the floating point status register.

MMX Instructions

HLA supports the following MMX instructions found on the Pentium and later processors (note that some instructions are only available on Pentium III and later processors; see the Intel reference manuals for details):

HLA uses the symbols mm0, mm1, ..., mm7 for the MMX register set.

The following MMX instructions all use the same syntax. The syntax is

mmxInstr( mmxReg, mmxReg );

mmxInstr( mem64, mmxReg );

mmxInstrs:

paddb

paddw

paddd

paddsb

paddsw

paddusb

paddusw

psubb

psubw

psubd

psubsb

psubsw

psubusb

psubusw

pmulhuw

pmulhw

pmullw

pmaddwd

pavgb

pavgw

pcmpeqb

pcmpeqw

pcmpeqd

pcmpgtb

pcmpgtw

pcmpgtd

packsswb

packuswb

packssdw

punpcklbw

punpcklwd

punpckldq

punpckhbw

punpckhwd

punpckhdq

pand

pandn

por

pxor

pmaxsw

pmaxub

pminsw

pminub

psadbw

The following MMX instructions require a special syntax. The syntax is listed for each instruction.

pextrw( constant, mmxReg, Reg32 );

pinsrw( constant, Reg32, mmxReg );

pmovmskb( mmxReg, Reg32 );

pshufw( constant, mmxReg, mmxReg );

pshufw( constant, mem64, mmxReg );

movd( mem32, mmxReg );

movd( mmxReg, mem32 );

movq( mem64, mmxReg );

movq( mmxReg, mem64 );

emms();

The following MMX shift instructions also require a special syntax. They allow the following two forms:

mmxshift( immConst, mmxReg );

mmxshift( mmxReg, mmxReg );

psllw

pslld

psllq

psrlw

psrld

psrlq

psraw

psrad

Note that the psllw, psrlw, and psraw instructions only allow an immediate constant in the range 0..15, the pslld, psrld, and psrad instructions only allow constants in the range 0..31, the psllq and psrlq instructions only allow immediate constants in the range 0..63.

Please see the appropriate Intel documentation or "The Art of Assembly Language" for a discussion of the behavior of these instructions.

OS/Priviledged Mode Instructions

Although HLA was originally intended for writing 32-bit flat model user mode applications, some HLA users may wish to write an operaing system kernel or device drivers within HLA. Therefore, HLA provides support for various priviledged instructions and instructions that manipulate segment registers on the 80x86 processor. This section describes those instructions. Normal application programs should not use these instructions (most will cause a "General Protection Fault" if you attempt to execute them).

For additional information on these instructions, please see the Intel documentation for the Pentia processors.

arpl( r16, r/m16 );

Adjusts the RPL field of a segment descriptor.

clts();

Clears the task switched flag in CR0.

hlt();

Halts the processor until an interrupt or reset comes along.

invd();

Invalidates the internal cache.

invlpg( mem );

Invalidates the TLB entry associated with the memory address specified as the source operand.

lar( r/m16, r16 );

lar( r/m32, r32 );

Load access rights from the segment descriptor specified by the first operand into the second operand.

lds( r32, m48 );

les( r32, m48 );

lfs( r32, m48 );

lgs( r32, m48 );

lss( r32, m48 );

Load a far (48-bit) segmented pointer into ds, es, fs, gs, or ss, and some other 32-bit register. Note that HLA does not support an fword data type. These instructions require a 48-bit memory operand, nonetheless. You may create your own 48-bit fword data type using a record declaration like the following:

type

fword: record

offset: dword;

selector: word;

endrecord;

lgdt( mem48 );

lidt( mem48 );

sgdt( mem48 );

sidt( mem48 );

Loads or stores the global descriptor table pointer (lgdt/sgdt) or interrupt descriptor table pointer (lidt/sidt) via the specified 48-bit memory operand. HLA does not support a 48-bit data type specifically for these instructions, but you can easily create one as follows:

type

descPtr: record

lowerLimit: word;

baseAdrs: dword;

endrecord

lldt( r/m16 );

sldt( r/m16 )

These instructions copy the specified source operand to/from the local descriptor table.

lsl( r/m16, r16 );

lsl( r/m32, r32 );

Load segment limit instruction;

ltreg( r/m16 );

streg( r/m16 );

Load and store the task register. Note that Intel uses the mnemonics "ltr" and "str" for these instructions. HLA changes these mnemonics to avoid conflicts with the commonly-used "str" namespace (the HLA strings module).

mov( r/m16, segreg );

mov( segreg, r/m16 );

Copies data between an 80x86 segment register and a 16-bit register or memory location. Note that HLA uses the following register names for the segment registers:

cseg The 80x86 CS register.

dseg The 80x86 DS register

eseg The 80x86 ES register

fseg The 80x86 FS register

gseg The 80x86 GS register

sseg The 80x86 SS register

HLA uses these names rather than the Intel standard register names to avoid conflicts with the "cs" (cset) namespace identifier and other commonly used application identifiers. Note that CSEG may not be a destination register for the MOV instruction.

mov( r32, crx ); // note: x= 0, 2, 3, or 4.

mov( crx, r32 );

These instructions move data between one of the 32-bit registers and one of the x86's control registers. Note that HLA reserves names cr0..cr7 even though Intel doesn't currently define all eight control registers.

mov( r32, drx ); // note: x=0, 1, 2, 3, 6, 7

mov( drx, r32 );

These instructions move data between the general purpose 32-bit registers the the x86 debug registers. Note that HLA reserves names dr0..dr7 even though the assembler doesn't currently support the user of the dr4 and dr5 registers.

push( segreg );

pop( segreg );

These instructions push and pop the x86 segment registers (cseg, dseg, eseg, fseg, gseg, and sseg). Note, however, that you cannot pop the cseg register. (see the comment earlier about HLA segment register names).

rdmsr();

rdpmc();

These instructions read model-specific registers or performance-monitoring registers on the x86. The ECX register specifies the register to read, these instructions copy the data to EDX:EAX.

rsm();

Resumes from system management mode.

verr( r/m16 );

verw( r/m16 );

Verifies whether the specified code segment is readable (verr) or writable (verw) from the current priviledge level.

wbinvd();

Write-back and invalidate cache.

Other Instructions and features

Currently, HLA does not support 3DNow, or another other SIMD instructions found on later x86 processors. The intent is to add support in the near future.

Note that HLA does not support the LMSW and SMSW instructions (old, obsolete 286 instructions). Use MOV with CR0 instead.

In the meantime, if you need to use any of these instructions you can use the #ASM..#ENDASM and #EMIT directives to insert them into your programs. You can also use macros to implement any desired instructions or syntaxes you desire.

HLA does not currently support segment prefixes on addresses. However, it is a trivial matter to create macros with names like "csprefix" and "dsprefix" that emit the opcode prefix bytes for these segment overrides. By invoking such a macro prior to the instruction with the segment reference, you can access the data in the specified segment, e.g.,

fsprefix;

mov( [eax], eax ); // Fetches from fs:[eax].

HLA does not provide for segment overrides because HLA was intended for use in flat-model 32-bit OS environments. However, the operating system kernel (even flat-model OSes) sometimes need to apply a segment override, hence this discussion.

Memory Addressing Modes in HLA

HLA supports all the 32-bit addressing modes of the Intel 80x86 instruction set34. A memory address on the 80x86 may consist of one to three different components: a displacement (also called an offset), a base pointer, and a scaled index value. The following are the legal combinations of these components:

displacement
basePointer
displacement + basePointer
displacement + scaledIndex
basePointer + scaledIndex
displacement + basePointer + scaledIndex

The following addressing modes are legal, but are mainly useful only within an LEA instruction:

scaledIndex

scaledIndex + displacement

HLA's syntax for memory addressing modes takes the following forms:

staticVarName

staticVarName [ constant ]

staticVarName[ breg32 ]

staticVarName[ ireg32 ]

staticVarName[ ireg32*index ]

staticVarName[ breg32 + ireg32 ]

staticVarName[ breg32 + ireg32*index ]

staticVarName[ breg32 + constant ]

staticVarName[ ireg32 + constant ]

staticVarName[ ireg32*index + constant ]

staticVarName[ breg32 + ireg32 + constant ]

staticVarName[ breg32 + ireg32*index + constant ]

staticVarName[ breg32 - constant ]

staticVarName[ ireg32 - constant ]

staticVarName[ ireg32*index - constant ]

staticVarName[ breg32 + ireg32 - constant ]

staticVarName[ breg32 + ireg32*index - constant ]

localVarName

localVarName [ constant ]

localVarName[ ireg32 ]

localVarName[ ireg32*index ]

localVarName[ ireg32 + constant ]

localVarName[ ireg32*index + constant ]

localVarName[ ireg32 - constant ]

localVarName[ ireg32*index - constant ]

basereg:globalVarName

basereg:globalVarName [ constant ]

basereg:globalVarName[ ireg32 ]

basereg:globalVarName[ ireg32*index ]

basereg:globalVarName[ ireg32 + constant ]

basereg:globalVarName[ ireg32*index + constant ]

basereg:globalVarName[ ireg32 - constant ]

basereg:globalVarName[ ireg32*index - constant ]

[ breg32 ]

[ breg32 + ireg32 ]

[ breg32 + ireg32*index ]

[ breg32 + constant ]

[ breg32 + ireg32 + constant ]

[ breg32 + ireg32*index + constant ]

[ breg32 - constant ]

[ breg32 + ireg32 - constant ]

[ breg32 + ireg32*index - constant ]

The following are legal, but are only useful within the LEA instruction:

[ ireg32*index ]

[ ireg32*index + constant ]

" staticVarName " denotes any static variable currently in scope (local or global).

" localVarName " denotes a local, automatic, variable declared in the var section of the current procedure.

" basereg " denotes any general purpose 32-bit register.

" globalVarname " denotes a non-local variable declared in the VAR section of some procedure other than the current procedure.

" breg32 " denotes a base register and can be any general purpose 32-bit register.

" ireg32 " denotes an index register and may also be any general purpose register, even the same register as the base register in the address expression.

" index " denotes one of the four constants "1", "2", "4", or "8". In those address expression that have an index register without an index constant, "*1" is the default index.

Those memory addressing modes that do not have a variable name preceding them are known as "anonymous memory locations." Anonymous memory locations do not have a data type associated with them and in many instances you must use the type coercion operator in order to keep HLA happy.

Those memory addressing modes that do have a variable name attached to them inherit the base type of the variable. Read the next section for more details on data typing in HLA.

HLA allows another way to specify addition of the various addressing mode components in an address expression - by putting the components in separate brackets and concatenating them together. The following examples demonstrate the standard syntax and the alternate syntax:

[ebx+2] [ebx][2]

[ebx+ecx*4+8] [ebx][ecx][8]

lbl[ebp-2] lbl[ebp][-2]

[ ebx*8 + 5 ] [ebx*8][5]

The reason for allowing the extended syntax is because you might want to construct these addressing modes inside a macro from the individual pieces and it's much easier to concatenate two operands already surrounded by brackets than it is to pick the expressions apart and construct the standard addressing mode.

Type Coercion in HLA

While an assembly language can never really be a strongly typed language, HLA is much more strongly typed than most other assembly languages.

Strong typing in an assembly language can be very frustrating. Therefore, HLA makes certain concessions to prevent the type system from interfering with the typical assembly language programmer. Within an 80x86 machine instruction, the only checking that takes place is a verification that the sizes of the operands are compatible.

Despite HLA playing fast and loose with machine instructions, there are many times when you will need to coerce the type of some operand. HLA uses the following syntax to coerce the type of a memory location or register operand:

(type typeID memOrRegOperand)

There are two instances where type coercion is especially important: (1) when you need to assign a type other than byte, word, or dword to a register35; (2) when you need to assign an anonymous memory location a type.

Type coercion is very useful in HLA when manipulating pointer objects, especially pointers to classes and records. Consider the following example:

type

myRec_t: record

i:int32;

c:char;

endrecord;

mrPtr_t: pointer to myRec_t;

static

mpr: mrPtr_t;

malloc( @size( myRec_t ) );

mov( eax, mpr );

mov( mpr, ebx );

mov( cl, (type myRec_t [ebx]).c );

mov( 0, (type myRec_t [ebx]).i );

As you can see here, whatever memory address appears inside the parentheses is treated like an object of the specified type. So you can treat that whole entity as though it were a variable of the specified type ( myRec_t in this example) and you can apply the dot operator or any other operation that would be legal on a variable of that type.

By default, the x86 general purpose registers have the types byte, word, or dword (depending, of course, on their size). Sometimes you might want to coerce these register to a different type, especially when outputting the value of a register or comparing a register with a constant. Coercion of a register is perfectly legal as long as the coerced data type is the same size as the register, e.g.,

(type int32 eax)

A coercion like this last example is especially useful when using the register without an output statement (like stdout.put) or in a run-time boolean expression. Consider the following:

if( eax < 0 ) then

<< do something if EAX is negative>>

endif;

In this example, the expression is always false because EAX is a dword object (which is unsigned). Therefore, EAX can never be less than zero (even if EAX contains something that you want interpreted as a negative value). You can solve this problem by coerce EAX to an INT32 object:

if( (type int32 eax) < 0 ) then

<< do something if EAX is negative>>

endif;

This code example will work properly since HLA is smart enough to generate the appropriate signed comparison/conditional jump sequence when it realizes one or more of the operands are signed.

- (negation) operator in constant expressions 60

- operator (subtraction, set difference) in constant expressions 63

operator in constant expressions 63

Symbols

62, 63

! (not operator) in constant expressions 58

!( boolean_expression ) 151

!= operator in constant expressions 63

operator (in #asm..#endasm sections) 182

#(...)# in macro parameters 134

#(...)# macro quoting symbols 139

#{ ... }# sequence for manually passing parameters 85

#{...}# parameter quoting mechanism 79

#{...}# sequence (to create thunks) 83

#{...}#" code brackets in boolean expressions 152

#asm..#endasm directives 181

#closeread compile-time statement 184

#closewrite statement 183

#code directive 130

#const directive 130

#else clause 185

#elseif clause in #if statement 185

#emit directive 181

#endif clause 185

#ERROR directive 183

#for..#endfor statement 185

#if statement 184

#Include directive 179

#include directive 190

#IncludeOnce directive 180

#KEYWORD reserved word 134

#openread compile-time statement 184

#openwrite statement 183

#PRINT directive 183

#readonly segment 130

#static segment 130

#storage segment 130

#TERMINATOR keyword 134

#text..#endtext statement 179

#while..#endwhile statement 185

#write statement 183

& operator in constant expressions 64

&& operator in boolean expressions 151

* (multiplication) operator in constant expressions 61

+ (addition, set union, string concatenation) operator in constant expressions 63

.bss section 28

.bss segment name 130

.code segment name 130

.data section 28

.data segment name 130

.edata section 28

.link files 131

.text section 28

/ (division) operator in constant expressions 62

= operator in constant expressions 63

== operator in constant expressions 63

> operator in constant expressions 63

>= operator in constant expressions 63

>> operator (shift right) in constant expressions 62

-@ command line option (linker response file) 25

@a 151

@abs function 162

@addofs1st function 175

@ae 151

@align 72

@Align procedure option 67

@alignstack 72

@alignstack procedure option 67, 68

@bound pseudo-variable 213

@byte function 162

@c 151

@Cdecl procedure option 77, 121, 189

@cdecl procedure option 67, 69

@curobject function 176

@curoffset function 175

@date function 162

@defined function 174

@delete function 164

@dim function 173

@display 71

@display procedure option 68

@e 151

@elements function 174

@elementsize function 173

@enter 72

@enter procedure option 70

@enumsize function 176

@EOS function 171

@eval function 140

@exactlynChar function 168

@exactlynCset function 166

@exactlyniChar function 169

@exactlyntomChar function 169

@exactlyntomCset function 167

@exactlyntomiChar function 169

@exceptions function 176

@exp function 163

@External option (in variable declarations) 129

@External procedures 71

@EXTERNAL reserved word 77

@extract function 163

@firstnChar function 168

@firstnCset function 166

@firstniChar function 169

@floor function 163

@FORWARD declarations 77

@isalpha function 163

@isalphanum function 163

@isclass function 175

@isconst function 174

@isdigit function 163

@IsExternal function 173

@isfreg function 175

@islower function 163

@ismem function 175

@isreg function 174

@isreg16 function 175

@isreg32 function 175

@isreg8 function 175

@isspace function 163

@istype function 175

@isupper function 163

@isxdigit function 163

@l 151

@lastobject function 175

@le 151

@leave 72

@leave procedure option 70

@length function 165

@lex function 173

@linenumber function 141, 175

@localoffset function 176

@locals function 174

@log function 164

@log10 function 164

@lowercase function 165

@matchID function 170

@matchIntConst function 170

@matchiStr function 169

@matchNumericConst function 170

@matchRealConst function 170

@matchStr function 169

@matchStrConst function 170

@matchToiStr function 169

@matchToStr function 169

@max function 164

@min function 164

@minparmsize function 176

@noalignstack procedure option 67, 68

@nodisplay 71

@Nodisplay option 89

@nodisplay procedure option 68

@noenter 72

@noenter procedure option 70

@noframe 72

@noframe procedure option 68

@noleave 72

@noleave procedure option 70

@nOrLessChar function 168

@nOrLessCset function 167

@nOrLessiChar function 169

@nOrMoreChar function 168

@nOrMoreCset function 167

@nOrMoreiChar function 169

@NOSTORAGE 127

@nostorage option in static sections 124

@ns 151

@ntomChar function 168

@ntomCset function 167

@ntomiChar function 169

@oneChar function 167

@oneCset function 166

@oneiChar function 169

@oneOrMoreChar function 168

@oneOrMoreCset function 167

@oneOrMoreiChar function 169

@oneOrMoreWS function 171

@optstring function 177

@parmoffset function 176

@Pascal procedure option 77, 121, 189

@pascal procedure option 67, 69

@pclass function 174

@peekChar function 167

@peekCset function 166

@peekiChar function 169

@peekWS function 171

@ptype function 171

@random function 164

@randomize function 164

@read compile-time function 184

@Returns procedure option 67, 110, 121, 189

@returns procedure option 69

@rindex function 165

@s 151

@section function 178

@sin function 164

@size function 173

@sqrt function 164

@staticname function 173

@Stdcall procedure option 77, 121, 189

@stdcall procedure option 67, 69

@strbrk function 165

@string

operator 178

@string operator 138

@strset function 165

@strspan function 165

@tokenize function 165

@tostring

Shift left operator ( 62

Shift right operator (>>) in constant expressions 62

Sign and zero extension instructions 207

Signed data types 39

Signed vs. unsigned comparisons in boolean expressions 149

Special symbols and punctuation 30

Specifying an external symbol's name 189

Specifying the executable file name 23

Stack size 26

Static class fields 101

Static member functions 99

Static section 121

STATIC..ENDSTATIC directives 122

stc instruction 198

std instruction 198

Stdcall procedure option 110, 191

sti instruction 198

Storage section 126

stosb instruction 198

stosd instruction 198

stosw instruction 198

String concatenation operator (+) in constant expressions 63

String constants 51

String data type 39

Structured constants 52

sub instruction 198

Subtraction operator (-) in constant expressions 63

Symbol table display 21

T

This (reference to class object) 98

THUNK constants (pass by name/lazy parameters) 83

Thunk data type 39

Thunks 46

TRY..EXCEPTION..ENDTRY statement 142

Type coercion 227

Type sections 109

U

UCR Standard Library for 80x86 Programmers 11

ud2 instruction 198

Unicode Character Constants 50

Unicode String Constants 51

Union constants 53

Union data types 40

Units 188

Unlabelled data objects 123

UNPROTECTED clause in the TRY..ENDTRY statement 145

Uns16 data type 39

Uns32 data type 39

Uns8 data type 39

Unsigned data types 39

Untyped reference parameters 67, 82

user32.lib 29

User-defined compilation errors 183

User-defined exceptions 144

V

Val (pass by value) parameter option 67

Val sections 114

Valres (pass by value/result) parameter option 67

Value parameters 79

Value/Result parameters 80

Var (pass by reference) parameter option 67

VAR (untyped reference parameters) 67

Var sections 117

Var type (untyped reference parameters) 83

Variable parameter lists in macros 132

Virtual member functions 99

Virtual Method Table pointer 97

Virtual method tables 97

VMT 97

W

wait instruction 198

wbinvd instruction 198

Webster 13

WHILE..ENDWHILE statement 152

Windows (GUI) applications 21

Windows API external names 189

Windows Structured Exception Handler 144

Word data type 39

Word segment alignment 125

X

Y

Yield 46

Z

Zero extension instruction 207

1. This section will use the term "HLA/86" when specifically taking about the High Level Assembler product this documentation describes and use "HLA" as a generic term. After this section, this documentation will use the term "HLA" to specifically describe the "HLA/86" product.

2. You must admit, though, HLA's documentation is better than that of most free software.

3. The ".exe" suffix appears only in the Windows' version.

4. Windows object files use the ".obj" suffix while Linux object files have the ".o" suffix. Although Linux users who write assembly code with Gas typically use a ".s" or ".S" suffix, HLA still uses ".asm" since Gas happily accepts this.

5. Or other compatible assembler/linker combination.

6. The output above is obviously for HLA v1.21. Later versions of the compiler produce different output. In particular, v1.25 and later using linker response files rather than supplying all the information directly to the linker on the command line.

7. Strictly speaking, this is not true. Win32 programs do have multiple 80x86 segments. For example, the FS segment register points at the process information for a given win32 process.

8. HLA actually translates the "-o xxxx" option into the corresponding "-out:xxxx.exe" option. If you use the "-r+" command line option, HLA will put this command in the linker response file for you; in that case, you don't have to specify it on the command line.

9. For C/C++ programmers: an HLA record is similar to a C struct. In language design terminology, a record is often referred to as a "cartesian product."

10. As this is being written, HLA doesn't fully support wchar or wstring types; ultimately the support will appear and you can add the sets {char, wchar} and {string, wstring} to the list.

11. In the future, HLA may also promote char objects to wchar and string objects to wstring. However, this was not functional as this is being written.

12. In theory, this should never happen since HLA maintains boolean values as zero or one.

13. This section only discusses procedure declarations. Other sections will describe iterators and methods.

14. Static variables are those you declare in the static, readonly, and storage sections. Non-static variables include parameters, VAR objects, and anonymous memory locations.

15. Strictly speaking, this isn't true. The nested procedure has access to all global variables that were declared before the procedure's declaration.

16. It is important that all nested procedures construct the display. You couldn't use the @ nodisplay option in lex1 and expect lex2 to properly build the display. In general, unless you know exactly what you are doing, your procedures should all have the @ nodisplay option, or none of them should have it.

17. Of course, you may create class variables (objects) by specifying the class type name in the var or static sections.

18. Actually, HLA was designed this way because far too often programmers make fields private and other programmers decide they really needed access to those fields, software engineering be damned. HLA relies upon the discipline of the programmers to stay out of trouble on this matter.

19. Note that the syntax is override , not overrides as is used for overriding data fields. This is an unfortunate consequence of HLA's grammar.

20. When calling a class procedure, HLA nevers disturbs the value in the EDI register. EDI is only tweaked when you call methods.

21. Of course, it is the caller's responsibilty to save this pointer away into an object pointer variable upon return from the class procedure.

22. Note, however, that HLA may automatically allocate storage for a display within the procedure. If you do not specify the @nodisplay procedure option, then the starting offset will be some negative number (depending on the lex level) to allow room for the display array. This is why the main program's current offset always starts at -4, HLA always allocates storage for a four-byte display entry for the main program (there is no way to specify @nodisplay for the main program).

23. Currently, this feature is available only under Windows as of HLA v1.32; plans are to add it to the Linux version at some point in the future. Please see the HLA change log to see if this feature has been added to the version you're using.

24. This feature depends upon operating system support.

25. Actually, HLA doesn't enforce this mutual exclusivity. However, if more than one of these options appears in a declaration, HLA only uses the last such declaration.

26. HLA's iterators are based on the similar control structure from the CLU language. CLU's iterators are considerably more powerful than the misnamed "iterators" found in the C/C++ language/library (which, technically, should be called "cursors" not iterators).

27. Mind you, this is not a very efficient implementation of a standard for loop.

28. Technically, yield is a variable of type thunk, not a statement. However, this discussion is somewhat clearly if we think of yield as a statement rather than a variable.

29. Actually, the HLA.EXE program allows you to specify several ".HLA" files on the command line. The command line option "-c" is only necessary if none of the files on the command line contain a main program.

30. For the purposes of this discussion, variables appearing in the READONLY, and STORAGE sections are treated as static variables along with variables declared in the STATIC section.

31. Because HLA emits MASM source code as its output, you must take care not to use any MASM reserved words as HLA external procedure names. Otherwise, MASM will generate an error when it attempts to assemble HLA's output.

32. Or when the HLA procedure name is a MASM reserved word.

33. However, since HLA emits the identifier to the MASM assembly language output file, the external identifier must be MASM compatible.

34. It does not support the 16-bit addressing modes since these are not very useful under Win32 or Linux.

35. Probably the most common case is treating a register as a signed integer in one of HLA's high level language statements. See the section on HLA High Level Language statements for more details.

HLA Language Reference and User Manual

Overview

What is a "High Level Assembler"?

What is an "Assembler"

HLA Design Goals

How to Learn Assembly Programming Using HLA

Legal Notice

Installing HLA Under Windows

Installing HLA Under Linux

Using the HLA Command-Line Compiler

Manually Assembling and Linking HLA Output Under Windows

The -w Option

The "-e" Option

The "-o:omf" and "-o:win32" Options

The "-s" Option

The "-sm" Option

The "-st" Option

Assembler Selection Options

The "-c" Option

The "-axxxxxx" Option

The "-@" Option Under Windows

MAKE Files and the Linker Response File Under Windows

The "-subsystem:console" Option

The "/heap:0x1000000,0x1000000" Option

The "/stack:0x1000000,0x1000000" Option

The "/base:0x3000000" Option

The "/machine:IX86" Option

The "/section:XXXXX" Options

The "-entry:?HLAMain" Option

The Library Files Options

The LINKER Command Line

HLA Language Elements

Comments

Special Symbols

Reserved Words

: HLA Reserved Words

External Symbols and Assembler Reserved Words

HLA Identifiers

External Identifiers

Data Types in HLA

Native (Primitive) Data Types in HLA

Composite Data Types

Array Data Types

Union Data Types

Record Data Types9

Pointer Types

Thunks

Class Types

Literal Constants

Numeric Constants

Decimal Constants

Hexadecimal Constants

Binary Constants

Numeric Set Constants

Real (Floating Point) Constants

Boolean Constants

Character Constants

Unicode Character Constants

String Constants

Unicode String Constants

Character Set Constants

Structured Constants

Array Constants

Record Constants

Union Constants

Pointer Constants

Constant Expressions in HLA

Type Checking and Type Promotion

!expr

- expr (unary negation operator)

expr1 * expr2

expr1 div expr2

expr1 mod expr2

expr1 / expr2

expr1 << expr2

expr1 >> expr2

expr1 + expr2

expr1 - expr2

Comparisons (=, ==, <>, !=, <, <=, >, and >=)

expr1 & expr2