1 Overview 7

2 What is a "High Level Assembler"? 8

3 What is an "Assembler" 10

4 HLA Design Goals 11

5 How to Learn Assembly Programming Using HLA 12

6 Legal Notice 13

7 Installing HLA Under Windows 13

8 Installing HLA Under Linux 17

9 Using the HLA Command-Line Compiler 19

10 Manually Assembling and Linking HLA Output Under Windows 22

10.1 The -w Option 23

10.2 The "-e" Option 23

10.3 The "-o:omf" and "-o:win32" Options 23

10.4 The "-s" Option 24

10.4.1 The "-sm" Option 24

10.4.2 The "-st" Option 24

10.5 Assembler Selection Options 24

10.6 The "-c" Option 24

10.7 The "-axxxxxx" Option 25

10.8 The "-@" Option Under Windows 25

10.9 MAKE Files and the Linker Response File Under Windows 25

10.9.1 The "-subsystem:console" Option 26

10.9.2 The "/heap:0x1000000,0x1000000" Option 26

10.9.3 The "/stack:0x1000000,0x1000000" Option 26

10.9.4 The "/base:0x3000000" Option 26

10.9.5 The "/machine:IX86" Option 27

10.9.6 The "/section:XXXXX" Options 27

10.9.7 The "-entry:?HLAMain" Option 28

10.9.8 The Library Files Options 29

10.9.9 The LINKER Command Line 29

11 HLA Language Elements 30

11.1 Comments 30

11.2 Special Symbols 30

11.3 Reserved Words 30

11.4 External Symbols and Assembler Reserved Words 38

11.5 HLA Identifiers 38

11.6 External Identifiers 38

11.7 Data Types in HLA 39

11.7.1 Native (Primitive) Data Types in HLA 39

11.7.2 Composite Data Types 40

11.7.2.1 Array Data Types 40

11.7.2.2 Union Data Types 40

11.7.2.3 Record Data Types 41

11.7.2.4 Pointer Types 46

11.7.2.5 Thunks 46

11.7.2.6 Class Types 48

11.8 Literal Constants 48

11.8.1 Numeric Constants 48

11.8.1.1 Decimal Constants 48

11.8.1.2 Hexadecimal Constants 49

11.8.1.3 Binary Constants 49

11.8.1.4 Numeric Set Constants 49

11.8.1.5 Real (Floating Point) Constants 50

11.8.2 Boolean Constants 50

11.8.3 Character Constants 50

11.8.4 Unicode Character Constants 50

11.8.5 String Constants 51

11.8.6 Unicode String Constants 51

11.8.7 Character Set Constants 51

11.8.8 Structured Constants 52

11.8.8.1 Array Constants 52

11.8.8.2 Record Constants 53

11.8.8.3 Union Constants 53

11.8.8.4 Pointer Constants 56

11.9 Constant Expressions in HLA 56

11.9.1 Type Checking and Type Promotion 57

11.9.2 !expr 58

11.9.3 - expr (unary negation operator) 60

11.9.4 expr1 * expr2 61

11.9.5 expr1 div expr2 61

11.9.6 expr1 mod expr2 62

11.9.7 expr1 / expr2 62

11.9.8 expr1 << expr2 62

11.9.9 expr1 >> expr2 62

11.9.10 expr1 + expr2 63

11.9.11 expr1 - expr2 63

11.9.12 Comparisons (=, ==, <>, !=, <, <=, >, and >=) 63

11.9.13 expr1 & expr2 64

11.9.14 expr1 in expr2 64

11.9.15 expr1 | expr2 64

11.9.16 expr1 ^ expr2 64

11.9.17 ( expr ) 65

11.9.18 [ comma_separated_list_of_expressions ] 65

11.9.19 record_type_name : [ comma_separated_list_of_field_expressions ] 65

11.9.20 identifier 65

11.9.21 identifier1.identifier2 {...} 65

11.9.22 identifier [ index_list ] 66

11.10 Program Structure 66

11.10.1 Procedure Declarations 67

11.10.1.1 Disabling HLA's Automatic Code Generation for Procedures 72

11.10.2 Procedure Calls and Parameters in HLA 76

11.10.3 Calling HLA Procedures 77

11.10.4 Parameter Passing in HLA, Value Parameters 79

11.10.5 Parameter Passing in HLA, Reference, Value/Result, and Result Parameters 80

11.10.5.1 Untyped Reference Parameters 82

11.10.5.2 Parameter Passing in HLA, Name and Lazy Evaluation Parameters 83

11.10.5.3 Hybrid Parameter Passing in HLA 85

11.10.5.4 Parameter Passing in HLA, Register Parameters 86

11.11 Lexical Scope 86

11.12 Class Data Types 89

11.12.1 Classes, Objects, and Object-Oriented Programming in HLA 90

11.12.2 Inheritence 92

11.12.3 Abstract Methods 96

11.12.4 Classes versus Objects 96

11.12.5 Initializing the Virtual Method Table Pointer 97

11.12.6 Creating the Virtual Method Table 98

11.12.7 Calling Methods and Class Procedures 98

11.12.8 Non-object Calls of Class Procedures 100

11.12.9 Static Class Fields 101

11.13 Program Unit Initializers and Finalizers 102

11.14 Declarations 107

11.14.1 Label Section 108

11.14.2 Type Section 109

11.14.3 Const Section 113

11.14.4 Val Section 114

11.14.5 Var Section 117

11.14.6 Static Section 121

11.14.7 Segments 124

11.14.8 Readonly Section 125

11.14.9 Storage Section 126

11.14.10 Variable Options 126

11.14.10.1 The @NOSTORAGE Option 127

11.14.10.2 The @VOLATILE Option 128

11.14.10.3 The @PASCAL, @CDECL, and @STDCALL Options 128

11.14.10.4 The @RETURNS Option 129

11.14.10.5 The @EXTERNAL Option 129

11.14.11 Segment Names 130

11.14.12 HLA ".link" Files (Windows Only) 131

11.14.13 Namespaces 131

11.14.14 Macros 132

11.14.14.1 Standard Macros 132

11.14.14.2 Multi-part (Context Free) Macro Invocations: 134

11.14.14.3 Macro Invocations and Macro Parameters: 137

11.14.14.4 Processing Macro Parameters 139

11.15 HLA High Level Language Statements 142

11.15.1 Exception Handling in HLA 142

11.15.2 The IF..THEN..ELSEIF..ELSE..ENDIF Statement in HLA 147

11.15.3 Boolean Expressions for High-Level Language Statements 148

11.15.4 The WHILE..ENDWHILE Statement in HLA 152

11.15.5 The REPEAT..UNTIL Statement in HLA 153

11.15.6 The FOR..ENDFOR Statement in HLA 153

11.15.7 The FOREVER..ENDFOR Statement in HLA 154

11.15.8 The BREAK and BREAKIF Statements in HLA 155

11.15.9 The CONTINUE and CONTINUEIF Statements in HLA 155

11.15.10 The BEGIN..END, EXIT, and EXITIF Statements in HLA 155

11.15.11 The JT and JF Medium Level Instructions in HLA 157

11.15.12 Iterators and the HLA Foreach Loop 158

11.16 HLA Compile-Time Language and Pragmas 159

11.16.1 Built-in Functions: 160

11.16.1.1 Constant Type Conversion Functions 160

11.16.1.2 Bitwise Type Transfer Functions 161

11.16.1.3 General functions 162

11.16.2 String functions: 164

11.16.3 String/Pattern matching functions 166

11.16.4 Symbol and constant related functions and assembler control functions 171

11.16.5 Pseudo-Variables 176

11.16.6 Text emission functions 178

11.16.7 Miscellaneous Functions 178

11.16.8 #Text and #endtext Text Collection Directives 179

11.16.9 The #Include Directive 179

11.16.10 The #IncludeOnce Directive 180

11.16.11 The #asm..#endasm and #emit Directives 181

11.16.12 The #system Directive 183

11.16.13 The #print and #error Directives 183

11.16.14 Compile-Time File Output (#openwrite, #write, #closewrite) 183

11.16.15 Compile-time File Input (#openread, @read, #closeread) 184

11.16.16 The Conditional Compilation Statements (#if) 184

11.16.17 The Compile-Time Loop Statements (#while and #for) 185

11.16.18 Compile-Time Functions (macros) 187

11.17 HLA Units and External Compilation 188

11.17.1 External Declarations 188

11.17.2 HLA Naming Conventions and Other Languages 190

11.17.3 HLA Calling Conventions and Other Languages 191

11.17.4 Calling Procedures Written in a Different Language 192

11.17.5 Calling HLA Procedures From Another Language 192

11.17.6 Linking in Code Written in Other Languages 192

12 The 80x86 Instruction Set in HLA 193

12.1 Zero Operand Instructions (Null Operand Instructions) 194

12.2 General Arithmetic and Logical Instructions 198

12.3 The XCHG Instruction 199

12.4 The CMP Instruction 200

12.5 The Multiply Instructions 201

12.6 The Divide Instructions 202

12.7 Single Operand Arithmetic and Logical Instructions 204

12.8 Shift and Rotate Instructions 205

12.9 The Double Precision Shift Instructions 205

12.10 The Lea Instruction 206

12.11 The Sign and Zero Extension Instructions 207

12.12 The Push and Pop Instructions 208

12.13 Procedure Calls 208

12.14 The Ret Instruction 210

12.15 The Jmp Instructions 211

12.16 The Conditional Jump Instructions 211

12.17 The Conditional Set Instructions 211

12.18 The Conditional Move Instructions 212

12.19 The Input and Output Instructions 212

12.20 The Interrupt Instruction 213

12.21 Bound Instruction 213

12.22 The Enter Instruction 214

12.23 CMPXCHG Instruction 214

12.24 CMPXCHG8B Instruction 214

12.25 The XADD Instruction 215

12.26 BSF and BSR Instructions 215

12.27 The BSWAP Instruction 216

12.28 Bit Test Instructions 216

12.29 Floating Point Instructions 217

12.30 Additional Floating Point Instructions for Pentium Pro and Later Processors 220

12.31 MMX Instructions 220

12.32 OS/Priviledged Mode Instructions 222

12.33 Other Instructions and features 224

13 Memory Addressing Modes in HLA 225

14 Type Coercion in HLA 227

HLA Language Reference and User Manual

Modification History:

v1.39: Updated document to reflect the new VAL operator (for actual parameters) and Unicode support.

v1.38: Discusses the new VAR section alignment and offset assignment options. Discusses the new union constant syntax. Describes the new #for..#endfor compile-time loops.

v1.37: Updated the discussion of constant expressions to describe the 128-bit arithmetic capabilities of HLA v1.37. Added NULL keyword and a brief discussion of its use. Described the new type transfer compile-time functions (@byte, @uns8, @int8, etc.).

v1.36: Began the modification history for this document. Note that version numbers correspond to HLA version numbers.

 

Overview

HLA, the High Level Assembler, is a vast improvement over traditional assembly languages. With HLA, programmers can learn assembly language faster than ever before and they can write assembly code faster than ever before. John Levine, comp.compilers moderator, makes the case for HLA when describing the PL/360 machine specific language:

 

1999/07/1119:36:51,themoderatorwrote:

"There'sno reason that assemblers have to have awful syntax. About 30 years ago I used Niklaus Wirth's PL360, which was basically a S/360 assembler with Algol syntax and a a little syntactic sugar like while loops that turned into the obvious branches. It really was an assembler, e.g., you had to write out your expressions with explicit assignments of values to registers, but it was nice. Wirth used it to write Algol W, a small fast Algol subset, which was a predecessor to Pascal. ... -John"

 

PL/360, and variants that followed like PL/M, PL/M-86, and PL/68K, were true "mid-level languages" that let you work down at the machine level while using more modern control structures (i.e., those loosely based on the PL/I language). Although many refer to "C" as a "medium-level language", C truly is high level when compared with languages like PL/*. The PL/* languages were very popular with those who needed the power of assembly language in the early days of the microcomputer revolution. While it's stretching the point to say that PL/M is "really an assembler," the basic idea is sound. There really is no reason that assemblers have to have an awful syntax.

HLA bridges the gap between very low level languages and very high level languages. Unlike the PL/* languages, HLA really is an assembly language. You can do just about anything with HLA that you can do with a traditional assembler like MASM, TASM, NASM, or Gas. If you want to write low-level assembly code using x86 machine instructions, HLA does not get in your way; if you want to use compares and conditional branches rather than structured control statements, you can. On the other hand, if you prefer to use more readable high-level control structures, HLA allows this, as well. HLA lets you work at the level you are most comfortable with and at the level that is most appropriate for the task at hand.

Beyond supplying a "non-awful" syntax, HLA has one other important feature -- it's extensible. HLA provides special features that let you add new statements to the language. So if HLA is not "high level" (or "low level") enough for your tastes, you can extend it. This document will expend considerable effort describing exactly how to do this in a later section.

In addition to the HLA language itself, the HLA system provides one other very important component - the HLA Standard Library. This is a collection of hundreds of functions that you can use to write assembly language programs as quickly and easily as you would write C programs.

What is a "High Level Assembler"?

The name "High Level Assembler" and its abbreviation "HLA" is certainly not new1. Nor is the concept of a high level assembler. David Salomon in his 1992 text "Assemblers and Loaders" (Ellis Horwood, ISBN 0-13-052564-2) uses these terms to describe various assembly languages dating back to 1966. Furthermore, both IBM and Motorola have assembler products with very similar names (e.g., IBM's HLAsm, though it's somewhat debatable whether HLAsm is truly a high level assembler).

Salomon offers the following definitions for a High Level Assembler (or HLA):

A high-level assembler language (HLA) is a programming language where each instruction is translated into a few machine instructions. The translator is somewhat more complex than an assembler, but much simpler than a compiler. Such a language should not have features like the if, for, and case control structures, complex arithmetic, logical expressions, and multi-dimensional arrays. It should consist of simple instructions, closely resembling traditional assembler instructions, and of a few simple data types.

Since Salomon describes a couple of high level assemblers that exceed this definition, he offers a second definition for high level assemblers that is a bit higher-level:

A high-level assembler language (HLA) is a language that combines most of the features of higher-level languages (easy to use control structures, variables, scope, data types, block structure) with one important feature of assembler languages namely, machine dependence.

Neither definition is particularly useful for describing HLA/86 and other HLAs like Terse, MASM and TASM. Of course the term "High Level Assembler" is very nebulous and offers a fair amount of latitude. Almost any macro assembler could pass as an HLA on the basis that a macro-instruction expands into a few machine instructions.

David Salomon describes several different high level assemblers in his text. The examples he describes are PL/360, NEAT/3, PL516, and BABBAGE.

PL/360 and PL516 are products that conform to the second definition above. They allow simple arithmetic expressions and assignment statements, the use of high level control structures (if, for, while, etc.), high level data declarations, and block structure (among other things). These languages expose the underlying machine's registers and allow the use of machine instructions using a "functional" syntax.

The NEAT/3 language is a much lower-level language; basically it is an assembly language for the NCR Century computers that provide COBOL-style data declarations. Most of its "instructions" translate one-for-one into Century machine instructions, though it does automatically insert code to convert data types from one format two another if the data types of an instruction's operands are incompatible.

The BABBAGE assembly language is an expression-based assembly language (very similar to Terse). It allows simplified high level control structures like if and while. The interesting thing about this assembler is that it was the only assembler for the GEC4000 family of computers.

In addition to the HLAs that Salomon describes, there have been several other high level assemblers created over the years. PL/M and PL/M-86 was designed by Intel for their 8080 and 8086 CPU families. This was an obvious adaptation of the PL/360 style HLA for Intel's CPUs. PL/68 was also available for the Motorola 680x0 family. SL/65 was a similar adaptation of PL/360 for the 6502 family. At one point there was a product named "High Level Assembler" for the Atari ST system (68K based). Jim Neil has also created an expression-based high level assembler (similar in principle to Babbage) for Intel's x86 family. MASM and TASM (for the x86) also fall into the category of a high level assembler due to their inclusion of high level control structures and logical expressions.

So where does HLA/86 fit into these definitions? In truth, the definition of HLA/86 falls somewhere between these two definitions. So the following paragraphs will define the term "High Level Assembler" as it should apply to HLA/86 and similar high level assemblers.

The first definition above is overly restrictive. It implies that any language that exceeds these limits is a high level language, not a high level assembly or traditional assembly language. Obviously, this definition is too restrictive in the sense that by this definition many traditional assemblers would have to be considered as high level languages (even beyond a high level assembler). Furthermore, it elevates many traditional assemblers to the status of an HLA even though we wouldn't normally think of them as high level assemblers; i.e., most macro assemblers provide the ability to create instructions that translate into a few machine instructions. Macro facilities, however, are something we expect out of a modern assembly language; their presence doesn't make the language a "high level" assembly language in most people's mind. Furthermore, most modern assemblers provide a mechanism for declaring multi-dimensional arrays (even though you still have to use some sequence of instructions to index into said arrays).

The second definition David Salomon provides hits the other extreme. Arguably, languages like C could be called HLAs under this definition (yes, there are some machine dependent features in C, though probably not enough to satisfy David Salomon's original intent).

The definition of high level assemblers like Terse, MASM, TASM, and HLA/86 fall somewhere between these extremes. Therefore, this document will define a high level assembler as follows:

A "high level assembly language" (HLAL) is a language that provides a set of statements or instructions that practically map one-to-one to machine instructions of the underlying architecture. The HLAL exposes the underlying machine architecture including access to machine registers, flags, memory, I/O, and addressing modes. Any operation that is possible with a traditional assembler should be possible within the HLAL. In addition to providing access to the underlying architecture, the HLAL must provide some abstractions that are not normally found in traditional assemblers and that are typically found in traditional high level languages; this could include structured control statements (e.g., if, for, and while), high level data types and data structuring facilities, extensive compile-time language facilities, run-time expression evaluation, and standard library support. A "High Level Assembler" is a translator that converts a high level assembly language to machine code.

 

There is a very important difference between this definition and the ones that David Salomon provides. Specifically, a high-level assembly language must provide access to the underlying machine architecture. Within the HLAL you must be able to specify any (reasonable) machine instruction that is available on the CPU. The HLAL may provide other statements that do not directly map to machine instructions (e.g., an if statement), but it must, at least, provide a set of statements that practically map one-to-one with the machine instructions. The "practically" modifier appears here for two reasons. First of all, some assembly source statements may map to two or more different, but equivalent, machine instructions. A good example is the x86 "mov reg, reg" which can map to two different (though equivalent) opcodes depending on the setting of the direction bit in the opcode. Most assemblers will map the source statement to only one of these opcodes, hence there is not truly a one-to-one mapping (since there exist some opcodes that do not map back to some source instruction). Another allowable restriction is that the HLAL may not allow the use of special "protected mode instructions" if the language is intended only for user-mode programming (as is the case for HLA/86).

In addition to supporting the underlying machine architecture (which almost any traditional assembler will do), the HLAL must also provide support for some features normally found in a high level language. The definition does not require that a HLAL support all the features listed above, nor is it restricted to just the features listed, but a HLAL must support some of the features traditionally found in a high level language. The number and type of features the HLAL supports determines how "high level" the assembly language is. Like HLLs, we can have "low-level" HLALs, "medium-level" HLALs, "high-level" HLALs, and even "very high-level" HLALs. NEAT/3, for example, would be a low-level HLAL since it provides higher-level data types, conversions, and not much else.

MASM and TASM are probably best considered medium-to-high-level HLALs since they provide high level data structuring facilities, structured control statements, high level procedure definitions and invocations, a limited block structure, powerful compile-time language (macro) facilities, standard library support (e.g., the UCR Standard Library and many other available library modules), and other high level language features. In actual use, the programmer is expected to normally use standard machine instructions and rise up to the high level statements only as necessary.

The Terse language is a good example of a medium level HLAL since it uses an expression syntax but otherwise maps statements fairly closely to the assembly counterparts. It does provide some higher-level data structuring capabilities, though this is inherited from the underlying assembler(s) on which Terse is based.

PL/360 and PL516 are definitely high-level HLALs because they fully support simplified arithmetic expressions, control structures, high-level data types, and other features. These languages provide access to the underlying architecture, but the emphasis is to use these langauges as a high level language and drop down to the machine instructions only as necessary.

HLA/86 probably falls in the high-level-to-very-high-level range because it provides high level data types and data structuring abilities, high level and very high level control structures, extensive parameter passing facilities (more than most high level languages), a very extensive compile time language, a very extensive standard library, built-in parsing facilities for language extension, and many other features. As a general rule, HLA/86 has a larger feature set than the other HLALs described above, but there are a couple of design goals that limit the "high-levelness" of HLA/86: (1) with one exception, HLA never emits any code behind the programmer's back that modifies registers or flags (the one exception is object method invocation, and this is well documented), and (2) HLA doesn't support arithmetic expressions (it does support a limited form of logical/boolean expressions). One interesting aspect of HLA/86 is that it is extensible. Using features built into the language, you can extend HLA/86's syntax by adding new statements and other features. This feature gives you the ability to make HLA/86 as high level as you desire (though it may take some effort to achieve certain language features). The bottom line is this: in some ways, HLA/86 is lower level than languages like PL/360 and PL516; in other ways, it's high level than these HLALs. However, as the definition requires, almost anything you can do with a traditional assembler is possible in HLA/86.

What is an "Assembler"

Because high level assemblers are clearly different that traditional assemblers, one might question whether a high level assembly language is truly an assembly language and whether translators for high level assembly languages can be properly called an assembler. Unfortunately, there is a consierable range of opinions as to exactly what consitutes an "assembler" versus other translators. This document will not attempt to get involved in this debate. Instead, this section provides a set of definitions that are useful for describing assemblers at various levels of abstraction.

Pure Assembler:

A "pure assembler" is a program that processes an assembly langauge source file and translates the source code using a direct mapping from source code instructions to individual machine instructions (each source instruction is mapped to exactly one machine instruction). The assembler only provides machine-primitive data types like bytes, words, double words, etc. A pure assembler does not provide macro facilities. A pure assembler always produces machine code as output.

Traditional Assembler:

A "traditional assembler" is a pure assembler plus macro facilities. The assembler may provides some "built-in macros" and instruction synonyms, but in general, the built-in statements should still map to individual machine instructions (note that the programmer may extend this by writing macros). There is no support by the assembler for run-time arithmetic or boolean expressions. A traditional assembler may also provide some simple data typing facilities (such as the ability to rename primitive data types as something else, e.g., byte->char). A traditional assembler always emits machine code as output.

High Level Assembler:

Unlike Traditional and Pure Assemblers, High Level Assemblers (HLAs) do not have to produce machine code as output. If a high level assembler produces machine code directly, then we call the high level assembly translator program an assembler; however, HLAs can also produce an assembly language output file that requires further processing by some other assembler to produce actual machine code; we'll call such translators compilers for a high level assembly language. Note that HLA v1,x (the product, not the classification) is a compiler by this definition. The intent is that HLA v2.0 and later will provide both compiler and assembler versions.

HLA Design Goals

HLA was originally conceived as a tool to teach assembly language programming. In early 1996 I decided to do a Windows version of my electronic text "the Art of Assembly Language Programming" (AoA). After an attempt to develop a new version of the "UCR Standard Library for 80x86 Programmers" (a mainstay of AoA), I came to the conclusion that MASM just wasn't powerful enough to make learning assembly language really easy. I decided to develop an assembler with sufficient power, providing the tools for a good standard library as well as satisify some other requirements. Therefore, HLA has two important goals: provide a system that is powerful enough to develop code and macros to make learning assembly language, which simultaneously providing a system that is easy for beginners to learn.

The principle goal of HLA was to leverage student's existing programming knowledge. For example, a good Pascal programmer can get their first C/C++ program operational in a few minutes. All they've got to do is note the similarities between the two programming languages, make the appropriate syntactical changes, and they're up and running. Take that same Pascal programming and expect them to learn LISP or Prolog the same way, and you'll not meet with the same success. LISP and Prolog are completely different, they use a different "programming paradigm," so the student has to "start over from scratch" when learning these languages. Although assembly language is an imperative language (like Pascal and C/C++), there is a considerable "paradigm shift" when moving from one of these high level languages to assembly. In HLA, I wanted to create a language with high level control structures and declarations that made it possible for someone familiar with an imperative language like Pascal or C/C++ to get their first HLA program running in a matter of minutes (or, at worst, a matter of hours). Of course, to achieve this goal, I needed to add high-level data declarations and high-level control constructs to the HLA language.

The astute reader will quickly point out that high level control structures are not assembly language and letting the students use these types of statements is not really teaching them assembly language. This is quite true; since the purpose of teaching an assembly language course is to teach the students "assembly language programming" it is quite clear that HLA would fail if it only provided these high level control structures (e.g., like the PL/M language does). Fortunately, this is not the case. HLA supports all standard assembly language instructions including CMP and Jcc instructions, so you can still write "pure" assembly language programs without using those high level language control structures. However, it does take time to learn the several hundred different machine instructions. Traditionally, it's taken my students (using only MASM) about five weeks before they could really write any meaningful programs in assembly language (you have to cover things like numeric representation, basic CPU architecture, addressing modes, data types, and introduce the instruction set before any real programs can be written).

HLA lets students write meaningful programs within about a week of it's introduction (e.g., the first assignment I give in a typical quarter is to write an "addition table" program that computes the outer product [addition table] of the two vectors 0..15 and 0..15, printing the table formatted nicely). They achieve this by using statements they already know (like IF and WHILE) with the injection of just a few assembly language concepts (registers, and the MOV and ADD instructions) plus an introduction to the HLA Standard Library. Over the next several weeks, these students write more and more complex programs as they are introduced to new assembly language and HLA concepts (e.g., data representation, basic architecture, addressing modes, data types, and additional instructions). At about the sixth week, I begin "weaning" these students off the high level language statements and force them to use the low level machine instructions. It turns out that they learn how to simulate an IF statement at roughly the same point in the quarter as they did when they used only MASM, but the big difference is that they've written a lot more code up to that point proving out other concepts in machine organzation and assembly language programming. In my limited experience with classroom testing, I've found that students spend less time on the class, cover more material, and retain the knowledge better (by the time of the final exam) than they did when I only used MASM.

 

The general goal of reducing the learning curve for students is achieved several ways.

(1) As noted above, HLA allows a gradual transition from high level languages into pure assembly language. My favorite analogy here is the Nicoderm CQ smoking cessation system ("gradual steps are better."). Like the Nicoderm system, HLA lets students learn assembly language in gradual steps rather than throwing them into the water and shouting "sink or swim!"

(2) In addition to letting the students employ high level language statements in their assembly language programs, HLA contains several other familiar concepts and syntactical items that ease the transition from high level language programming to assembly language. For example, HLA uses the familiar (to C/C++ programmers) "/*" and "*/" comment delimiters (as well as the "//" comment delimiter). Statements generally end with a semicolon (just as in high level languages). Machine instructions use a functional notation rather than "mnemonic-operand" notation. Constant, type, and variable declarations should look very familiar to Pascal programmers. HLA's standard library should look comfortable to anyone who has used the C/C++ standard library.

In addition to syntactical similarities, well-written HLA programs share a similar programming style with modern high level languages. So a student who has learned how to write readable Pascal, C/C++, or Java programs will be able to write readable HLA programs with almost no additional study. Contrast this with the style guide I've written for (MASM) assembly language programmers that is quite a bit different than high level languages and takes a while to master.

Another factor many people don't consider is the evaluation of a programming project. At UCR we are given about 1.5-2 hours per student per quarter of reader (student grader) time to grade projects. Experienced readers who can grade (or want to grade) assembly language projects are few and far inbetween. Most readers get "stuck" with grading the assembly class rather than volunteer for the job. The fact that most student assembly language projects have a horrible programming style and are hard to read only exacerbates this situation. HLA helps solve this problem. Since good HLA programming style is very similar to good C/C++ style, UCR's readers have a much easier time reading the projects and evaluating their programming style. Also, since the students have (presumably) learned good programming style in the prerequisite course(s), they tend to write easier to read HLA programs than MASM programs. This lets me assign more projects without fear of exceeding my reader budget each quarter.

HLA's advantages are easily summed up by a complaint I had from a student once. She said "HLA drives me nuts. It's so similar to C++ that I often get confused and try out something that would work in C++ only have have the HLA compiler reject it." I agreed with this student that this was a bit of a problem, but I also mentioned "what about all the times you've tried something from C++ and it HAS worked?" She thought about it for a moment and walked away agreeing with my assessment of her complaint. Had this student been learning assembly the traditional way, she wouldn't have bothered to try anything. She would had to have spent extra time learning how to achieve what she wanted by reading an assembly text or she would have missed out on the opportunity to actually learn something new. HLA's similarity to C++ encouraged her to try something out on her own. The experiments weren't always successful, but in those cases where they were, she benefited greatly from this. This anecdote, more than any other, sums up what my goals with HLA were and describes the success I believe I have achieved with it.

How to Learn Assembly Programming Using HLA

Of course, a compiler without a language reference manual and tutorial is useless. This document will provide a reference to the HLA programming language. It is not, however, appropriate pedagogy for beginners (it's more suitable for those who already know assembly language programming and wish to learn HLA's syntax). A better text for beginners is "The Art of Assembly Language Programming/Win32 Edition." This provides a complete college level textbook that teaches assembly language programming from the ground up using HLA. You can find a copy of "AoA" on Webster at http://webster.cs.ucr.edu. Webster also contains the latest version of HLA as well as tons of HLA sample source code. That's the first place you should go for information on learning HLA.

Legal Notice

The HLA v1.xx implementation is a prototype intended to test language design and implementation features. I (Randall Hyde) have placed this code and language design in the public domain so others may benefit from this work. However, keep in mind that, as a prototype, HLA is not up to contemporary commercial standards for software quality. It is your responsibility to evaluate whether HLA is suitable for whatever purpose you intend its use.

At any given time there are several known and unknown defects in this software. Some may be corrected in later releases of HLA v1.x, some may never be corrected in the v1.x series. I (Randall Hyde) do not warrant or guarantee this software in any way. In particular, you cannot expect corrections of any given defect in the system. Obviously, I try to fix known problems (if possible), but I refuse to be held legally responsible for such defects in the software.

Note that defects will come in three general varieties: defects that cause the compiler to fail or generate bad code, defects in support code (e.g., the HLA Standard Library or other example code), and defects in the documentation accompanying this product. No guarantee applies to anything in HLA, especially in these three areas.

The purpose of developing a prototype implementation of the HLA language was to try out language design and implementation ideas. The prototype phase of HLA development is rapidly coming to an end and an "official" HLA language design will be forthcoming. HLA v2.0 will implement this new language. The only guarantees I make about compatibility between HLA v1.x and HLA v2.0 is that there will be some incompatibilities. The exact nature and magnitude of those incompatibilities is unknown at this point, but it is safe to assume that no HLA v1.x program will compile under HLA v2.0 without at least some minor source code changes. So please don't get the idea that any investment you make in HLA source code will be protected in v2.0 (note: after the release of v2.0 this is a relatively safe assumption to make, though there will still be no guarantees). The changes in the source language between HLA v1.25 and HLA v1.26 are but a small harbinger of the changes that will occur between v1.x and v2.0.

The HLA Standard Library may also undergo changes between v1.x and v2.0. So expect this to happen and plan accordingly if you intend to port your HLA code to v2.0 eventually.

Because HLA is constantly changing (typical of a prototype), it is very difficult to keep the documentation in phase with the language. You can expect this documentation (and all HLA documentation) to contain omissions (e.g., of new features that have yet to be documented), discussion of features removed from HLA, and incorrect descriptions of HLA features. Every attempt will be made to keep the documentation in phase with the software, but like so many free software projects, lack of time and motivation prevents perfection2.

This software is not fit for use in mission-critical or life-support software systems. This software is principally intended for evaluation and educational (i.e., learning assembly language) purposes only. It has been successfully used to develop commercial applications and it has been successfully used in educational environments, but again, you are personally responsible for determining the fitness of this software and documentation for your particular application and you must take responsibility for that choice.

Installing HLA Under Windows

HLA is not a stand alone program. It is a compiler that translates HLA source code into a lower-level assembly language. A separate assembler, such as MASM, then completes the processing of this low-level intermediate code to produce an object code file. Finally, you must link the object code output from the assembler using a linker program. Typically you will link the object code produced by one or more HLA source files with the HLA Standard Library (hlalib.lib) and, possibly, several operating system specific library files (e.g., kernel32.lib under Win32). Most of this activity takes place transparently whenever you ask HLA to compile your HLA source file(s). However, for the whole process to run smoothly, you must have installed HLA and all the support files correctly. This section will discuss how to set up HLA on your system.

First, you will need an HLA distribution for your particular Operating System. Since HLA was originally developed for Win32, these installation instructions will cover installation on a Win32 OS. Please see Webster if you're attempting to install HLA on a different OS (assuming it is available for some OS other than Windows; it was not as this was being written). The latest version of HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the latest version if you do not already possess it.

As noted earlier, HLA is not a stand alone assembler. The HLA package contains the HLA compiler, the HLA Standard Library, and a set of include files for the HLA Standard Library. If you write an HLA program with just this code, HLA will produce an "ASM" file and then stop. To produce an executable file you will need Microsoft's MASM and LINK programs, along with some Win32 library files, to complete the process. The easiest way to get all the files you need is to download the "MASM32" package from http://www.pdq.com.au/home/hutch/masm.htm or any of the other places on the net where you can find the MASM32 package. Once you unzip this file, it's easy to install the MASM32 package using the install program it supplies. You must install MASM32 (or MASM/LINK/Win32 library files) before HLA will function properly.

Here are the steps I went through to install MASM32 on my system:

 

path=c:\hla;c:\masm32\bin;%path%

set lib=c:\masm32\lib;c:\hla\hlalib;%lib%

set include=c:\hla\include;c:\masm32\include;%include%

set hlainc=c:\hla\include

set hlalib=c:\hla\hlalib\hlalib.lib

 

 

program HelloWorld;

#include( "stdlib.hhf" )

begin HelloWorld;

 

stdout.put( "Hello, World of Assembly Language", nl );

 

end HelloWorld;

 

 

HLA (High Level Assembler)

Copyright 1999, by Randall Hyde, all rights reserved.

Version Version 1.25 build 2933 (prototype)

 

Files:

1: hw.hla

 

Compiling "hw.hla" to "hw.asm"

 

Assembling hw.asm via "ml /c /coff /Cp hw.asm"

 

Microsoft (R) Macro Assembler Version 6.14.8444

Copyright (C) Microsoft Corp 1981-1997. All rights reserved.

 

Assembling: hw.asm

Linking via "link -subsystem:console /heap:0x1000000,0x1000000 /stack:0x1000000,0x1000000 /BASE:0x3000000 /machine:IX86 -entry:?HLAMain @hw.link -out:hw.exe kernel32.lib user32.lib c:\hla\hlalib\hlalib.lib hw.obj"

Microsoft (R) Incremental Linker Version 5.12.8078

Copyright (C) Microsoft Corp 1992-1998. All rights reserved.

 

/section:.text,ER

/section:readonly,R

/section:.edata,R

/section:.data,RW

/section:.bss,RW

 

 

 

 

1) Open System Properties (Winkey-Break is a convenient shortcut) and go to Advanced tab, then Environment Variables. Add "c:\hla" to the Path in SYSTEM VARIABLES, not in "User variables for <your win2k login name>". Click OK, but keep the Environment Variables window open, we're not done.

 

2) Look at the contents of ihla.bat (ABOVE):

 

3) In "User Variables for <your login name>", you must end up with each of these settings. For example, to create hlainc, you click the "New..." button, type "hlainc" as the name of the variable, and type "c:\hla\include" as the Variable value (all without quotes of course). If there is already a path set, and it already has some value, add this immediately to the end: ";c:\hla;%path%" and that will preserve your existing User and System paths as well as adding c:\hla.

For example, suppose you opened up your User Variables for <login name> and it already said "C:\Private

Files\PantiePix;c:\winnt\system32;c:\winnt;c:\winnt\System32\Wbem;d:\lcc\bin;D:\PROGRA~1\ULTRAE~1;D:\4NT300;C:\msoffice\Office;c:/hla",

you would click on Edit and type "C:\Private Files\PantiePix;c:\hla;%path%"

(Same advice for preserving existing lib and include settings)

4) Once you reboot the computer, you should be all set for "Hello world of assembly language"! (without having to run the IHLA.BAT file.)

Installing HLA is a complex and slightly involved process. Unfortunately, this is necessary because I don't have the rights to distribute MASM, LINK, and other Microsoft files. Fortunately, HUTCH has collected all of these files together so they are easy to download. If you are concerned about possible legal issues with the download, you may legally download MASM and LINK from Microsoft's site. A link on Webster (at the URL above) describes how to do this. At the time this was being written, work was progressing on HLA to produce TASM compatible output and plans were in the works to produce NASM and Gas versions as well. However, you will still have to obtain the Microsoft library files from some source if you intend to produce a Win32 application. Versions of HLA may appear for other Operating Systems as well. Check out Webster to see if any progress has been made in this direction.

The most common two problems people have running HLA involve the location of the Win32 library files and the choice of linker. During the linking phase, HLA (well, link.exe actually) requires the kernel32.lib, user32.lib, and gdi32.lib library files. These must be present in the pathname(s) specified by the LIB environment variable. If, during the linker phase, HLA complains about missing object modules, make sure that the LIB path specifies the directory containing these files. If you're a MS VC++ user, installation of VC++ should have set up the LIB path for you. If not, then locate these files (they are part of the MASM32 distribution) and copy them to the HLA\HLALIB directory (note that the ihla.bat file includes c:\hla\hlalib as part of the LIB path).

Another common problem with running HLA is the use of the wrong link.exe program. Microsoft has distributed several different versions of link.exe; in particular, there are 16-bit linkers and 32-bit linkers. You must use a 32-bit segmented linker with HLA. If you get complaints about "stack size exceeded" or other errors during the linker phase, this is a good indication that you're using a 16-bit version of the linker. Obtain and use a 32-bit version and things will work. Don't forget that the 32-bit linker must appear in the execution path (specified by the PATH environment variable) before the 16-bit linker.

Installing HLA Under Linux

HLA is not a stand alone program. It is a compiler that translates HLA source code into a lower-level assembly language. A separate assembler, such as Gas (as), then completes the processing of this low-level intermediate code to produce an object code file. Finally, you must link the object code output from the assembler using a linker program. Typically you will link the object code produced by one or more HLA source files with the HLA Standard Library (hlalib.a). Most of this activity takes place transparently whenever you ask HLA to compile your HLA source file(s). However, for the whole process to run smoothly, you must have installed HLA and all the support files correctly. This section will discuss how to set up HLA on your system.

First, you will need an HLA distribution for Linux. Please see Webster or the previous section if you're attempting to install HLA on a different OS such as Windows. The latest version of HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the latest version if you do not already possess it.

As noted earlier, HLA is not a stand alone assembler. The HLA package contains the HLA compiler, the HLA Standard Library, and a set of include files for the HLA Standard Library. If you write an HLA program with just this code, HLA will produce an "ASM" file and then stop. To produce an executable file you will need GNU's as and ld programs (these come with any Linux distribution that supports compiling C/C++ programs). Note that the HLA output can only be assembled by Gas v2.10 or later (so you will need the 2.10 or later binutils distribution).

Here's the steps I went through to install HLA on my Linux system:

 

program HelloWorld;

#include( "stdlib.hhf" )

begin HelloWorld;

 

stdout.put( "Hello, World of Assembly Language", nl );

 

end HelloWorld;

 

 

HLA (High Level Assembler) Parser

Copyright 2001, by Randall Hyde, all rights reserved.

Version Version 1.32 build 4895 (prototype)

-t active

File: t.hla

 

Compiling "t.hla" to "t.asm"

HLA (High Level Assembler)

Copyright 1999, by Randall Hyde, all rights reserved.

Version Version 1.32 build 4895 (prototype)

ELF output

Using GAS assembler

GAS output

-test active

 

Files:

1: t.hla

 

Compiling 't.hla' to 't.asm'

using command line [hlaparse -v -sg -test "t.hla"]

 

Assembling "t.asm" via [as -o t.o "t.asm"]

Linking via [ld -o "t" "t.o" "/usr/hla/hlalib/hlalib.a"]

 

Installing HLA is a complex and slightly involved process; though take heart, it's a lot simpler to install HLA under Linux than Windows! (See the previous section if you need proof.) Versions of HLA may appear for other Operating Systems (beyond Windows and Linux) as well. Check out Webster to see if any progress has been made in this direction. Note a very unique thing about HLA: Carefully written (console) applications will compile and run on all supported operating systems without change. This is unheard of for assembly language! So if you are using multiple operating systems supported by HLA, you'll probably want to download files for all supported OSes.

Using the HLA Command-Line Compiler

Once you've installed HLA and verified that it is operational, you can run the HLA compiler. The HLA compiler consists of two executables: hla(.exe)3, which is a shell that processes command line arguments, compiles ".hla" files to ".asm" files, assembles the ".asm" files by calling an assembler, and links the resulting files together using a linker program; the second executable is hlaparse(.exe) which compiles a single ".hla" file to an assembly language file. Generally, you would only run hla(.exe). The hla(.exe) program automatically runs the hlaparse(.exe) and assembler/linker programs. The hla(.exe) command uses the following syntax:

 

hla optional_command_line_parameters Filename_list

 

The filenames list consists of one or more unambiguous filenames having the extension: ".hla", ".asm" or ".obj"/".o"4. HLA will first run the hlaparse(.exe) program on all files with the HLA extension (producing files with the same basename and an ASM extension). Then HLA runs the assembler on all files with the ".asm" extension (including the files produced by hlaparse). Finally, HLA runs the linker to combine all the object files together (including the ".obj"/".o" files the assembler produces). The ultimate result, assuming there were no errors along the way, is an executable file (with an EXE extension under Windows, with no extension under Linux).

 

HLA supports the following command line parameters (this output is from the Linux version, the Windows display is slightly different even though the command line options are the same):

 

options:

-@ Do not generate linker response file.

-axxxxx Pass xxxxx as command line parameter to assembler.

-c Compile and assemble to .OBJ files only.

-dxx Define VAL symbol xx to have type BOOLEAN and value TRUE.

-e name Executable output filename.

-gas Assemble using Gas (default - Linux)

-lxxxxx Pass xxxxx as command line parameter to linker.

-m Create a map file during link

-masm Assemble using MASM (default - windows).

-o:omf Produce OMF files (default - tasm).

-o:win32 Produce win32 COFF files.

Note: default OBJ format depends on OS and assembler.

Win32/TASM = OMF, Win32/MASM = COFF.

-s Compile to .ASM files only.

-sm Compile to MASM files only (default for -s).

-st Compile to TASM files only.

-sg Compile to Gas files only.

-sym Dump symbol table after compile.

-tasm Assemble using TASM.

-test Send diagnostic info to stdout rather than stderr (This

option is intended for HLA test/debug purposes).

-v Verbose compile.

-w Compile as windows app (default is console app).

-? Display this help message.

 

Note that HLA ignores case when processing command line parameters (unlike typical Linux programs). Hence, "-s" is equivalent to "-S" (for example) when specifying a command line parameter.

Under Windows, HLA always produces a "linker response file" that it supplies to the Microsoft LINK.EXE program during the link phase. This linker response file contains necessary segment declarations and other vital linker information. HLA overwrites any existing ".LINK" file whenever you run the compiler. The "-@" option tells HLA not to create a new ".LINK" file, but to use the existing one. Use this option if you edit the ".LINK" file to change default parameters or add linker options and you want HLA to use the edited linker response file rather than create a new one. If you specify multiple ".HLA" filenames on the command line, HLA only generates a single ".LINK" file using the name of the first ".HLA" file it encounters. Linux's ld program does not require this linker response file, so the Linux version of HLA does not produce this file.

The -aXXXXX option lets you pass assembler-specific command line options to the assembler during the assembler phase. This option is ignored if you use one of the -s options.

The -c option tells HLA to run the hlaparse compiler and the assembler, producing ".obj"/".o" files. HLA will process all filenames on the command line that have ".hla" or ".asm" extension, but it will ignore any filenames with ".obj" extensions. If you compile an HLA unit without compiling an HLA program at the same time, you will need to use this option or the linker will complain about not finding the main program. You may specify the ".obj"/".o" file format using the COFF or OMF command line options (MASM only, TASM always produces OMF files, Gas always produces ELF files).

The -dXXXXX option tells HLA to define the symbol XXXXX as a boolean VAL constant and initialize it with the value TRUE. Generally you use such symbols to control the emission of code during assembly using statements like "#if( @defined( XXXXX )) ..."

By default, HLA creates an executable filename using the extension ".exe" (Windows) or without an extension (Linux) and the basename of the first filename on the command line. You can use the -e name option to specify a different executable file name.

The -lXXXXX option passes the text XXXXX on to the linker as a command line option.

The -m option tells the Microsoft linker to produce a map file during the link phase. This is equivalent to the "-lmap" option. The Linux version of HLA ignores this option.

The -MASM option tells HLA to assemble the output using the MASM assembler. Note that this overrides any earlier -s option. If you use this option, you should not also specify the -sg or -st options. You should not use this option under Linux unless you've, somehow, got MASM running there (e.g., via DOSEMU).

The -o:omf option tells the underlying assembler (MASM, generally) to produce an Object Module Format (OMF) OBJ file. This option is generally applicable only to MASM since TASM always produces OMF files. This option is not legal when using the Gas assembler.

The -o:win32 option instructs the assembler to generate a COFF OBJ file. This option is the default for MASM and may not be available for other assemblers.

The -s option tells the HLA program to run only the hlaparse compiler; HLA will not run an assembler or linker. As a result, HLA ignores any ".asm" or ".obj" filenames you supply on the command line. This option is useful if you wish to view the output of an HLA compilation without producing any actual object code. Note that this option overrides any -MASM or -TASM options appearing earlier on the command line. Similarly, the -MASM or -TASM options override the -s option if they appear after -s on the command line.

The -st option tells HLA to produce TASM-compatible assembly and stop after compilation. If you also want to assemble the code using Borlands Turbo assembler, you must specify the -tasm command line option after the -st option.

The -sm option tells HLA to produce MASM-compatible assembly and stop after compilation. This is (currently) equivalent to the -s command under Windows. You can force a compilation of the source code by specifying the -masm command after -sm on the command line.

The -sg option tells HLA to produce Gas-compatible assembly and stop after compilation. This is (currently) equivalent to the -s command under Linux. You can force a compilation of the source code by specifying the -gas command after -sg on the command line.

The -sym option dumps the symbol table after compiling each file with an HLA extension. This option is primarily intended for testing and debugging the HLA compiler; however, this information can be useful to the HLA programmer on occasion.

The -tasm option tells the compiler to run Borland's TASM32.EXE (V5.0) assembler after compiling the source file. The -st option should appear on the command line prior to this option.

The -test option is intended for hlaparse testing and debugging purposes only. It causes the compiler to send all error messages to the standard output device rather than the standard error device. This allows the test code to redirect all errors to a text file for comparison against other files.

The -v option (verbose) causes HLA to print additional information during compile to show the progress of the compilation. Due to a bug in MASM, if you do not specify the -v option the compilation isn't completely quiet. MASM will still output data to the standard error device even in quiet (non-verbose) mode.

The -w option informs HLA that you are compiling a standard Windows (GUI) application rather than a console application. By default, HLA assumes that you are compiling a executable that will run from the command window. If you want to write a full Windows application, you will need to supply this option to tell HLA not to link the code for console operation. Obviously, this option doesn't apply to Linux systems.

The -? option cause HLA to dump the list of command line options and immediately quit without further work.

Note that the command line options this document describes are for HLA v1.26 and later only. Earlier versions of HLA used a different command line set. See the documentation for the specific version you're using if you have questions.

Manually Assembling and Linking HLA Output Under Windows

Warning: The material in this section is somewhat advanced. If this is your first exposure to HLA, you will probably want to skip this material. This information is generally not needed when writing standard HLA applications; only advanced programmers in some very special circumstances will need this information. This complexity applies mainly to the Windows OS; under Linux, HLA works as you would expect it to, no special tricks are needed to link object modules produced by HLA.

The HLA compiler physically consists of two executable files: HLA.EXE and HLAPARSE.EXE. The HLAPARSE.EXE program is the actual HLA compiler. This file accepts a single HLA source file and compiles it to an .ASM file. The HLA.EXE program is the "user interface" to the compiler. This file processes the command line parameters and calls HLAPARSE.EXE, ML.EXE, and LINK.EXE5, as appropriate, to process the user's source and object files. You can view the individual steps during compilation by specifying the "-v" (for verbose) command line option when running HLA, e.g.,

 

c:>hla -v t.hla

HLA (High Level Assembler)
Copyright 1999, by Randall Hyde, all rights reserved.
Version Version 1.21 build 2254 (prototype)

Files:
1: t.hla

Compiling "t.hla" to "t.asm"

Assembling t.asm via "ml /c /coff /Cp t.asm"

Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997. All rights reserved.

Assembling: t.asm

 

Linking via
"link -subsystem:console \
/heap:0x1000000,0x1000000 \
/stack:0x1000000,0x1000000 \
/BASE:0x3000000 \
/machine:IX86 \
/section:cseg,ER \
/section:dseg,RWS \
/section:bssseg,RWS \
/section:readonly,RS \
/section:strings,RS \
-entry:?HLAMain \
-out:t.exe \
kernel32.lib \
c:\hla\hlalib\hlalib.lib \
t.obj"

Microsoft (R) Incremental Linker Version 6.00.8168
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.

 

As you can see from this output, HLA fills in a lot of details for you during compilation6.

By default, HLA compiles win32 console applications. This default was chosen because HLA is an instructional tool and most students will write console applications in assembly language using HLA. While this is a good default for compilation, many programmers will want to create GUI applications or otherwise change the default compilation configuration. This section discusses how to achieve that.

There are four HLA command-line options that let you interrupt the normal compilation process. They are "-s", "-c", "-w", and "-o".

The -w Option

The "-w" option tells HLA to invoke the linker using the command line option

-subsystem:windows

rather than the default

-subsystem:console

 

This provides a convenient mechanism for those who wish to create win32 GUI applications. Most likely, however, if you wish to create GUI applications, you will run the linker explicitly yourself (as this document will explain), so you'll probably not use the "-w" option very frequently. It's great for some short GUI demos, but larger GUI programs will probably not use this option. This option is only active if HLA compiles the program to an executable. If you compile the program to an OBJ or ASM file, HLA ignores this option.

 

The "-e" Option

The "-e" option uses the name following the "-e" as the exectuable filename that HLA produces. E.g., "HLA -e x t.hla" compiles "t" to "x.exe". As with the "-e" option, this option is mildly convenient for short projects (and console applications), but serious users will probably not use this HLA command line option since they will likely specify the executable filename as a linker command line option. The "-e" option is only active if HLA actually compiles the program to an executable. If you compile the program to an OBJ or ASM file, HLA ignores this option.

 

The "-o:omf" and "-o:win32" Options

The -o:omf option tells HLA to produce an Object Module Format OBJ file rather than a COFF (Common Object File Format) OBJ file. OMF files are required by some non-Microsoft languages. The -o:win32 option tells HLA to compile and assemble the files to the standard COFF file format (this is the default condition).

 

The "-s" Option

The "-s" (s=source) option tells HLA to compile the HLA source file to an ASM source file and then stop. The HLA.EXE program does not run MASM or the linker programs after compiling the HLA source file. This option is primarily used by those who wish to inspect the HLA compiler output (perhaps to verify correctness of the HLA compiler or just to see what HLA is doing). This document will not consider this option any farther.

 

The "-sm" Option

As above, but this option explicitly tells HLA to create a MASM-compatible .ASM file.

 

The "-st" Option

This option is also an "assembly output only" option, but it instructs HLA to produce a file that TASM 5.0 can assemble. This option exists primarily for linking HLA code with code produced by Borland's Delphi.

 

Assembler Selection Options

By default, HLA uses the MASM assembler (ML.EXE) to process the assembly output it produces. You can use the "-masm" command line option to explicitly request the use of MASM. You may also use the "-tasm" directive to explicitly request the use of the TASM5 assembler. If you specify the use of a different assembler, you should also specify the appropriate "-sX" option (before the assembler choice) to obtain the correct source code output format. Note that "-masm" and "-tasm" override the "-sX" option insofar as producing source only. That is, if you specify an assembler, then HLA will run the assembler to produce an OBJ file unless you specify an "-sX" option after the assembler choice on the command line.

The "-c" Option

The "-c" option (compile and assemble only) tells HLA to compile the HLA source code to an ASM file and then run MASM on this ASM file to produce an OBJ file. Compilation stops at that point and it is the user's responsibility to run the linker to produce an executable file.

One common use of this option is to compile HLA units to OBJ files. Since HLA units do not contain a main program, you cannot compile an HLA unit directly to an executable. To compile an HLA unit separately (i.e., without compiling an HLA main program during the same HLA.EXE invocation) you must specify the "-c" option or the compilation will generate an error when it attempts to link the program.

A second reason for using the "-c" option is because you want to explicitly run the linker yourself and supply LINK.EXE command line options that are different than those that HLA automatically provides.

 

The "-axxxxxx" Option

 

This option passes "xxxxxxx" as a command line parameter to the assembler. HLA allows multiple instances of this option on the command line (presumably, each one contains a different assembler option).

The "-@" Option Under Windows

By default when running under Windows, HLA produces a linker response file containing segment (section) information. The "-@" option turns this feature off. This is necessary, for example, when you've created your own linker response file and you don't HLA to overwrite your response file. Note that if you haven't created a linker response file already, the compilation and link will fail unless you've also specified the "-c" option. HLA always assumes the presence of a linker response file when it runs the linker.

MAKE Files and the Linker Response File Under Windows

As you probably noticed earlier, HLA supplies a really long command line parameter list to the LINK command. If you compile the HLA source file to an OBJ file (using the "-c" option) and then attempt to manually run the linker, you're going to be spending a lot of time typing in the link command. If you expect to link your files together more than once, you're definitely going to want to automate the compilation and linking process. Fortunately, the MAKE tool is perfect for this job. You can use Microsoft's NMAKE.EXE program, Borland's MAKE.EXE program, or any other UNIX-compatible MAKE tool that runs in a win32 console window.

 

A typical "makefile" will take the following form:

 

t.exe: t.hla
hla -c t.hla
link @t.link -out:t.exe t.obj

The "t.link" file contains the standard set of LINK.EXE command line parameters, this file takes the following form (this is typical of the file produced by the "-@" option, though, of course, it is a text file that you can create or modify):

 

-subsystem:console

/heap:0x1000000,0x1000000

/stack:0x1000000,0x1000000

/BASE:0x3000000

/machine:IX86

/section:.text,ER

/section:.edata,RS

/section:readonly,RS

/section:.data,RWS

/section:.bss,RWS

-entry:?HLAMain

kernel32.lib

user32.lib

c:\hla\hlalib\hlalib.lib

 

These commands appear in the "t.link" file rather than in the makefile on the LINK.EXE command line because there is too much text on the command line when all of these items are present. Hence, we must use a linker response file and include this data using the LINK.EXE "@t.link" command line parameter. The following sections describe each of the lines in the "t.link" file and how you might modify them.

Note that if your specify multiple HLA source files on the same command line, HLA only generates a single linker response file using the name of the first HLA source file you specify on the command line. E.g., if you run HLA with the following command line it only generates a single linker response file:

hla unitDemoMain.hla UnitDemoUnit.hla

 

For this example, HLA generates a "unitDemoMain.link" linker response file.

The "-subsystem:console" Option

This is the option that tells the linker you are creating a console application rather than a GUI application. This option must be present if you are compiling a program that directs output to the standard output device (e.g., stdout.XXXX) or reads input from the standard input device (e.g., stdin.XXXX). Applications you compile with this option will run in a win32 console window as a 32-bit executable. If you are creating a GUI (windows) application, you must change this line in the "t.link" file to "-subsystem:windows" to tell the linker this is a windows application.

 

The "/heap:0x1000000,0x1000000" Option

This option tells the linker to reserve and commit 16 megabytes of storage for the heap. HLA's memory allocation routines use the heap (by calling the Windows global allocation routines). In theory, if you exceed the maximum heap size, Windows will automatically allocate additional heap storage elsewhere in memory. However, if you expect that your application will consume more than 16 MBytes of dynamic storage, you should bump this value up to make more room.

The choice of 16 MBytes was fairly arbitrary. If your applications do not make much use of dynamic memory, you should consider reducing this number to one megabyte (0x100000, that's one less zero than 16 MBytes) or even less. By doing this your applications will use less memory and less system resources. To reduce or expand the heap size, simply replace the two values in the "/heap" option with the value you desire. Note that if you make the heap larger, you should adjust the base address of the program upward to compensate for the new size.

 

The "/stack:0x1000000,0x1000000" Option

This linker option sets aside 16 MBytes of storage for the 80x86 stack segment. Like the "/heap" option, you can change this value if you would like a larger or smaller stack. If your program does not contain any recursive functions or use a tremendous amount of local automatic (VAR) storage, you can probably reduce this to one megabyte or even less (a good minimum is probably 64K or 0x10000). If you make this value larger than 16 MBytes, you will probably want to change the program's base address using the "/base" option.

 

The "/base:0x3000000" Option

This option sets the base address of the HLA program.

HLA, along with the default linker options, lays out the program in memory as shown in the following diagram:

 

 

The "/base" option specifies the base address of all the segments (sections) other than the stack and heap segments in memory. Normally the stack is located at the low end of memory and the heap is placed immediately above that. If the stack and heap are both at 16 Mbytes long (0x1000000) and the OS reserves a small amount of storage for its own use, then the other sections should start somewhere above address 0x2?????? in memory (the actual address depends on the actual storage in use by the system starting at address zero). The default "/base:0x3000000" command tells the linker to place the remaining segments in memory starting at address 0x3000000 (48 MBytes into the memory space). This should be well above the space reserved (by default) for the stack and heap. Unless you have some special requirements (specifically, if you need a combined heap and stack space larger than 48 MBytes), you should not have to change the base address of the program. Note that if you specify a base address that would place the code in the heap or stack segments, the linker will relocate the stack and heap to higher addresses. Generally, you shouldn't do this because the current memory organization automatically traps heap and stack overflows. If you move these things around in memory the system may not trap these exceptions.

The "/machine:IX86" Option

This just tells the linker that you're linking in 80x86 machine code and quiets certain warnings

that may otherwise appear. You shouldn't change this line in the "t.link" file.

 

The "/section:XXXXX" Options

There are five lines in the "t.link" file that contain "/section" options. These are

 

/section:.code,ER
/section:const,RS
/section:readonly,RS
/section:.data,RWS
/section:.bss,RWS

Under win32, a "section" is similar to a memory segment. Of course, win32 programs use the flat memory model so the application has only one physical 80x86 segment7. Sections are the logical equivalent of a segment. In general, a win32 program can have as many sections as desired. HLA, however, only directly supports five sections: the .code section, the HLA constants section (const), the readonly section, the static .data section, and the uninitialized data section (.bss).

The .text section holds the 80x86 machine instructions the HLA compiler emits. The "ER" option on the "/section:.text, ER" line tells the linker that the CPU can execute instructions in this segment and it can read data found in this segment. Since the "W" option is not present, the code segment is read-only. Any attempt to write data to the code section will generate a memory access exception. Generally it is not a good idea to embed writable objects in the code section. If you absolutely need this feature (e.g., self-modifying code) then add the "W" option to this option list. The ability to read data in the code segment isn't strictly required. If you do not embed values in the code stream, you can remove the "R" option from this command. This may help catch some errant pointer accesses during your program's execution. Since fetching parameters from the code stream is not unheard of, HLA defaults to making the code section readable in order to allow the use of this parameter passing technique.

The .edata section holds string and other constants that HLA automatically emits (for example, if you specify an instruction like "mul( 10, EAX);" then HLA will create an object in the .edata section that holds the constant 10 for use by this multiply instruction). This section should always be read-only. The "RS" option specifies read access and shared access. It doesn't really require the "S" option since other processes won't share this data, but it probably doesn't hurt too much to have the "S" option present. You should not make this segment writable since this segment only contains literal constant data and errant writes to this section of memory may produce unusual results (to say the least).

The readonly section holds all the HLA variables you declare in the HLA READONLY declaration section. Like the .edata section, this memory segment is readable and shared but it is not writable. In order to preserve the semantics of the HLA READONLY section, you should not make this section writable by adding the "W" option. Like the .edata section, the readonly section is sharable. It is possible (and practical) to share data in a readonly section between two programs, hence the default setting of shared for this section. Sharing objects is beyond the scope of this documentation, a different article will have to describe how to accomplish that.

The .data section holds all initialized objects you declare in the HLA STATIC and DATA declaration sections. Since these sections are writable as well as readable, the default memory access mode for this section command is "RWS" (readable, writable, sharable). Note that this memory section may not contain executable instructions. If you attempt to jump to code in the .data section, the system will raise a memory access exception. If you intend to create self-modifying code in the .data section, you will need to modify this linker option to allow executable as well as read-write-shared.

The .bss section is where all HLA STORAGE variables go. In theory, this section holds uninitialized variables and shouldn't consume much disk space (even if there are large arrays in the STORAGE declaration sections). The memory access options are the same as .data's.

 

The "-entry:?HLAMain" Option

This option specifies the entry point of the main program. The HLA main program always uses the symbol "?HLAMain" so you should never modify this option unless you really know what you are doing. The HLA main program initializes the standard library and the exception handling system. Therefore, the program may crash if the program does not begin execution with the HLA entry point.

If you decide to create your own entry point, you should eventually jump to the " ?HLAMain " label to prevent problems with the HLA run-time system. Modifying this option should only be done by those who are well-versed in the HLA compiler and run-time system.

About the only reason for taking control of the entry point yourself is if you are writing stand-along assembly code that doesn't use the HLA exception handling system or HLA Standard Library routines. Further investigation of this feature is left to the reader.

The Library Files Options

The "kernel32.lib" and "user32.lib" files are win32 API interface files that provide the external symbols for most of the win32 APIs a typical (console) application will use. Almost every win32 application will need to specify this file (which is part of the Microsoft SDK). If you are writing GUI applications using HLA you will probably need to specify some other MSSDK library files as well. You will have to refer to the Microsoft win32 API documentation for exact details, but common files you'll need to link in include comctl32.lib, comdlg32.lib, and gdi32.lib in addition to kernel32.lib and user32.lib. There are several dozen different LIB files you may want to use. See the Microsoft documentation for more details.

The c:\hla\hlalib\hlalib.lib option specifies the path to the HLA Standard Library LIB file. You must link in this library module if you call or use any objects in the HLA Standard Library.

Any filename appearing within the linker response file is assumed to be a library file or OBJ file (depending on the suffix). By default, link.exe uses the "LIB" environment variable to determine the location of Win32 library files. Note that LINK.EXE does not honor the value of the "HLALIB" environment variable. You must provide the full path of the HLA Standard Library LIB file (hlalib.lib) in this list.

The LINKER Command Line

link @t.link -out:t.exe t.obj

 

As noted earlier, the "@t.link" command line option simply tells the linker to fetch the items from the "t.link" file and treat them as though they appear on the linker's command line. This lets you specify far more options than are normally possible on the linker command line. About the only thing worth noting here is that the filename after the at-sign ("@") does not have to be "t.link". You can use any filename you choose; typically it will be something like "projectName.link" where projectName denotes the name of the main source file in your current project. The examples in this document use the project name "t" because the HLA source file is "t.hla".

The "-out:t.exe" option specifies the output (executable) file name. When running the linker independently of HLA, you should specify this option in place of the HLA "-o" option8. If this option is not present the linker will give the executable file the same prefix name as the first object code file on the command line. When using a makefile, it's a good idea to always supply this option so you can modify the linker command line and not worry about generating an executable with a strange name.

The last item on the link command line, "t.obj", specifies the name of the object module compiled previously by HLA and MASM. If you needed to link in several HLA units as well as the main program (or OBJ files created by other languages), you would include them on this linker command line as well.

HLA Language Elements

Starting with this section we being discussing the HLA source language. HLA source files must contain only seven-bit ASCII characters. These are text files with each source line record containing a carriage return/line feed (Windows) or a just a line feed (Linux) termination sequence (HLA is actually happy with either sequence, so text files are portable between OSes without change). White space consists of spaces, tabs, and newline sequences. Generally, HLA does not appreciate other control characters in the file and may generate an error if they appear in the source file.

Comments

HLA uses "//" to lead off single line comments. It uses "/*" to begin an indefinite length comment and it uses "*/" to end an indefinite length comment. C/C++, Java, and Delphi users will be quite comfortable with this notation.

Special Symbols

The following characters are HLA lexical elements and have special meaning to HLA:

 

* / + - ( ) [ ] { } < > : ; , . = ? & | ^ ! @

&&

||

<=

>=

<>

!=

==

:=

..

<<

>>

##

#(

)#

#{

}#

Reserved Words

Here are the HLA reserved words. You may not use any of these reserved words as HLA identifiers. HLA reserved words are case insensitive. That is, "MOV" and "mov" (as well as any permutation with resepect to case) both represent the HLA "mov" reserved word.

: HLA Reserved Words

 

 

 

 

 

 

#asm

#closeread

#closewrite

#code

#const

 

#else

#elseif

#emit

#endasm

#endfor

#endif

#endmacro

#endtext

#endwhile

 

#error

#for

#if

#include

#includeonce

#keyword

 

#macro

 

 

 

 

 

#openread

#openwrite

#print

#readonly

#static

#storage

#system

#terminator

#text

#while

#write

@a

@abs

@abstract

@addofs1st

@ae

@align

@alignstack

@arity

@b

@basereg

@be

@bound

@byte

@c

@cdecl

@ceil

@char

@class

@cos

@cset

@curdir

@curlex

@curobject

@curoffset

@date

@defined

@delete

@dim

@display

@dword

@e

@elements

@elementsize

@enter

@enumsize

@eos

@eval

@exactlynchar

@exactlyncset

@exactlynichar

@exactlyntomchar

@external

@exactlyntomcset

@exactlyntomichar

@exceptions

@exp

@extract

@filename

@firstnchar

@firstncset

@firstnichar

@floor

@frame

@g

@ge

@global

@index

@insert

@int8

@int16

@int32

@int64

@int128

@into

@isalpha

@isalphanum

@isclass

@isconst

@isdigit

@IsExternal

@isfreg

@islower

@ismem

@isreg

@isreg16

@isreg32

@isreg8

@isspace

@istype

@isupper

@isxdigit

@l

@lastobject

@le

@leave

@length

@lex

@linenumber

@localoffset

@localsyms

@log

@log10

@lowercase

@lword

@matchid

@matchintconst

@matchistr

@matchnumericconst

@matchrealconst

@matchstr

@matchstrconst

@matchtoistr

@matchtostr

@max

@min

@na

@nae

@name

@nb

@nbe

@nc

@ne

@ng

@nge

@nl

@nle

@no

@noalignstack

@nodisplay

@noenter

@noframe

@noleave

@norlesschar

@norlesscset

@norlessichar

@normorechar

@normorecset

@normoreichar

@nostorage

@np

@ns

@ntomchar

@ntomcset

@ntomichar

@nz

@o

@odd

@offset

@onechar

@onecset

@oneichar

@oneormorechar

@oneormorecset

@oneormoreichar

@oneormorews

@optstrings

@p

@parmoffset

@parms

@pascal

@pclass

@pe

@peekchar

@peekcset

@peekichar

@peekws

@po

@pointer

@ptype

@qword

@random

@randomize

@read

@reg

@reg16

@reg32

@reg8

@real32

@real64

@real80

@rindex

@returns

@s

@section

@sin

@size

@sqrt

@staticname

@stdcall

@strbrk

@string

@strset

@strspan

@substr

@tan

@text

@time

@tokenize

@tostring

@trace

@trim

@type

@typename

@uns8

@uns16

@uns32

@uns64

@uns128

@uppercase

@uptochar

@uptocset

@uptoichar

@uptoistr

@uptostr

@use

@volatile

@word

@wsoreos

@wstheneos

@z

@zeroormorechar

@zeroormorecset

@zeroormoreichar

@zeroormorews

@zerooronechar

@zerooronecset

@zerooroneichar

aaa

aad

aam

aas

abstract

adc

add

ah

al

align

and

anyexception

arpl

ax

begin

bh

bl

boolean

bound

bp

break

breakif

bsf

bsr

bswap

bt

btc

btr

bts

bx

byte

call

cbw

cdq

ch

char

cl

class

clc

cld

cli

clts

cmc

cmova

cmovae

cmovb

cmovbe

cmovc

cmove

cmovg

cmovge

cmovl

cmovle

cmovna

cmovnae

cmovnb

cmovnbe

cmovnc

cmovne

cmovng

cmovnge

cmovnl

cmovnle

cmovno

cmovnp

cmovns

cmovnz

cmovo

cmovp

cmovpe

cmovpo

cmovs

cmovz

cmp

cmpsb

cmpsd

cmpsw

cmpxchg

cmpxchg8b

const

continue

continueif

cpuid

cr0

cr1

cr2

cr3

cr4

cr5

cr6

cr7

cseg

cset

cwd

cwde

cx

daa

das

dec

dh

di

div

dl

do

dr0

dr1

dr2

dr3

dr4

dr5

dr6

dr7

dseg

dup

dword

dx

dx:ax

eax

ebp

ebx

ecx

edi

edx

edx:eax

else

elseif

emms

end

endclass

endfor

endif

endreadonly

endrecord

endstatic

endstorage

endtry

endunion

endwhile

enter

enum

eseg

esi

esp

exception

exit

exitif

external

f2xm1

fabs

fadd

faddp

fbld

fbstp

fchs

fclex

fcmova

fcmovae

fcmovb

fcmovbe

fcmove

fcmovna

fcmovnae

fcmovnb

fcmovnbe

fcmovne

fcmovnu

fcmovu

fcom

fcomi

fcomip

fcomp

fcompp

fcos

fdecstp

fdiv

fdivp

fdivr

fdivrp

ffree

fiadd

ficom

ficomp

fidiv

fidivr

fild

fimul

fincstp

finit

fist

fistp

fisub

fisubr

fld

fld1

fldcw

fldenv

fldl2e

fldl2t

fldlg2

fldln2

fldpi

fldz

fmul

fmulp

fnop

for

foreach

forever

forward

fpatan

fprem

fprem1

fptan

frndint

frstor

fsave

fscale

fseg

fsin

fsincos

fsqrt

fst

fstcw

fstenv

fstp

fstsw

fsub

fsubp

fsubr

fsubrp

ftst

fucom

fucomi

fucomip

fucomp

fucompp

fwait

fxam

fxch

fxtract

fyl2x

fyl2xp1

gseg

hlt

idiv

if

imod

imul

in

inc

inherits

insb

insd

insw

int

int16

int32

int8

intmul

into

invd

invlpg

iret

iretd

iterator

ja

jae

jb

jbe

jc

jcxz

je

jecxz

jf

jg

jge

jl

jle

jmp

jna

jnae

jnb

jnbe

jnc

jne

jng

jnge

jnl

jnle

jno

jnp

jns

jnz

jo

jp

jpe

jpo

js

jt

jz

label

lahf

lar

lazy

lds

lea

leave

les

lfs

lgdt

lgs

lidt

lldt

lock.adc

lock.add

lock.and

lock.btc

lock.btr

lock.bts

lock.cmpxchg

lock.dec

lock.inc

lock.neg

lock.not

lock.or

lock.sbb

lock.sub

lock.xadd

lock.xchg

lock.xor

lodsb

lodsd

lodsw

loop

loope

loopne

loopnz

loopz

lsl

lss

ltreg

method

mm0

mm1

mm2

mm3

mm4

mm5

mm6

mm7

mod

mov

movd

movq

movsb

movsd

movsw

movsx

movzx

mul

name

namespace

neg

nop

not

null

or

out

outsb

outsd

outsw

override

overrides

packssdw

packsswb

packuswb

paddb

paddd

paddsb

paddsw

paddusb

paddusw

paddw

pand

pandn

pavgb

pavgw

pcmpeqb

pcmpeqd

pcmpeqw

pcmpgtb

pcmpgtd

pcmpgtw

pextrw

pinsrw

pmaddwd

pmaxsw

pmaxub

pminsw

pminub

pmovmskb

pmulhuw

pmulhw

pmullw

pointer

pop

popa

popad

popf

popfd

por

procedure

program

psadbw

pshufw

pslld

psllq

psllw

psrad

psraw

psrld

psrlq

psrlw

psubb

psubd

psubsb

psubsw

psubusb

psubusw

psubw

punpckhbw

punpckhdq

punpckhwd

punpcklbw

punpckldq

punpcklwd

push

pusha

pushad

pushd

pushf

pushfd

pushw

pxor

qword

raise

rcl

rcr

rdmsr

rdpmc

rdtsc

readonly

real32

real64

real80

record

rep.insb

rep.insd

rep.insw

rep.movsb

rep.movsd

rep.movsw

rep.outsb

rep.outsd

rep.outsw

rep.stosb

rep.stosd

rep.stosw

repe.cmpsb

repe.cmpsd

repe.cmpsw

repe.scasb

repe.scasd

repe.scasw

repeat

repne.cmpsb

repne.cmpsd

repne.cmpsw

repne.scasb

repne.scasd

repne.scasw

repnz.cmpsb

repnz.cmpsd

repnz.cmpsw

repnz.scasb

repnz.scasd

repnz.scasw

repz.cmpsb

repz.cmpsd

repz.cmpsw

repz.scasb

repz.scasd

repz.scasw

result

ret

returns

rol

ror

rsm

sahf

sal

sar

sbb

scasb

scasd

scasw

segment

seta

setae

setb

setbe

setc

sete

setg

setge

setl

setle

setna

setnae

setnb

setnbe

setnc

setne

setng

setnge

setnl

setnle

setno

setnp

setns

setnz

seto

setp

setpe

setpo

sets

setz

sgdt

shl

shld

shr

shrd

si

sidt

sldt

sp

sseg

st0

st1

st2

st3

st4

st5

st6

st7

static

stc

std

sti

storage

stosb

stosd

stosw

streg

string

sub

tbyte

test

text

then

this

thunk

to

try

type

ud2

union

unit

unprotected

uns16

uns32

uns8

until

val

valres

var

verr

verw

vmt

wait

wbinvd

while

word

wrmsr

xadd

xchg

xlat

xmm0

xmm1

xmm2

xmm3

xmm4

xmm5

xmm6

xmm7

xor

Note that "@debughla" is also a reserved compiler symbol. However, this is intended for internal (HLA) debugging purposes only. When the compiler encounters this symbol, it immediately stops the compiler with an assertion failure. Obviously, you should never put this statement in your source code unless you're debugging HLA and you want to stop the compiler immediately after the compilation of some statement.

External Symbols and Assembler Reserved Words

HLA produces an assembly language file during compilation and invokes an assembler such as MASM to complete the compilation process. HLA automatically translates normal identifiers you declare in your program to beneign identifiers in the assembly language program. However, HLA does not translate EXTERNAL symbols, but preserves these names in the assembly language file it produces. Therefore, you must take care not to use external names that conflict with the underlying assembler's set of reserved words or that assembler will generate an error when it attempts to process HLA's output.

For a list of assembler reserved words, please see the documentation for the assembler you are using.

HLA Identifiers

HLA identifiers must begin with an alphabetic character or an underscore. After the first character, the identifier may contain alphanumeric and underscore symbols. There is no technical limit on identifier length in HLA, but you should avoid external symbols greater than about 32 characters in length since the assembler and linkers that process HLA identifiers may not be able to handle such symbols.

HLA identifiers are always case neutral. This means that identifiers are case sensitive insofar as you must always spell an identifier exactly the same (with respect to alphabetic case). However, you are not allowed to declare two identifiers whose only difference is alphabetic case.

Although technically legal in your program, do not use identifiers that begin and end with a single underscore. HLA reserves such identifiers for use by the compiler and the HLA standard library. If you declare such identifiers in your program, the possibility exists that you may interfere with HLA's or the HLA Standard Library's use of such a symbol.

By convention, HLA programmers use symbols beginning with two underscores to represent private fields in a class. So you should avoid such identifiers except when defining such private fields in your own classes.

External Identifiers

HLA lets you explicitly provide a string for external identifiers. External identifiers are not limited to the format for HLA identifiers. HLA allows any string constant to be used for an external identifier. It is your responsibility to use only those characters that are legal in the assembler that processes HLA's intermediate ASM file. Note that this feature lets you use symbols that are not legal in HLA but are legal in external code (e.g., Win32 APIs use the '@' character in identifiers and some non-HLA code may use HLA reserved words as identifiers). See the discussion of the @EXTERNAL option for more details.

Data Types in HLA

Native (Primitive) Data Types in HLA

HLA provides the following basic primitive types:

 

boolean One byte; zero represents false, one represents true.

Enum One byte; user defined IDs whose value ranges from 0 to 255.

Uns8 Unsigned values in the range 0..255.

Uns16 Unsigned integer values in the range 0..65535.

Uns32 Unsigned integer values in the range 0..4,204,967,295

Byte Generic eight-bit value.

Word Generic 16-bit value.

DWord Generic 32-bit value.

Int8 Signed integer values in the range -128..+127.

Int16 Signed integer values in the range -32768..+32767.

Int32 Signed integer values in the range -2,147,483,648..+2,147,483,647

Char Character values.

Real32 32-bit floating point values.

Real64 64-bit floating point values.

Real80 80-bit floating point values.

String Dynamic length string constants. (Run-time implementation: four-byte pointer.)

CSet A set of up to 128 different ASCII characters (16-byte bitmap).

Text Similar to string, but text constants expand in-place (like #define in C/C++).

Thunk A set of machine instructions to execute.

 

Often, it is convenient to discuss the types above in various groups. This document will often use the following terms:

 

Ordinal: boolean, enum, uns8, uns16, uns32, byte, word dword, int8, int16, int32, char.

Unsigned: uns8, uns16, uns32, byte, word dword.

Signed: int8, int16, int32, byte, word dword.

Number: uns8, uns16, uns32, int8, int16, int32, byte, word dword

Numeric: uns8, uns16, uns32, int8, int16, int32, byte, word dword, real32, real64, real80

Composite Data Types

In addition to the primitive types above, HLA supports arrays, records (structures), unions, classes, and pointers of the above types (except for text objects).

Array Data Types

HLA allows you to create an array data type by specifying the number of array elements after a type name. Consider the following HLA type declaration that defines intArray to be an array of int32 objects:

 

type intArray : int32[ 16 ];

The "[ 16 ]" component tells HLA that this type has 16 four-byte integers. HLA arrays use a zero-based index, so the first element is always element zero. The index of the last element, in this example, is 15 (total of 16 elements with indicies 0..15).

HLA also supports multidimensional arrays. You can specify multidimensional arrays by providing a list of indicies inside the square brackets, e.g.,

 

type intArray4x4 : int32[ 4, 4 ];
type intArray2x2x4 : int32[ 2,2,4 ];

The mechanism for accessing array elements differs depending upon whether you are accessing compile-time array constants or run-time array variables. A complete discussion of this will appear in later sections.

Union Data Types

HLA implements the discriminant union type using the UNION..ENDUNION reserved words. The following HLA type declaration demonstrates a union declaration:

 

type allInts: union
i8: int8;
i16: int16;
i32: int32;
endunion;

All fields in a union have the same starting address in memory. The size of a union object is the size of the largest field in the union. The fields of a union may have any type that is legal in a variable declaration section (see the discussion of the VAR section for more details).

Given a union object, say "i" of type "allInts", you access the fields of the union using the familiar dot-notation. The following 80x86 mov instructions demonstrate how to access each of the fields of the "i" variable:

 

mov( i.i8, al );
mov( i.i16, ax );
mov( i.i32, eax );

Unions also support a special field type known as an anonymous record (see the next section for a description of records). The syntax for an anonymous record in a union is the following:

type

 

unionWrecord: union

u1Field: byte;

u2Field: word;

u3Field: dword;

record

u4Field: byte[2];

u5Field: word[3];

endrecord;

u6Field: byte;

endunion;

 

Fields appearing within the anonymous record do not necessarily start at offset zero in the data structure. In the example above, u4Field starts at offset zero while u5Field immediately follows it two bytes later. The fields in the union outside the anonymous record all start at offset zero. If the size of the anonymous record is larger than any other field in the union, then the record's size determines the size of the union. This is true for the example above, so the union's size is 16 bytes since the anonymous record consumes 16 bytes.

Record Data Types9

HLA's records allow programmers to create data types whose fields can be different types. The following HLA type declaration defines a simple record with four fields:

 

type Planet: record

x: int32;
y: int32;
z: int32;
density: real64;

endrecord;

Objects of type Planet will consume 20 bytes of storage at run-time.

The fields of a record may be of any legal HLA data type including other composite data types. Like unions, anything that is legal in a VAR section is a legal field of a record. Also like unions, you use the dot-notation to access fields of a record object.

In addition to the VAR types, you may also declare anonymous unions within a record. An anonymous union is at union declaration without a fieldname associated with the union, e.g.,

 

type DemoAU: record
x: real32;
union
u1:int32;
r1:real32;
endunion;
y:real32;
endrecord;

In this example, x, u1, r1, and y are all fields of DemoAU. To access the fields of a variable D of type DemoAU, you would use the following names: D.x, D.u1, D.r1, and D.y. Note that D.u1 and D.r1 share the same memory locations at run-time, while D.x and D.y have unique addresses associated with them.

 

Record types may inherit fields from other record types. Consider the following two HLA type declarations:

 

type
Pt2D: record

x: int32;
y: int32;

endrecord;

Pt3D: record inherits( Pt2D )

z: int32;

endrecord;

In this example, Pt3D inherits all the fields from the Pt2D type. The "inherits" keyword tells HLA to copy all the fields from the specified record (Pt2D in this example) to the beginning of the current record declaration (Pt3D in this example). Therefore, the declaration of Pt3D above is equivalent to:

 

Pt3D: record

x: int32;
y: int32;
z: int32;

endrecord;

 

In some special situations you may want to override a field from a previous field declaration. For example, consider the following record declarations:

BaseRecord:

record

a: uns32;

b: uns32;

endrecord;

 

DerivedRecord:

record inherits( BaseRecord )

b: boolean; // New definition for b!

c: char;

endrecord;

 

Normally, HLA will report a "duplicate" symbol error when attempting to compile the declaration for "DerivedRecord" since the "b" field is already defined via the "inherits( BaseRecord )" option. However, in certain cases it's quite possible that the programmer wishes to make the original field inaccessible in the derived class by using a different name. That is, perhaps the programmer intends to actually create the following record:

DerivedRecord:

record

a: uns32; // Derived from BaseRecord

b: uns32; // Derived from BaseRecord, but inaccessible here.

b: boolean; // New definition for b!

c: char;

endrecord;

 

HLA allows a programmer explicitly override the definition of a particular field by using the OVERRIDES keyword before the field they wish to override. So while the previous declarations for DerivedRecord produce errors, the following is acceptable to HLA:

BaseRecord:

record

a: uns32;

b: uns32;

endrecord;

 

DerivedRecord:

record inherits( BaseRecord )

overrides b: boolean; // New definition for b!

c: char;

endrecord;

 

 

Normally, HLA aligns each field on the next available byte offset in a record. If you wish to align fields within a record on some other boundary, you may use the ALIGN directive to achieve this. Consider the following record declaration as an example:

type

AlignedRecord:

record

b:boolean; // Offset 0

c:char; // Offset 1

align(4);

d:dword; // Offset 4

e:byte; // Offset 8

w:word; // Offset 9

f:byte; // Offset 11

endrecord;

 

Note that variable "d" is aligned at a four-byte offset while "w" is not aligned. We can correct this problem by sticking another ALIGN directive in this record:

type

AlignedRecord2:

record

b:boolean; // Offset 0

c:char; // Offset 1

align(4);

d:dword; // Offset 4

e:byte; // Offset 8

align(2);

w:word; // Offset 10

f:byte; // Offset 12

endrecord;

 

Be aware of the fact that the ALIGN directive in a RECORD only aligns fields in memory if the record object itself is aligned on an appropriate boundary. For example, if an object of type AlignedRecord2 appears in memory at an odd address, then the "d" and "w" fields will also be misaligned (that is, they will appear at odd addresses in memory). Therefore, you must ensure appropriate alignment of any record variable whose fields you're assuming are aligned.

Note that the AlignedRecord2 type consumes 13 bytes. This means that if you create an array of AlignedRecord2 objects, every other element will be aligned on an odd address and three out of four elements will not be double-word aligned (so the "d" field will not be aligned on a four-byte boundary in memory). If you are expecting fields in a record to be aligned on a certain byte boundary, then the size of the record must be an even multiple of that alignment factor if you have arrays of the record. This means that you must pad the record with extra bytes at the end to ensure proper alignment. For the AlignedRecord2 example, we need to pad the record with three bytes so that the size is an even multiple of four bytes. This is easily achieved by using an ALIGN directive as the last declaration in the record:

type

AlignedRecord2:

record

b:boolean; // Offset 0

c:char; // Offset 1

align(4);

d:dword; // Offset 4

e:byte; // Offset 8

align(2);

w:word; // Offset 10

f:byte; // Offset 12

align(4) // Ensures we're padded to a multiple of four bytes.

endrecord;

 

Note that you should only use values that are integral powers of two in the ALIGN directive.

If you want to ensure that all fields are appropriately aligned on some boundary within the record, but you don't want to have to manually insert ALIGN directives throughout the record, HLA provides a second alignment option to solve your problem. Consider the following syntax:

type

alignedRecord3 : record[4]

<< Set of fields >>

endrecord;

 

The "[4]" immediately following the RECORD reserved word tells HLA to start all fields in the record at offsets that are multiples of four, regardless of the object's size (and the size of the objects preceeding the field). HLA allows any integer expression that produces a value in the range 1..4096 inside these parenthesis. If you specify the value one (which is the default), then all fields are packed (aligned on a byte boundary). For values greater than one, HLA will align each field of the record on the specified boundary. For arrays, HLA will align the field on a boundary that is a multiple of the array element's size. The maximum boundary HLA will round any field to is a multiple of 4096 bytes.

Note that if you set the record alignment using this syntactical form, any ALIGN directive you supply in the record may not produce the desired results. When HLA sees an ALIGN directive in a record that is using field alignment, HLA will first align the current offset to the value specified by ALIGN and then align the next field's offset to the global record align value.

Nested record declarations may specify a different alignment value than the enclosing record, e.g.,

type

alignedRecord4 : record[4]

a:byte;

b:byte;

c:record[8]

d:byte;

e:byte;

endrecord;

f:byte;

g:byte;

endrecord;

 

In this example, HLA aligns fields a, b, f, and g on dword boundaries, it aligns d and e (within c ) on eight-byte boundaries. Note that the alignment of the fields in the nested record is true only within that nested record. That is, if c turns out to be aligned on some boundary other than an eight-byte boundary, then d and e will not actually be on eight-byte boundaries; they will, however be on eight-byte boundaries relative to the start of c .

In addition to letting you specify a fixed alignment value, HLA also lets you specify a minimum and maximum alignment value for a record. The syntax for this is the following:

type

recordname : record[maximum : minimum]

<< fields >>

endrecord;

 

Whenever you specify a maximum and minimum value as above, HLA will align all fields on a boundary that is at least the minimum alignment value. However, if the object's size is greater than the minimum value but less than or equal to the maximum value, then HLA will align that particular field on a boundary that is a multiple of the object's size. If the object's size is greater than the maximum size, then HLA will align the object on a boundary that is a multiple of the maximum size. As an example, consider the following record:

type

r: record[ 4:1 ];

a:byte; // offset 0

b:word; // offset 2

c:byte; // offset 4

d:dword;[2] // offset 8

e:byte; // offset 16

f:byte; // offset 17

g:qword; // offset 20

endrecord;

 

Note that HLA aligns g on a dword boundary (not qword, which would be offset 24) since the maximum alignment size is four. Note that since the minimum size is one, HLA allows the f field to be aligned on an odd boundary (since it's a byte).

If an array, record, or union field appears within a record, then HLA uses the size of an array element or the largest field of the record or union to determine the alignment size. That is, HLA will align the field without the outermost record on a boundary that is compatible with the size of the largest element of the nested array, union, or record.

HLA sophisticated record alignment facilities let you specify record field alignments that match that used by most major high level language compilers. This lets you easily access data types used in those HLLs without resorting to inserting lots of ALIGN directives inside the record.

Note that there is a big difference in the semantics between the global record alignment option (above) and the similar syntax in the STATIC, READONLY , and STORAGE declaration sections. (which is why their syntax is different) Consider the following:

static(4)

v1: byte;

v2: dword;

 

Unlike the record alignment option, this example only aligns the first field of the STATIC section, not all the variables in that section (i.e., v2 will not be aligned on a dword boundary in the example above). Keep this difference in mind when using this alignment option.

When declaring record variables in a VAR, STATIC, READONLY, STORAGE , or SEGMENT declaration section, HLA associates the offset zero with the first field of a record. Each additional field in the record is assigned an offset corresponding to the sum of the sizes of all the prior fields. So in the example immediately above, "x" would have the offset zero, "y" would have the offset four, and "z" would have the offset eight.

If you would like to specify a different starting offset, you can use the following syntax for a record declaration:

 

Pt3D: record := 4;

x: int32;
y: int32;
z: int32;

endrecord;

The constant expression specified after the assignment operator (":=") specifies the starting offset of the first field in the record. In this example x, y, and z will have the offsets 4, 8, and 12, respectively.

Warning: setting the starting offset in this manner does not add padding bytes to the record. This record is still a 12-byte object. If you declare variables using a record declared in this fashion, you may run into problems because the field offsets do not match the actual offsets in memory. This option is intended primarily for mapping records to pre-existing data structures in memory. Only really advanced assembly language programmers should use this option.

Pointer Types

HLA allows you to declare a pointer to some other type using syntax like the following:

 

pointer to base_type

 

The following example demonstrates how to create a pointer to a 32-bit integer within the type declaration section:

 

type pi32: pointer to int32;

 

HLA pointers are always 32-bit (near32) pointers.

HLA also allows you to define pointers to existing procedures using syntax like the following:

procedure someProc( parameter_list );

<< procedure options, followed by @external, @forward, or procedure body>>

.

.

.

type

p : pointer to procedure someProc;

 

The p procedure pointer "inherits" all the parameters and other procedure options associated with the original procedure. This is really just shorthand for the following:

procedure someProc( parameter_list );

<< procedure options, followed by @external, @forward, or procedure body>>

.

.

.

type

p : procedure ( Same_Parameters_as_someProc ); <<same options as someProc>>

 

The former version, however, is easier to maintain since you don't have to keep the parameter lists and procedure options in sync.

Note that HLA provides the reserved word null (or NULL, reserved words are case insensitive) to represent the nil pointer. HLA replaces NULL with the value zero. The NULL pointer is compatible with any pointer type (including strings, which are pointers).

Thunks

A "thunk" is an eight-byte variable that contains a pointer to a piece of code to execute and an execution environment pointer (i.e., a pointer to an activation record). The code associated with a thunk is, essentially, a small procedure that (generally) uses the activation record of the surround code rather than creating its own activation record. HLA uses thunks to implement the iterator "yield" statement as well as pass by name and pass by lazy evaluation parameters. In addition to these two uses of thunks, HLA allows you to declare your own thunk objects and use them for any purpose you desire. To declare a thunk variable is easy, just use a declaration like the following in a VAR or STATIC section:

 

thunkVar: thunk;

 

This declaration reserves eight bytes of storage. The first dword holds the address of the code to execute, the second dword holds a pointer to the activation record to load into EBP when the thunk executes.

Of course, like almost any pointer variable, declaring a thunk variable is the easy part; the hard part is making sure the thunk variable is initialized before attempting to call the thunk. While you could manually load the address of some code and the frame pointer value into a thunk variable, HLA provides a better syntax for initializing thunks with small code fragments: the "thunk" statement. The "thunk" statement uses the following syntax:

 

thunk thunkVar := #{ sequence_of_statements }#;

 

Consider the following example:

 

program ThunkDemo;

#include( "stdio.hhf" );

 

procedure proc1;

var

i: int32;

p1Thunk: thunk;

procedure proc2( t:thunk );

var

i:int32;

begin proc2;

mov( 25, i );

t();

stdout.put( "Inside proc2, i=", i, nl );

end proc2;

begin proc1;

 

thunk p1Thunk := #{ mov( 0, i ); }#;

mov( 1, i );

proc2( p1Thunk );

stdout.put( "i=", i, nl );

 

end proc1;

begin ThunkDemo;

 

proc1();

end ThunkDemo;

 

In this example, proc1 has two local variables, i and p1Thunk . The THUNK statement initializes the p1Thunk variable with the address of some code that moves a zero into the i variable. The THUNK statement also initializes p1Thunk with a pointer to the current activation record (that is, a pointer to proc1 's activation record). Then proc1 calls proc2 passing p1Thunk as a parameter.

The proc2 routine has its own local variable named i . Of course, this is a different variable than the i in proc1 . Proc2 begins by setting its variable i to the value 25. Then proc2 invokes the thunk (passed to it as a parameter). This thunk sets the variable i to zero. However, since the thunk uses the current activation record when the set statement was executed, this statement sets proc1 's i variable to zero rather than proc2 's i variable. This program produces the following output:

 

Inside proc2, i=25

i=0

 

Although you probably won't use thunks that often, they are quite nice for deferred execution. This is especially useful in AI (Artificial Intelligence) programs.

Class Types

Classes and object-oriented programming are the subject of a later section of this document. See See Class Data Types for more details.

Literal Constants

Literal constants are those language elements that we normally think of as non-symbolic constant objects. HLA supports a wide variety of literal constants. The following sections describe those constants.

Numeric Constants

HLA lets you specify several different types of numeric constants.

Decimal Constants

The first and last characters of a decimal integer constant must be decimal digits (0..9). Interior positions may contain decimal digits and underscores. The purpose of the underscore is to provide a better presentation for large decimal values (i.e., use the underscore in place of a comma in large values). Example: 1_234_265.

Note: Technically, HLA does not allow negative literal integer constants. However, you can use the unary "-" operator to negate a value, so you'd never notice this omission (e.g., -123 is legal, it consists of the unary negation operator followed by a positive decimal literal constant). Therefore, HLA always returns type unsXX for all literal decimal constants. Also note that HLA always uses a minimum size of uns32 for literal decimal constants. If you absolutely, positively, want a literal constant to be treated as some other type, use one of the compile-time type coercion functions to change the type (e.g., uns8(1), word(2), or int16(3)). Generally, the type that HLA uses for the object is irrelevant since HLA will automatically promote a value to a larger or smaller type as appropriate.

Here are the following ranges for the various HLA unsigned data types:

uns8: 0..255

uns16: 0..65,535

uns32: 0..4,294,967,295

uns64: 0..18,446,744,073,709,551,615

uns128: 0..340,282,366,920,938,463,463,374,607,431,768,211,455

Hexadecimal Constants

Hexadecimal literal constants must begin with a dollar sign ("$") followed by a hexadecimal digit and must end with a hexadecimal digit (0..9, A..F, or a..f). Interior positions may contain hexadecimal digits or underscores. Hexadecimal constants are easiest to read if each group of four digits (starting from the least significant digit) is separated from the others by an underscore. E.g., $1A_2F34_5438.

If the constant fits into 32 bits or less, HLA always returns the dword type for a hexadecimal constant. For larger values, HLA will automatically use the qword or lword type, as appropriate. If you would like the hexadecimal value to have a different type, use one of the HLA compile-time type coercion functions to change the type (e.g., byte($12) or qword($54)).

Here are the following ranges for the various HLA hexadecimal data types:

uns8: 0..$FF

uns16: 0..$FFFF

uns32: 0..$FFFF_FFFF

uns64: 0..$FFFF_FFFF_FFFF_FFFF

uns128: 0..$FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF

Binary Constants

Binary literal constants begin with a percent sign ("%") followed by at least one binary digit (0/1) and they must end with a binary digit. Interior positions may contain binary digits or underscore characters. Binary constants are easiest to read if each group of four digits (starting from the least significant digit) is separated from the others by an underscore. E.g., %10_1111_1010.

Like hexadecimal constants, HLA always associates the type dword with a "raw" binary constant; it will use the qword or lword type if the value is greater than 32 bits or 64 bits (respectively). If you want HLA to use a different type, use one of the compile-time type coercion functions to achieve this.

Obviously, binary constants may have as many binary digits as there are bits in the underlying type. This document will not attempt to write out the maximum binary literal constant!

Numeric Set Constants

HLA provides a special numeric constant form that lets you specify a numeric value by the bit positions containing ones. This corresponds to a powerset of integer values in the range 0..31. These constants take the following form:

@{ comma_separated_list_of_digits }

 

The comma_separate_list_of_digits can be empty (signifying no set bits, i.e., the value zero), a single digit, or a set of digits separated by commas. Here are some examples:

@{}

@{8}

@{1,2,8,24}

 

The corresponding numeric constant is given the type dword and is assigned the value that has ones in all the specified bit positions. For example, "@{8}" is equal to 256 since this value has a single set bit in bit position eight. Note that "@{0}" equals one, not zero (because the value one has a single set bit in position zero).

Real (Floating Point) Constants

Floating point (real) literal constants always begin with a decimal digit (never just a decimal point). A string of one or more decimal digits may be optionally followed by a decimal point and zero or more decimal digits (the fractional part). After the optional fractional part, a floating point number may be followed by "e" or "E", a sign ("+" or "-"), and a string of one or more decimal digits (the exponent part). Underscores may appear between two adjacent digits in the floating point number; their presence is intended to substitute for commas found in real-world decimal numbers.

Examples:

1.2

2.345e-2

0.5

1.2e4

2.3e+5

1_234_567.0

 

Literal real constants are always 80 bits and have the default type real80 . If you wish to specify real32 or real64 literal constants, then use the real32 or real64 compile-time coercion functions to convert the values, e.g., real32( 3.14159 ) . During compile time, it's rare that you'd want to use one of the smaller types since they are less accurate at representing floating point values (although this might be precisely why you decide to use the smaller real type, so the accuracy matches the computations you're doing at run-time).

The range of real32 constants is approximately 10±38 with 6-1/2 digits of precision; the range of real64 values is approximately 10±308 with approximately 14-1/2 digits of precision, and the range of real80 constants is approximately 10±4096 with about 18 digits of precision.

Boolean Constants

Boolean constants consist of the two predefined identifiers true and false. Note that your program may redefine these identifiers, but doing so is incredibly bad programming style. Since these are actual identifiers in the symbol table (and not reserved words), you must spell these identifiers in all lower case or HLA will complain (unlike reserved words that are case insensitive).

Internally, HLA represents true with one and false with zero. In fact, HLA's boolean operations only look at bit #0 of the boolean value (and always clear the other bits). HLA compile-time statements that expect a boolean expression do not use zero/not zero like C/C++ and a few other languages. Such expressions must have a boolean type and, again.

Character Constants

Character literals generally consist of a single (graphic) character surrounded by apostrophes. To represent the apostrophe character, use four apostrophies, e.g., `'''.

Another way to specify a character constant is by typing the "#" symbol followed by a numeric literal constant (decimal, hexadecimal, or binary). Examples: #13, #$D, #%1101.

Unicode Character Constants

Unicode character constants are 16-bit values. HLA provides limited support for Unicode literal constants. HLA supports the UTF/7 code point (character set) which is just the standard seven-bit ASCII character set and nine high-order zero bits. To specify a 16-bit literal Unicode constant simply prefix a standard ASCII literal constant with a 'u' or 'U', e.g.,

u'A' - UTF/7 character constant for 'A'

Note that UTF/7 constants are simply the ASCII character codes zero extended to 16 bits.

HLA provides a second syntax for Unicode character constants that lets you enter values whose character codes are outside the range $20..$7E. You can specify a Unicode character constant by its numeric value using the 'u#nnnn' constant form. This form lets you specify a 16-bit value following the '#' in either binary, decimal, or hexadecimal form, e.g.,

u#1233

u#$60F

u%100100101001

String Constants

String literal constants consist of a sequence of (graphic) characters surrounded by quotes. To embed a quote within a string, insert a pair of quotes into the string, e.g., "He said ""This"" to me."

If two string literal constants are adjacent in a source file (with nothing but whitespace between them), then HLA will concatenate the two strings and present them to the parser as a single string. Furthermore, if a character constant is adjacent to a string, HLA will concatenate the character and string to form a single string object. This is useful, for example, when you need to embed control characters into a string, e.g.,

 

"This is the first line" #$d #$a "This is the second line" #$d #$a

 

HLA treats the above as a single string with a Wndows newline sequence (CR/LF) at the end of each of the two lines of text.

Unicode String Constants

HLA lets you specify Unicode string literals by prefixing a standard string constant with a 'u' or a 'U'. Such string constants use the UTF/7 character set (that is, the standard ASCII character set) but reserve 16 bits for each character in the string. Note that the high order nine bits of each character in the string will contain zero.

As this was being written, there is no support for Unicode strings in the HLA Standard Library, though support for Unicode string functions should appear shortly (note that Windows' programmers can call the Unicode string functions that are part of the Windows' API).

Character Set Constants

A character set literal constant consists of several comma delimited character set expressions within a pair of braces. The character set expressions can either be individual character values or a pair of character values separated by an ellipse (".."). If an individual character expression appears within the character set, then that character is a member of the set; if a pair of character expressions, separated by an ellipse, appears within a character set literal, then all characters between the first such expression and the second expression are members of the set.

Examples:

{`a','b','c'} // a, b, and c.

{`a'..'c'} // a, b, and c.

{`A'..'Z','a'..'z'} //Alphabetic characters.

{` `,#$d,#$a,#$9} //Whitespace (space, return, linefeed, tab).

 

HLA character sets are currently limited to holding characters from the 128-character ASCII character set. In the future, HLA may support an xcset type (supporting 256 elements) or even wcset (wide character sets), but that support does not currently exist.

Structured Constants

Array Constants

Note: see See Array Data Types for more details about HLA array types.

HLA lets you specify an array literal constant by enclosing a set of values within a pair of square brackets. Since array elements must be homogenous, all elements in an array literal constant must be the same type or conformable to the same type. Examples:

 

[ 1, 2, 3, 4, 9, 17 ]
[ 'a', 'A', 'b', 'B' ]
[ "hello", "world" ]

Note that each item in the list of values can actually be a constant expression, not a simple literal value.

 

HLA array constants are always one dimensional. This, however, is not a limitation because if you attempt to use array constants in a constant expression, the only thing that HLA checks is the total number of elements. Therefore, an array constant with eight integers can be assigned to any of the following arrays:

 

const
a8: int32[8] := [1,2,3,4,5,6,7,8];
a2x4: int32[2,4] := [1,2,3,4,5,6,7,8];
a2x2x2: int32[2,2,2] := [1,2,3,4,5,6,7,8];

Although HLA doesn't support the notation of a multi-dimensional array constant, HLA does allow you to include an array constant as one of the elements in an array constant. If an array constant appears as a list item within some other array constant, then HLA expands the interior constant in place, lengthening the list of items in the enclosing list. E.g., the following three array constants are equivalent:

 

[ [1,2,3,4], [5,6,7,8] ]
[ [ [1,2], [3,4] ], [ [5,6], [7,8] ] ]
[1,2,3,4,5,6,7,8]

Although the three array constants are identical, as far as HLA is concerned, you might want to use these three different forms to suggest the shape of the array in an actual declaration, e.g.,

const
a8: int32[8] := [1,2,3,4,5,6,7,8];
a2x4: int32[2,4] := [ [1,2,3,4], [5,6,7,8] ];
a2x2x2: int32[2,2,2] := [[[1,2], [3,4] ], [[5,6], [7,8]]];

Also note that symbol array constants, not just literal array constants, may appear in a literal array constant. For example, the following literal array constant creates a nine-element array holding the values one through nine at indexes zero through eight:

 

const Nine: int32[ 9 ] := [ a8, 9 ];

 

This assumes, of course, that "a8" was previously declared as above. Since HLA "flattens" all array constants, you could have substituted a2x4 or ax2x2x for a8 in the example above and obtained identical results.

You may also create an array constant using the HLA DUP operator. This operator uses the following syntax:

expression DUP [expression_to_replicate]

 

Where expression is an integer expression and expression_to_replicate is a some expression, possibly an array constant. HLA generates an array constant by repeating the values in the expression_to_replicate the number of times specified by the expression. (Note: this does not create an array with expression elements unless the expression_to_replicate contains only a single value; it creates an array whose element count is expression times the number of items in the expression_to_replicate ). Examples:

 

10 dup [1] -- equivalent to [1,1,1,1,1,1,1,1,1,1]

5 dup [1,2] -- equivalent to [1,2,1,2,1,2,1,2,1,2]

 

Please note that HLA does not allow class constants, so class objects may not appear in array constants. Also, HLA does not allow generic pointer constants, only certain types of pointer constants are legal. See the discussion of pointer constants for more details.

Record Constants

Note: see See Record Data Types for details about HLA Records.

HLA supports record constants using a syntax very similar to array constants. You enclose a comma-separated list of values for each field in a pair of square brackets. To further differentiate array and record constants, the name of the record type and a colon must precede the opening square bracket, e.g.,

 

Planet:[ 1, 12, 34, 1.96 ]

 

HLA associates the items in the list with the fields as they appear in the original record declaration. In this example, the values 1, 12, 34, and 1.96 are associated with fields x, y, z, and density, respectively. Of course, the types of the individual constants must match (or be conformable to) the types of the individual fields.

Note that you may not create a record constant for a particular record type if that record includes data types that cannot have compile-time constants associated with them. For example, if a field of a record is a class object, you cannot create a record constant for that type since you cannot create class constants.

Union Constants

Note: see See Union Data Types for more details about HLA's UNION types.

Starting with HLA v1.38, HLA supports union constants. These union constants allow you to initialize static union data structures in memory as well as initialize union fields of other data structures (including anonymous union fields in records). This section describes the syntax you'll use to create union constants.

There are some important differences between HLA compile-time union constants and HLA run-time unions (as well as between the HLA run-time union constants and unions in other languages). Therefore, it's a good idea to begin the discussion of HLA's union constants with a description of these differences.

There are a couple of different reasons people use unions in a program. The original reason was to share a sequence of memory locations between various fields whose access is mutually exclusive. When using a union in this manner, one never reads the data from a field unless they've previous written data to that field and there are no intervening writes to other fields between that previous write and the current read. The HLA comile-time language fully (and only) supports this use of union objects.

A second reason people use unions (especially in high level languages) is to provide aliases to a given memory location; particularly, aliases whose data types are different. In this mode, a programmer might write a value to one field and then read that data using a different field (in order to access that data's bit representation as a different type). HLA does not support this type of access to union constants. The reason is quite simple: internally, HLA uses a special "variant" data type to represent all possible constant types. Whenever you create a union constant, HLA lets you provide a value for a single data field. From that point forward, HLA effectively treats the union constant as a scalar object whose type is the same as the field you've initialized; access to the other fields through the union constant is no longer possible. Therefore, you cannot use HLA compile-time constants to do type coercion; nor is there any need to since HLA provides a set of type coercion operators like @byte, @word, @dword, @int8, etc. As noted above, the main purpose for providing HLA union constants is to allow you to initialize static union variables; since you can only store one value into a memory location at a time, union constants only need to be able to represent a single field of the union at one time (of course, at run-time you may access any field of the static union object you've created; but at compile-time you may only access the single field associated with a union constant).

An HLA literal union constant takes the following form:

typename.fieldname:[ constant_expression ]

 

The typename component above must be the name of a previously declared HLA union data type (i.e., a union type you've created in the type section). The fieldname component must be the name of a field within that union type. The constant_expression component must be a constant value (expression) whose type is the same as, or is automatically coercable to, the type of the fieldname field. Here is a complete example:

type

u:union

b:byte;

w:word;

d:dword;

q:qword;

endunion;

 

static

uVar :u := u.w:[$1234];

 

The declaration for uVar initializes the first two bytes of this object in memory with the value $1234. Note that uVar is actually eight bytes long; HLA automatically zeros any unused bytes when initializing a static memory object with a union constant.

Note that you may place a literal union constant in records, arrays, and other composite data structures. The following is a simple example of a record constant that has a union as one of its fields:

type

r :record

b:byte;

uf:u;

d:dword;

endrecord;

 

static

sr :r := r:[0, u.d:[$1234_5678], 12345];

 

In this example, HLA initializes the sr variable with the byte value zero, followed by a dword containing $1234_5678 and a dword containing zero (to pad out the remainder of the union field), followed by a dword containing 12345.

You can also create records that have anonymous unions in them and then initialize a record object with a literal constant. Consider the following type declaration with an anonymous union:

type

rau :record

b:byte;

union

c:char;

d:dword;

endunion;

w:word;

endrecord;

 

Since anonymous unions within a record do not have a type associated with them, you cannot use the standard literal union constant syntax to initialize the anonymous union field (that syntax requires a type name). Instead, HLA offers you two choices when creating a literal record constant with an anonymous union field. The first alternative is to use the reserved word union in place of a typename when creating a literal union constant, e.g.,

static

srau :rau := rau:[ 1, union.d:[$12345], $5678 ];

 

The second alternative is a shortcut notation. HLA allows you to simply specify a value that is compatible with the first field of the anonymous union and HLA will assign that value to the first field and ignore any other fields in the union, e.g.,

static

srau2 :rau := rau:[ 1, 'c', $5678 ];

 

This is slightly dangerous since HLA relaxes type checking a bit here, but when creating tables of record constants, this is very convenient if you generally provide values for only a single field of the anonymous union; just make sure that the commonly used field appears first and you're in business.

Although HLA allows anonymous records within a union, there was no syntactically acceptable way to differentiate anonymous record fields from other fields in the union; therefore, HLA does not allow you to create union constants if the union type contains an anonymous record. The easy workaround is to create a named record field and specify the name of the record field when creating a union constant, e.g.,

 

type

r :record

c:char;

d:dword;

endrecord;

 

u :union

b:byte;

x:r;

w:word;

endunion;

 

static

y :u := u.x:[ r:[ 'a', 5]];

 

You may declare a union constant and then assign data to the specific fields as you would a record constant. The following example provides some samples of this:

type

u_t :union

b:byte;

x:r;

w:word;

endunion;

 

val

u :u_t;

.

.

.

?u.b := 0;

.

.

.

?u.w := $1234;

 

The two assigments above are roughly equivalent to the following:

?u := u_t.b:[0];

 

and

 

?u := u_t.w:[$1234];

 

However, to use the straight assignment (the former example) you must first declare the value u as a u_t union.

To access a value of a union constant, you use the familiar "dot notation" from records and other languages, e.g.,

?x := u.b;

.

.

.

?y := u.w & $FF00;

 

Note, however, that you may only access the last field of the union into which you've stored some value. If you store data into one field and attempt to read the data from some other field of the union, HLA will report an error. Remember, you don't use union constants as a sneaky way to coerce one value's type to another (use the coercion functions for that purpose).

 

Pointer Constants

Note: see See Pointer Types for more details about HLA pointer types.

HLA allows a very limited form of a pointer constant. If you place an ampersand in front of a static object's name (i.e., the name of a static variable, readonly variable, uninitialized variable, segment variable, procedure, method, or iterator), HLA will compute the run-time offset of that variable. Pointer constants may not be used in abitrary constant expressions. You may only use pointer constants in expressions used to initialize static or readonly variables or as constant expressions in 80x86 instructions. The following example demonstrates how pointer constants can be used:

 

program pointerConstDemo;

static
t:int32;
pt: pointer to int32 := &t;

begin pointerConstDemo;

mov( &t, eax );

end pointerConstDemo;

Also note that HLA allows the use of the reserved word NULL anywhere a pointer constant is legal. HLA substitutes the value zero for NULL.

Constant Expressions in HLA

HLA provides a rich expression evaluator to process assembly-time expressions. HLA supports the following operators (sorting by decreasing precedence):

 

! (unary not), - (unary negation)

*, div, mod, /, <<, >>

+, -

=, = =, <>, !=, <=, >=, <, >

&, |, &, in

 

 

The following subsections describe each of these operators in detail.

Type Checking and Type Promotion

Many dyadic (two-operand) operators expect the types of their operands to be the same. Prior to actually performing such an operation, HLA evaluates the types of the operands and attempts to make them compatible. HLA uses a type algebra to determine if two (different) types are compatible; if they are not, HLA will report a type mismatch error during assembly. If the types are compatible, HLA will make them identical via type promotion. The type algebra describes how HLA promotes one type to another in order to make the two types compatible.

Usually, you can state a type algebra easily enough by providing "algebraic" type equations. For example, in high level languages one could use a statement like "r = r + i" to suggest that the type of the resulting sum is real when the left operand is real and the right operand is integer (around the "+" operator). Unfortunately, HLA supports so many different data types and operators that any attempt to describe the type algebra in this fashion will produce so many equations that it would be difficult for the reader to absorb it all. Therefore, this document will rely on an informal English description of the type algebra to explain how HLA operates.

First of all, if two operands have the same basic type, but are different sizes, HLA promotes the smaller object to the same size as the larger object. Basic types include the following sets: {uns8, uns16, uns32, uns64, uns128}, {int8, int16, int32, int64, int128}, {byte, word, dword, qword, lword}, and {real32, real64, real80}10. So if any two operands appear from one of these sets, then both operands are promoted to the larger of the two types.

If an unsigned and a signed operand appear around an operator, HLA produces a signed result. If the unsigned operand is smaller than the signed operand, HLA assigns both operands the signed type prior to the operation. If the unsigned and signed operands are the same size (or the unsigned operand is larger), HLA will first check the H.O. bit of the unsigned operand. If it is set, then HLA promotes the unsigned operand to the next larger signed type (e.g., uns16 becomes int32 ). If the resulting signed type is larger than the other operand's type, it gets promoted as well. This scheme fails if you've got an uns128 value whose H.O. bit is set. In that case, HLA promotes both operands to int128 and will produce incorrect results (since the uns128 value just went negative when it's really positive). Therefore, you should attempt to limit unsigned values to 127 bits if you're going to be mixing signed and unsigned operations in the same expression.

Any mixture of hexadecimal types (byte, word, dword, qword, or lword) and an unsigned type produces an unsigned type; the size of the resulting unsigned type will be the larger of the two types. Likewise, any mixture of hexadecimal types and signed integer types will produce a signed integer whose size is the larger of the two types. This "strengthening" of the type (hexadecimal types are "weaker" than signed or unsigned types) may seem counter-intuitive to a die-hard assembly programmer; however, making the result type hexadecimal rather than signed/unsigned can create problems if the result has the H.O. bit set since information about whether the result is signed or unsigned would be lost at that point.

Mixing unsigned values and a real32 value will produce a real32 result or an error. HLA produces an error if the unsigned value requires more than 24 bits to represent exactly (which is the largest unsigned value you may represent within the real32 format). Note that in addition to promoting the unsigned type to real32 , HLA will also convert the unsigned value to a real32 value (promoting the type is not the same thing as converting the value; e.g., promoting uns8 to uns16 simply involves changing the type designation of the uns8 object, HLA doesn't have to deal with the actual value at all since it keeps all values in an internal 128 bit format; however, the binary representation for unsigned and real32 values is completely different, so HLA must do the value conversion as well). Note that if you really want to convert a value that requires more than 24 bits of precision to a real32 object (with truncation), just convert the unsigned operand to real64 or real80 and then convert the larger operand to real32 using the real32(expr) compile-time function. Since unsigned values are, well, unsigned and real32 objects are signed, the conversion process always produces a non-negative value.

Mixing signed and real32 values in an expression produces a real32 result. Like unsigned operands, signed operands are limited to 24 bits of precision or HLA will report an error. Technically, you should get one more bit of precision from signed operands (since the real32 format maintains its sign apart from the mantissa), but HLA still limits you to 24 bits during this conversion. If the signed integer value is negative, so will be the real32 result.

If you mix hexadecimal and real32 types, HLA treats the hexadecimal type as an unsigned value of the same size. See the discussion of unsigned and real32 values earlier for the details.

If you mix an unsigned, signed, or hexadecimal type with a real64 type, the result is an error (if HLA cannot exactly represent the value in real64 format) or a real64 result. The conversion is very similar to the real32 conversion discussed above except you get 52 bits of integer precision rather than only 24 bits.

If you mix an unsigned, signed, or hexadecimal type with a real80 type, the result is an error (if HLA cannot exactly represent the value in real80 format) or a real80 result. The conversion is very similar to the real32 conversion discussed above except you get 64 bits of integer precision rather than only 24 bits. Note that conversion of integer objects 64-bits or less will always proceed without error; 128-bit values are the only ones that will get you into trouble.

If you mix a pair of different sized real values in the same expression, HLA will promote (and convert) the smaller real value to the same size as the larger real value.

The only non-numeric promotions that take place in an expression are between characters and strings. If a character and a string both appear in an expression, HLA will promote the character to a string of length one11.

HLA will report a type mismatch error if objects of any other types appear within an expression. Note that you may use the type-coercion compile-time functions to convert between types that HLA does not automatically support in an expression.

!expr

The expression must be either boolean or a number. For boolean values, not computes the standard logical not operation. Numerically, HLA inverts only the L.O. bit of boolean values and clears the remaining bits of the boolean value. Therefore, the result is always zero or one when NOTting a boolean value (even if the boolean object errantly contained other set bits prior to the "!" operation). Remember, the "!" operator only looks at the L.O. bit; if the value was originally non-zero but the L.O. bit was clear12, then "!" produces true. This is not a zero/not-zero operation.

For numbers, not computes the bitwise not operation on the bits of the number, that is, it inverts all the bits. The exact semantics of this operation depend upon the original data type of the value you're inverting. Therefore, the result of applying the "!" operator to an integer number may not always be intuitive because HLA always maintains 128-bits of precision, regardless of the underlying data type. Therefore, a full explanation of this operator's semantics must be given on a type-by-type basis.

uns8 : Bits 8..127 of an Uns8 object are always zero. Therefore, when you apply the "!" operator to an Uns8 value, the result can no longer be an Uns8 object since bits 8..127 will now contain ones. Zeroing out the H.O. bits is not wise, because you could be assigning the result of this expression to a larger data type and you may very well expect those bits to be set. Therefore, HLA converts the type of "!u8expr" to type byte (which does allow the H.O. bits to contain non-zero values). If you assign an object of type byte to a larger object (e.g., type word ), HLA will copy over the H.O. bits from the byte object to the larger object. Example:

val

u8 :uns8 := 1;

b8 := !u8; // produces $FFF..FFFE but registers as byte $FE.

w16 :word := b8; // produces $FF..FFFE but registers as word $FFFE.

 

Note: If you really want to chop the value off at eight bits, you can use the compile-time byte function to achieve this, e.g.,

val

u8 :uns8 := 1;

b8 := byte(!u8); // produces $FE.

w16 :word := b8; // produces $00FE.

 

 

uns16 : The semantics are similar to uns8 except, of course, applying "!" to an uns16 value produces a word value rather than a byte value. Again, the "!" operator will set bits 16..127 to one in the final result. If you want to ensure that the final result contains no set bits beyond bit #15, use the compile-time word function to strip the value down to 16 bits (just like the byte function in the example above).

uns32 : The semantics are similar to uns8 except, of course, applying "!" to an uns32 value produces a dword value rather than a byte value. Again, the "!" operator will set bits 32..127 to one in the final result. If you want to ensure that the final result contains no set bits beyond bit #31 use the compile-time dword function to strip the value down to 32 bits (just like the byte function in the example above).

uns64 : The semantics are similar to uns8 except, of course, applying "!" to an uns64 value produces a qword value rather than a byte value. Again, the "!" operator will set bits 64..127 to one in the final result. If you want to ensure that the final result contains no set bits beyond bit #63 use the compile-time qword function to strip the value down to 64 bits (just like the byte function in the example above).

uns128 : Applying the "!" operator to an uns128 object simply inverts all the bits. There are no funny semantics here. Resulting expression type is set to lword .

int8 : Same semantics as byte (see explanation below).

int16 : Same semantics as word (see explanation below).

int32 : Same semantics as dword (see explanation below).

int64 : Same semantics as qword (see explanation below).

int128 : Applying the "!" operator to an int128 object simply inverts all the bits. There are no funny semantics here. Resulting expression type is set to lword .

byte : Bits 8..127 of a byte ( int8 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..7 in the original value and returns this inverted result. Note that the type of the new value is always byte (even if the original subexpression was int8 ).

word : Bits 16..127 of a word ( int16 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..15 in the original value and returns this inverted result. Note that the type of the new value is always word (even if the original subexpression was int16 ).

dword : Bits 32..127 of a d word ( int32 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..31 in the original value and returns this inverted result. Note that the type of the new value is always d word (even if the original subexpression was int32 ).

qword : Bits 64..127 of a q word ( int64 ) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero, the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the "!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits 0..63 in the original value and returns this inverted result. Note that the type of the new value is always q word (even if the original subexpression was int64 ).

lword : Applying the "!" operator to an lword object simply inverts all the bits. There are no funny semantics here..

No other types are legal with the "!" operator. HLA will report a type conflict error if you attempt to apply this operator to some other type.

If the operand is one of the integer types (signed, unsigned, hexadecimal), then HLA will set the type of the result to the smallest type within that class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values).

- expr (unary negation operator)

The expression must either be a numeric value or a character set. For numeric values, "-" negates the value. For character sets, the "-" operator computes the complement of the character set (that is, it returns all the characters not found in the set).

Again, the exact semantics depend upon the type of the expression you're negating. The following paragraphs explain exactly what this operator does to its expression. For all integer values (unsXX, intXX, byte, word, dword, qword, and lword), the negation operator always does a full 128-bit negation of the supplied operand. The difference between these different data types is how HLA sets the resulting type of the expressions (as explained in the paragraphs below).

uns8 : If the original value was in the range 128..255, then the resulting type is int16 , otherwise the resulting type is int8 . Since uns8 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns16 : If the original value was in the range 32678..65535, then the resulting type is int32 , otherwise the resulting type is int16 . Since uns16 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns32 : If the original value was in the range $8000_0000..$FFFF_FFFF, then the resulting type is int64 , otherwise the resulting type is int32 . Since uns32 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns64 : If the original value was in the range $8000_0000_0000_0000..$FFFF_FFFF_FFFF_FFFF, then the resulting type is int128 , otherwise the resulting type is int64 . Since uns64 values are always positive, the negated result is always negative, hence the result type is always a signed integer type.

uns128 : The result type is always set to int128 . Note that there is no check for overflow. Effectively, HLA treats uns128 operands as though they were int128 operands with respect to negation. So really large positive ( uns128 ) values become smaller unsigned values after the negation. If you need to mix and match 128-bit values in an expression, you should attempt to limit your unsigned values to 127 bits.

byte, int8,

word, int16,

dword, int32,

qword, int64,

lword,

int128: Negates the expression (full 128 bits) and assigns the original expression type to the result.

real32 : Negates the real32 value and returns a real32 result.

real64 : Negates the real64 value and returns a real64 result.

real64 : Negates the real64 value and returns a real64 result.

cset : Computes the set complement (returns cset type). The set complement is all the items that were not previously in the set. Since HLA uses a bitmap representation for character sets, the complement of a character set is the same thing as inverting all the bits in the powerset.

If the operand is one of the integer types (signed, unsigned, hexadecimal), then HLA will set the type of the result to the smallest type within that class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values).

expr1 * expr2

For numeric operands, the "*" operator produces their product. For character set operands, the "*"operator produces the intersection of the two sets. The exact result depends upon the types of the two operands to the "*" operator. To begin with, HLA attempts to make the types of the two operands identical if they are not already identical. HLA achives this via type promotion (see the discussion earlier).

If the operands are unsigned or hexadecimal operands, HLA will compute their unsigned product. If the operands are signed, HLA computes their signed product. If the operands are real, HLA computes their real product. If the operands are integer (signed or unsigned) and less than (or equal to) 64 bits, HLA computes their exact result. If the operands are greater than 64 bits and their product would require more than 128 bits, HLA quietly overflows without error. Note that HLA always performs a 128-bit multiplication, regardless of the operands' sizes; however, objects that require 64 bits or less of precision will always produce a product that is 128 bits or less. HLA automatically extends the size of the result to the next greater size if the product will not fit into an integer that is the same size as the operands. HLA will actually choose the smallest possible size for the product (e.g., if the result only requires 16 bits of precision, the resulting type will be uns16, int16 , or word ). The resulting type is always unsigned if the operands were unsigned, signed if the operands were signed, and hexadecimal if the operands were hexadecimal.

If the operands are real operands, HLA computes their product and always produces a real80 result. If you want to produce a smaller result via the '*' operator, use the real32 or real64 compile-time function to produce the smaller result, e.g., " real32( r32const * r32const2 ) ". Note that all real arithmetic inside HLA is always performed using the FPU, hence the results are always real80 . Other than trying to simulate the actual products a running program would produce, there is no real reason to coerce the product to a smaller value.

If the operands are character set operands, the '*' operator computes the intersection of the two sets. Since HLA uses a bitmap representation for character sets, this operator does a bitwise logical AND of the two 16-byte operands (this operation is roughly equivalent to applying the "&" operator to two lword operands).

If the operand is one of the integer types (signed, unsigned, hexadecimal), then HLA will set the type of the result to the smallest type within that class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values).

expr1 div expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. Supplying any other data type as an operand will produce an error. The div operator divides the first expression by the second and produces the truncated quotient result.

If the operands are unsigned, HLA will do a full 128/128 bit division and the resulting type will be unsigned (HLA sets the type to the smallest unsigned type that will completely hold the result). If the operands are signed, HLA will do a full 128/128 bit signed division and the resulting type will be the smallest intXX type that can hold the result. If the operands are hexadecimal values, HLA will do a full 128/128 bit unsigned division and set the resulting type to the smallest hex type that can hold the result.

Note that the div operator does not allow real operands. Use the "/" operator for real division.

HLA will set the type of the result to the smallest type within its class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values).

expr1 mod expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. The mod operator divides the first expression by the second and produces their remainder (this value is always positive).

If the operands are unsigned, HLA will do a full 128/128 bit division and return the remainder. The resulting type will be unsigned (HLA sets the type to the smallest unsigned type that will completely hold the result).

If the operands are signed, HLA will do a full 128/128 bit signed division and return the remainder. The resulting type will be the smallest intXX type that can hold the result.

If the operands are hexadecimal values, HLA will do a full 128/128 bit unsigned division and set the resulting type to the smallest hex type that can hold the result.

Note that the mod operator does not allow real operands. You'll have to define real modulus and write the expression yourself if you need the remainder from a real division.

HLA will set the type of the result to the smallest type within its class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values).

expr1 / expr2

The two expressions must be numeric. The '/' operator divides the first expression by the second and produces their (real80) quotient result.

If the operands are integers (unsigned, signed, or hexadecimal) or the operands are real32 or real80 , HLA first converts them to real80 before doing the division operation. The expression result is always real80 .

expr1 << expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. The second operand must be a small (32-bit or less) non-negative value in the range 0..128. The << operator shifts the first expression to the left the number of bits specified by the second expression. Note that the result may require more bits to hold than the original type of the left operand. Any bits shifted out of bit position 127 are lost.

HLA will set the type of the result to the smallest type within the left operan's class (signed, unsigned, or hexadecimal) that can hold the result (not including sign extension bits for negative numbers or zero extension bits for non-negative values). Note that the '<<' operator can yield a smaller type (specifcally, an eight bit type) if it shifts all the bits off the H.O. end of the number; generally, though, this operation produces larger result types than the left operand.

expr1 >> expr2

The two expressions must be integer (signed, unsigned, or hexadecimal) numbers. The second operand must be a small (32-bit or less) non-negative value in the range 0..128. The >> operator shifts the first expression to the right the number of bits specified by the second expression. Any bits shifted out of the L.O. bit are lost. Note that this shift is a logical shift right, not an arithmetic shift right (this is true even if the left operand is an INTxx value). Therefore, this operation always shifts a zero into bit position 127.

Shift rights may produce a smaller type that the value of the left operand. HLA will always set the type of the result value to the minimum type size that has the same base class as the left operand.

expr1 + expr2

If the two expressions are numeric, the "+" operator produces their sum.

If the two expressions are strings or characters, the "+" operator produces a new string by concatenating the right expression to the end of the left expression.

If the two operands are character sets, the "+" operator produces their union.

If the operands are integer values (signed, unsigned, or hexadecimal), then HLA adds them together. Any overflow out of bit #127 (unsigned or hexadecimal) or bit #126 (signed) is quietly lost. HLA sets the type of the result to the smallest type size that will hold the sum; the type class (signed, unsigned, hexadecimal) will be the same as the operands. Note that it is possible for the type size to grow or shrink depending on the values of the operands (e.g., adding a positive and negative number could reduce the type size, adding two positive or two negative numbers may expand the result type's size).

When adding two real values (or a real and an integer value), HLA always produces a real80 result.

Since HLA uses a bitmap to represent character sets, taking the union of two character sets is the same as doing a bitwise logical OR of all 16 bytes in the character set.

expr1 - expr2

If the two expressions are numeric, the "-" operator produces their difference.

If the two expressions are character sets, the "-" operator produces their set difference (that is, all the characters in expr1 that are not also in expr2).

If the operands are integer values (signed, unsigned, or hexadecimal), then HLA subtracts the right operand from the left operand. Any overflow out of bit #127 (unsigned or hexadecimal) or bit #126 (signed) is quietly lost. HLA sets the type of the result to the smallest type size that will hold their difference; the type class (signed, unsigned, hexadecimal) will be the same as the operands. Note that it is possible for the type size to grow or shrink depending on the values of the operands (e.g., subtracting two negative or non-negative numbers could reduce the type size, subtracting a negative value from a non-negative value may expand the result type's size).

When subtracting two real values (or a real and an integer value), HLA always produces a real80 result.

Since HLA uses a bitmap to represent character sets, taking the set of two character sets is the same as doing a bitwise logical AND of the left operand with the inverse of the right operand.

Comparisons (=, ==, <>, !=, <, <=, >, and >=)

expr1 = expr2

expr1 == expr2

expr1 <> expr2

expr1 != expr2

expr1 < expr2

expr1 <= expr2

expr1 > expr2

expr1 >= expr2

 

(note: "!=" and "<>" operators are identical. "=" and "==" operators are identical.)

The two expressions must be compatible (described earlier). These operators compare the two operands and return true or false depending upon the result of the comparison.

You may use the "=" and "<>" operators to compare two pointer constants (e.g., "&abc" or "&ptrVar"). The other operators do not allow pointer constant operands.

All the above operators allow you to compare boolean values, enumerated values (types must match), integer (signed, unsigned, hexadecimal) values, character values, string values, real values, and character set values.

When comparing boolean values, note that false < true .

One character set is less than another if it is a proper subset of the other. A character set is less than or equal to another set if it is a subset of that second set. Likewise, one character set is greater than, or greater than or equal to, another set if it is a proper superset, or a superset, respectively.

As with any programming language, you should take care when comparing two real values (especially for equality or inequality) as minor precision drifts can cause the comparison to fail.

expr1 & expr2

 

(note: "&&" and "&" mean different things to HLA. See the section on high level language control structures for details on the "&&" operator.)

The operands must both be boolean or they must both be numbers. With boolean operands the and operator produces the logical and of the two operands (boolean result). With number operands, the and operator produces the bitwise logical AND of the operands.

If the operand is one of the integer types (signed, unsigned, hexadecimal), then HLA will set the type of the result to the smallest type within that class (signed, unsigned, or hexadecimal) that can hold the result.

expr1 in expr2

The first expression must be a character value. The second expression must be a character set. The in operator returns true if the character is a member of the specified character set; it returns false otherwise.

expr1 | expr2

(note: "||" and "|" mean different things to HLA. See the section on high level language control structures for details on the "||" operator.)

The operands must both be boolean or they must both be numbers. With boolean operands the or operator produces the logical or of the two operands (boolean result). With number operands, the or operator produces the bitwise or of the operands.

If the operand is one of the integer types (signed, unsigned, hexadecimal), then HLA will set the type of the result to the smallest type within that class (signed, unsigned, or hexadecimal) that can hold the result.

expr1 ^ expr2

The operands must both be boolean or they must both be numbers. With boolean operands the xor operator produces the logical exclusive-or of the two operands (boolean result). With number operands, the xor operator produces the bitwise exclusive-or of the operands.

If the operand is one of the integer types (signed, unsigned, hexadecimal), then HLA will set the type of the result to the smallest type within that class (signed, unsigned, or hexadecimal) that can hold the result.

( expr )

You may override the precedence of any operator(s) using parentheses in HLA constant expressions.

 

[ comma_separated_list_of_expressions ]

This produces an array expression. The type of the expression is an array type whose base element is the type of one of the expressions in the list. If there are two or more constant types in the array expression, HLA promotes the type of the array expression following the rules for mixed-mode arithmetic (see the rules earlier in this document).

 

record_type_name : [ comma_separated_list_of_field_expressions ]

This produces a record expression. The expressions appearing within the brackets must match the respective fields of the specified record type. See the discussion earlier in this chapter.

 

identifier

An identifier is a legal component of a constant expression if the identifier's classification is CONST or VAL (that is, the identifier was declared in a constant or value section of the program). The expression evaluator substitutes the current declared value and type of the symbol within the expression. Constant expressions allow the following types:

 

Boolean, enumerated types, Uns8, Uns16, Uns32, Uns64, Uns128 Byte, Word, DWord, QWord, LWord, Int8, Int16, Int32, Int64, Int128, Char, Real32, Real64, Real80, String, and Cset.

 

You may also specify arrays whose element base type is one of the above types (or a record or union subject to the following restriction). Likewise, you can specify record or union constants if all of their respective fields are one of the above primitive types or a value array, record, or union constant.

HLA allows array, record, and union constants. If you specify the name of an array, for example, HLA works with all the values of that array. Likewise, HLA can copy all the values of a record or union with a single statement.

identifier1.identifier2 {...}

Selects a field from a record or union constant. Identifier1 must be a record or union object defined in a const or val section. Identifier2 (and any following dot-identifiers) must be a field of the record or union. HLA replaces this object with the value of the specified field.

Examples:

recval.fieldval

recval.subrecval.fieldval

 

Don't forget that with union constant, you may only access the last field into which you've actually stored data (see the section on union constants for more details).

identifier [ index_list ]

Identifier must be an array constant defined in either a const or val section. Index_list is a list of constant expressions separated by commas. The index list selects a specified element of the "identifier" array. HLA reports an error if you supply more indices than the array has dimensions. HLA returns an array slice if you specify fewer indices than the array has dimensions (for example, if an array is declared as "a:uns8[4,4]" and you specify "a[2]" in a constant expression, HLA returns the third row of the array (a[2,0]..a[2,3]) as the value of this term).

Examples:

arrayval[0]

aval[1,4,0]

Program Structure

 

An HLA program uses the following general syntax:

 

program identifier ;

declarations

begin identifier;

statements

end identifier;

 

The three identifiers above must all match. The declaration section (declarations) consists of label, type, const, val, var, static, uninitialized, readonly, segment, procedure, and macro definitions (all described later). Any number of these sections may appear and they may appear in any order; more than one of each section may appear in the declaration section.

Example:

 

program TestPgm;
type
integer: int16;
const
i0 : integer := 0;
var
i:integer;

begin TestPgm;

mov( i0, i );

end TestPgm;

If you wish to write a library module that contains only procedures and no main program, you would use an HLA unit. Units have a syntax that is nearly identical to programs, there just isn't a begin associated with the unit, e.g.,

 

unit TestPgm;

procedure LibraryRoutine;
begin LibraryRoutine;
<< etc. >>
end LibraryRoutine;

end TestPgm;

Procedure Declarations

Procedure declarations are nearly identical to program declarations with two major differences: procedures are declared using the "procedure" reserved word and procedures may have parameters. The general syntax is:

 

procedure identifier ( optional_parameter_list ); procedure_options

declarations

begin identifier;

statements

end identifier;

 

Note that you may declare procedures inside other procedure in a fashion analogous to most block-structured languages (e.g., Pascal).

The optional parameter list consists of a list of var-type declarations taking the form:

 

optional_access_keyword identifier1 : identifier2 optional_in_reg

 

optional_access_keyword, if present, must be val, var, valres, result, name, or lazy and defines the parameter passing mechanism (pass by value, pass by reference, pass by value/result [or value/returned], pass by result, pass by name, or pass by lazy evaluation, respectively). The default is pass by value (val) if an access keyword is not present. For pass by value parameters, HLA allocates the specified number of bytes according to the size of that object in the activation record. For pass by reference, pass by value/result, and pass by result, HLA allocates four bytes to hold a pointer to the object. For pass by name and pass by lazy evaluation, HLA allocates eight bytes to hold a pointer to the associated thunk and a pointer to the thunk's execution environment (see the sections on parameters and thunks for more details).

The optional_in_reg clause, if present, corresponds to the phrase "in reg" where reg is one of the 80x86's general purpose 8-, 16-, or 32-bit registers. You must take care when passing parameters through the registers as the parameter names become aliases for registers and this can create confusion when reading the code later (especially if, within a procedure with a register parameter, you call another procedure that uses that same register as a parameter).

HLA also allows a special parameter of the form:

 

var identifer : var

 

This creates an untyped reference parameter. You may specify any memory variable as the corresponding actual parameter and HLA will compute the address of that object and pass it on to the procedure without further type checking. Within the procedure, the parameter is given the DWORD type.

The procedure_options component above is a list of keywords that specify how HLA emits code for the procedure. There are several different procedure options available: @noalignstack, @alignstack, @pascal, @stdcall, @cdecl, @align ( int_const), @use reg32 , @leave, @noleave, @enter, @noenter, and @returns ("text").

 

: Procedure Options

Option

Description

@noframe, @frame

By default, HLA emits code at the beginning of the procedure to construct a stack frame. The @noframe option disables this action ( noframe is depreciated, you should always use @noframe ). The @ frame option tells HLA to emit code for a particular procedure if stack frame generation is off by default. See the description of #frame and #noframe for details on controlling the default frame generation. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @frame to true (or @noframe to false) turns on frame generation by default; setting @frame to false (or @noframe to true) turns off frame generation.

@nodisplay, @display

By default, HLA emits code at the beginning of the procedure to construct a display within the frame. The @nodisplay option disable this action (@ nodisplay is depreciated, you should use @nodisplay ). The @ display option tells HLA to emit code to generate a display for a particular procedure if display generation is off by default. Note that HLA does not emit code to construct the display if '@ noframe ' is in effect, though it will assume that the programmer will construct this display themselves. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @display to true (or @nodisplay to false) turns on display generation by default; setting @display to false (or @nodisplay to true) turns off display generation.

@noalignstack, @alignstack

By default (assuming frame generation is active), HLA will an instruction to align ESP on a four-byte boundary after allocating local variables. Win32, Linux, and other 32-bit OSes require the stack to be dword-aligned (hence this option). If you know the stack will be dword-aligned, you can eliminate this extra instruction by specifying the @noalignstack option. Conversely, you can force the generation of this instruction by specifying the @ alignstack procedure option. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @alignstack to true (or @noalignstack to false) turns on stack alignment generation by default; setting @alignstack to false (or @noalignstack to true) turns off stack alignment code generation.

@pascal, @cdecl, @stdcall

These options give you the ability to specify the parameter passing mechanism for the procedure. By default, HLA uses the @ pascal calling sequence. This calling sequence pushes the parameters on the stack in a left-to-right order (i.e., in the order they appear in the parameter list). The @cdecl procedure option tells HLA to pass the parameters from right-to-left so that the first parameter appears at the lowest address in memory and that it is the user's responsibility to remove the parameters from the stack. The @stdcalll procedure option is a hybrid of the @ pascal and @ cdecl calling conventions. It pushes the parameters in the right-to-left order (just like @ cdecl ) but @ stdcall procedures automatically remove their parameter data from the stack (just like @ pascal ). Win32 API calls use the @ stdcall calling convention.

@align( int_constant )

The @ align ( int_const ) procedure option aligns the procedure on a 1, 2, 4, 8, or 16 byte boundary. Specify the boundary you desire as the parameter to this option. The default is @align(1) , which is unaligned; HLA also uses this special identifiers as a compile-time variable to set the default procedure alignment . Setting @align := 1 turns off procedure alignment while supplying some other value (which must be a power of two) sets the default procedure alignment to the specified number of bytes.

@use reg32

When passing parameters, HLA can sometimes generate better code if it has a 32-bit general purpose register for use as a scratchpad register. By default, HLA never modifies the value of a register behind your back; so it will often generate less than optimal code when passing certain parameters on the stack. By using the @use procedure option, you can specify one of the following 32-bit registers for use by HLA: eax, ebx, ecx, edx, esi , or edi . By providing one of these registers, HLA may be able to generate significantly better code when passing certain parameters.

@returns( "text" )

This option specifies the compile-time return value whenever a function name appears as an instruction operand. For example, suppose you are writing a function that returns its result in EAX. You should probably specify a "returns" value of "EAX" so you can compose that procedure just like any other HLA machine instruction (see the example below and the section on machine instructions for more details).

@leave, @noleave

These two options control the code generation for the standard exit sequence. If you specify the @leave option then HLA emits the x86 LEAVE instruction to clean up the activation record before the procedure returns. If you specify the @noleave option, then HLA emits the primitive instructions to achieve this, e.g.,

mov( ebp, esp );

pop( ebp );

 

The manual sequence is faster on some architectures, the LEAVE instruction is always shorter.

 

Note that @noleave occurs by default if you've specified @noframe or #noframe . By default, HLA assumes @noleave but you may change the default using these special identifiers as a compile-time variable to set the default LEAVE generation for all procedures. Setting @leave to true (or @noleave to false) turns on LEAVE generation by default; setting @leave to false (or @noleave to true) turns off the use of the LEAVE instruction.

@enter, @noenter

These two options control the code generation for a procedure's standard entry sequence. If you specify the @enter option then HLA emits the x86 ENTER instruction to create the activation record. If you specify the @noenter option, then HLA emits the primitive instructions to achieve this.

 

The manual sequence is always faster, using the ENTER instruction is usually shorter.

 

Note that @noenter occurs by default if you've specified @noframe or #noframe . By default, HLA assumes @noenter but you may change the default using these special identifiers as a compile-time variable to set the default ENTER generation for all procedures. Setting @enter to true (or @noenter to false) turns on ENTER generation by default; setting @enter to false (or @noenter to true) turns off the use of the ENTER instruction.

The following example demonstrates how the @ returns option works:

 

program returnsDemo;
#include( "stdio.hhf" );

procedure eax0; @returns( "eax" );
begin eax0;

mov( 0, eax );

end eax0;

begin returnsDemo;

mov( eax0(), ebx );
stdout.put( "ebx=", ebx, nl );

end returnsDemo;

To help those who insist on constructing the activation record themselves, HLA declares two local constants within each procedure: _vars_ and _parms_ . The _vars_ symbol is an integer constant that specifies the number of local variables declared in the procedure. This constant is useful when allocating storage for your local variables. The _parms_ constants specifies the number of bytes of parameters. You would normally supply this constant as the parameter to a ret() instruction to automatically clean up the procedure's parameters when it returns.

If you do not specify @ nodisplay , then HLA defines a run-time variable named _display_ that is an array of pointers to activation records. For more details on the _display_ variable, see the section on lexical scope.

You can also declare @external procedures (procedures defined in other HLA units or written in languages other than HLA) using the following syntaxes:

 

procedure externProc1 (optional parameters) ; @returns( "text" ); @external;

 

procedure externProc2 (optional parameters) ;

@returns( "text" ); @external( "external_name" );

 

As with normal procedure declarations, the parameter list and @ returns clause are optional.

The first form is generally used for HLA-written functions. HLA will use the procedure's name (externProc1 in this case) as external name.

The second form lets you refer to the procedure by one name ( externProc2 in this case) within your HLA program and by a different name ("different_name" in this example) in the MASM generated code. This second form has two main uses: (1) if you choose an external procedure name that just happens to be a MASM reserved word, the program may compile correctly but fail to assemble. Changing the external name to something else solves this problem. (2) When calling procedures written in external languages you may need to specify characters that are not legal in HLA identifiers. For example, Win32 API calls often use names like "WriteFile@24" containing illegal (in HLA) identifier symbols. The string operand to the external option lets you specify any name you choose. Of course, it is your responsibility to see to it that you use identifiers that are compatible with the linker and MASM, HLA doesn't check these names.

By default, HLA does the following:

 

These options are the most general and "safest" for beginning assembly language programmers. However, the code HLA generates for this general case may not be as compact or as fast as is possible in a specific case. For example, few procedures will actually need a display data structure built upon procedure activation. Therefore, the code that HLA emits to build the display can reduce the efficiency of the program. Advanced programmers, of course, can use procedure options like "@nodisplay" to tell HLA to skip the generation of this code. However, if a program contains many procedures and none of them need a display, continually adding the "@nodisplay" option can get really old. Therefore, HLA allows you to treat these directives as "pseudo-compile-time-variables" to control the default code generation. E.g.,

? @display := true; // Turns on default display generation.

? @display := false; // Turns off default display generation.

? @nodisplay := true; // Turns off default display generation.

? @nodisplay := false; // Turns on default display generation.

 

? @frame := true; // Turns on default frame generation.

? @frame := false; // Turns off default frame generation.

? @noframe := true; // Turns off default frame generation.

? @noframe := false; // Turns on default frame generation.

 

? @alignstack := true; // Turns on default stk alignment code generation.

? @alignstack := false; // Turns off default stk alignment code generation.

? @noalignstack := true; // Turns off default stk alignment code generation.

? @noalignstack := false; // Turns on default stk alignment code generation.

 

? @enter := true; // Turns on default ENTER code generation.

? @enter := false; // Turns off default ENTER code generation.

? @noenter := true; // Turns off default ENTER code generation.

? @noenter := false; // Turns on default ENTER code generation.

 

? @leave := true; // Turns on default LEAVE code generation.

? @leave := false; // Turns off default LEAVE code generation.

? @noleave := true; // Turns off default LEAVE code generation.

? @noleave := false; // Turns on default LEAVE code generation.

 

?@align := 1; // Turns off procedure alignment (align on byte boundary).

?@align := int_expr; // Sets alignment, must be a power of two.

 

These directives may appear anywhere in the source file. They set the internal HLA default values and all procedure declarations following one of these assignments (up to the next, corresponding assignment) use the specified code generation option(s). Note that you can override these defaults by using the corresponding procedure options mentioned earlier.

Disabling HLA's Automatic Code Generation for Procedures

Before jumping in and describing how to use the high level HLA features for procedures, the best place to start is with a discussion of how to disable these features and write "plain old fashioned" assembly language code. This discussion is important because procedures are the one place where HLA automatically generates a lot of code for you and many assembly language programmers prefer to control their own destinies; they don't want the compiler to generate any excess code for them. So disabling HLA's automatic code generation capabilities is a good place to start.

By default, HLA automatically emits code at the beginning of each procedure to do five things: (1) Preserve the pointer to the previous activation record (EBP); (2) build a display in the current activation record; (3) allocate storage for local variables; (4) load EBP with the base address of the current activation record; (5) adjust the stack pointer (downwards) so that it points at a dword-aligned address.

When you return from a procedure, by default HLA will deallocate the local storage and return, removing any parameters from the stack.

To understand the code that HLA emits, consider the following simple procedure:

 

procedure p( j:int32 );

var

i:int32;

begin p;

end p;

 

Here is a dump of the symbol table that HLA creates for procedure p:

 

p <0,proc>:Procedure type (ID=?1_p)

--------------------------------

_vars_ <1,cons>:uns32, (4 bytes) =4

i <1,var >:int32, (4 bytes, ofs:-12)

_parms_ <1,cons>:uns32, (4 bytes) =4

_display_ <1,var >:dword, (8 bytes, ofs:-4)

j <1,valp>:int32, (4 bytes, ofs:8)

p <1,proc>:

------------------------------------

 

The important thing to note here is that local variable " i " is at offset -12 and HLA automatically created an eight-bit local variable named " _display_ " which is at offset -4.

HLA emits the following code for the procedure above (annotations in italics are not emitted by HLA, this output is subject to changes in HLA code generation algorithms):

 

?1_p proc near32

push ebp ;Dynamic link (pointer to previous activation record)

pushd [ebp-04] ;Display for lex level 0

lea ebp,[esp+04] ;Get frame ptr (point EBP at current activation record)

pushd ebp ;Ptr to this proc's A.R. (part of display construction)

sub esp, 4 ;Local storage.

and esp, 0fffffffch ;dword-align stack

 

; Exit point for the procedure:

 

?x?1_p:

mov esp, ebp ;Deallocate local variables.

pop ebp ;Restore pointer to previous activation record.

ret 4 ;Return, popping parameters from the stack.

?1_p endp

 

Building the display data structure is not very common in standard assembly language programs. This is only necessary if you are using nested procedures and those nested procedures need to access non-local variables. Since this is a rare situation, many programmers will immediately want to tell HLA to stop emitting the code to generate the display. This is easily accomplished by adding the "@ nodisplay " procedure option to the procedure declaration. Adding this option to procedure p produces the following:

 

procedure p( j:int32 ); @nodisplay;

var

i:int32;

begin p;

end p;

 

Compiling this procedures the following symbol table dump:

 

p <0,proc>:Procedure type (ID=?1_p)

--------------------------------

_vars_ <1,cons>:uns32, (4 bytes) =4

i <1,var >:int32, (4 bytes, ofs:-4)

_parms_ <1,cons>:uns32, (4 bytes) =4

j <1,valp>:int32, (4 bytes, ofs:8)

p <1,proc>:

------------------------------------

 

Note that the _display_ variable is gone and the local variable i is now at offset -4. Here is the code that HLA emits for this new version of the procedure:

 

?1_p proc near32

push ebp ;Save ptr to previous activation record.

mov ebp, esp ;Point EBP at current activation record.

sub esp,4 ;Local storage.

and esp, 0fffffffch ;Align stack on dword boundary.

 

; Exit point for the procedure:

 

?x?1_p:

mov esp, ebp ;Deallocate local variables.

pop ebp ;Restore pointer to previous activation record.

ret 4 ;Return, and remove parameters from stack.

?1_p endp

 

As you can see, this code is smaller and a bit less complex. Unlike the code that built the display, it is fairly common for an assembly language programmer to construct an activation record in a manner similar to this. Indeed, about the only instruction out of the ordinary above is the "AND" instruction that dword-aligns the stack (OS calls require the stack to be dword-aligned, and the system performance is much better if the stack is dword aligned).

This code is still relatively inefficient if you don't pass parameters on the stack and you don't use automatic (non-static, local) variables. Many assembly language programmers pass their few parameters in machine registers and also maintain local values in the registers. If this is the case, then the code above is pure overhead. You can inform HLA that you wish to take full responsibility for the entry and exit code by using the "@ noframe " procedure option. Consider the following version of p :

 

procedure p( j:int32 ); @nodisplay; @noframe;

var

i:int32;

begin p;

end p;

 

(this produces the same symbol table dump as the previous example).

 

HLA emits the following code for this version of p:

 

?1_p proc near32

?1_p endp

 

Whoa! There's nothing there! But this is exactly what the advanced assembly language programmer wants. With both the @ nodisplay and @ noframe options, HLA does not emit any extra code for you. You would have to write this code yourself.

By the way, you can specify the @ noframe option without specifying the @ nodisplay option. HLA still generates no extra code, but it will assume that you are allocating storage for the display in the code you write. That is, there will be an eight-byte _display_ variable created and i will have an offset of -12 in the activation record. It will be your responsibility to deal with this. Although this situation is possible, it's doubtful this combination will be used much at all.

Note a major difference between the two versions of p when @ noframe is not specified and @ noframe is specified: if @ noframe is not present, HLA automatically emits code to return from the procedure. This code executes if control falls through to the "end p;" statement at the end of the procedure. Therefore, if you specify the @ noframe option, you must ensure that the last statement in the procedure is a RET() instruction or some other instruction that causes an unconditional transfer of control. If you do not do this, then control will fall through to the beginning of the next procedure in memory, probably with disasterous results.

The RET() instruction presents a special problem. It is dangerous to use this instruction to return from a procedure that does not have the @ noframe option. Remember, HLA has emitted code that pushes a lot of data onto the stack. If you return from such a procedure without first removing this data from the stack, your program will probably crash. The correct way to return from a procedure without the @ noframe option is to jump to the bottom of the procedure and run off the end of it. Rather than require you to explicitly put a label into your program and jump to this label, HLA provides the "exit procname;" instruction. HLA compiles the EXIT instruction into a JMP that transfers control to the clean-up code HLA emits at the bottom of the procedure. Consider the following modification of p and the resulting assembly code produced:

 

procedure p( j:int32 ); @nodisplay;

var

i:int32;

begin p;

exit p;

nop();

end p;

 

 

?2_p proc near32

push ebp

mov ebp, esp

sub esp, 4 ;Local storage.

and esp, 0fffffffch

jmp ?x?2_p ;p

nop

?x?2_p:

mov esp, ebp

pop ebp

ret 4

?2_p endp

 

As you can see, HLA automatically emits a label to the assembly output file ("? x?2_p " in this instance) at the bottom of the procedure where the clean-up code starts. HLA translates the "exit p;" instruction into a jmp to this label.

If you look back at the code emitted for the version of p with the @ noframe option, you'll note that HLA did not emit a label at the bottom of the procedure. Therefore, HLA cannot generate a jump to this nonexistent label, so you cannot use the exit statement in a procedure with the @ noframe option (HLA will generate an error if you attempt this).

Of course, HLA will not stop you from putting a RET() instruction into a procedure without the @ noframe option (some people who know exactly what they are doing might actually want to do this). Keep in mind, if you decide to do this, that you must deallocate the local variables (that's what the "mov esp, ebp" instruction is doing), you need to restore EBP (via the "pop ebp" instruction above), and you need to deallocate any parameters pushed on the stack (the "ret 4" handles this in the example above). The following code demonstrates this:

 

procedure p( j:int32 ); @nodisplay;

var

i:int32;

begin p;

if( j = 0 ) then

// Deallocate locals.

mov( ebp, esp );

// Restore old EBP

pop( ebp );

// Return and pop parameters

ret( 4 );

endif;

nop();

end p;

 

 

?1_p proc near32

push ebp

mov ebp, esp

sub esp, 4 ;Local storage.

and esp, 0fffffffch

cmp dword ptr [ebp+8], 0

jne ?2_false

mov esp, ebp

pop ebp

ret 4

?2_false:

nop

?x?1_p:

mov esp, ebp

pop ebp

ret 4

?1_p endp

 

 

If "real" assembly language programmers would generally specify both the @ noframe and @ nodisplay options, why not make them the default case (and use "@frame" and "@display" options to specify the generation of the activation record and display)? Well, keep in mind that HLA was originally designed as a tool to teach assembly language programming to beginning students. Those students have considerable difficulty comprehending concepts like activation records and displays. Having HLA generate the stack frame code and display generation code automatically saves the instructor from having to teach (and explain) this code. Even if the student never uses a display, it doesn't make the program incorrect to go ahead and generate it. The only real cost is a little extra memory and a little extra execution time. This is not a problem for beginning students who haven't yet learned to write efficient code. Therefore, HLA was optimized for the beginning at the expense of the advanced programmer. It is also worthwhile to point out that the behavior of the EXIT statement depends upon displays if you attempt to exit from a nested procedure; yet another reason for HLA's default behavior. Of course, you can always override HLA's default behavior by using the #nodisplay and #noframe directives.

If you are absolutely certain that your stack pointer is aligned on a four-byte boundary upon entry into a procedure, you can tell HLA to skip emitting the AND( $FFFF_FFFC, ESP ); instruction by specifying the @ noalignstack procedure option. Note that specifying @ noframe also specifies @ noalignstack .

Procedure Calls and Parameters in HLA

HLA's high level support consists of three main features: HLL-like declarations, the HLL statements (IF, WHILE, etc), and HLA's support for procedure calls and parameter passing. This section discusses the syntax for procedure declarations and how HLA generates code to automatically pass parameters to a procedure.

The syntax for HLA procedure declarations was touched on earlier; however, it's probably a good idea to review the syntax as well as describe some options that previous sections ignored. There are several procedure declaration forms, the following examples demonstrate them all13:

 

// Standard procedure declaration:

 

procedure procname (opt_parms); proc_options

begin procname;

<< procedure body >>

end procname;

 

// External procedure declarations:

 

procedure extname (opt_parms); proc_options @external;

procedure extname (opt_parms); proc_options @external( "name");

 

// Forward procedure declarations:

 

procedure fwdname (opt_parms); proc_options @forward;

 

Opt_parms indicates that the parameter list is optional; the parentheses are not present if there are no parameters present.

Proc_options is any combination (zero or more) of the following procedure options (see the discussion earlier for these options):

@noframe;

@nodisplay;

@noalignstack;

@pascal;

@cdecl;

@stdcall;

@align( expression );

@returns( "string" );

 

The @external reserved word tells HLA that the specified procedure does not appear in the current compilation, but is present in a different source file that will be compiled separately. Note that the presence of an external declaration doesn't require that the procedure appear in a separate source file. If the actual procedure appears in the same compilation unit as the external declaration, HLA treats the external declaration as a forward declaration (see the next paragraph for details on forward declarations). External procedure declarations have been discussed earlier, see the appropriate section(s) for additional details.

 

The @forward declaration syntax is necessary because HLA requires all procedure symbols to be declared before they are used. In a few rare cases (where mutual recursion occurs between two or more procedures), it may be impossible to write your code such that every procedure is declared before the first call to the code. More commonly, sorting your procedures to ensure that all procedures are written before their first call may force an artificial organization on the source file, making it harder to read. The forward procedure declaration handles this situation for you. It lets you create a procedure prototype that describes how the procedure is to be called without actually specifying the procedure body. Later on in the source file, the full procedure declaration must appear.

Note: an external declaration also serves as a forward declaration. So if you have an external definition at the beginning of your program (perhaps it appears in an include file), you do not need to provide a forward declaration as well.

Calling HLA Procedures

There are two standard ways to call an HLA procedure: use the call instruction or simply specify the name of the procedure as an HLA statement. Both mechanisms have their plusses and minuses.

To call an HLA procedure using the call instruction is exceedingly easy. Simply use either of the following syntaxes:

 

call( procName );

call procName;

 

Either form compiles into an 80x86 call instruction that calls the specified procedure. The difference between the two is that the first form (with the parentheses) returns the procedure's "returns" value, so this form can appear as an operand to another instruction. The second form above always returns the empty string, so it is not suitable as an operand of another instruction. Also, note that the second form requires a statement or procedure label, you may not use memory addressing modes in this form; on the other hand, the second form is the only form that lets you "call" a statement label (as opposed to a procedure label); this form is useful on ocassion.

If you use the call statement to call a procedure, then you are responsible for passing any parameters to that procedure. In particular, if the parameters are passed on the stack, you are responsible for pushing those parameters (in the correct order) onto the stack before the call. This is a lot more work than letting HLA push the parameters for you, but in certain cases you can write more efficient code by pushing the parameters yourself.

The second way to call an HLA procedure is to simply specify the procedure name and a list of actual parameters (if needed) for the call. This method has the advantage of being easy and convenient at the expense of a possible slight loss in effiency and flexibility. This calling method should also prove familiar to most HLL programmers. As an example, consider the following HLA program:

 

program parameterDemo;

 

#include( "stdio.hhf" );

 

procedure PrtAplusB( a:int32; b:int32 ); @nodisplay;

begin PrtAplusB;

 

mov( a, eax );

add( b, eax );

stdout.put( "a+b=", (type int32 eax ), nl );

end PrtAplusB;

static

v1:int32 := 25;

v2:int32 := 5;

begin parameterDemo;

 

PrtAplusB( 1, 2 );

PrtAplusB( -7, 12 );

PrtAplusB( v1, v2 );

 

mov( -77, eax );

mov( 55, ebx );

PrtAplusB( eax, ebx );

end parameterDemo;

 

This program produces the following output:

 

a+b=3

a+b=5

a+b=30

a+b=-22

 

As you can see, call PrtAplusB in HLA is very similar to calling procedures (and passing parameters) in a high level language like C/C++ or Pascal. There are, however, some key differences between and HLA call and a HLL procedure call. The next section will cover those differences in greater detail. The important thing to note here is that if you choose to call a procedure using the HLL syntax (that is, the second method above), you will have to pass the parameters in the parameter list and let HLA push the parameters for you. If you want to take complete control over the parameter passing code, you should use the call instruction.

 

Parameter Passing in HLA, Value Parameters

The previous section probably gave you the impression that passing parameters to a procedure in HLA is nearly identical to passing those same parameters to a procedure in a high level language. The truth is, the examples in the previous section were rigged. There are actually many restrictions on how you can pass parameters to an HLA procedure. This section discusses the parameter passing mechanism in detail.

The most important restriction on actual parameters in a call to an HLA procedure is that HLA only allows memory variables, registers, constants, and certain other special items as parameters. In particular, you cannot specify an arithmetic expression that requires computation at run-time (although a constant expression, computable at compile time is okay). The bottom line is this: if you need to pass the value of an expression to a procedure, you must compute that value prior to calling the procedure and pass the result of the computation; HLA will not automatically generate the code to compute that expression for you.

The second point to mention here is that HLA is a strongly typed language when it comes to passing parameters. This means that with only a few exceptions, the type of the actual parameter must exactly match the type of the formal parameter. If the actual parameter is an int8 object, the formal parameter had better not be an int32 object or HLA will generate an error. The only exceptions to this rule are the byte, word, and dword types. If a formal parameter is of type byte, the corresponding actual parameter may be any one-byte data object. If a formal parameter is a word object, the corresponding actual parameter can be any two-byte object. Likewise, if a formal parameter is a dword object, the actual parameter can be any four-byte data type. Conversely, if the actual parameter is a byte, word, or dword object, it can be passed without error to any one, two, or four-byte actual parameter (respectively). Programmers who are really lazy make all their parameters bytes, words, or dwords (at least, whereever possible). Programmers who care about the quality of their code use untyped parameters cautiously.

If you want to use the high level calling sequence, but you don't like the inefficient code HLA sometimes produces when generating code to pass your parameters, you can always use the #{...}# sequence parameter to override HLA's code generation and substitute your own code for one or two parameters. Of course, it doesn't make any sense to pass all the parameters is a procedure using this trick, it would be far easier just to use the call instruction. Example:

PrtAplusB

(

#{

mov( i, eax ); // First parameter is "i+5"

add( 5, eax );

push( eax );

}#,

5

);

 

HLA will automatically copy an actual value parameter into local storage for the procedure, regardless of the size of the parameter. If your value parameter is a one million byte array, HLA will allocate storage for 1,000,000 bytes and then copy that array in on each call. C/C++ programmers may expect HLA to automatically pass arrays by reference (as C/C++ does) but this is not the case. If you want your parameters passed by reference, you must explicitly state this.

The code HLA generates to copy value parameters, while not particularly bad, certainly isn't optimal. If you need the fastest possible code when passing parameters by value on the stack, it would be better if you explicitly pushed the data yourself. Another alternative that sometimes helps is to use the " use reg32 " procedure option to provide HLA with a hint about a 32-bit scratchpad register that it can use when building parameters on the stack.

Parameter Passing in HLA, Reference, Value/Result, and Result Parameters

The one good thing about pass by reference, pass by value/result, and pass by result parameters is that they are always four byte pointers, regardless of the size of the actual parameter. Therefore, HLA has an easier time generating code for these parameters than it does generating code for pass by value parameters.

HLA treats reference, value/result, and result parameters identically. The code within the procedure is responsible for differentiating these parameter types (value/result and result parameters generally require copying data between local storage and the actual parameter). The following discussion will simply refer to pass by reference parameters, but it applies equally well to pass by value/result and pass by result.

Like high level languages, HLA places a whopper of a restriction on pass by reference parameters: they can only be memory locations. Constants and registers are not allowed since you cannot compute their address. Do keep in mind, however, that any valid memory addressing mode is a valid candidate to be passed by reference; you do not have to limit yourself to static and local variables. For example, "[eax]" is a perfectly valid memory location, so you can pass this by reference (assuming you type-cast it, of course, to match the type of the formal parameter). The following example demonstrate a simple procedure with a pass by reference parameter:

 

program refDemo;

 

#include( "stdio.hhf" );

 

procedure refParm( var a:int32 );

begin refParm;

mov( a, eax );

mov( 12345, (type int32 [eax]));

end refParm;

static

i:int32:=5;

begin refDemo;

 

stdout.put( "(1) i=", i, nl );

mov( 25, i );

stdout.put( "(2) i=", i, nl );

refParm( i );

stdout.put( "(3) i=", i, nl );

end refDemo;

 

The output produced by this code is

 

(1) i=5

(2) i=25

(3) i=12345

 

As you can see, the parameter a in refParm exhibits pass by reference semantics since the change to the value a in refParm changed the value of the actual parameter ( i ) in the main program.

Note that HLA passes the address of i to refParm , therefore, the a parameter contains the address of i . When accessing the value of the i parameter, the refParm procedure must deference the pointer passed in a . The two instructions in the body of the refParm procedure accomplish this.

Take a look at the code that HLA generates for the call to refParm :

 

pushd offset32 ?198_i

call ?197_refParm

 

(" ?198_i " is the MASM compatible name that HLA generated for the static variable " i ".)

As you can see, this program simply computed the address of i and pushed it onto the stack. Now consider the following modification to the main program:

 

program refDemo;

 

#include( "stdio.hhf" );

 

procedure refParm( var a:int32 );

begin refParm;

mov( a, eax );

mov( 12345, (type int32 [eax]));

end refParm;

static

i:int32:=5;

var

j:int32;

begin refDemo;

 

mov( 0, j );

refParm( j );

refParm( i );

lea( eax, j );

refParm( [eax] );

end refDemo;

 

 

This version emits the following code:

 

mov dword ptr [ebp-8] , 0 ;j

push eax

lea eax, dword ptr [ebp-8] ;j

xchg eax, [esp]

call ?197_refParm ;refParm

 

pushd offset32 ?198_i

call ?197_refParm ;refParm

 

lea eax, dword ptr [ebp-8] ;j

push eax

push eax

lea eax, dword ptr [eax+0] ;[eax]

mov [esp+4],eax

pop eax

call ?197_refParm ;refParm

 

As you can see, the code emitted for the last call is pretty ugly (we could easily get rid of three of the instructions in this code). This call would be a good candidate for using the call instruction directly. Also see "Hybrid Parameters" a little later in this document. Another option is to use the "use reg32" option to tell HLA it can use one of the 32-bit registers as a scratchpad. Consider the following:

procedure refParm( var a:int32 ); use esi;

.

.

.

lea( eax, j );

refParm( [eax] );

 

This sequence generates the following code (which is a little better than the previous example):

lea eax, dword ptr [ebp-8] ;j

lea eax, dword ptr [eax+0] ;[eax]

push eax

call ?197_refParm ;refParm

 

As a general rule, the type of an actual reference parameter must exactly match the type of the formal parameter. There are a couple exceptions to this rule. First, if the formal parameter is dword , then HLA will allow you to pass any four-byte data type as an actual parameter by reference to this procedure. Second, you can pass an actual dword parameter by reference if the formal parameter is a four-byte data type.

There is a third exception to the "the types must exactly match" rule. If the formal reference parameter is some data type HLA will allow you to pass an actual parameter that is a pointer to this type. Note that HLA will actually pass the value of the pointer, rather than the address of the pointer, as the reference parameter. This turns out to be really convenient, particularly when calling Win32 API functions and other C/C++ code. Note, however, that this behavior isn't always intuitive, so be careful when passing pointer variables as reference parameters.

If you want to pass the value of a double word or pointer variable in place of the address of such a variable to a pass by reference, value/result, or result parameter, simply prefix the actual parameter with the VAL reserved word in the call to the procedure, e.g.,

refParm( val dwordVar );

 

This tells HLA to use the value of the variable rather than it's address.

Untyped Reference Parameters

HLA provides a special formal parameter syntax that tells HLA that you want to pass an object by reference and you don't care what its type is. Consider the following HLA procedure:

 

procedure zeroObject( var object:byte; size:uns32 );

begin zeroObject;

<< code to write "size" zeros to "object" >

end zeroObject;

 

The problem with this procedure is that you will have to coerce non-byte parameters to a byte before passing them to zeroObject . That is, unless you're passing a byte parameter, you've always got to call zeroObject thusly:

 

zeroObject( (type byte NotAByte), sizeToZero );

 

For some functions you call frequently with different types of data, this can get painful very quickly.

The HLA untyped reference parameter syntax solves this problem. Consider the following declaration of zeroObject :

 

procedure zeroObject( var object:var; size:uns32 );

begin zeroObject;

<< code to write "size" zeros to "object" >

end zeroObject;

 

Notice the use of the reserved word "VAR" instead of a data type for the object parameter. This syntax tells HLA that you're passing an arbitrary variable by reference. Now you can call zeroObject and pass any (memory) object as the first parameter and HLA won't complain about the type, e,g.,

 

zeroObject( NotAByte, sizeToZero );

 

Note that you may only pass untyped objects by reference to a procedure.

Note that untyped reference parameters always take the address of the actual parameter to pass on to the procedure, even if the actual parameter is a pointer (normal pass by reference semantics in HLA will pass the value of a pointer, rather than the address of the pointer variable, if the base type of the pointer matches the type of the reference parameter). Sometimes you'll have the address of an object in a register or a pointer variable and you'll want to pass the value of that pointer object (i.e., the address of the utlimate object) rather than the address of the pointer variable. To do this, simply prefix the actual parameter with the VAL keyword, e.g.,

zeroObject( ptrVar ); // Passes the address of ptrVal

zeroObject( val ptrVar ); // Passes ptrVar's value.

 

Parameter Passing in HLA, Name and Lazy Evaluation Parameters

HLA provides a modicum of support for pass by name and pass by lazy evaluation parameters. A pass by name parameter consists of a thunk that returns the address of the actual parameter. A pass by lazy evaluation parameter is a thunk that returns the value of the actual parameter. Whenever you specify the "name" or "lazy" keywords before a parameter, HLA reserves eight bytes to hold the corresponding thunk in the activation record. It is your responsibility to create a thunk whenever calling the procedure.

There is a minor difference between passing a thunk parameter by value and passing a lazy evaluation or name parameter to a procedure. Pass by name/lazy parameters require an immediate thunk constant; you cannot pass a thunk variable as a pass by name or lazy parameter.

To pass a thunk constant as a parameter to a pass by name or pass by lazy evaluation parameter, insert the thunk's code inside "#{...}#" sequence in the parameter list and preface the whole thing with the THUNK reserved word. The following example demonstrates passing a thunk as a pass by name parameter:

 

program nameDemo;

#include( "stdio.hhf" );

 

procedure passByName( name ary:int32; var ip:int32 );

@nodisplay;

const i:text := "(type int32 [ebx])";

const a:text := "(type int32 [eax])";

begin passByName;

mov( ip, ebx );

mov( 0, i );

while( i < 10 ) do

ary(); // Get address of "ary[i]" into eax.

mov(i, ecx );

mov( ecx, a );

inc( i );

endwhile;

end passByName;

procedure thunkParm( t:thunk );

begin thunkParm;

t();

end thunkParm;

var

index:int32;

array:int32[10];

th:thunk;

begin nameDemo;

 

thunk th := #{ stdout.put( "Thunk Variable",nl ) }#;

thunkParm( th );

thunkParm( thunk #{ stdout.put( "Thunk Constant" nl ); }# );

// passByName( th, index ); -- would be illegal;

passByName

(

thunk

#{

push( ebx );

mov( index, ebx );

lea( eax, array[ebx*4] );

pop( ebx );

}#,

index

);

mov( 0, ebx );

while( ebx < 10 ) do

stdout.put

(

"array[",

(type int32 ebx),

"]=",

array[ebx*4],

nl

);

inc( ebx );

endwhile;

 

end nameDemo;

 

This program produces the following output:

 

Thunk Variable

Thunk Constant

array[0]=0

array[1]=1

array[2]=2

array[3]=3

array[4]=4

array[5]=5

array[6]=6

array[7]=7

array[8]=8

array[9]=9

 

Hybrid Parameter Passing in HLA

HLA's automatic code generation for parameters specified using the high level language syntax isn't always optimal. In fact, sometimes it is downright inefficient. This is because HLA makes very few assumptions about your program. For example, suppose you are passing a word parameter to a procedure by value. Since all parameters in HLA consume an even multiple of four bytes on the stack, HLA will zero extend the word and push it onto the stack. It does this using code like the following:

 

pushw 0

pushw Parameter

 

Clearly you can do better than this if you know something about the variable. For example, if you know that the two bytes following "Parameter" are in memory (as opposed to being in the next page of memory that isn't allocated, and access to such memory would cause a protection fault), you could get by with the single instruction:

 

push dword ptr Parameter

 

Unfortunately, HLA cannot make these kinds of assumptions about the data because doing so could create malfunctioning code.

One solution, of course, is to forego the HLA high level language syntax for procedure calls and manually push all the parameters yourself and call the procedure via the CALL instruction. However, this is a major pain that involves lots of extra typing and produces code that is difficult to read and understand. Therefore, HLA provides a hybrid parameter passing mechanism that lets you continue to use a high level language calling syntax yet still specify the exact instructions needed to pass certain parameters. This hybrid scheme works out well because HLA actually does a good job with most parameters (e.g., if they are an even multiple of four bytes, HLA generates efficient code to pass the parameters; it's only those parameters that have a weird size that HLA generates less than optimal code for).

If a parameter consists of the "#{" and "}#" brackets with some corresponding code inside the brackets, HLA will emit the code inside the brackets in place of any code it would normally generate for that parameter. So if you wanted to pass a 16-bit parameter efficiently to a procedure named "Proc" and you're sure there is no problem accessing the two bytes beyond this parameter, you could use code like the following:

 

Proc( #{ push( (type dword WordVar) ); }# );

 

Notice the similarity to pass by name/eval parameters. However, no THUNK reserved word prefaces the code in this instance.

Whenever you pass a non-static14 variable as a parameter by reference, HLA generates the following code to pass the address of that variable to the procedure:

 

push eax

push eax

lea eax, Variable

mov [esp+4], eax

pop eax

 

It generates this particular code to ensure that it doesn't change any register values (after all, you could be passing some other parameter in the EAX register). If you have a free register available, you can generate slightly better code using a calling sequence like the following (assuming EBX is free):

 

HasRefParm

(

#{

lea( ebx, Variable );

push( ebx );

}#

);

 

 

Parameter Passing in HLA, Register Parameters

HLA provides a special syntax that lets you specify that certain parameters are to be passed in registers rather than on the stack. The following are some examples of procedure declarations that use this feature:

 

procedure a( u:uns32 in eax ); forward;

procedure b( w:word in bx ); forward;

procedure d( c:char in ch ); forward;

 

Whenever you call one of these procedures, the code that HLA automatically emits for the call will load the actual parameter value into the specified register rather than pushing this value onto the stack. You may specify any general purpose 8-bit, 16-bit, or 32-bit register after the "IN" keyword following the parameter's type. Obviously, the parameter must fit in the specified register. You may only pass reference parameters in 32-bit registers; you cannot pass parameters that are not one, two, or four bytes long in a register.

You can get in to trouble if you're not careful when using register parameters, consider the following two procedure definitions:

procedure one( u:uns32 in eax; v:dword in ebx ); forward;

procedure two( a:uns32 in eax );

begin two;

 

one( 25, a );

 

end two;

 

The call to "one" in procedure "two" looks like it passes the values 25 and whatever was passed in for "a" in procedure two. However, if you study the HLA output code, you will discover that the call to "one" passes 25 for both parameters. They reason for this is because HLA emits the code to load 25 into EAX in order to pass 25 in the "u" parameter. Unfortunately, this wipes out the value passed into "two" in the "a" variable, hence the problem. Be aware of this if you use register parameters often.

Lexical Scope

HLA is a block-structured language that enforces the scope of local identifiers. HLA uses lexical scope to determine when and where an identifier is visible to the program. Identifiers declared within a procedure are always visible within that procedure and to any local procedures declared after the identifier. Local identifiers are never visible outside the procedure declaration. The scoping rules are similar to languages like Pascal, Ada, and Modula-2. As an example, consider the following code:

program scopeDemo;

#include( "stdio.hhf" );

var
i:int32;
j:int32;
k:int32;

procedure lex1;
var
i:int32;
j:int32;

procedure lex2;
var
i:int32;
begin lex2;

mov( i, eax ); /1

mov( ebx::j, eax ); //2

mov( ecx::k, eax ); //3


end lex2;

begin lex1;

mov( i, eax ); //4

mov( j, eax ); //5

mov( ecx::k, eax ); //6


end lex1;


procedure alsolex1;
var
i:int32;
m:int32;
begin alsolex1;

mov( i, eax ); //4

mov( m, eax ); //5

mov( ecx::k, eax ); //6


end alsolex1;


begin scopeDemo;

mov( i, eax ); //7

mov( j, eax ); //8

mov( k, eax ); //9


end scopeDemo;

 

(Note: the purpose of the ebx:: and ecx:: prefixes on certain variables will become clear in a moment. Also note that this code is not functional, it was written only as an illustration.)

In this example you will note that lex2 is nested within lex1 , which is nested within the main program. The alsolex1 procedure is nested within the main program but inside no other procedure. To describe this arrangement, compiler writers use the term lex level to denote the depth of nesting. HLA defines the main program to be lex level zero. Each time you nest a procedure, you increase its lex level. So lex1 is at lex level one since it is directly nested inside the main program at lex level zero. The lex2 procedure is at lex level two because it is nested inside the lex1 procedure. Finally, alsolex1 is also at lex level one because it is nested inside the main program (which is lex level zero).

Within a given procedure (or the main program), all identifiers must be unique. That is, you cannot have two symbols named "i" in the same procedure. In different procedures, however, you may reuse the names. If all procedures were written at lex level one, then no procedure would be able to directly access the local variables in any other procedure (this is the case with the C/C++ language). In block structured languages, like HLA, it is possible to access certain non-local variables in other procedures if the current procedure (whose code is attempting to access said variable) is nested within the other procedure.

In the example above, lex2 accesses three variables: i , j , and k . The i variable is local to lex2 , so there is nothing surprising here. The j variable is local to lex1 and global to lex2 . Likewise, the k variable is global to both lex1 and lex2 yet lex2 can access it. Whenever a procedure is nested within another (either directly or indirectly), the nested procedure can access all variables in the global, nesting, procedures (including the main program)15 unless the procedure declares a local name with the same name as a global name (the local name always takes precedence in this case). The term "scope" refers to the visibility of these names.

Being able to use a name during compilation is one thing, accessing the memory location associated with that name at run-time is something else entirely. Most block structured high level languages (HLLs) emit lots of extra code to access these "intermediate" and global variables for you. Why the extra code? Well remember, local procedure variables are accessed on the stack by indexing off the EBP register (which points at a procedure's "activation record"). When a procedure like lex1 above calls a local procedure like lex2 , the lex2 procedure promptly saves the value in EBP (that points at lex1 's activation record) and it points EBP at the new activation record for lex2 . Unfortunately, lex2 no longer has access to lex1 's local variables since EBP no longer points at lex1 's locals. This creates a bit of a problem.

"But wait!" you exclaim. "EBP is pointing at the pointer to lex1 's activation record, why not just use double indirection to get the pointer to lex1 's locals?" This is a good idea, but it fails if lex2 is recursive. There are two or three general solutions to this problem, HLA uses a display to access non-local values.

A display is nothing more than an array of pointers. Display [0] is a pointer to the most recent activation record at lex level zero, Display [1] is a pointer to the most recent activation record at lex level one, Display [2] is a pointer to the most recent activation record at lex level two, etc. (note the use of the phrase most recent. This ensures that displays work properly even when recursion occurs). With a display, to access a non-local variable, you just go to the memory location specified by Display [ varlex ] + varoffset where " varlex " is the lex level of the symbol you wish to access and " varoffset " is the offset into the activation record where the variable's data can be found.

Sound complex? Actually, HLA simplifies this quite a bit. First, as long as you don't specify the @ nodisplay procedure option, HLA automatically emits the code to build a display at the start of the procedure's code16. HLA also defines a run-time variable, _display_ , that points at this array of pointers. To access a non-local variable requires two instructions, one to fetch the address of the variable's activation record and one to access the variable. Correcting the previous program, the code would look something like this:

 

procedure lex2;
var
i:int32;
begin lex2;

mov( i, eax );

// access non-local variable j
// at lex level 1.

mov( _display_[-1*4], ebx );

mov( ebx::j, eax );


// access non-local variable k
// at lex level 0.

mov( _display_[0], ecx );

mov( ecx::k, eax );


end lex2;

There are two things to note about the display: first, the entries are stored at negative indicies in the array (0, -1, -2, etc) rather than at positive indicies (this is due to HLA's implementation). Second, don't forget that this is a run-time array of dwords so you must multiply each index by the array element size, which is four in this case.

Once you've loaded the address into a register, the reg:var syntax tells HLA to use the specified register rather than EBP as the pointer to the variable's activation record. The "mov(ecx::k,eax);" instruction, for example, compiles to "mov eax, [ecx+koffset]" where koffset represents the offset of k in the main program's activation record.

In general, few programs take advantage of nested procedures and access to local variables, so it is very common to find programmers putting " @nodisplay " after all their procedures. Of course, if you do this, HLA does not generate display and access to non-local variables (declared in the var section) is not possible. Of course, static variables are not allocated in the activation record, so you always have access to non-local static variables even if you don't generate the code for a display.

Class Data Types

HLA supports object-oriented programming via the class data type. A class declaration takes the following form:

 

class

<< declarations >>

endclass;

Classes allow const, val, var, static, readonly, uninitialized, procedure, method, and macro declarations. In general, just about everything allowed in a program declaration section except types, segments, and namespaces are legal in a class declaration.

Unlike C++ and Object Pascal, where the class declarations are nearly identical to the record/struct declarations, HLA class declarations are noticably different than HLA records because you supply const, var, static, etc., declaration sections within the class. As an example, consider the following HLA class declaration:

 

type SomeClass: class

var
i:int32;

const
pi:=3.14159;

method incrementI;

endclass;

Unlike records, you must put each declaration into an appropriate section. In particular, data fields must appear in a static, readonly, uninitialized, or var section.

Note that the body of a procedure or method does not appear in the class declaration. Only prototypes (forward declarations) appear within the class definition itself. The actual procedure or method is declared elsewhere in the code.

Classes, Objects, and Object-Oriented Programming in HLA

HLA provides support for object-oriented program via classes, objects, and automatic method invocation. Indeed, supporting method calls requires HLA to violate an important design principle (that HLA generated code does not disturb values in any registers except ESP and EBP). Nevertheless, supporting object-oriented programming and automatic method calls was so important, an exception was made in this instance. But more on that in a moment.

It is worthwhile to review the syntax for a class declaration. First of all, class declaration may only appear in a type section within an HLA program. You cannot define classes in the VAR, STATIC, STORAGE, or READONLY sections and HLA does not allow you to create class constants17. Within the TYPE section, a class declaration takes one of the following forms:

 

type

baseClass:

class

Declarations, including const,

val, var, and static sections, as

well as procedures, methods, and

macros.

endclass;

 

derivedClass:

class inherits( baseClass )

Declarations, including const,

val, var, and static sections, as

well as procedure and method prototypes, and

macros.

endclass;

 

Note that you may not include type sections or namespaces in a class. Allowing type sections in a class creates some special problems (having to due with the possibility of nested class definitions). Namespaces are illegal because they allow type sections internally (and there is no real need for namespaces within a class).

Note that you may only place procedure, iterator, and method prototypes in a class definition. Procedure and method prototypes look like a forward declaration without the forward reserved word; They use the following syntax:

 

procedure procName(optional_parameters); options

method methodName(optional_parameters); options

iterator iterName( optional_parameters ); optional_external

 

" procName ", " iterName ", and " methodName " are the names you wish to assign to these program units. Note that you do not preface these names with the name of the class and a period.

If the procedure, iterator, or method has any parameters, they immediately following the procedure/iterator/method name enclosed in parentheses. The parentheses must not be present if there are no parameters. A semicolon immediately follows the parameters, or the procedure/method name if there are no parameters.

Class procedure and method prototypes allow two options: a @RETURNS clause and/or an @EXTERNAL clause. The @ pascal, @cdecl, @stdcall, @nodisplay and @ noframe options are not allowed in the prototype. See the section on procedures for more details on the @ returns and @ external clauses. The iterator only allows the @external option.

Unlike procedures and methods, if you define a macro within a class you must supply the body of the macro within the class definition.

Consider the following example of a class declaration:

 

type

baseClass:

class

var

i:int32;

 

procedure create; @returns( "esi" );

procedure geti; @returns( "eax" );

method seti( ival:int32 ); @external;

 

endclass;

 

By convention, all classes should have a class procedure named " create ". This is the constructor for the class. The create procedure should return a pointer to the class object in the ESI register, hence the @returns( "esi" ); clause in this example.

This procedure includes two accessor functions, geti and seti , that provide access to the class variable " i ". Note that HLA classes do not support the public, private, and protected visibility options found in HLLs like C++ and Delphi. HLA's design assumes that an assembly language programmers are sufficiently disciplined such that they will not access fields that should be private18.

Of course, the class' procedures and methods must be defined at one point or another. Here are some reasonable examples of these class definitions (a full explanation will appear later):

 

procedure baseClass.create;

begin create;

 

push( eax );

if( esi = 0 ) then

malloc( @size( baseClass ));

mov( eax, esi );

endif;

mov( baseClass._VMT_, this._pVMT_ );

pop( eax );

ret();

end create;

 

 

procedure baseClass.geti; @nodisplay; @noframe;

begin geti;

 

mov( this.i, eax );

ret();

end geti;

 

method baseClass.seti( ival:int32 ); @nodisplay;

begin seti;

 

push( eax );

mov( ival, eax );

mov( eax, this.i );

pop( eax );

end seti;

 

These procedure and method declarations look almost like regular procedure declarations with one important difference: the class name and a period precede the procedure or method name on the first line of the procedure/method declaration. Note, however, that only the procedure or method name appears after the BEGIN and END clauses.

Another important difference is the procedure options. Only the @ nodisplay /@ display , @ noalignstack/@alignstack , and @ noframe/@frame options are legal here (the converse of the class procedure/method prototype definitions which only allow @ external and @returns ). Note that call procedures, methods, and iterators do not support the @ pascal, @cdecl , or @ stdcall procedure options (they always use the Pascal calling convention).

Class procedures and methods must be defined at the same lex level and within the same scope as the class declaration. Usually class declarations are a lex level zero (i.e., inside the main program or within a unit), so the corresponding procedure and method declarations must appear at lex level zero as well. Of course, it is perfectly legal to declare a class type within some other procedure (at lex level one or higher). If you do this, the class procedure and method declarations must appear at the same level.

 

Inheritence

HLA classes support inheritence using the INHERITS reserved word. Consider the following class declaration that inherits the fields from the baseClass declaration in the previous section:

 

derivedClass:

class inherits( baseClass )

var

j:int32;

f:real64;

endclass;

 

This class inherits all the fields from baseClass and adds two new fields, j and f . This declaration is roughly equivalent to:

 

derivedClass:

 

var

i:int32;

 

procedure create; @returns( "esi" );

procedure geti; @returns( "eax" );

method seti( ival:int32 ); @external;

var

j:int32;

f:real64;

endclass;

 

It is "roughly" equivalent because there is no need to create the derivedClass.create and derivedClass.geti procedures or the derivedClass.seti method. This class inherits the procedures and methods written for baseClass along with the field definitions.

Like records, it is possible to "override" the VAR fields of a base class in a derived class. To do this, you use the OVERRIDES keyword. Note that this keyword is valid only for VAR fields in a class, you may not override static objects with this keyword. Example:

 

derivedClass:

class inherits( baseClass )

 

procedure create; @returns( "esi" );

procedure geti; @returns( "eax" );

method seti( ival:int32 ); @external;

var

overrides i: dword; // New copy of i for this class.

j:int32;

f:real64;

endclass;

 

 

Occasionally, you may want to override a procedure in a base class. For example, it is very common to supply a new constructor in each derived class (since the constructor may need to initialize fields in the derived class that are not present in the base class). The override19 keyword tells HLA that you intend to supply a new procedure or method declaration and you do not want to call the corresponding functions in the base class. Consider the following modifications to derivedClass that override the create procedure and seti method:

 

derivedClass:

class inherits( baseClass )

var

j:int32;

f:real64;

override procedure create;

override method seti;

endclass;

 

When you override a procedure or method, you are not allowed to specify any parameters or procedure options except the @external option. This is because the parameters and @returns strings must exactly match the declarations in the base class. So even though seti in this derived class doesn't have an explicit parameter declared, the " ival " parameter is still required in a call to seti .

Of course, once you override procedures and methods in a derived class, you must provide those program units in your code. Here is an example of a section of a program that provides overridden procedures and methods along with their declarations:

 

 

type

 

base: class

var

i:int32;

procedure create;

method geti;

method seti( ival:int32 );

endclass;

 

derived:class inherits( base )

var

j:int32;

override procedure create;

override method seti;

method getj;

method setj( jval:int32 );

endclass;

 

procedure base.create; @nodisplay; @noframe;

begin create;

 

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &base._VMT_, this._pVMT_ );

mov( 0, this.i );

pop( eax );

ret();

end create;

 

method base.geti; @nodisplay; @noframe;

begin geti;

 

mov( this.i, eax );

ret();

end geti;

 

method base.seti( ival:int32 ); @nodisplay;

begin seti;

 

push( eax );

mov( ival, eax );

mov( eax, this.i );

pop( eax );

end seti;

 

procedure derived.create; @nodisplay; @noframe;

begin create;

 

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

// Do any initialization done by the base class:

call base.create;

// Do our own specific initialization.

mov( &derived._VMT_, this._pVMT_ );

mov( 1, this.j );

// Return

pop( eax );

ret();

end create;

 

method derived.seti( ival:int32 ); @nodisplay;

begin seti;

 

push( eax );

mov( ival, eax );

// call inherited code to do whatever it does:

(type base [esi]).seti( ival );

// Now handle the code that we do specially.

mov( eax, this.j );

// Okay, return to caller.

pop( eax );

end seti;

 

method derived.setj( jval:int32 ); @nodisplay;

begin setj;

 

push( jval );

pop( this.j );

end setj;

 

method derived.getj; @nodisplay; @noframe;

begin getj;

 

mov( this.j, eax );

ret();

end getj;

 

 

Abstract Methods

Sometimes you will want to create a base class as a template for other classes. You will never create instances (variables) of this base class, only instances of classes derived from this class. In object-oriented terminology, we call this an abstract class. Abstract classes may contain certain methods that will always be overridden in the derived classes. Hence, there is no need to actually supply the method for this base class. HLA, however, always checks to verify that you supply all methods associated with a class. Therefore, you normally have to supply some sort of method, even if it's just an empty method, to satisfy the compiler. In those instances where you really don't need such a method, this is an annoyance. HLA's abstract methods provide a solution to this problem.

 

You declare an abstract method in a class declaration as follows:

 

type

c: class

 

method absMethod( parameters: uns32 ); @abstract;

 

endclass;

 

The @ABSTRACT keyword must follow the @RETURNS option if the @RETURNS option is present.

The @ABSTRACT keyword tells HLA not to expect an actual method associated with this class. Instead, it is the responsibility of all classes derived from "c" to override this method. If you attempt to call an abstract method, HLA will raise an exception and abort program execution.

Classes versus Objects

An object is an instance of a class. In plain English, this means that a class is only a data type while an object is a variable whose type is some class type. Therefore, actual objects may be declared in the var or static section of a program. Here are a couple of typical examples:

 

var

b: base;

 

static

d: derived;

 

Each of these declarations reserves storage for all the data in the specified class type.

For reasons that will shortly become clear, most programmers use pointers to objects rather than directly declared objects. Pointer declarations look like the following:

 

var

ptrToB: pointer to base;

 

static

ptrToD: pointer to derived;

 

Of course, if you declare a pointer to an object, you will need to allocate storage for the object (call the HLA Standard Library " malloc " routine) and initialize the pointer variable with the address of the allocated storage. As you will soon see, the class constructor typically handles this allocation for you.

 

Initializing the Virtual Method Table Pointer

Whether you allocate storage for an object statically (in the STATIC section), automatically (in the VAR section), or dynamically (via a call to malloc ), it is important to realize that the object is not properly initialized and must be initialized before making any method calls. Failure to do so will, most likely, cause your program to crash when you attempt to call a method or access other data in the class.

The first four bytes of every object contain a pointer to that object's virtual method table. The virtual method table, or VMT, is an array of pointers to the code for each method in the class. To help you initialize this pointer, HLA automatically adds two fields to every class you create: _VMT_ which is a static dword entry (the significance of this being a static entry will become clear later) and _pVMT_ which is a VAR field of the class whose type is pointer to dword. _pVMT_ is where you must put a pointer to the virtual method table. The pointer value to store here is the address of the _VMT_ entry. This initialization can be done using the following statement:

 

mov( &ClassName._VMT_, ObjectName._pVMT_ );

 

ClassName represents the name of the class and ObjectName represents the name of the STATIC or VAR variable object. If you've allocated storage for an object pointer using malloc , you'd use code like the following:

 

mov( ObjectPtr, ebx );

mov( &ClassName._VMT_, (type ClassName [ebx])._pVMT_ );

 

In this example, ObjectPtr represents the name of the pointer variable. ClassName still represents the name of the class type.

Typically, the initialization of the pointer to the virtual method table takes place in the class' constructor procedure (it must be a procedure, not a method!). Consider the example from the previous section:

 

procedure base.create; @nodisplay; @noframe;

begin create;

 

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &base._VMT_, this._pVMT_ );

mov( 0, this.i );

pop( eax );

ret();

end create;

 

As you can see here, this example uses the keyword " this._pVMT_ " rather than " (type derived [esi])._pVMT_ " That's because " this " is a shorthand for using the ESI register as a pointer to an object of the current class type.

 

Creating the Virtual Method Table

For various technical reasons (related to efficiency), HLA does not automatically create the virtual method table for you; you must explicitly tell HLA to emit the table of pointers for the virtual method table. You can do this in either the STATIC or the READONLY declaration sections. The simple way is to use a statement like the following in either the STATIC or READONLY section:

 

VMT( classname );

 

If you need to be able to access the pointers in this table, there are two ways to do this. First, you can refer to the " classname._VMT_ " dword variable in the class. Another way is to directly attach a label to the VMT you create using a declaration like the following:

 

vmtLabel: VMT( classname );

 

The " vmtLabel " label will be a static object of type dword.

 

Calling Methods and Class Procedures

Once the virtual method table of an object is properly initialized, you may call the methods and procedures of that object. The syntax is very similar to calling a standard HLA procedure except that you must prefix the procedure or method name with the object name and a period. For example, assume you have some objects with the following types ("base" is the type in the examples of the previous sections):

 

var

b: base;

pb: pointer to base;

 

With these variable declarations, and some code to initialize the pointers to the " base " virtual method table, the calls to the base procedures and methods might look like the following:

 

b.create();

b.geti();

b.seti( 5 );

 

pb.create();

pb.geti();

pb.seti( eax );

 

Note that HLA uses the same syntax for an object call regardless of whether the object is a pointer or a regular variable.

Whenever HLA encounters a call to an object's procedure or method, HLA emits some code that will load the address of the object into the ESI register. This is the one place HLA emits code that modifies the value in a general purpose register! You must remember this and not expect to be able to pass any values to an object's procedure or methods in the ESI register. Likewise, don't expect the value in ESI to be preserved across a call to an object's procedure or method. As you will see momentarily, HLA may also emit code that modifies the EDI register as well as the ESI register. So don't count on the value in EDI, either.

The value in ESI, upon entry into the procedure or method, is that object's "this" pointer. This pointer is nececessary because the exact same object code for a procedure or method is shared by all object instances of a given class. Indeed, the "this" reserved word within a method or class procedure is really nothing more than shorthand for "(type ClassName [esi])".

Perhaps an obvious question is "What is the difference between a class procedure and a method?" The difference is the calling mechanism. Given an object b , a call to a class procedure emits a call instruction that directly calls the procedure in memory. In other words, class procedure calls are very similar to standard procedure calls with the exception that HLA emits code to load ESI with the address of the object20. Methods, on the other hand, are called indirectly through the virtual method table. Whenever you call a method, HLA actually emits three machine instructions: one instruction that load the address of the object into ESI, one instruction that loads the address of the virtual method table (i.e., the first four bytes of the object) into EDI, and a third instruction that calls the method indirectly through the virtual method table. For example, given the following four calls:

 

 

b.create();

b.geti();

pb.create();

pb.geti();

 

HLA emits the following 80x86 assembly language code:

 

lea esi, [ebp-12] ;b

call ?8_create

 

lea esi, [ebp-12] ;b

mov edi, [esi]

call dword ptr [edi+0] ;geti

 

mov esi, dword ptr [ebp-16] ;pb

call ?8_create

 

mov esi, dword ptr [ebp-16] ;pb

mov edi, [esi]

call dword ptr [edi+0] ;geti

 

HLA class procedures roughly correspond to C++'s static member functions. HLA's methods roughly correspond to C++'s virtual member functions. Read the next few sections on the impact of these differences.

 

Non-object Calls of Class Procedures

In addition to the difference in the calling mechanism, there is another major difference between class procedures and methods: you can call a class procedure without an associated object. To do so, you would use the class name and a period, rather than an object name and a period, in front of the class procedure's name. E.g.,

 

base.create();

 

Since there is no object here (remember, base is a type name, not a variable name, and types do not have any storage allocated for them at run-time), HLA cannot load the address of the object into the ESI register before calling create. This situation can create some big problems in your code if you attempt to use the "this" pointer within a class procedure. Remember, an instruction like "mov( this.i, eax );" really expands to "mov( (type base [esi]).i, eax );" The question that should come to mind is "where is ESI pointing when one makes a non-object call to a class procedure?"

When HLA encounters a non-object call to a class procedure, HLA loads the value zero (NULL) into ESI immediately before the call. So ESI doesn't contain junk but it does contain the NULL pointer. If you attempt to dereference NULL (e.g., by accessing " this.i ") you will probably bomb the program. Therefore, to be really safe, you must check the value of ESI inside your class procedures to verify that it does not contain zero.

The base.create constructor procedure demonstrates a great way to use class procedures to your advantage. Take another look at the code:

 

procedure base.create; @nodisplay; @noframe;

begin create;

 

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &base._VMT_, this._pVMT_ );

mov( 0, this.i );

pop( eax );

ret();

end create;

 

This code follows the standard convention for HLA constructors with respect to the value in ESI. If ESI contains zero, this function will allocate storage for a brand new object, initialize that object, and return a pointer to the new object in ESI21. On the other hand, if ESI contains a non-null value, then this function does not allocate memory for a new object, it simply initializes the object at the address provided in ESI upon entry into the code.

Certainly you do not want to use this trick (automatically allocating storage if ESI contains NULL) in all class procedures; but it's still a real good idea to check the value of ESI upon entry into every class procedure that accesses any fields using ESI or the "this" reserved word. One way to do this is to use code like the following at the beginning of each class procedure in your program:

 

if( ESI = 0 ) then

 

raise( AttemptToDerefZero );

 

endif;

 

If this seems like too much typing, or if you are concerned about efficiency once you've debugged your program, you could write a macro like the following to solve your problem:

 

#macro ChkESI;

#if( CheckESI )

if( ESI = 0 ) then

 

raise( AttemptToDerefZero );

 

endif;

#endif

#endmacro

 

Now all you've got to do is stick an innocuous " ChkESI " macro invocation at the beginning of your class procedures (maybe on the same line as the "begin" clause to further hide it) and you're in business. By defining the boolean constant " CheckESI " to be true or false at the beginning of your code, you can control whether this "inefficent" code is generated into your programs.

 

Static Class Fields

There exists only one copy, shared by all objects, of any static data objects in a class. Since there is only one copy of the data, you do not access variables in the class' static section using the object name or the "this" pointer. Instead, you preface the field name with the class name and a period.

For example, consider the following class declaration that demonstrates a very common use of static variables within a class:

 

 

program DemoOverride;

 

#include( "memory.hhf" );

#include( "stdio.hhf" );

type

 

CountedClass:

class

static

CreateCnt:int32 := 0;

procedure create;

procedure DisplayCnt;

endclass;

 

 

procedure CountedClass.create; @nodisplay; @noframe;

begin create;

 

push( eax );

if( esi = 0 ) then

malloc( @size( base ));

mov( eax, esi );

endif;

mov( &CountedClass._VMT_, this._pVMT_ );

inc( this.CreateCnt );

pop( eax );

ret();

end create;

procedure CountedClass.DisplayCnt; @nodisplay; @noframe;

begin DisplayCnt;

stdout.put( "Creation Count=", CountedClass.CreateCnt, nl );

ret();

end DisplayCnt;

 

 

 

var

b: CountedClass;

pb: pointer to CountedClass;

 

begin DemoOverride;

 

 

CountedClass.DisplayCnt();

b.create();

CountedClass.DisplayCnt();

 

CountedClass.create();

mov( esi, pb );

CountedClass.DisplayCnt();

 

end DemoOverride;

 

In this example, a static field ( CreateCnt ) is incremented by one for each object that is created and initialized. The DisplayCnt procedure prints the value of this static field. Note that DisplayCnt does not access any non-static fields of CountedClass . This is why it doesn't bother to check the value in ESI for zero.

Program Unit Initializers and Finalizers

HLA does not automatically call an object's constructor like C++ does. Also, there is no code associated with a unit that automatically executes to initialize that unit as in (Turbo) Pascal or Delphi. Likewise, HLA does not automatically call an object's destructor. However, HLA does provide a mechanism by which you can automatically execute initialization and shut-down code without explicitly specifying the code to execute at the beginning and end of each procedure. This is handled via the HLA " _initialize_ " and " _finalize_ " strings. All programs, procedures, methods, and iterators have these two predeclared string constants (VALUE strings, actually) associated with them. Whenever you declare a program unit, HLA inserts these constants into the s