The MPS language

Introduction

MPS in Swedish means My Programming Language. It dates back to about 1974, when I built a computer from loose integrated circuits like shift registers, counters, adders and the like. This was the time when you could buy the very first Microprocessor, Intel 4004. I discovered that it took a full page of code to add two 16 bit numbers, so I decided to build my own computer instead. For it, I made a language, that worked like assembler, but which could be written so that it looked like higher level programming.

My computer, which was called Dator 108 (Computer 108), was an accumulator machine, where the computational results accumulated in an accumulator register.

Later I became acquainted with the principles of stack machines, so I developed a new language called MNPS. However, there was never any machine for it.

When I met with the Propeller computer, I used that language for it, though, under the name of Myra, from the Swedish word for ant. Still the Propeller wasn't actually a stack machine, so it costed code size and computation time to adapt the stack language to a non-stack-machine.

So I decided that the MPS language would make a come back after 40 years, but now as a language for the Propeller computer. I have also thought of adapting MPS to the ARM architecture, i.e. a Raspberry PI, but we will see how that goes.

Virtual architecture

So the language is thought for a machine, which has an accumulator register. If we write

+op it means accu + op ->accu where accu is the value of the accumulator register accu

If we write x +u ->x what happens is this:
  1. The value of the variable x (in memory) is loaded into accu
  2. The sum of the value of accu and the variable u (in memory) is written into accu
  3. The value of accu is written back to the memory to the variable x
In fact, accu is also a cell in the memory. Indeed, all this is a waste of resources in the Propeller computer, where all memory cells are equal, and data can be transferred from any cell to any other cell. But it is a way of standardizing memory use, to make it easier to read and write programs.

Except accu, the virtual machine has a program counter, though, in the Propeller, the program counter is a specialized register. It has an index register index, which allows for relative adressing into arrays and the like. There is also a register that holds the current memory adress, but this is only temporary in character.

The code x +u ->x is typical. The code is written "chronologically", i.e. you have to compute the value of x before you can store it. Hence we use a '->' symbol, rather than a '=' symbol or a ':=' -symbol. Instruction are separated by spaces (and also by linefeeds if you write code on more than one line). '+u' is one instruction. Hence it doesn't contain a space character. So the placement of spaces is crucial.

All this is about a portion of the code, called the executable part (what the program actually does). We will come back to this soon, but first something about the code as a whole.

General program structure

An MPS-application is written into on single file, though that file may refer to other, so called object files. That file has the following parts:

A line starting with a '--'-sign is a comment. Anything that follows a '--'-sign on a line is also a comment.

Process code

As mentioned, processes start with the process keyword and end with a backslash. Between there, there are three segments of code. The first is a description of the process. The process may then call functions. These can appear in the second segment. But they can also be loaded from external files. For this purpose, there is the third segment, which is a load segment.

Let's return then to the first segment; the description of the process proper.

So it starts with the process keyword followed by the process name. This is the name, that is refered to in the execution pattern. Then follow the variable declarations. These contain the variable name and optionally a '='-sign and an initial value. (These "local" variables which reside in each cog's memory can be initialized.) Initial values may be given with

The variables are followed by the begin keyword, followed by the executable code.

A process doesn't have a 'return' statement. It may be an infinite loop, if you wish, with an initial part. You can end the process with an infinite empty loop, which you can describe as A:here (this means "Jump Always here"). Alternatively you can use a stop instruction. The difference between the stop and the empty loop, is that the stop instruction saves power, but it also disables all outputs.

Another type of end of a process is that a semaphore is changed, which may mean that the process is overwritten with new code. This may take a little time, though, so it may be wise to put a stop or A:here after it. Otherwise execution may continue out in the empty space, and you never know what will happen.

Then we come to the second segment, which contains function code. We come to that in the next section. Here, then, is an example of the third segment, i.e. the load segment: load from (ram.mpo) *ram.init *ram.write After load from comes the filename. The rest is a 'point list' of functions to load. Functions may be loaded from several object files, which requires several constructs as the one above.

We will return to object files in a while.

Function code

Functions which follow the process code are written almost like the process code.

The code starts with the function keyword followed by the function name, and optionally a so called suffix which is a hyphen (-) followed by a string. More about the suffix later. After that follows variable declarations as in the process code, the begin keyword, executable code, and last there is one and only one return instruction

Object files

Object files are collections of functions to be used by processes, and sometimes by other object files. Object files should contain such collections, which deal with some sort of "thing" in the external world, like a sensor, or an external memory. But they could also deal with more abstract things, like a communication interface standard, och a mathematical number type. It's a good convention to mark all these functions with an object prefix, followed by a dot. So as an example a function that gets the reading of a compass could be called compass.get, and if some initialization of the compass is required, there could be a function compass.init.

Except for the functions, an object file can contain variables, called field variables. The variables are declared on one line each. The list is prececed by the keyword fields and is terminated by the keyword endfields. The field variables represent the state of the object. A common type of field variable is the pin number to which a device is connected. These pin numbers are usually set in an init function.

A function declaration starts with the function keyword, and ends when the keyword function appears again. Last in the function declaration, there may be a load section, to load other functions called. There has to be such a load section, also when a function calls a function within its own object file. The load section has the same syntax as the load section in process code.

Object oriented language?

MPS is not fully an object oriented language. But it has the most essential aspect of object orienting. This is that a program should be built with components, that correspond to things, that belong to the problem to be solved.

In "real" object oriented languages, the objects are instances of classes. Instances of a class all have the same functions (called methods in OOL-jargon) but the instances have individual field variables. In MPS we can arrange this by letting the field variables be arrays. In a true OOL, we could write:

Dog fido = new Dog(); fido.give(food); After this, the variable fido.hungry would be false.

In MPS we could do this: -- fido is dog number 3 (this is just a comment) ( 3 , food , )give Now [3] hungry is false. (Here again, everything is written in "chronological order"; we will come back to this in a while.)

The OOL solution is more elegant, but the MPS solution works. It's just that you shouldn't forget to put an instance number first in your function calls. (If I am not wrong, the "MPS solution" is the one used for the Ada language, which wasn't originally an OOL)

One thing that is definitely missing in MPS is heritage, i.e. that an object can be a specialization of another, like that a student is a special form of a human.

Variable types and scopes

We distinguish between the following type of variables:

Everything we write in MPS is converted to Propeller assembler (and some SPIN). Propeller assembler doesn't have very much notion of scope, though. If ever you declare some variable twice with the same name, you will get an assembler warning: "Symbol is already declared". If you declare it only once, it is still accessible throughout a process. Hence, to make a private variable inaccessible outside a function, requires a special effort.

Here are the principles:

Executable code

So now, at last, we come to the executable code, i.e. the instructions. The principle is simple. If you chop up the code after begin along spaces and line feeds, you get the individual instructions, which are executed in that order.

There are a number of categories of instructions.

Binary instructions

These are "mathematical" operations on data, and two data are involved (hence the name binary). One of them is accu, and the other one, which we call op is refered to in the instruction. op can be

Then, quite in general, if ¤ is an operation, ¤op means: perform the operation between accu and op, and write the result into accu.

The operations are

Unary instructions

These instruction only involve one piece of data. namely the value of accu. So they are mappings from accu to accu. Some of them don't affect accu either.

Function call and return

Labels

An "instruction" (i.e. a code fragment surrounded by spaces or linefeeds) which starts with a colon (:) is a label. It is not an instruction to be executed, but it marks a place in the sequence of instructions to jump to.

In the assembler code, there is an effort to place the labels at existing instructions, but this fails, if there is reason to put more than one label on one instruction. In that case, a nop-instruction is inserted to carry one of the labels.

Jumps

In Propeller assembler (and ARM by the way) all instructions can be conditional. In Myra I used this, but in MPS I don't. The reason is that there is a "security risc". Wether a condition is satisfied or not is measured by means of two status registers. These are updated when you want them to be. Now, you can have a chain of conditioned instructions. While you execute them, you don't change the status registers. But what if you have a function call? Are you sure that the function doesn't contain similar chains of conditioned instructions. If so, you would change the status registers in there. This seems to be dangerous to me. I have thought of different kinds of stack mechanisms, where the status before the function call could be restored. But it seems complicated.

So my decision for MPS was to have only conditional jumps. Conversly all jumps are conditional, but there is an always condition A.

The condition is a relation, R, between accu and a reference value ref. If the relation is satisfied, the jump is done. The syntax for the jump is now:

Rref:goal If accu R ref (read as: accu related with R to ref) then the jump is made. These instructions are thus characterized by the presence of a colon, but the colon is not the first character. The relations are: if you let true be equal to the 'true' value, i.e. the LSB equal to 1, you can write =true:goal, but there is a simpler notation just with T:goal. Analogously there is an instruction F:goal.

Finally, the always condition deviates from the standard syntax in that there is no reference value to compare to. The syntax is simply: A:goal The ref value can be given in the same ways as the value of op, but global variables are not allowed, and the notation #var for the adress of var is not supported.

Here's just an example: x >30:much Execution goes to the label :much if x is greater than 30.

Preprocessing. Extended MPS

There are certain instructions, that are replaced by others by the compiler in a preprocessing step. This preprocessing allows you to write more readable code. We call the "instructions" processed in this way Extended MPS elements.

The preprocessing right now deals with conditional code (if-then-statements). It allows you to write something like this:

if(x >y){ x +1 ->x } else { x -1 ->x } with a hopfully obvious meaning. It's a system that lets x follow y. The extended MPS elements are if(, >y){, }, else, { and a second } . The principle is that all these elements are instructions in themselves, so they should be surrounded by spaces. A maybe unfortunate exception is the if( element, which doesn't need to be followed by a space (I thought that would be ugly).

To explain this, let's start with the simpler case without an else branch: if(x >y){ x +1 ->x } First of all, the word "if" may be important to you. It signals that there is a condition evaluation down the line. But the computer doesn't care. So the compiler removes if( completely. Then the condtional job is done by skipping over the code within { and }, if the condition is not satisfied. We do so by means of a skip label, that we create for the purpose. So the above code is modifed to: x <=y:s11 x +1 ->x :s11 (We use the possibility to put a label anywhere on a line in MPS).

Before I show the complete translation scheme, I'll mention a variant of the conditional jump. A conditional jump looks like for example #y:target. Now, if you have introduced the if( element, it would be nice to be able to have a balancing right parenthesis. So, just put it there!: #y):target. The compiler will remove the parenthesis for you.

Here now, the translations that the compiler does: So now we can see how this example if(x >y){ x +1 ->x } else { x -1 ->x } is translated: x <=y:S1 x +1 ->x A:E1 :S1 x -1 ->x :E1 I have tried to write the compiler in such a way, that these condtional statements can be nested into each other, but I haven't tested it yet. (The trick is to put the created labels on a stack).

One may wonder how these extended MPS elements differ from standard MPS instructions. Well, standard MPS is what we call a context free language. Every instruction in MPS is independent of its context. The only way, that the instructions communicate, is with accu, index and the program counter (and possibly through the outer world through I/O instructions). For the extended MPS elements, it is not so. The >0){ has to know where the label to jump to goes. This is handled by the compiler, which has to have some overview of the code.

The NEXT instruction

The NEXT instruction is in a sense a binary instruction, but none of the operands is accu.Here's an example

NEXT(i):loop i is a variable, and it is decremented with one. If i is not equal to zero, execution goes to the label :loop. This instruction is directly based on the djnz instruction in Propeller assembler (decrement and jump if nonzero), which is popular in many other asembler languages too. It represents a simple way to make a loop with a defined number of turns. Note though that the variable i, if you want to use it in the loop, evolves backwards, and it never reaches the value zero.

Index instructions and arrays

The following is an index instruction

[op] It loads the value of op into the systems index register index. Normally maybe, op is the name of a variable, whose value is loaded into the index register. But op can also be something identifiable as a number. However names of global variables are not allowed.

If the bracket is empty as [] then it is equivalent to [accu].

So, index is loaded. More importantly, the instruction, that immediately follows, is modified, so that the adress is augmented with index. In the normal situation, when op is a variable, we don't load that variable, but the variable that has an adress index steps higher up in memory. The 'step' here is a step in the longword sense, i.e. we move up 32 bits, or equivalently 4 bytes in the memory.

So, in this way, we can move up in arrays of data. But where do the arrays come from?

Well, there are two ways of declaring arrays.

Writeback instructions

Writeback instructions deviate from the pattern, that all computations go through accu. These instructions are for saving space and time, because they allow you to do an operation in a single assembler instruction. Here's an example:

+1,x< 1 and x are added, and then the result is written back to x, which is marked by the "backwards arrow" '<'. You can do this for any of the binary operations. Particularly you can do it for the load instruction: 320,x< will load the value 320 into x.

These instructions should be used when needed for speed and small program size, as they are not terribly readable. Note also, that accu is never affected, so you can't go on and imagine that the result of the computation is available in accu.

There is one more type of one-assembler-instruction instruction, the n>x instruction, that we will encounter in the next section.

Function calling

If we want to call a function fun with two arguments x and y, and then want to deposit the result in z, we write this as:

( x , y , )fun ->z In mathematics we would write z = f(x,y) Here, again, we have the "chronological order" in MPS. We have to collect the arguments to the function before we can call it, and we have to call the function before we can store or use the result.

The spaces in the MPS code are important, because both '(' and ',' are actually instructions. The principle is that we pass arguments to functions over a standardized set of variables called arg0, arg1, ...argi,... arg5. The '(' resets a kind of index i to 0 in the compiler. ',' stores the current value of accu into the variable argi, and increments i. Most of this just happens in the compiler. In real time, ',' is just a store instruction, and '(' is nothing at all. Finally, when all the necessary arguments have been brought into the argi variables, we can call the function, which is the ')fun'-instruction, which is a subroutine call to fun.

This mechanism means that you can have expressions in the function call. The system just grabs the result in accu and transfers it as an argument. But you can't have function calls in these expressions. Then you would have two '('-instructions before the first ')'-instruction, and the system makes no attempt to find out what that would mean.

The expression can also be empty for the first argument in the function, if you know that accu contains the right value.

If you want to pass only one argument to a function, you can just as well use accu to pass that argument. Then the call is simply: ( x )fun The '('-instruction doesn't do anything here, but it doesn't do any harm either. Some functions take no argument at all, and in that case, I have used the convention to leave out the '(' and just write: )fun ')' here simply symbolizes a jump.

A word of warning now. All functions use the same variables arg0 ... arg5. So if you call a function in a function, the new call will overwrite your arguments. So if you need them after that second call, you have to rescue them before the second call.

The variables argi can be used directly, but if you want to move their values over to other variables, there is a fast instruction for this: 3>y is equivalent to arg3 ->y but it is realized as a single assembler instruction.

As in most languages, a function returns a single result. This is done through accu, which is loaded with the result just before the return instruction. If you wish to return more results, you can do so through the argi variables, and in that case, the n>x instructions may be usefull.

In a language like Java, the standard method for returning more than one result, is to pack the result into an instance of a class, and then return a pointer to that instance (though in Java one has tried to make the concept of pointer invisible. One returns the 'name' of the instance, though technically this name symbol may be a pointer).

Another popular method of returning result, is to let the user of a function indicate where he wants his result deposited. If we for example want a result as a string, we can declare that string, say as txt. Then we send the adress of txt to the function as #txt. Now, the function knows where to put the result. Another question is how to do it. The next section is about that, and that section is about arrays again.

The symbol L0

The symbol L0 (Local zero) represents an array which fills out the local memory of a cog. (If we load a process proc into a cog, the name of the first word in the cog, must be proc, because after a cog has been loaded with instructions, the only thing the system can do, is to jump to the first word. Hence the adress of the array L0 in this case is the same as the adress of proc). Now assuming that we know the adress adr = #var, we can read the value of var

[adr] L0 or we can write to it through [adr] ->L0 This gives us a mechanism to deposit our result in the right place.

In line assembler code

A line starting with a semicolon is imported directly into the assembler code, as it is. Hence, this is supposed to be a line written in Propeller assembler. Such lines are used in cases, where something can't be expressed in MPS. Writing assembler lines like this may require an extra effort. One case is suffixing. If you use a local variable in MPS, the suffix is added automatically when you use the variable, but for the in line assembler code, you have to add the suffix manually.


The std.mpo object file

The object file std.mpo is a little like the Java package Java.lang. It contains standard technical and mathematical concepts, that can be used by any program. The functions in std.mpo are written in MPS except for a few lines, that are assembler in line code. I have thought that MPS is effective enough, so that I wouldn't need to go to assembler. The functions in std.mpo fall in a few categories, that we will deal with in different sections.

Pin manipulation functions

Propellers and similar processors communicate with the external world through 'pins' which are pins on the processor chip. The Propeller has 32 such pins, and they are exclusively digital pins. They can be set to be input pins, which measure the input voltage, and interpret that as a 0 or as a 1. In this mode they have high impedance. Or they can be output pins, with "totem-pole-configuration", so that they can deliver either a high voltage (1) i.e. 3.3volts, or zero voltage (0), in both cases with low impedance. As the pins are 32 in numbers, they directly correspond to the bits of a 32 bit number. For this, there are the following functions:

If you want to read all the inputs simultaneously, there is a variable ina which contain the whole bit pattern as a 32 bit word.

In the same way, there is a variable outa, where you can set all 32 output pins, by writing a 32 bit value to outa. In most cases, this is not very practical. You want to leave some bits unaffected, when you send out information to the pins. This is handled readily with the outpins function.

There is also a variable dira which controls which pins are output pins. setout, setin and setdir write to that variables.

ina, outa and dira are in reality physical registers, but they are mapped to the adress space of each cog's local memory. There are other such registers, like the real time counter, cnt.

As these registers are mapped to the cog's adress spaces, you can set them in each cog, and you can set different values to them. The rules then are
  1. If some cog thinks that a pin should be an output pin, then it is an output pin.
  2. If some cog thinks that an output pin should be 1, then it is 1.
ina, cnt and others are passive input registers, and you can read their values from any cog.

Mathematical functions

These functions are:

Serial communication

Serial communication here follows a standard called RS232, but which officially has changed name to EIA232. It is a serial communication without clock signal. Hence, communication can take place on a single line + ground. The line is high, when nothing is going on. When the line is going down, this is the beginning of a start pulse. It is then followed by a number of datapulses, usually 8, which come with fixed time intervals. Finally there is one or two stop pulses, which take the line high again. The fixed time interval is an important parameter in the protocol. It is usually given with its inverse, which is the number of pulses per second. There is a long row of standard values for this so called baud rate. The second parameter is, on which computer pin data come in or out.

The standard is, as mentioned a format of 8 bits (one byte) but std.mpo also has functions for 9 bits and 32 bits. The 9 bit programs are intended for some cases, when one wants to transmit both data and commands. The data words may contain 8 bits, and in that case, the ninth bit is needed to distinguish between commands and data.

When using this protocol, it is important that the receive function is called before the data are expected. Otherwise, data are lost. If called early enough, the receive function will wait for the data. To make sure that the receive function is called early enough, it often has to reside in its owh process. When the data are received, they are quickly deposited somewhere where other processes can read them, and then the process returns to the receive function, to be ready for new data.

Here are the RS232 functions:

The 8 bit functions are further supported by buffering functions, They may be used when the user of the data temporarily uses the data at a slower pace, than they are delivered.

When the buffering funtctions are used, the the receiving process deposits data in a buffer with a call )wbuff with the received data in accu. Then some other process can get the data in the right order and without loss by making a call )rbuff. The buffer is of limited size however, so in average, the using process can't be slower than the receiving process. To use these functions, one has to declare global variables: iw ir buff[128] The buffer size is 128 words, which means that the buffer holds 128 bytes. The size can be changed to another power of 2, but this requires a slight change of wbuff and rbuff.

Analog input

These functions work for a specific A/D-converter Microchips MC3208, which is connected to the Propeller chip to given pins. The analog data are brought to the computer in serial form according to a protocol called SPI. The protocol is also used for ordering conversions. The converter has 8 channels, and data are converted to 12 bit numbers. Two functions handle this:

Number conversions

These two functions construct strings for typing the value of numbers. The numbers are alsways viewed as 32 bit integers, and they are converted to decimal or hexadecimal form. For the decimal versions, negative numbers are preceded by a minus-sign.

===========================================================

The Myra language is a stack based, higher order language, that can be compiled into Propeller (TM) assembler code. Propeller is a trademark of the Parallax company, that has developed and manufactures the Propeller processors. As a language it is fairly universal, but some of its features make use of features in the Propeller assembler language. Above all, the Propeller computer is a parallell processing computer with 8 separate processes, called cogs, communicating via a common memory.

The language is named after the Swedish word 'myra', which is 'ant' in English, as it is a stack based language. (Ants build stacks in Swedish and heaps in English, but these words are used more or less interchangeably in computer science)

Compilation

A Myra program called 'example.myr' can be compiled using

java Myra example   (no need to write '.myr')
This generates a number of .spin files, one for each cog used in the system. The filenames are derived from a system name, given on a system-line in the Myra program. The files are then named
systemname1.spin
systemname2.spin
systemname3.spin
etc.
They are assembled and loaded from the Propeller environment, by opening the first of these files. The Propellent environment can also be used.

The generated code is slower than manually made assembler code, as there is some overhead for managing the stack, but the code is much faster than spin code.

Stack based programs

A stack is a heap or stack of values stacked on top of each other. You can only enter data on the top of the stack, and only the value currently on the top of the stack is visible. Normally when you use a value, it is simultatneously lifted off the stack, making the value under it visible. The normal operations are

The functions can be normal operations like + or *, or they can be any function with a name like sin. This is the elegance with stack based programs: operations and functions look the same. This also means that there is a standardized way of supplying arguments to functions, and retrieve the result. Unlike most programming languages, a function is free to deliver more than one value as a result.

The language has four special stack operations Stack machines inherently work with ?ukaszevicz notation, also called reversed Polish notation, or postfix notation. This means that all arithmetic expressions can be written without parentheses, and these expressions can be treated without parsing. This simplifies the compiler. The presence of a parenthesis in normal code, implies that there is a need for a temporary variable. In a stack based system you can avoid temporary variables, and use the stack instead. You can develop it into an art to write stack based programs with as few variables as possible, but beyond some limit, this can be harmfull for the readability of the code.

Example program

As an example to look back to, when reading the following paragraph, here's a piece of code:

system colors global x y exec mix an process mix red = 3 green = 6 blue = 5 black = 7 colormask = 7 tred tgreen tblue trg ttot = 40960 begin colormask setdir :loop y ttot * 12 >> ->tblue ttot tblue - ->trg x trg * 12 >> ->tred trg tred - ->tgreen red tred [show] green tgreen [show] blue tblue [show] black 0 [show] go(loop) function show tshow begin ->tshow colormask outpins tshow wait return \ process an chx = 0 chy = 4 begin init_analog :loop chx analog 4 >> ->x chy analog 4 >> ->y go(loop) \ The program sends out different colors on an RGB-led. The led is connected to pins 0,1 and 2 on the Propeller, and it is connected so that, a diode is on, when the corresponding Propeller output is zero. Hence 7 means that all three diodes are off, so the color is "black". The color is controlled by two potentiometers, Their values are read in the second process. The light is controlled in the function show which sets up a "pure" color, and then lets that be on for a specified time. The times are set in the process mix, which proportionates a total time of 40960 tics to each of the colors, depending on the potentiometer settings.

General program structure

A Myra program has the following separate parts:

Structure of a process

A process starts with a process keyword, followed by the process name. This name should appear also in the exec-part of the program as mentioned above. (if it does not, this process will never be executed, and it will not even be compiled.)

After this, variable declarations follow, each on one line with a variable name optionally followed by an '=' sign and a value. These variables are of type long and are stored in the cog's local memory. There is also provisions for declaring arrays. We will come back to that later. Values are written with the same standard as in Propeller assembler, i.e. a '$' sign means hexacecimal notation, '%' means binary notation, and "a" means a character value, i.e. the ASCII-code for the character 'a' is stored. Symbols like '|<20', which means a 1 in the 20th position, can also be used. Also, expressions involving contstants, like 80000000/9600, can be used.

After the begin keyword follows the executable code. This code can call functions, and it can contain macros. These can be declared locally in the process itself, or externally on separate files. In the former case the functions follow the executable code of the process. The scope of these declarations is limited to within the process. In the latter case, the functions have to be referred to with a 'load from'-section.

A local function declaration starts with a function keyword, followed by the function name. The rest of the function declaration is similar to the process declaration, except that the function must finish with a return statement, while a process must not contain a return statement. (Processes are either infinite loops, or just terminate.)

A 'load from'-section starts with a load from keyword followed by a filename within parentheses. The functions to be loaded are then listed, each on one line starting with an asterisk (*). If functions are loaded from several files, each file requires its 'load from'-section.

It is recommended that these external functions are named in object oriented style with a dot, like display.write. As Propeller assembler is intolerant to these dots, they are replaced with underscores in the assembler code.

A process declaration ends with a \ (backslash). The backslash of the last process ends the whole program. These backslash signs are crucial for correct compilation.

Empty lines are ignored. Lines starting with '--' are comments, and are ignored by the compiler. If a line contains '--', the rest of the line after that is ignored as a comment.

External function files

External functions are written on external function files, preferably with a '.myo'-extention. These files don't contain any processes, and are thus not executable. They merely contain declarations of functions.

It is recommended that functions are collected into .myo-files in such a way, that each file represents some kind of an object, like a sensor, a display, something simulated etc.

Apart from functions (or methods in object oriented terminology) such an object file can also contain variables, that represent the state of the object. Such variables are called fields. They are declared in the beginning of the file, between a fields- line and a endfields-line. These variables can be assigned intial values like local variables in functions. They are stored in cog memory, and are reachable from all functions of an object. But they are not reachable from one process to another.Fields should not be referenced from outside the object functions. Instead they should be reached through so called 'put-' and 'get-' functions. The natural way for these functions to communicate with the outside world, is to use the stack.

The functions on a myo-file may further refer to other functions on other external files, or on their own file. If this is done, the function body should be followed by one or more 'load from'-section, as in the main program.

It is recommended that an object oriented style dot notation is used for the functions of an object. An object representing a display could have funtions called display.init, display.writetext etc. The dots in these names will be replaced with underscores in the assembler code.

Instructions

The instructions are found in the executable part of processes and functions. Instructions are separated with white spaces, or with line feeds if instructions are written on several lines. Except for what is mentioned about conditional statements later, the programmer is free to divide his code into lines as he wishes.

Instructions are interpreted in the following way by the compiler;

The "unpop" instruction maybe deserves the following comment. A write instruction
->x, saves the value on the top of the stack, but removes it from the stack. The combination ->x ' can then be seen as a modified write instruction, that maintains the stored value on the stack for future use.

Macros

A macro differs from a function, in that it is never called, i.e. there are no jumps to a macro. Instead, the macro code is substituted for the call of the macro directly in the code before compilation. This gives faster code, as the jumps to and back from the macro are avoided. But there is a penalty in memory usage, if the same macro is used more than once. If the macro is used n times, the macro code will be appear in the compiled program n times.

A macro is defined in a macro definition. On its first line is the keyword macro, followed by the macro name. On the second line is the macro code. Hence the macro code can not be longer than that it fits into one line. Macros are supposed to be small.

The 'call' of a macro (which isn't a call anyway) looks like the function call, i.e. it is the macro name placed between brackets. The compiler will substitute the macro name and the two brackets with the macro code.

Macro definitions can be placed in the main program file (the '.myr-file'), or in object files ('.myo-files). There, they can be used either by the main program or by the object-functions.However, the main program can use the macros only if the object file is referred to at all, through a loading of some object funtion. For macros no 'load from'-operation is necessary; the macros will be found, once the system has had reason to open the object file.

There is also a special macro file, for universally usefull macros, called macro.myo. The macros on this file are always available.

Name uniqueness is as urgent for macros as it is for functions. Hence it is recommended that macros on objectfiles are named with a 'dot'-notation like the functions.

A macro can use a macro, but currently this is limited to two levels, i.e. a macro can us a macro, but that macro cannot use a macro.

Technically macros don't add anything to the system functionality. There is no difference between using the macro concept, and substituting the macro code yourself. Macros are there to enhance the readability of the code. You substitute a piece of technical code with a name, which reflects what the code is good for.

Conditional statements

Instructions inscribed between curly brackets {...} are executed only if the condition preceding the left bracket is satisfied. The condition is based on the value on the top of the stack, at the time when the last ?-instruction was executed.

The available conditions are as follows:

If the condition is not satisfied, the instructions are executed as nop-statements, as this is the way Propeller assembler works.

A line can only contain one curly-bracket-pair, but that pair may contain as many instructions as one wishes. If there isn't space for all the instructions on one line one can continue on the next line with a new curly-bracket-pair, but the condition has to be repeated.

The principle for conditional statements here mimics what happens in Assembler and in the computer itself, but it is also in a way quite elegant. You can write something that works as if-then-else constructs, without letting the compiler construct lots of jumps. But there is a very important pitfall. Instructions inside curly brackets can change the status registers. The ?-instruction does that, but you probably learn pretty soon to avoid ?-instructions inside curly brackets. But the problem is with functions and assembler functions. Remember that arithmetic instructions are executed as assembler functions. Most of them don't change the status registers, but multiplication and division do. And many other assembler functions do.

If the status registers are changed anywhere between a ?-instruction and any conditions that is supposed to use it, things don't work the way they should. If it happens inside a curly bracket, the rest of the instructions in there will not be executed. If there is a curly bracket pair on the next line with the opposite condition (an 'else branch') then the code in there may be executed, even though it shouldn't.

As a remedy to this, there is a version of ? called ?s, which stores away the stack content that set the status registers, in a fixed place. We call this the status variable. Then, you can at any time restore the status register, which is done with a ! instruction. Place this instruction after any instruction that may have changed the status register. Here's an example: x ?s ={a b c * ! + ->z} if x is zero z is computed as a+bc. '!' protects for the multiplication, which might have changed the status registers.

All this i OK, unless a function that you call also uses the ?s-instruction. That will destroy the status variable. As a remedy to this, we have two other instructions, called S and R. S saves the status variable, and R restores it. As a matter of fact, they push and pop the values into a small stack. It only has a height of two now, but that should be sufficient.

Now the rule is: If a function uses a ?s instruction, it should start with an S instruction and end with an R instruction. Assembler functions give no problems; they don't use the ?s-instruction.

The codes for ?s, !, S and R are quite small. In fact ?s is no bigger than the normal ?.

(It would of course be tempting to use a more consistent stack concept for all this. The problem is that there can be several !-instructions for each ?s-instruction, and it is difficult for the compiler to know which ?s- and !-instructions belong together).

Booleans

The mechanism with conditional statements, as described above, is both elegant and powerful. But it makes it difficult to combine several conditions with boolean operators. To help with this, a notion of a boolean variables is introduced. Boolean variables are either true or false. These values ar represented as the integer 1 (a 1 only in the least significant bit), and 0 (all bits zero). With this representation, the operators &, or and xor, act on booleans, as one would expect. There are two instructions that generate boolean values on the stack:

  1. Z, which replaces the stack top with true, if the current value on the stack was exactly zero
  2. G, which replaces the stack top with true if the current value on the stack was greater than zero
Note that Z also serves as a complement function, which replaces true with false, and false with true.

Arrays

An array is defined simply by writing several values separated by commas after the equals sign:

M = 1,2,3,1,2,3,1,2,3
declares an array M with 9 elements. A string after the equals sign, like
message = "Hello world"
is interpreted as
message = "H","e","l","l","o"," ","w","o","r","l","d",0
"H" means the ASCII-code for the character H. The final 0 can be seen as the ASCII-code for the null-character. Hence we represent a so called null-terminated string.

Arrays can also be declared with a bracket-notation.
area[256]
reserves 256 long words, i.e. 1024 bytes for the array area.

Arrays can be adressed in two ways:
i M[]
will push the i:th element of the array M on the stack. The enumeration starts with zero. Hence
4 message[]
will push the ASCII-code for 'o' on the stack.

The other alternative is used to adress an arbitrary array. It uses the function @.
adress i @
loads the i:th element of the array starting at adress adress on the stack. If we want to load the first character in message, we do the following:
#message 0 @
This loads the "H" character on the stack.

Serial execution

The following code after the exec keyword:

exec proc1 proc2 sema/proc3a/proc3b proc4 makes the process proc3a and proc3b alternate in one and the same cog. They are controlled by the variable sema, which should also be declared as a global variable. When sema switches to the value 0, which it always does initially, proc3a is executed in the cog nr 3. If sema switches to 1, the cog is instead loaded with proc3b. The variable sema can then switch back to 0 or to higher values, and then other processes are executed, if they are mentioned after more "/":es.

When sema switches values, the current process is interupted abruptly, so it may be an advantage to let each process interrupt itself in a controlled way. Thus, it would be safer only to allow the processes controlled by sema to control sema.

After change of process the exiting process is completely wiped out, so the only way it can communicate with the world into the future, is by writing to global variables, or output pins.

The motivation for this whole concept is, that the limited size of the cog memories (512 long words) maybe is the strongest limitation to what you can do with a Propeller. As long as you can divide your computations into independent chunks, this concept allows you to do as big computations as you like, upto the limitation of the size of the global memory (which is 8k long words). Naturally the reloading of a process into a cog takes some time. A natural use, is when a process requires much code for initialization. Then you let one process (init) initialize, and another process (run) execute. Then you write exec ... s/init/run ... When init has done its work, it sets s to 1.

This is the case when the processes actually execute in series, but the concept allows you to let a process tree branch out, depending on the results of the computations.

A high level construct

Macros are treated by a preprocessor, which substitutes the macro name with the macro code. The same preprocessing can be used to handle high level constructs. I have made one, which mimics a standard for loop. It is made in stack processing style. The idea is, that if you load two numbers, k and n, on the stack, we can let that represent the interval between k and n, i.e. all the integers between k and n. Then we have a function [all:i] which produces all the integers between k and n. These values are produced consecutively in time in the variable i. These values are used in a number of statements enclosed in standard parentheses (()). This means that the code within () is repeated for each value of i. Here's an example:

1 ->nfact 1 n [all:i] (nfact i * ->nfact ) With this code, nfact is the factorial of n (called n!). Here's another example: 1 ->pn 1 n [all:i] (pn p * ->pn ) With this code, pn is the n:th power of p. Note that the variable i is not at all mentioned between ( and ).

As the system is now, the loop variable i has to be declared separately.

Assembler functions

A backbone in the system is the assembler resource file Assembler.spin. This file contains a number of usefull functions, that can be directly interfaced from Myra. How to add functions to this file is described later. Here's what it contains now.

A number of functions are handling the stack, and implement simple arithmetic operations. Multiplication is worth mentioning specially, as Propeller assembler doesn't have any multiplication instruction. The same is true for the division instruction, but it is mentioned further down. Non arithmetic instructions are logical and (&), logical or (or), exclusive or (xor), right shift (>>), and left shift (<<). The right shift is arithmetic, i.e. it preserves the sign of the number.

Adding assembler functions

The user can write his own Myra functions, as he likes. To speed up the programs, he can also add his own assembler functions to the file Assembler.spin.

These functions should have the following properties:

  1. They should be correct assembler code, of course. This is validated when assembling the complete program with the Propeller environment. The whole file Assembler.spin file is written, so that it could be assembled on its own, but it is now too big for that; it contains more than 512 instructions. The assembler functions are there to be called, and hence they need a properly labeled return statement. If the function is called fun, then the return statement should be labeled fun_ret.
  2. Each assembler function must have a header. From the assembler language point of view, this is a comment with the following format:
    '> name aliasname
    The name should be the same as the label on the first line of the actual assembler function (the "name" of the assembler function). The aliasname, can be the same as the name, or something shorter, down to a simple operator symbol like '+'.

    The system uses these headers to load the used assembler functions into the code. It does so by matching the call in the Myra code with the aliasname.
  3. The assembler functions should have a stack type interface with the rest of the system, i.e. arguments should be popped from the stack, and the results should be pushed on the stack.
  4. Functions should not unnecessarily have side effects, i.e. their only effect should be the result they deliver. However, when we use assembler functions to send things out to the output pins, or to control timing (by waiting for instance), these are of course unavoidable side effects
The stack handling is made using self modifying code, i.e. using the movs (move source) and movd (move destination) instructions. Globally there is an array, whose first element is called stack, and there is a stack pointer called sa ("stack adress"). Then the following code loads the item on the top of the stack (which is at the adress sa) to a local variable arg1: movs instr1,sa nop instr1 mov arg1,stack sub sa,#1 ...| arg1 long 0 The instruction instr1 is modified, so that data are fetched at the adress sa. This overwrites the adress 'stack', so we could write whatever we like there, but 'stack' is a litte bit informative. The final instruction moves the stack pointer down, so that the loaded value is no longer reachable. (The value is "popped" from the stack). The nop instruction is necessary, because the Propeller uses pipeling. Without it, the instruction instr1 would be loaded before it were modified.

You could look inte the file Assembler.spin to find examples to learn from.

Variable and symbol scopes

The scope of the global variables is of course global, i.e. the variable names can be used throughout the system.

Processes can not be "called" by each other; they can only be called on the line after the exec keyword.

The scope of the rest of the symbols (variables, labels, function names) is the containing process. For some tastes, this might seem a bit too wide. For instance one function in a process could use the local variable of another. This is due to, that the assembler language doesn't have much of a notion of scope. Nevertheless a check against this missuse, could be made at compile time, but this hasn't been implemented so far. A consequence of this is that variables in functions must have unique names. Otherwise the Propeller assembler will complain ("symbol already defined"). I think it would seem acceptable, to let the variables defined in the process be accessible also to the functions, but the programmer should avoid to borrow variables between the functions.

Note that if the same function shall be used by more than one process, the function has to be repeated in each of the processes.

Suffixing

As a remedy to the scope problem (that the scope of variables and labels is too wide), there is a mechanism of suffixing. If the function statement is followed by at least one space, and then the tag "-?tag?", all local variables, and all labels (goals for jumps) are suffixed with _?tag?. Hence if one writes -cos, then the variable x will be renamed to x_cos, and the label loop will be renamed to loop_cos. These suffixed names will appear in the assembler code, but in the Myra code, the names will of course remain unsuffixed. If all functions are given unique tags, then there is no problem with variable scopes, unless of course one deliberately creates names like x_cos somewhere.

It may happen that the unsiffixed name of a local variable coincides with the name of a variable in the process or a field variable in an object file. In that case, the compiler prefers to interpret the variable as a local variable, i.e. it gets a suffix. (For those who study the compiler source code, Myra.java, this is the reason for the notion of a PVariable (a public variable), so there is a function 'recognizeableAsPVariable(...)').

Recursive calls

The Propeller doesn't have any subroutine call stack, and there is no attempt to construct any call stack in Myra. Hence recursive calls are not possible. If an assembler or Myra function tries to call itself, it will get lost.

Motivations for the language

The Propeller computers are programmed with the Propeller assembler language, and the Spin language. Assembler is very fast, but not always easy to write and to read. Spin is an interpreted language, and thus relatively slow. One can see this for the Spin instruction

wait(581+cnt)
This instruction works. It will wait untill 581 clock cycles have elapsed from now. But if we wrote 580 instead, more than 580 clock cycles would have elapsed, before the actual waiting started, so then waiting would go on till the clock had completed a full cycle through spilling. But 581 clock cycles is pretty much.

In that situation, one would like to mix spin and assembler code, and the ideal way would be to write fast assembler routines, and call them from spin.

But the spin code doesn't really call assembler code; it loads assembler code into cogs, and lets it run there. Execution doesn't return from the assembler code, unless the assembler code halts the whole cog. Also the interface to the assembler code is quite thin; it is only a single long variables, which preferably would contain an adress, through which the assembler code can interface.

In Myra, everything is assembler, (except a few coginit- statements and some code for handling serial execution). This opens up for using pre-written assembler routines directly and uniformly. The stack-based architecture gives a standardized interface to these routines. Likewise one can use pre-written Myra routines, and they are fast, as they are compiled into assembler code. As compared to standard languages like C and Forth, a specialized language makes it easy to use features of the Propellers, like reading and controlling time, reading and writing to i/o pins, and the language/instruction feature, that each instruction can be run conditionally.

Myra also has a nice concept for making object oriented code. Except for object-code files, the whole system is kept together in a single file, that completely defines the system.

As for all stack based languages, one has to get used to the stack architecture, and till then, it may seem like "write only programs", but once one is used to it, programs are actually quite elegant.