Cicada ---> Online Help Docs ---> Customizing the Cicada language

Cicada bytecode

Occasionally, we might want to extend the Cicada language in a way that can’t be scripted, in which case we have to define our new command directly in Cicada’s native ‘bytecode’, using cicada.h as a dictionary to the command words. Scripts run much faster from bytecode than they ever could by reading their original text, but it should be emphasized that Cicada’s bytecode is different from (and much slower than) raw machine code. One significant difference is that Cicada’s bytecode has a recursive structure, composed of expressions and sub-expressions, just like Cicada script itself. In fact, there is pretty much a one-to-one correspondence between the symbols (operators) we write in Cicada and bytecode commands, except that the bytecode commands are in a different order.

For example, when we type the following command into the command prompt:


    > area = 3.14 * R^2
   

Cicada’s compiler produces bytecode output that looks roughly like:


    equate [ <area> , product_of ( 3.14, raise_to_power ( <R> , 2 ) ) ]
   

where we’ve bracketed the arguments of each bytecode operator. These are just the arguments of each operator in the script, but the compiler has made two changes. 1) Each operator’s arguments follow the actual operator command: for example the equate operator is followed by two immediate arguments which correspond to the expressions to the left and right of the equate symbol in the script. 2) The operators are reordered, in ascending precedence when parentheses don’t force otherwise. The equate is done last, so it becomes the outermost function in the bytecode. The trick is to think of every operator as a function in bytecode, and write the function command first followed by its arguments.

There’s actually a way we can see bytecode from the command prompt: by using the slightly anachronistic disassemble() function (dating from before error messages).


    > bytecodeStr := compile("area = 3.14 * R^2")
   
   
    > disassemble(bytecodeStr)
   
    equ ( sm $area , mul ( 3.14 , pow ( sm $R , 2 ) ) )
   

Using this tool we find out that there’s a ‘search-for-member’ (sm) operator before each member identifier. The member identifier is simply an integer ID number: positive ID numbers for user-defined members (counting upwards from 1), and negative ID numbers (counting downwards from -1) for so-called hidden members which the compiler adds to the bytecode. The ‘disassembly’ doesn’t show it but there’s also a ‘constant-floating-point’ operator just before the 3.14 constant. The raw bytecode will look something like:


    > bytecd :: [] int
   
    > bytecd[*] =! compile("area = 3.14 * R^2")
   
    > bytecd
   
    { 8, 1, 10, 309, 29, 55, 1374389535, 1074339512, 31, 10, 310, 54, 2, 0 }
   

where each bytecode command is now just a number which we can look up in cicada.h. The two complicated numbers are the bytes of 3.14 broken into integers. The actual output varies based on machine and also on what has been run beforehand (which determines which member ID numbers are assigned). Every script ends with a null word, telling the interpreter to either fall back to the enclosing function or else exit the program.


Pathnames

Cicada pathnames consist of a sequence of steps starting from some variable. For example the path


    myVar.array[5].x
   

takes 3 steps: to array, to the fifth element, and finally to x. In bytecode the final step is the outermost operator, so the entire path looks like


    step_to_member( step_to_index( step_to_member( "array", search_member "myVar" ), 5 ), "x" )
   

(For speed reasons the ‘step-to-member’ operator takes the member-to-step-to as its first argument, which is backwards from the other step operators.) Notice that step-to-member continues a path, whereas search-member begins a path and so takes one fewer arguments.


Inlined constants

Each of the five types of inlined constants---Booleans, characters, integers, floating-point numbers and strings---has a unique bytecode operator. The raw data of the constant follows in subsequent bytecode words (integers). The data for large constants -- floating-point numbers and many strings -- takes up several bytecode words.

String constants in bytecode use the ‘Pascal’ string convention rather than the C format: the constant-string operator is the first bytecode word, followed by the character-length of the string (also 1 bytecode word), followed by the raw string data (N/size(int) words rounded up). There is no terminating character.


Flow control commands

The four flow-control commands in Cicada---if, for, while and do---are all higher-order commands that the compiler expands into expressions involving ‘goto’s. Cicada sports three ‘goto’s: an unconditional jump, and jump-if-true and and jump-if-false operators. Each goto sequence begins with its bytecode command word followed by a jump offset (1 word). The jump offset is the number of bytecode words to jump ahead from the jump offset, which is negative if we want to jump backwards. The jump must be take us to the start of a command -- otherwise transform() throws an error. In the case of the two conditional gotos, there is a final bytecode expression following the jump offset which is the condition on which to jump.

The most complicated flow-control command is the for statement, which basically consists of a while along with an assignment (to initialize the counter) and a counter-increment command at the end of the loop. Notice that if we define a variable inside the for loop, as in for (j::int) in <1, 5>), then Cicada will plunk the whole expression j::int into both the initialization and the increment command, which can slow down short loops considerably.


Prev: cicada.c    Next: define flags


Last update: May 8, 2024