ARMA 3 SQF Grammar

The entire grammar of SQF can be defined with only seven non-terminal rules:

Code
Statement ( ';' Statement ) *1 |
empty
Statement
empty |
Assignment |
BinaryExpression
Assignment
Identifier '=' BinaryExpression |
'private' Identifier '=' BinaryExpression
BinaryExpression
BinaryExpression Operator2 BinaryExpression |
PrimaryExpression
PrimaryExpression3
Number |
UnaryExpression |
NularExpression|
Variable|
String |
'{' Code '}' |
'(' BinaryExpression ')' |
'[' BinaryExpression ',' ...4 ']'
NularExpression
Operator
UnaryExpression
Operator PrimaryExpression
1 Kleene star; the grammar term before the * occurs zero, or one or more times.
2 The ambiguity of this rule is resolved by the operator precedence table. All binary operators are left associative.
3 The ambiguity of numbers, nular and unary operators and variable names is resolved in this rule by ordered alternation. Nular then unary operators are checked first, and if none are found the identifier is assumed to be a variable reference.
4 An ellipsis indicates a list delimited by the prior terminal.

Why is it so simple? What's missing? Keywords, statements and control structures.

A keyword is recognised by the compiler during parsing and usually influences the parse tree. SQF only has one keyword—private—and it was added only very recently. Used outside of the context of an assignment, it is taken to be a reference to the private operator.

A statement is a part of a computer language that is executed sequentially and can not be a sub-part of an expression. There are only two types of statements in the SQF grammar: assignments, and stand-alone expressions. The resulting value from executing a code block is the value of the last executed non-empty statement, which is nil in the case of an assignment.

A control structure in a typical language is a set of statements that implement conditional or repetitive control flow. The SQF compiler itself does not have a special syntax for control structures or control structure statements—instead, control structures are implemented using operators and expressions.

Before turning to exactly how control structures are implemented, the terminals of the grammar are:

Variable
Identifier
Operator
Identifier1 |
Punctuation |
Punctuation Punctuation2
Identifier
( Letter | Digit3 | '_' ) +
Number
( '0x' | '$' ) HexadecimalDigit4 + |
FloatingPointNumber
String
" ( not " | "" ) * " |
' ( not ' | '' ) * '
1 The ambiguity between variables and operators is resolved either by grammar context—being on the left hand side of an assignment for example—or by reference to the operator symbol table.
2 Punctuation operators—such as the * or == operator—are also present in the operator symbol table.
3 Amusingly, identifiers can start with a digit on the left hand side of an assignment but in other contexts are shadowed by numbers. Using getVariable can be used to retrieve the assigned values.
4 Hexadecimal numbers are truncated to 32-bit signed values, but are stored as floating point numbers. Setting the sign bit results in a negative value.

The Letter, Digit, Punctuation, HexadecimalDigit and FloatingPointNumber classes seem to be identical to the C Language standard character classes and syntax.

Control Structures

Control structures—such as if and while—are implemented with unary or binary operators. Some of the control operators simply pack one or two arguments into a container. For example, the unary if operator does nothing more than put the right hand boolean argument into a container object called an If Type:

value = if true;
hint typeName value; // hints "IF"

The operators that actually do the work of conditionally executing code are the then and exitWith operators:

value = if true;
code = { hint "Hi!"; };
value then code; // hints "Hi!"

The execution of a block of code—which is a sequence of statements surrounded with { and } characters—happens only when the block of code is called by call, or by control operators such as then and exitWith. Some control structures require an expression to be evaluated exactly once, before the control structure runs, and therefore do not require the condition or expression to be placed in a { and } code block:

if (condition) then { }
switch (expression) do { }
{ } forEach (expression)

Other times a control structure needs to execute code conditionally or multiple times, and the code must therefore be placed in a code block using { and }.

if true then { hint "!"; }
while { a < 10 } do { }
waitUntil { a >= 10 }

The parentheses often included in control structures are optional—they aren't even technically part of the control structure. However, the precedence of comparison operators is lower than the precedence of operators like then and else, and would therefore cause the statement to parse incorrectly if used without parentheses:

if 2 == 1 then { }; // This fails to compile

In the above example the if unary operator is higher precedence than the == operator, causing if 2 to be evaluated first, which is invalid due to if requiring a boolean argument.

Terms with higher precedence than else and any PrimaryExpression in the grammar can be used without parentheses. Most notably this includes constants, lists, and expressions of unary operators and variables:

if _someBoolean then { };
{ } forEach [1, 2, 3];
if local vehicle leader group player then { hint "The boss is in my car."; };

As the last example shows, discretionary parenthesis are often still a good idea for readability.

Entanglement of the Parser and Operator Tables

Nular, unary and binary operators take zero, one and two arguments respectively. Any parser for SQF requires access to the table of operators to resolve ambiguities between variables and operators with different arity. Say—for example—that the symbols alice, bob and charlie could each either be operators or variables. The following code would be ambiguous without knowing which is which:

alice bob charlie

There are two possible interpretations of the above:

Both forms appear frequently in real code:

rand floor myGlobalVariable

And:

oneGlobalVariable mod anotherGlobalVariable

Perhaps one of the main reasons operators can not be defined by scripts themselves—requiring scripts to instead use call for subroutines—is that a script can't be compiled without advance knowledge of which symbols are operators and which are variables.

Switches

With the above basic knowledge of how control structures are built from operators, it becomes easy to deduce how each control structure is "put together" and what each of the operators might actually do internally. The switch structure is—however—somewhat more mysterious than the others:

The evidence for a hidden flag inside the Switch Type is this:

mySwitchInstance = switch 1;

mySwitchInstance do
{
    case 1; // Set the flag.

    mySwitchInstance do
    {
        case 2: { hint "Always printed"; }
    }
}

The evidence for the Switch Type being stored in a local scope is this:

function = 
{
    hint str case 1; // Hints "123".
};

switch 123 do 
{
    call function;
}

The evidence for the Switch Type actually living in a regular (but undocumented) local variable is harder to come by without being somewhat nefarious, since—short of using modified versions of the ARMA binaries—local variables can not currently be enumerated in SQF.