Wikis / Unreal Wiki / Legacy:BruteForce/EBNF

If you don't understand this first read about EBNF.

Non terminals

 PROGRAM               ::= DECLARATIONS STATEMENTS

It's not that difficult to allow declarations everywhere in the code, it's just that I prefer to have it at the top. This way it's easy to find the declaration when you are looking at the code.

 DECLARATIONS          ::= ((DECLARATION|FUNCTION) SEMICOLON)*

Well ok, this allows you to mix variable declarations and function declarations, but that's up to you. You might prefer to declare vars first and then function. This gives you the freedom.

Note that this rule may be empty. I use the semicolon as a divider of parts, and it's checked here and not in the declaration/function because it's part of the block, seperating the elements.

 DECLARATION           ::= VAR TYPE IDENTIFIER

C style variable declaration, but prefixed with a var keyword. This is to make it a bit easier for the parser.

 TYPE                  ::= INTEGER
                         | STRING
                         | FLOAT
                         | BOOLEAN

We've got four types, I've not thought about custom type declarations.

 IDENTIFIER            ::= __implicit__

Identifiers are returned by the tokenizer so I haven't added it to the language specification.

 FUNCTION              ::= FUNC TYPE IDENTIFIER LBRACK ARGUMENTS RBRACK DECLARATIONS BEGIN STATEMENTS END

A function declaration including body. You always have to return a type, this is used to declare the always present local variable result (just like in Pascal functions). Before the function body you can define local variables (and even functions). The function body is encapsued with a begin and end just like in Pascal. I like this way, ok it's more to write but it makes to code more readable, ok I just like it since I can easily replace it with { and } or whatever I want.

 ARGUMENTS             ::= (TYPE LVALUE (SEMICOLON TYPE LVALUE)*)?

Function arguments are optional, and to prevent you from adding a semicolon after the last argument I used this construction. It's doesn't matter for the resulting AST, just the parser.

 STATEMENTS            ::= (STATEMENT SEMICOLON)*

Zero or more statements, yes it's possible to define a program without any code to be executed, this might be usefull for function declarations to be reused or something.

 STATEMENT             ::= ASSIGNMENT
                         | WHILEDO
                         | IFTHENELSE

A statement can be everything, an assignment or a control block like while ... do or if ... then ... else. It's easy to add new kind of statements here.

 ASSIGNMENT            ::= IDENTIFIER BECOMES EXPR
                         | FUNCTIONCALL

Well this is the reason why BruteForce is no longer LL(1), because I have to look a head to see if it's an assignment or an function call. Both start with an identifier. I could work around it but this would require me to add a keyword to one of them. For example call function_name()

 WHILEDO               ::= WHILE EXPR DO CODEBLOCK
 CODEBLOCK             ::= STATEMENT
                         | (BEGIN STATEMENTS END)

The while and if control blocks accept code blocks this is either one statement or more, because more statements qould require to use the semicolon to divide them a code block will start with begin and end with end .

 IFTHENELSE            ::= IF EXPR THEN CODEBLOCK (ELSE CODEBLOCK)?

I've made the else part optional, empty else parts are just ugly and it can easily be avoided since it's a diffirence between a semicolon and else

 EXPR                  ::= BOOLEX

Here's an nice example of a pretty useless rule, but maybe when I want to extend expression with something that has a lower precendence than a boolean expression I would only need to update one rule instead of a lot.

The following rules define the operator precedance, the sooner the lower the priority. The definition below allows constructs like:

1+2+3+4+5 
1*2+3/5<=4+1
 BOOLEX                ::= ACCUM ((LT|LE|GT|GE|EQ|NE) ACCUM)*
 ACCUM                 ::= MULT ((PLUS|MINUS) MULT)*
 MULT                  ::= PREOP ((MULTIPLY|DIVIDE) PREOP)*
 PREOP                 ::= (MINUS|NOT)? OPERAND
 OPERAND               ::= IDENTIFIER
                         | FUNCTIONCALL
                         | INTVALUE
                         | FLOATVALUE
                         | STRINGVALUE
                         | BOOLVALUE
                         | LBRACK EXPR RBRACK

An operant is either a constant value, a variable identifier, a function call or again an expression between braches. These expressions between braces are very important, it allows you to overrule the operator precedance. You won't see these braces in the AST anymore because they are no longer needed in there, the operators are already set in the correct order.

 FUNCTIONCALL          ::= IDENTIFIER LBRACK (EXPR (COMMA EXPR)*)? RBRACK

The arguments of a function call are expressions, this allows you to use a constant or expression besides the normal identifiers.

 INTVALUE              ::= __implicit__
 FLOATVALUE            ::= __implicit__
 STRINGVALUE           ::= __implicit__
 BOOLVALUE             ::= TRUE|FALSE

Boolean values are returned as an identifier by the tokenizer, so we need to filter those out manualy.

Terminals

 BEGIN                 ::= 'begin'
 END                   ::= 'end'
 SEMICOLON             ::= ';'
 VAR                   ::= 'var'
 INTEGER               ::= 'int'
 STRING                ::= 'string'
 FLOAT                 ::= 'float'
 BOOLEAN               ::= 'bool'
 FUNC                  ::= 'function'
 BECOMES               ::= '='
 WHILE                 ::= 'while'
 DO                    ::= 'do'
 IF                    ::= 'if'
 THEN                  ::= 'then'
 ELSE                  ::= 'else'
 LT                    ::= '<'
 LE                    ::= '<='
 GT                    ::= '>'
 GE                    ::= '>='
 EQ                    ::= '=='
 NE                    ::= '!='
 PLUS                  ::= '+'
 MINUS                 ::= '-'
 MULTIPLY              ::= '*'
 DIVIDE                ::= '/'
 MOD                   ::= '%'
 NOT                   ::= '!'
 LBRACK                ::= '('
 RBRACK                ::= ')'
 TRUE                  ::= 'true'
 FALSE                 ::= 'false'
 COMMA                 ::= ','

Sample code

Here's a piece of example code that will print all easters dates from 2003 to 2012, unless you provide diffirent values on the input

var int y;
var int ey;
 
function string easter(int year)
  var int m;
  var int d;
  var int g;
  var int c;
  var int x;
  var int z;
  var int b;
  var int e;
begin
  g = year % 19 + 1;         
  c = (year / 100) + 1;
  x = (3*c / 4) - 12;
  z = ((8*c + 5) / 25) - 5;
  b = (5*year / 4) - x - 10;
  e = (11*g + 20 + z - x);
  e = e % 30;                     
  if e < 0 then e = e+30;
  if (((e == 25)+(g>11) == 2) + (e==24)) > 0 then e = e+1;
  d = 44-e;
  if d < 21 then d = d+30;
  d = d+7-((b+d) - ((b+d)/7)*7);
  if d>31 then m = 4 else m = 3;
  if d>31 then d = d-31;
  result = ""+year+"-"+m+"-"+d;
end;
 
y = 2003;
ey = 2012;
 
if (argc() == 1) then ey = Int(argv(0));
if (argc() == 2) then begin
  y = Int(argv(0));
  ey = Int(argv(1));
end;
print("Day of Easter for "+y+"-"+ey+".");
 
while (y <= ey) do begin
  print(easter(y));
  y = y+1;
end;

Page Information

2022-11-18T15:46:06.102437Z 2003-04-23T12:18:20Z El Muerte TDS * https://wiki.beyondunreal.com/Legacy:BruteForce/EBNF Attribution-NonCommercial-ShareAlike 3.0