Morel: The basic language
Last week I wrote about my goals for Morel, extending a simple functional language (Standard ML) with relational operations so that it can be used as a query language.
This week I’d like to go over the basics of the language. Much of this is the same as ML, and that’s a good thing. If you want to learn more about ML, there are plenty of good resources.
The shell
The easiest way to start Morel is in its shell. The following example starts Morel from macOS and evaluates a string literal.
$ ./morel
morel version 0.1.0 (java version "13", JRE null (build 13+33), JLine terminal, xterm-256color)
- "hello, world!";
val it = "hello, world!" : string
-
(To build Morel and start the shell for yourself, follow the instructions on GitHub. To exit the shell, type Ctrl+D.)
Primitive types and simple expressions
As a functional language, everything in Morel is an expression. The
basic types are bool, int, real, string, char, and unit.
Here are literals in each.
false;
(*[> val it = false : bool]*)
10;
(*[> val it = 10 : int]*)
~4.5;
(*[> val it = ~4.5 : real]*)
"morel";
(*[> val it = "morel" : string]*)
#"a";
(*[> val it = #"a" : char]*)
();
(*[> val it = () : unit]*)
As you’d expect, there are built-in operators for each data type. Here are a few examples:
true andalso false;
(*[> val it = false : bool]*)
true orelse false;
(*[> val it = true : bool]*)
not false;
(*[> val it = true : bool]*)
1 + 2;
(*[> val it = 3 : int]*)
~(5 - 2);
(*[> val it = ~3 : int]*)
10 mod 3;
(*[> val it = 1 : int]*)
"mo" ^ "rel";
(*[> val it = "morel" : string]*)
Variables
You can assign values to variables.
val x = 7;
(*[> val x = 7 : int]*)
val y = x mod 3;
(*[> val y = 1 : int;]*)
x + y;
(*[> val it = 8 : int]*)
(Morel, following Standard ML, actually calls them “value bindings” rather than “variables” because you cannot change their value. It’s not much of a hardship, because you can create a new variable with the same name, and it will obscure the old variable and its value.)
There is a special variable called it used by the shell. Each time
you evaluate an expression, the shell and assigns the value to a
variable called it, and prints the value and its type. You can use
it in the next expression.
"morel";
(*[> val it = "morel" : string]*)
String.size it;
(*[> val it = 5 : int]*)
it + 4;
(*[> val it = 9 : int]*)
A let expression binds one or more values and evaluates an expression.
let
val x = 3
val y = 2
in
x + y
end;
(*[> val it = 5 : int]*)
Lists, records and tuples
In addition to primitive types, there are list, record, and tuple types.
[1, 2, 3];
(*[> val it = [1,2,3] : int list]*)
{id = 10, name = "Scooby"};
(*[> val it = {id=10,name="Scooby"} : {id:int, name:string}]*)
(1, true, "yes");
(*[> val it = (1,true,"yes") : int * bool * string]*)
Tuples are actually just records with fields named “1”, “2”, and so on. The following example shows that the values are identical, and have the same type, whether you use tuple or record syntax.
(1, true, "yes");
(*[> val it = (1,true,"yes") : int * bool * string]*)
{1 = 1, 2 = true, 3 = "yes"};
(*[> val it = (1,true,"yes") : int * bool * string]*)
(1, true, "yes") = {1 = 1, 2 = true, 3 = "yes"};
(*[> val it = true : bool;]*)
The empty record and empty tuple are equal, and are the only value of
the type unit. Morel outputs it as ().
{};
(*[> val it = () : unit]*)
();
(*[> val it = () : unit]*)
{} = ();
(*[> val it = true : bool;]*)
Functions
Functions are expressions, too. fn makes a lambda expression.
After we have bound the lambda value to plusOne, we can use
plusOne as a function.
val plusOne = fn x => x + 1;
(*[> val plusOne = fn : int -> int]*)
plusOne 2;
(*[> val it = 3 : int]*)
Function declarations are common, so the fun keyword provides a
shorthand: “fun f arg = exp” is
short for “val f = fn arg => exp”.
fun plusOne x = x + 1;
(*[> val plusOne = fn : int -> int]*)
plusOne 2;
(*[> val it = 3 : int]*)
Functions can have multiple arguments, separated by spaces.
fun plus x y = x + y;
(*[> val plus = fn : int -> int -> int]*)
plus 3 4;
(*[> val it = 7 : int]*)
If we supply too few arguments, we get a closure that captures the argument value and can be applied later.
val plusTen = plus 10;
(*[> val plusTen = fn : int -> int]*)
plusTen 2;
(*[> val it = 12 : int]*)
Functions can be recursive. Here, the factorial function evaluates
by calling itself, using the mathematical identity that n! = n *
(n-1)!.
fun factorial n =
if n = 1 then
1
else
n * factorial (n - 1);
(*[> val factorial = fn : int -> int]*)
factorial 1;
(*[> val it = 1 : int]*)
factorial 5;
(*[> val it = 120 : int]*)
Higher-order functions and type inference
A higher-order function is a function that operates on other functions. Here are a couple of examples.
The map function applies a given function f to each element of a
list, returning a list, as follows:
fun map f [] = []
| map f (head :: tail) = (f head) :: (map f tail);
(*[> val map = fn : ('a -> 'b) -> 'a list -> 'b list]*)
fun double n = n * 2;
(*[> val double = fn : int -> int]*)
map double [1, 2, 3, 4];
(*[> val it = [2,4,6,8] : int list]*)
The type of the map function, above, is fn : ('a -> 'b) -> 'a list
-> 'b list. Morel’s type system (based, like that of ML, on the
Hindley-Milner type system)
has deduced that map has a polymorphic type, and 'a and 'b are
type variables. This means that if f has type 'a -> 'b, for any
types 'a and 'b, then map f will transform a list of ‘a' to a
list of 'b’.
For example, if f is the built-in function String.size of type
string -> int, then 'a is string and 'b is int, and map
String.size will convert a string list to an int list.
Notice that we did not declare any types; the type system deduced everything for us. Type inference is perhaps ML’s greatest feature. In Morel, it helps us achieve our goal of writing powerful queries concisely. We don’t need to specify types, and furthermore, we can include temporary values and functions in the query whenever we need them.
The filter function keeps only those elements of a list for which a
predicate p evaluates to true, as follows:
fun filter p [] = []
| filter p (head :: tail) =
if (p head) then
(head :: (filter p tail))
else
(filter p tail);
(*[> val filter = fn : ('a -> bool) -> 'a list -> 'a list]*)
fun even n = n mod 2 = 0;
(*[> val even = fn : int -> bool]*)
filter even [1, 2, 3, 4];
(*[> val it = [2,4] : int list]*)
You may notice that map and filter are very similar to the
SELECT and WHERE clauses of a SQL statement. This is no surprise:
relational algebra, which underlies SQL, is basically a collection of
higher-order functions applied to lists of records (relations).
Can we extend ML syntax to make it easier to write relational algebra expressions? You bet!
Relational expressions
from is a Morel extension that iterates over one or more lists,
applies relational operations, and returns a list.
It has a similar purpose to SQL’s SELECT. But unlike SELECT, its
inputs and outputs can be collections of any type (not just
records). Also, Morel makes no distinction between relations and
expressions; therefore Morel do not need operations like SQL’s
UNNEST to deal with nested collections, and we can operate on lists
in memory just like tables in a database.
Let’s start by defining emps and depts relations as lists of
records.
val emps =
[{id = 100, name = "Fred", deptno = 10},
{id = 101, name = "Velma", deptno = 20},
{id = 102, name = "Shaggy", deptno = 30},
{id = 103, name = "Scooby", deptno = 30}];
(*[> val emps =
> [{deptno=10,id=100,name="Fred"},{deptno=20,id=101,name="Velma"},
> {deptno=30,id=102,name="Shaggy"},{deptno=30,id=103,name="Scooby"}]
> : {deptno:int, id:int, name:string} list]*)
val depts =
[{deptno = 10, name = "Sales"},
{deptno = 20, name = "HR"},
{deptno = 30, name = "Engineering"},
{deptno = 40, name = "Support"}];
(*[> val depts =
> [{deptno=10,name="Sales"},{deptno=20,name="HR"},
> {deptno=30,name="Engineering"},{deptno=40,name="Support"}]
> : {deptno:int, name:string} list]*)
Now let’s run our first query:
from e in emps yield e;
(*[> val it =
> [{deptno=10,id=100,name="Fred"},{deptno=20,id=101,name="Velma"},
> {deptno=30,id=102,name="Shaggy"},{deptno=30,id=103,name="Scooby"}]
> : {deptno:int, id:int, name:string} list]*)
The equivalent in SQL would be
SELECT *
FROM emps AS e
In Morel there is no difference between a query, a table, and a
list-valued expression, so we could have instead written just emps.
emps;
(*[> val it =
> [{deptno=10,id=100,name="Fred"},{deptno=20,id=101,name="Velma"},
> {deptno=30,id=102,name="Shaggy"},{deptno=30,id=103,name="Scooby"}]
> : {deptno:int, id:int, name:string} list]*)
A where clause filters out rows, and a yield clause controls which
fields are returned.
from e in emps
where #deptno e = 30
yield {id = #id e};
(*[> val it = [{id=102},{id=103}] : {id:int} list]*)
SQL equivalent is as follows:
SELECT e.id
FROM emps AS e
WHERE e.deptno = 30
If you omit yield, you get the raw values of the loop variable e.
from e in emps
where #deptno e = 30;
(*[> val it =
> [{deptno=30,id=102,name="Shaggy"},
> {deptno=30,id=103,name="Scooby"}]
> : {deptno:int, id:int, name:string} list]*)
Shorthand
In ML, the usual way to access a field is via an accessor function
that starts with ‘#’. For example, #id e returns the id field of
record e. But Morel has an alternative syntax, e.id, which is more
familiar for SQL users.
Also, when you are constructing a record ML requires each field to be
named, e.g. id = #id e, but in Morel you can omit the name field if
it is the same as the current field or variable.
Thus the following 3 queries are equivalent:
from e in emps
yield {e = #id e};
(*[> val it = [{id=100},{id=101},{id=102},{id=103}] : {id:int} list]*)
from e in emps
yield {e = e.id};
(*[> val it = [{id=100},{id=101},{id=102},{id=103}] : {id:int} list]*)
from e in emps
yield {e.id};
(*[> val it = [{id=100},{id=101},{id=102},{id=103}] : {id:int} list]*)
I’ll use the abbreviated forms from now on.
Joins and sub-queries
The following query joins employees and departments relations on department number.
from e in emps,
d in depts
where e.deptno = d.deptno
yield {e.id, e.deptno, ename = e.name, dname = d.name};
(*[> val it =
> [{deptno=10,dname="Sales",ename="Fred",id=100},
> {deptno=20,dname="HR",ename="Velma",id=101},
> {deptno=30,dname="Engineering",ename="Shaggy",id=102},
> {deptno=30,dname="Engineering",ename="Scooby",id=103}]
> : {deptno:int, dname:string, ename:string, id:int} list]*)
The following query returns the names of employees in the Engineering department.
let
fun exists [] = false
| exists (head :: tail) = true
in
from e in emps
where exists (from d in depts
where d.deptno = e.deptno
andalso d.name = "Engineering")
yield e.name
end;
(*[> val it = ["Shaggy","Scooby"] : string list]*)
This query shows how much can be accomplished in Morel with just
functions, without extending the language. In SQL, the equivalent
query would have EXISTS and a correlated sub-query, but in Morel
exists is an ordinary function that we have defined in the query,
and a correlated sub-query is just an expression that happens to
reference return a list and reference variables in an enclosing scope.
Summary
To recap, Morel has:
- primitive types
bool,int,real,string,char,unit; - also
list, tuple, record, and function types; - lambda expressions and recursive functions;
- polymorphism and type-inference;
- relational
fromexpressions (a Morel extension to Standard ML).
This was just a quick introduction to Morel and its ancestor Standard
ML, and I had to skip over many topics. There wasn’t time to cover
algebraic types and pattern-matching, variations to from expressions
such as the group clause, and how Morel accesses external data and
optimizes programs. I hope to cover these topics in upcoming posts.
If you have comments, please reply on Twitter:
Last week, the grand vision; this week we're down to brass tacks. https://t.co/2MihRqUTjs
— Julian Hyde (@julianhyde) March 3, 2020
This article has been updated.