I’m currently working on an ML-family programming language in specialized for minimization of executable-size (e.g. for embedded systems).
On the function front it (vapour-ware-style) features a notable innovation: A very full set of function “tags” with a simple inference algorithm.
Function “tags” is stuff like “pure” and “noreturn.”
What I am working with is a Lattice (partial order with lower and upper bounds,) over these with the ordering relation being “… can be safely used as if it were a …” so for instance a pure
function can be safely used as if it were an impure one (or to put it another way, a pure
function is an impure function with a side-effect count of zero.)
The properties I am working with are these:
-
Number of calls allowed:
any
,maybe_once
,once_or_more
, andonce_only
, all of which are intervals of integers, [1,1]; [0,1]; [1,∞); and [0,∞) respectively. Ordering relation is set-inclusion (e.g. amaybe_once
can be safely used as aonce_only
function because {1} ⊆ {0, 1}).An example of a
once_only
function is a raw resource-deallocator; an example of amaybe_once
function is a library initializer (i.e. you can choose not to initialize the library, but initializing twice f***s it up), an exmaple of aonce_or_more
function is an idempotent/safe resource-deallocator -
Purity:
side_effects
,invisible
, andpure
. The in-between point ofinvisible
is a function which does not change the memory space immediately visible to the caller. The safe-use ordering should be obvious.An example of a
side_effects
function is one which changes a global variable, an example of aninvisible
function is one which prints to stdout (hard to tell, need third-party library calls to check), an example of apure
function is a simple implementation of greatest common divisor. -
Closure properties:
closure
/delegate
orraw
. Discerns whether a callable object has attached data like curried arguments, an object of which it is a method, or C-stylestatic
variables. Araw
function is essentially aclosure
function with zero attached data. -
Reference properties:
referencing
orself_contained
. Discerns whether a function has references to external data, such as global variables or variables local to the parent scope.An example of a
referencing
function is a block function from Ruby or Rust. Aself_contained
function is areferencing
function with zero external references. -
Lastly, return-behaviour:
no_return
,maybe_return
,always_return
. A classicno_return
function is C’sterminate()
, most pure functions arealways_return
, a function which exits on error is amaybe_return
.
Now, on the return behaviour I am uncertain of the “safe-use” ordering. Should maybe_return
be the “safest” element, which no_return
and always_return
can default to (and where no_return
and always_return
can never be used for one another) or shall always_return
be the “safest” (where no_return
is then the least safe, because you can unambiguously state that a no_return
returns a type, but it will return the bottom element of that type).
To me, the first option (maybe_return
as “safest”) is the more optimization-minded, while the second (always_return
as “safest”) seems to make maybe_return
superfluous (which it might be) and is more type-theory-ish.
These five lattices are then joined in a direct product:
- (num, pur, clo, ref, ret) < (num', pur', clo', ref', ret') iff num < num', pur < pur', clo < clo', ref < ref', ret < ret' .
The inference algorithm is:
- The greatest lower bound of the greatest lower bounds of all function calls in each execution path in the function.
In pseudo-python:
def infer(func): res = None for path in func.execution_paths(): x = None for func in path.get_functions_called(): if x is None: x = func else: x = greatest_lower_bound(x, func) if res is None: res = x else: res = greatest_lower_bound(res, x) return res
These tags are part of the function type and exist mainly for ensuring programming safety.
Thoughts? Especially on the order of return properties?