More about Functions

Evidently, functions do play an ubiquitous role in a functionally oriented language. So, it is a good idea to spend a bit more time with them. In a nutshell, the contents of this lesson may be summarized as: "with very few exceptions, functions just behave in the natural way one should expect them to behave."

In particular, this means:

  1. Functions just are values that may be given names, passed around, put into arrays, lists, and tuples, and evaluated.
  2. Once we have a function, it does not matter when and where we evaluate it: We can pass it around, and still it behaves as it did when and where we created it.
  3. A function need not be given a name, just as an integer, a tuple, or any other value need not have a name.

Now, all this may sound very obvious. Actually, it is not, as there are many programming languages which do not provide us with functions that have all these properties. Let us just look at the third point to give an example. In C, one could write:

Anonymous entities in C

#include <stdio.h>

char *a_string_named_charly={'C','h','a','r','l','y',0};
/* Ah, joys olf Hungarian Notation */

int function_number_five(void)
{
  return 5;
}

int main(void)
{
  printf("This is charly: %s\n",a_string_named_charly);
  printf("This is function five: %d\n",function_number_five());
  
  printf("Here, we use an un-named (anonymous) string: %s\n",
	 "I have no (variable) name");

  /* 
  printf("Here, we would like to use an anonymous function...",
	 ???);
  ...but something like that actually does not exist in C */

  return 0;
}

The major differences between OCaml functions and the mathematical concept of a function as a mapping are:

  1. We may want to think and even talk about the amount of effort involved in the evaluation of a function. (In mathematics, it is not a problem to e.g. define the determinant in a very "inefficient" way if it helps to prove theorems.)
  2. A function need not at all produce a value. It may just run forever and never return anything.
  3. Mathematical functions are mappings from a given domain to a given co-domain that "never change": f(5) will always be the same value, no matter where the evaluation of f at the point 5 occurs in a line of reasoning. In OCaml, there are functions for which this is not the case, such as e.g. Random.int: (int ->int).
  4. Furthermore, the evaluation of an OCaml function may "change the world around us". For example, evaluating Unix.unlink: (string -> unit) may delete a file.

From the C example above, we see that there is a fundamental difference between functions and other values in C: Functions cannot be constructed anonymously. Actually, this is just a symptom of a much deeper problem: The C programming language does not have any "functions"!. What do I mean by that? Even though C terminology calls constructions like "function_number_five" above a "function", technically speaking it is not: it merely is a pointer to a block of memory where machine code instructions lie that go through a sequence of operations and can be called with arguments. From our point of view, this is rather just a subroutine than a proper function.

It is very instructive to look at the history of the Perl programming language in order to understand what is going wrong here. Indeed, the Perl developers got this issue of anonymous values wrong over and over again, first for arrays, then for file handles, then for regular expressions, and at some point in time had to repair it. This is one reason why the Perl programming language is so uneven.

Initially, Perl did not have a concept of arrays as other programming languages do: instead of a container value to which one could refer as this array, perl provided an unique concept of "plurality", which is, there are means to refer to that value (a single one), or those values (a collection). Now, this concept of plurality did not allow to talk about e.g. using an array of arrays to represent a matrix: in no useful sense could those values be composed of elements which represented these other values each. Putting a "group" of values into another group of values just gives a larger group in Perl. One give collections of values a name, but one could not treat them as a separate new entity which could be passed around freely in a program.

So, what people did here was to introduce a concept of "symbolic references", which roughly means that they introduced a way to interpret an ordinary string as a variable name, so that one then could just pass around the string name of an array and use that whenever needed. This seemed to somewhat solve the problem, but surely, it neither was elegant, nor overly bright: When one wanted to write a function that created a matrix, one had to invent internal-only unique name strings for every row, and as this methods really introduced a set of new variables whose values had to be the rows. This led to all sorts of problems, such as that it was difficult to take care that on the one hand, making a new matrix did not over-write a pre-existing one and at the other hand, the entries of a matrix were freed properly once it was no longer in use.

Making an array of arrays in old perl (schematically)

sub make_matrix
  {
    ($nr_rows,$nr_cols,$base_name)=@_;
    
    @matrix=();
    for($r=0;$r<$nr_rows;$r++)
      {
	@row=map {"Row $r, Column $_"} (0..$nr_cols-1);
	$row_name="$base_name-$r";
	$$row_name=@row;
	push @matrix, $row_name;
      }
    return  @matrix;
  }

What's changed with modern Perl is that they introduced two new concepts: first, that of a proper array in the sense of "this collection of values", second, this new container data type comes with an "anonymous array constructor", that is, a way to create an array without having to give it a name.

What changed with Perl5

sub make_matrix_perl5
  {
    my($nr_rows,$nr_cols,$base_name)=@_;
    
    my @matrix;
    
    for(my $r=0;$r<$nr_rows;$r++)
      {
	@row=map {"Row $r, Column $_"} (0..$nr_cols-1);
	$matrix[$r]=[ @row ];
	#           ^ anonymous array constructor!
      }
    return \@matrix;
  }

This example serves us as a double warning, more so as in particular with Perl, the very same problem re-occurred in other guises (as mentioned), where the initial (or even present) approaches were about as bad as these "symbolic references": First, one must admit that it is surprisingly easy to be confused about the issue of why and how to properly support anonymous composite values in a programming language. Second, it usually is not a good idea not to think enough about the underlying abstract concepts. In the end, it will just turn out that one cannot avoid them anyway, and inventing some ad-hoc approaches that don not get the fundamental idea right might bite back badly.

We already have seen how to create anonymous functions in OCaml in the last lesson. As a reminder:

Reminder: Anonymous functions in OCaml

# let some_list = [(fun x -> x+10);(fun x -> x+11);(fun x -> x+12)];;
val some_list : (int -> int) list = [<fun>; <fun>; <fun>]

# (List.hd some_list) 100;;
- : int = 110

Here, we created a list of three functions mapping integers to integers. As the type of such a function is int -> int, the type of that particular list has to be (int -> int) list. Note that the parentheses around (fun ...) are actually necessary here (but admittedly for somewhat unfortunate reasons).

This settles another point: how to put functions into lists (, arrays, tuples, ...), and how to retrieve them and evaluate them. Whenever we want to evaluate a function, we just put the argument after the function. This is a very important general rule and we will have more to say about this very soon.

What does it mean then that the behaviour of a function, when evaluated, must not depend on where that evaluation takes place? Actually, this is something very simple and natural, and it may just be that we are not familiar enough yet with functions so that it does not sound strange. So, let us demonstrate the concept with arrays, where it is just the same: Suppose I give the following definition:

Arrays and Scope

# let my_array_v1 =
    let a1 = [|1.0;2.0;3.0|] in
    let a2 = [|10.0;20.0;30.0|] in
   [|a1;a1;a2|]
;;

val my_array_v1 : float array array =
  [|[|1.; 2.; 3.|]; [|1.; 2.; 3.|]; [|10.; 20.; 30.|]|]

# let my_array_v2 =
    let a1 = [|1.0;2.0;3.0|] in
    let a2 = [|10.0;20.0;30.0|] in
   [|a1;Array.copy a1;a2|]
;;

val my_array_v2 : float array array =
  [|[|1.; 2.; 3.|]; [|1.; 2.; 3.|]; [|10.; 20.; 30.|]|]

If we go back to our list of operators from the first lesson, we see that there is a difference between "being the same" and "being equal", and we can express this difference in meaning in OCaml by using either the "==" or the "=" comparison operator, the former testing for same-ness. Now, in our situation, we get (look sharp):

Arrays and Scope, continued

# my_array_v1.(0) = my_array_v1.(1);;
- : bool = true

# my_array_v1.(0) = my_array_v1.(2);;
- : bool = false

# my_array_v1.(0) == my_array_v1.(1);;
- : bool = true

# my_array_v2.(0) == my_array_v2.(1);;
- : bool = false

# my_array_v2.(0) = my_array_v2.(1);;
- : bool = true

So, my_array_v1 and my_array_v2 may look very similar at first, but actually, they are structurally very different: In the first case, the first and second entry of this array are the very same entity, while in the second case, they are not. So, if I just replaced my_array_v2.(0).(1) with 500, I would get what I asked for, but if I did this with my_array_v1.(0).(1), then my_array_v1.(1).(1) would change accordingly!

This may not be as unexpected if we manipulated the rows in the scope where a1, a2 are defined, but it might require some getting used to that in fact, these row entities survive beyond the scope where they are defined via being referenced in the final array-of-arrays value. After the definition of my_array_v1, the names a1 and a2 are gone forever - they were only valid locally. But through the value returned from that scope, we retain a handle on them.

It is just the same with functions. To demonstrate this, let us look at a function which is returned from a local scope that contained a variable which entered in the definition of that function. Here is an equivalent of the last example using functions instead of arrays:

Functions and Scope

# let my_fun =
    let a1 = [|1.0;2.0;3.0|] in
    let a2 = [|10.0;20.0;30.0|] in
   (fun n -> if n < 2 then a1 else a2)
;;

val my_fun : int -> float array = <fun>
# my_fun 0;;
- : float array = [|1.; 2.; 3.|]

# my_fun 1;;
- : float array = [|1.; 2.; 3.|]

# my_fun 0 == my_fun 1;;
- : bool = true

# my_fun 2;;
- : float array = [|10.; 20.; 30.|]

I suppose there should be little debate that the particular behaviour of scoping issues and anonymously created arrays is both desirable and useful. But then, what would be more natural than to demand that with anonymously created functions, where these issues arise in just the same form again, things work just in the same way?

In fact, we already have seen something like this in the last lesson: remember the "vivify_polynomial" function which briefly showed up in the Horner example and the exercises? Here as well, we define a function which retains some memory over values from the scope it was defined in. This is a very powerful technique that allows us to do all sorts of tricks. To give a very simple example, we can define functions such as the following that map a number to a "machine" adding that number to another number:

Functions and Scope: another simple example

# let add =
   (fun to_add ->
     (fun x -> x + to_add))
;;

val add : int -> (int -> int) = <fun>

# (* Actually, I put in the parentheses by hand.
     We will have to say more about this soon. *)

# let add_five = add 5;;
val add_five : int -> int = <fun>

# add_five 3;;
- : int = 8

# add_five 1000;;
- : int = 1005

Actually, it's the simplest thing in the world. Or maybe not. Just for the sake of looking over the fence, here is the equivalent in LISP, in a form which is understood by both the Emacs interpreter as well as by a proper Common LISP system, such as Bruno Haible's CLISP:

The same example in LISP (Common LISP and Emacs LISP)

(defun add (to_add)
  (lambda (x) (+ x to_add)))

(defvar add_five (add 5))
add_five

(funcall add_five 1000)
;; Emacs says: Symbol's value as a variable is void: to_add
;; CLISP says: 1005

;; Even worse:

(defvar to_add 27)

(funcall add_five 1000)
;; Emacs says: 1027
;; CLISP still says: 1005

Not infrequently, one encounters the situation that, given a function of multiple arguments, we want to fix some of them and treat it as a function of fewer arguments. For example, kinetic energy is a function of mass and velocity. Usually, we want to consider a given body with fixed mass. Then, it is just a function of velocity. This can be modeled very nicely and directly with these techniques:

Example: fixing parameters


# let kinetic_energy = (fun (m,v) -> 0.5*.m*.v*.v);;
(*
   Equivalently, we also could have defined:
   
   let kinetic_energy (m,v) = 0.5*.m*.v*.v;;
 *)

val kinetic_energy : float * float -> float = <fun>

# let kinetic_energy_of_ball =
    fun v -> kinetic_energy (10e-3,v);;

val kinetic_energy_of_ball : float -> float = <fun>

# kinetic_energy_of_ball 5.0;;
- : float = 0.125

Let us consider something more convoluted, but actually very useful: as we have seen, we can map values like the number five above to functions that "know" about that value. Of course, the initial values can also be functions, and there are some very interesting mathematical concepts which may be understood in terms of mappings from functions to functions which we can capture that way. Actually, what we can do is limited by the fact that all we can do in order to get information about what's "inside" a function is to call it. So, we will not be able to model, say, symbolic derivatives, in this way. We can, however, model taking a derivative, or gradient, numerically. Let us try this for the one-dimensional gradient of a function which is given as a float -> float function f in OCaml. Now, how does one compute such a gradient? We will have to evaluate the function f, as that is all we can do to f anyway. It will not help us to know f only at one place if we want to know something about how strongly it varies. So, we have to evaluate it at least at in two places. And two is already sufficient to get a numerical approximation for the gradient. But we may want to do more in some situations. For the sake of simplicity, let us be content with only two evaluations here. What we furthermore have to know is the distance epsilon of the locations where we evaluate f. We would get the proper gradient in by taking the limit where epsilon goes to zero, but we clearly cannot do this on the machine. So we will content ourselves with something much more primitive and just choose epsilon=10^(-8). (There are deeper reasons for choosing a value just about this large.) So, when we know epsilon, and the function f, we can then make a function that maps a position x0 to the numerically determined derivative of f at x0 with step width epsilon. Let us have a try:

Defining the gradient - first try

# 
let grad_v1 (epsilon,f) =
  let epsilon_half = 0.5*. epsilon in
  let inv_epsilon = 1.0/.epsilon in
  (fun x0 -> 
    ((f (x0+.epsilon_half)) -. (f (x0-.epsilon_half)))
      *.inv_epsilon)
;;


val grad_v1 : float * (float -> float) -> (float -> float) = <fun>

# let test_grad_v1 = grad_v1 (1e-8,(fun x -> x*.x));;
val test_grad_v1 : float -> float = <fun>

# test_grad_v1 5.0;;
- : float = 10.00000082740371

Here, we made epsilon a parameter, which is slightly more useful than having it fixed as 10^(-8). One could always apply the parameter fixing technique shown above then. In particular, we may then introduce a function grad_with_epsilon_fixed which takes epsilon as a parameter and returns a gradient-taking function f that uses this epsilon:

Defining the gradient - a slight generalization

# 
let gradient_with_epsilon_fixed epsilon =
  (fun f -> grad_v1 (epsilon,f))
;;

val gradient_with_epsilon_fixed :
 float -> (float -> float) -> (float -> float) =
  <fun>

# let another_grad = gradient_with_epsilon_fixed 1e-5;;
val another_grad : (float -> float) -> float -> float = <fun>

# let test_another_grad = another_grad (fun x -> sin x +. x*.x);;
val test_another_grad : float -> float = <fun>

# test_another_grad 0.0;;
- : float = 0.999999999995833222

But if this is useful - a function that allows us to specify epsilon beforehand, why don't we just eliminate the intermediate pair, as well as the function taking that pair as an argument?

Defining the gradient - second try (simplification)

#
let grad_v2 epsilon =
  let epsilon_half = 0.5*. epsilon in
  let inv_epsilon = 1.0/.epsilon in
  (fun f ->
    (fun x0 -> 
      ((f (x0+.epsilon_half)) -. (f (x0-.epsilon_half)))
	*.inv_epsilon))
;;

val grad_v2 : float -> (float -> float) -> float -> float = <fun>

# let test_grad_v2 = grad_v2 1.0e-8 (fun x -> x**1.5);;
val test_grad_v2 : float -> float = <fun>

# test_grad_v2 1.0;;
- : float = 1.49999999088379354

Note that we have seen earlier that "Whenever we want to evaluate a function, we just put the argument after the function"? This works here as well. grad_v2 1.0e-8 is a function, so if we just put the argument (fun x -> x**1.5) behind it, we evaluate the function for this argument. Extra parentheses are not necessary here. This is indeed the preferred notational style in OCaml for evaluating functions which themselves are the result of a function evaluation. Simple and convenient. And actually, it is not really worse than passing a pair for us as well, as there is no reason at the conceptual level why one should treat epsilon and f as belonging together in a pair. But then, we may just as well write e.g.:

Evaluation by putting arguments after functions

# grad_v2 1.0e-8 (fun x -> x**1.5) 1.0;;
- : float = 1.49999999088379354

# grad_v2 1.0e-8 (fun x -> x**1.5) 4.0;;
- : float = 3.00000007058542906

Here, evaluating epsilon_half and inv_epsilon is not really a lot of computational effort, so it does not hurt if we do not pre-compute that value, but compute 1/epsilon whenever we actually do compute a gradient value. Let me just put this at the innermost level to demonstrate something else, as we then get a nice repetitive (fun ... (fun ... structure:

Repetitive fun after fun

let grad_v3 =
  (fun epsilon ->
    (fun f ->
      (fun x0 -> 
	let epsilon_half = 0.5*. epsilon in
	let inv_epsilon = 1.0/.epsilon in
	((f (x0+.epsilon_half)) -. (f (x0-.epsilon_half)))
	  *.inv_epsilon)))
;;

Remember that let something = fun argument -> body can be re-written as let something argument = body without a change in meaning? (Actually, there may be a difference if one looks very closely at memory requirements and execution speed. But even if there is, there should be no reason why it should be - an optimizer should be able to recognize this situation). We actually can even do this repeatedly. Look how nicely all this then simplifies:

Multiple arguments at the left hand side of a let

let grad_v4 epsilon =
  (fun f ->
    (fun x0 -> 
      let epsilon_half = 0.5*. epsilon in
      let inv_epsilon = 1.0/.epsilon in
      ((f (x0+.epsilon_half)) -. (f (x0-.epsilon_half)))
	*.inv_epsilon))
;;

let grad_v5 epsilon f =
  (fun x0 -> 
    let epsilon_half = 0.5*. epsilon in
    let inv_epsilon = 1.0/.epsilon in
    ((f (x0+.epsilon_half)) -. (f (x0-.epsilon_half)))
      *.inv_epsilon)
;;

let grad_v6 epsilon f x0 =
  let epsilon_half = 0.5*. epsilon in
  let inv_epsilon = 1.0/.epsilon in
  ((f (x0+.epsilon_half)) -. (f (x0-.epsilon_half)))
    *.inv_epsilon
;;

Of course, one could just as well throw out the inv_epsilon here - there no longer is any point in carrying this around. But that's a very minor issue. Now, let us look at the type of any of those functions: OCaml reports it as "float -> (float -> float) -> float -> float". Here, one should know that for functions returning functions, the convention for types is that "a -> b -> c" will always mean a -> (b -> c), that is, parentheses have to be inserted to the right. So, fully parenthesized, this type would read: "float -> ((float -> float) -> (float -> float))". Indeed, we map a float (namely, epsilon to a function mapping float -> float functions to other such functions - which is just what the gradient does. As we have seen, there seems to be kind of a duality between functions whose arguments are tuples and functions which produce other functions: one may regard a function such as addition either as a mapping from a pair of numbers to a number, or alternatively, as a mapping from numbers to mappings from numbers to numbers, where every number N is mapped to the increase-by-N function. Both ways to express addition contain the same amount of information, but the advantage of the purely functional point of view is that it can do without any notion of tuples and such. In fact, it is easy to define a notion of a tuple purely in terms of such functions producing functions.

For the example at hand, this means that we could just as well re-interpret this a bit more superficially as a function taking a float and a float -> float function f, as well as a position x0, and producing a float, which is the corresponding numerical epsilon-approximation to the gradient of f at x0.

Functions like these whose values are again functions are known as "higher-order functions", and the number of arguments one may feed into such a function generally is called the "arity" of that function. However, that terminology may be considered as slightly misleading, as the major distinction is whether a language does support functions properly or not, and not on the maximal arity of a function.

This convenient way to deal with functions of multiple arguments as functions mapping functions to functions comes at a small price, however. First of all, there are no functions of zero arguments: whenever a "function" does not really depend on an argument, the best we can do is to make clear that we deal with an evaluation by using () as a pseudo-argument, which of course is not used in the computation. (How could it be anyway?) Second, we cannot really have "variable argument" functions. That is, while in Common LISP, there are functions that can be evaluated with an arbitrary number of arguments such as (gcd 60 80 100), something similar cannot exist in OCaml. Usually, neither really is a noticeable problem.

One should note that the vast majority of OCaml's library functions use precisely this style to pass multiple arguments. The order of arguments usually is that of most reasonable increasing specialization. For example, String.concat takes as arguments a string representing a "glue sequence" and a list of strings, and produces a new string which is all the strings from the list concatenated to one another, with the glue sequence placed between any two adjacent strings. One most likely would want to use this with a given glue sequence, such as "\n", or ":", or ", and ", etc. on a variety of string lists, so it seems appropriate to have the glue string as the first argument.

A very interesting library function which I wanted to point out is "Array.init". This will map a length N and a function mapping an index to the corresponding array element to an array that consists of the values of that function for all indices from 0 up to and including N-1:

Array.init example

# Array.init 10 (fun x -> x*x);;
- : int array = [|0; 1; 4; 9; 16; 25; 36; 49; 64; 81|]

Unfortunately, we cannot yet fully understand the type of this function - just as that of quite some other library function. But with a little bit of intuition on what they should do, this should not be a problem: if we just use them, they will normally work as expected.

One more notational detail: instead of fun x -> fun y -> body, we may also for anonymous functions always write fun x y -> body, and likewise for higher orders of arguments.

Actually, all this really was cheating a bit. From this lesson, one may get the impression that OCaml behaves in a much more systematic way than it actually does. I want to warn my audience that there indeed are quite some dark corners where things do not work out as expected.

Some practical Notes

Interacting with OCaml

Now that we got somewhat proficient with using OCaml from within an Emacs shell, it is perhaps appropriate to point out some other ways to interact with the OCaml interpreter that sometimes are more useful. The most important one is Emacs' caml-mode. As the audience may have noticed, whenever we load a .ml file into an emacs buffer, this will activate Emacs' caml major mode. As in every Emacs major mode, Control-h m will give a short list of the most important mode-specific keystrokes. What's interesting is that we can just use Control-c Control-e whenever the cursor is somewhere on a lengthy OCaml expression to send this expression to an OCaml sub-process attached to emacs and see what it evaluates to. Emacs will even be so nice to first start that ocaml process for us if necessary. Another useful keystroke is Control-c Control-h, which will show the part of the OCaml documentation belonging to the function the cursor is on. Also useful is Control-c tab for auto-completion.

There is another Emacs mode for editing OCaml, the so-called tuareg-mode, usually available as a separate package. This claims to be more intelligent than bare caml-mode, but to some extent this will also mean that it will more easily show strange opinions on OCaml code.

The OCaml documentation

Within Emacs, there are multiple ways to access the OCaml documentation: either via Control-c Control-h in caml-mode, or via the info system, via navigating to the OCaml entry. (The Emacs info system can be opened with Control-h i.) Another useful utility to navigate the OCaml documentation is the ocamlbrowser program, as this can often also refer directly to the implementation of a given function.


Exercises

  1. Define a function that counts the number of occurrences of a given character (such as space, tab, newline, etc.) in a string.

  2. Define a "scalar product" function on float arrays.

    (Hint: this is structurally somewhat similar to the Horner's method example in the last lesson: Here, we walk through two arrays, remembering a partial sum in every step.)

  3. Define a function that uses Random.float to numerically determine the value of the integral of a float -> float function on a given interval. (Note: Random.float might give random numbers of too low quality for serious applications.)

  4. Define a function that maps a nonnegative integer number N to a floatingpoint unit matrix (in the form of an array of arrays).

  5. Define a function that generalizes Array.init to matrices: given a number of rows and columns, and a function mapping a row and column number to the corresponding entry, it will produce a matrix.

  6. Define a function that multiplies matrices of the forementioned form.

  7. Use Array.init and String.concat (and maybe some other library functions which you will find in the documentation of the Array, List, and String modules) to define a plotting function that behaves as follows:

    Array.init example
    
    # Printf.printf "%s" (plot_graph 40 (fun x -> 5.0*.x*.x) (-2.0) (2.0));;
     -2.0000: ####################
     -1.8974: ##################
     -1.7949: ################
     -1.6923: ##############
     -1.5897: ############
     -1.4872: ###########
     -1.3846: #########
     -1.2821: ########
     -1.1795: ######
     -1.0769: #####
     -0.9744: ####
     -0.8718: ###
     -0.7692: ##
     -0.6667: ##
     -0.5641: #
     -0.4615: #
     -0.3590: 
     -0.2564: 
     -0.1538: 
     -0.0513: 
      0.0513: 
      0.1538: 
      0.2564: 
      0.3590: 
      0.4615: #
      0.5641: #
      0.6667: ##
      0.7692: ##
      0.8718: ###
      0.9744: ####
      1.0769: #####
      1.1795: ######
      1.2821: ########
      1.3846: #########
      1.4872: ###########
      1.5897: ############
      1.6923: ##############
      1.7949: ################
      1.8974: ##################
      2.0000: ####################
    - : unit = ()
    


Dr. Thomas Fischbacher
Last modified: Sun Dec 11 16:48:50 GMT 2005