Lesson 5: Modules and the C interface

Modules

With very few exceptions, our discussion of OCaml so far only covered the "core language", as well as some of the standard libraries. We can by now write and even compile small OCaml scripts, but we cannot yet build our own libraries. It is now time to change this. In this lesson, we will not always be as detailed with our explanations as in the previous ones, but rather take the pragmatic route and see that we get going - even though this perhaps will inevitably mean that we will occasionally have to deal with exotic situations which we cannot fully understand using only the material presented here. On the other hand, some very basic and general things have to be covered in great detail, as it is essential to develop an understanding for the underlying machinery.

We first have to take a closer look at the compiler. One should keep in mind that actually, OCaml is not one language, but two, which, however, are syntactically almost the same (up to toplevel directives, that is). For many compiler languages, the process of turning an idea into working code will utilize files of many different types, some of them source files, some of them intermediate files. This is just as true for a C compiler (.c, .h, .o, .so, .s) as it is for, say, TeX (.tex, .dvi, .aux, .log), and also holds for OCaml. For some systems, file endings are just a convention, while others enforce certain names. OCaml is of the latter type: a file named something.ml will - for example - always be treated as an OCaml source file. What other file types does the compiler know about?

OCaml file types
Type (machine code equivalent given in parens if different)Meaning
.mlOCaml source file
.mliOCaml interface file
.cmiCompiled interface
.cmo (.cmxo)Compiled object
.cma (.cmxa)Compiled object archive (library)
.cC source code
.oC object code
.soC shared object library

It is nice that the OCaml compiler knows about C source files as well and will call the C compiler when necessary. One detail worth to know is that the manpage of the ocamlc(3) compiler is incomplete. One may for example wish to explicitly specify which C compiler to use (say, gcc or intel's icc, or use a C compiler wrapper such as mpicc). This is possible, and ocamlc --help as well as the OCaml online documentation tell us that we can use the -cc option for this, but this is not mentioned in the manpage. Also note that the order of objects given to the compiler does matter.

What is this .mli interface thing about? A .ml file provides a compilation unit. We may now choose not to export all the definitions in that compilation unit to the outside world, or make some type definitions opaque (that is, we just tell the outside about the existence of a given type, but not its complete realization). This is what the .mli file is for. Furthermore, we may want to put extra documentation into that file - which is provided in the form of comments that adhere to certain conventions. ocamldoc will then allow us to automatically generate HTML and latex documentation for our module. A .mli file is more or less just the list of variable types and type definitions which the toplevel prints out if we load a .ml file. It can be auto-generated from a .ml file (for later editing) via ocamlc -i code.ml. The fine print says that we acually often can also go without such an interface file, but as a matter of good practice, we should always provide one.

The behaviour of OCaml with respect to re-compilation of modules often is somewhat picky. It will include cryptographic hash fingerprints in compiled interface .cmi definitions and a few other places, and as a consequence, if library/module B uses some independent module A, which undergoes a change (and if this is even only the addition of one more function), then B will complain that its idea of the interface of A no longer matches reality, that is, B has to be recompiled because A was. Such behaviour is somewhat unexpected in particular to C programmers, and there have been long discussions whether this makes sense and is a good thing or not. As one may guess, it does not especially make life easy for module maintainers and in a sense, Ocaml tries to be "holier than CVS version control" here.

Due to very similar issues (especially with component dependencies), OCaml may feel quite a bit unnatural for seasoned C developers when it comes to writing Makefiles. Indeed, many newcomers seem to experience major difficulties here. Hence, one normally is much better off using a pre-existing tool that deals with most of this makefile complexity: OCamlMakefile. This is a Makefile that is to be included in our own Makefiles and provides quite a lot of intelligence that does most of the really dirty work. In Debian, it's part of the ocaml-tools package. OCamlMakefile cooperates quite well with ocamlfind, which is a package and dependency management system for OCaml. (Objective Caml provides a simple library location and loading framework right out of the box, as can be seen by giving a directive like #load "unix.cma" to the toplevel, but findlib is much more flexible.) The ocamlfind Debian package is called ocaml-findlib.

When installing ocamlfind, one may want to make a few adjustments, especially if multiple users in the Unix group ocaml are supposed to be able to install libraries system-wide. Oh the author's system, this looks as follows:

/etc/ocamlfind.conf

destdir="/usr/local/lib/ocaml"
path="/usr/local/lib/ocaml:/usr/lib/ocaml/3.08.3:/usr/lib/ocaml/3.08.3/METAS"

Further adjustments may have to be made to the file /usr/lib/ocaml/3.08.3/ld.conf (This is unfortunate and should not be, as configuration files should always reside under /etc in Debian):

/usr/lib/ocaml/3.08.3/ld.conf

/usr/local/lib/ocaml/stublibs
/usr/lib/ocaml/3.08.3/stublibs

The /usr/local/lib/ocaml structure then was re-built in such a way that the stublibs directory is a direct subdirectory of this. /usr/local/lib/ocaml is owned by user root, group ocaml, and has mode 2775. Packages are installed as direct subdirectories.

The structure of a simple module

In the project we are working on right now at the University of Southampton, there is a catch-it-all module which collects small useful snippets that are not present in the OCaml standard library and do not justify creating an individual new module either. In this module, one can find functions for degree to radian conversion just as well as a function to generate a random number with gaussian distribution. The directory looks like this:

In the snippets module's directory

tf@ouija:~/ocaml/snippets$ ls -la
total 84
drwxr-xr-x   4 tf tf  4096 Jan 11 22:12 .
drwxr-xr-x  23 tf tf  4096 Nov 21 15:08 ..
drwxr-xr-x   2 tf tf  4096 Nov 21 15:08 CVS
-rw-r--r--   1 tf tf   267 Nov 11 19:22 META
-rw-r--r--   1 tf tf   525 Nov 11 19:20 Makefile
drwxr-xr-x   3 tf tf  4096 Nov 21 15:08 examples
-rw-r--r--   1 tf tf 44431 Nov 22 16:41 snippets.ml
-rw-r--r--   1 tf tf 14678 Nov 22 16:43 snippets.mli

tf@ouija:~/ocaml/snippets$ # let us stay for a while
tf@ouija:~/ocaml/snippets$ # The important pieces are:
tf@ouija:~/ocaml/snippets$ cat META

# Specifications for the "snippets" library:
requires = "unix str"
description = "Snippets"
version = "0.1"
directory = "/usr/local/lib/ocaml/snippets"
browse_interfaces = " Snippets "
archive(byte) = "snippets.cma"
archive(native) = "snippets.cmxa"

tf@ouija:~/ocaml/snippets$ cat Makefile

OCAMLMAKEFILE = /usr/share/ocaml-tools/OCamlMakefile

# We put those in so that they are in place right from the start.
# Do not want to see any surprises later on.

#PACKS = extlib

PACKS = unix str

LIBS = 

OCAMLLIBPATH=

INCDIRS=

LIBDIRS=

EXTLIBDIRS=

# We turn on debugger support in all our modules for now.
OCAMLBCFLAGS = -g
OCAMLBLDFLAGS = -g
RESULT = snippets

SOURCES = snippets.ml snippets.mli

all: byte-code-library native-code-library

mrproper: clean
	rm -f *~ *.cmi *.cmo *.top *.so

.PHONY: mrproper

include $(OCAMLMAKEFILE)

Note that by default, the all: target will build both a bytecode and native-code library. Other interesting dependencies we may want to include here are "doc" (auto-generate documentation) and "top" (build a toplevel) - maybe even "native-code" to build a fast compiled standalone executable. These are the most common ones; see the OcamlMakeFile documentation for information on what else there is.

Note that we furthermore define a "mrproper" symbolic target for complete cleanup. This is nice and convenient. The name, by the way, was taken from the Linux kernel source makefiles. Now, if this builds correctly, we can use the power of OCamlMakefile and findlib to install it with a simple make libinstall and remove it again with a make libuninstall. If some other package now depended on snippets, we would add snippets to the PACKS= line in the Makefile, as well as to the requires line of the META file, which may then e.g. look like this: requires = "snippets qhull mt19937". The META file is used by findlib.

This more or less tells us how to build and install simple OCaml modules. "Simple" in the sense that we do not use sophisticated foreign language interface techniques, but just basic plain OCaml.

If we want to use such an installed package from the toplevel, the magical incantations are:

Loading findlib packages from the toplevel - example

#use "topfind";;
#require "snippets";;

open Snippets;;

...where the open just imports all the symbols into our namespace so that we can refer to them directly instead of having to use names such as Snippets.deg_rad. Note that the compiler does not understand toplevel directives. So, whenever we have the situation that some piece of OCaml code (a small script, say) is to be fed both into the toplevel and the compiler, it makes sense to add a small toplevel loader wrapper script which just contains the package loading directives, plus a final #use "mycode.ml";; directive which then loads the "interesting" code.

The C Interface

(Things you never ever wanted to know, but were forced to find out)

Even if we had the most elegant, most efficient, most effective language available, its value would be reduced greatly if it did not come with an interface to C. The very simple reason for this is that nowadays, a lot of important functionality is available in the form of C libraries - especially if speed matters for the task. Sooner or later, we want to tap that resource, and hence, every language must provide a C interface, just as every serious programmer should be somewhat proficient with C (even if he does not use it often).

Some very general remarks

Concerning foreign language interfaces in general, here are different levels of sophistication, and the answer to the question what can be achieved depends just as much on the ability of the language as well as on that of the programmers of both the interface code and the code to be interfaced. In fact, many things can go wrong, and should insurmountable problems arise, they are practically always a consequence of bad design. So, it pays to spend quite some time thinking about foreign language issues, no matter if one takes on the role of library implementor, language designer, or interface code writer.

One very important point to keep in mind is that it is very easy to build large and complex systems by combining components which were never intended to interoperate well. As a rule of thumb, the amount of internal interface friction in a software project with N components usually is proportional to N^a, with the exponent a being closer to 2 than to 1. So, the most problematic moments in the development of a N-component application are whenever there is a version update/change of at least one component. From this perspective, especially the philosophy underlying the Debian GNU/Linux system to provide a stable platform as well as a large library of components whose behaviour is and stays frozen for long times (up to important bugfixes) comes as a blessing, and presumably such a concept of behavioural (version) stability and reliability should find wider recognition as a vital quality factor.

Rule: Keep it simple - do not overdo it!

Another important general observation: designing the interface to a foreign language library is a task that often needs a lot of thought and sometimes a certain amount of experience. One of the major problems is that the fundamental philosophy underlying the two languages which are to be bridged is different. (If it were not, at least one of them would be completely unnecessary.) So, the big question is how to catch the spirit - the key ideas - underlying the piece of code to be interfaced and map this in the best possible way to something that feels smooth from the perspective of the new language. There is no patent recipe to that question, but there are a few common observations one should know about. First of all, it is often a good idea to keep a foreign language interface as direct and as low level as possible. While it may be tempting to put more intelligence into the interface and employ the power of the new language to make things more convenient, this is a double-edged sword, as it may easily lead to a violation of the principle of least surprise. In particular if the library in question is well known and frequently used in its natural environment, one must assume that many users of the interface will have expectations that were shaped by the original behaviour. In fact, the author of this lesson still remembers quite well the shock of finding out that there is a subtle difference in the behaviour of Perl's built-in fork() and Unix fork(2): In case of a fork failure, the latter returns -1 (and sets errno appropriately), while the former returns undef! That may certainly have been well meant, but such unexpected surprises may have disastrous consequences. Another aspect is: the simpler the interface, the less effort to adjust it to new versions.

If, however, the target language gives strong safety guarantees (type safety, bounds checks, crash stability and such), one must assume that the user of the interfaced library will expect those safety belts to also work with that particular library. The same holds for automatic dynamical resource management, such as garbage collection. So, one usually would like to provide at least these features in a manually written interface - but again, depending on the situation, there may be exceptions (e.g. if this turns out to be prohibitively complicated, or if it is important to stay at the lowest level, or if it greatly simplifies things, or just if it's too small a problem to be worth the effort).

Rule: Let the machine do it whenever possible

Writing interface code manually often is a tedious task with many repetitive steps. True, there are situations where one should use as much intelligence and wisdom as possible, but there also are situations where one has to interface dozens of functions in an uniform way. Then, it is often a good idea not to write all that code by hand, but use code that automatically generates that part of the interface code.

Nowadays, there are tools such as the Simplified Wrapper and Interface Generator (SWIG) that can help a lot.

Rule: Know when not to do it

Just because something can be done in principle, it need not be a good idea. If something looks challenging, this may be for a variety of reasons, one of them being that the author whose work we decided to build on had chosen an excessively unelegant and clumsy approach, maybe because he did not fully understand the nature of the problem. (This is not necessarily a fault of the author. If we only did things we understand well all the time, there would be virtually no progress at all! Hence, we desperately need brave souls that tackle problems which nobody understands properly, and many a breakthrough was achieved only after a lot of confusion.) Nevertheless, one should always keep the question in mind: does it really have to be that way? Can't we achieve the same or a better result with a more elegant approach, maybe using some other piece of code?

In particular, whenever you have the impression that you have to fight against the original author, as his intentions do not at all match yours, and his code evolves in a different direction than the one you are interested in, better look for an alternative.

Desirable features of a C interface

As mentioned above, there are different levels of sophistication in the art of calling foreign functions - from just executing a C function to print a value to stderr to writing code that allows one to turn C callback functions into callback functions in the higher-level language. Other exotic applications may include using the C compiler at run time to map C code strings into dynamically loaded shared objects, or telling a C library to use the dynamical memory management of the higher-level language instead of malloc(3)/free(3) in order to put its dynamical state into serializable strings (cf. PCRE).

Quite in general, one may want a good foreign function interface to provide the following capabilities:

  1. It is possible to call compiled C code from the language.
  2. It is possible to runtime-link and call C shared object libraries.
  3. It is possible to register callbacks from C into the language.
  4. Calling into C and from there back into the language recursively "just works as it should". (There are some call stack issues here.)
  5. It is possible to start the high-level language run time system from within an own C main() function.
  6. It is possible to put code written in the high-level language into libraries that are callable as C shared object libraries.
  7. ...and this works with any C compiler on your machine (including in particular compiler wrappers like MPI's mpicc).
  8. ...and it is reasonable and easy to have multiple independent C-callable .so libraries (that are to be used in conjunction) utilize code written in the extension language at the same time.
  9. One can start the high-level language's read-eval-print-loop/command-line-interface from within C.

Fortunately, OCaml provides us with a quite powerful C interface that allows us to do most of the things mentioned above - even if this sometimes seems to be just by accident. (For example, it is possible to turn OCaml code into a C-linkable .so shared library that almost behaves like any other C library, which is great, but this does not work for 64-bit x86 code.) One problem, however, is that at present, the documentation is somewhat scattered, and there are important things to be known which are not well documented at all.

OCaml <=>C: first examples

After these general remarks, it is perhaps appropriate to look at something more practical and explain the details by means of a few typical examples. What one should know about the C interface is:

But let us have a look at the perhaps simplest example of a C function exported to OCaml. We will wrap this up in a dedicated package, which we call "c_examples". We start out by creating a corresponding directory, into which we put the following files:

Directory Structure

tf@alpha:~/ocaml-tutorial/c_examples$ ls -la
total 28
drwxr-xr-x  2 tf tf 4096 2006-01-12 14:35 .
drwxr-xr-x  3 tf tf 4096 2006-01-12 14:24 ..
-rw-r--r--  1 tf tf  487 2006-01-12 14:34 c_examples_impl.c
-rw-r--r--  1 tf tf   86 2006-01-12 14:33 c_examples.ml
-rw-r--r--  1 tf tf   25 2006-01-12 14:30 c_examples.mli
-rw-r--r--  1 tf tf  435 2006-01-12 14:34 Makefile
-rw-r--r--  1 tf tf  252 2006-01-12 14:25 META

META

# Specifications for the "c_examples" library:
requires = ""
description = "C_examples"
version = "0.1"
directory = "/usr/local/lib/ocaml/c_examples"
browse_interfaces = " C_examples "
archive(byte) = "c_examples.cma"
archive(native) = "c_examples.cmxa"

Makefile

OCAMLMAKEFILE = /usr/share/ocaml-tools/OCamlMakefile

PACKS =

LIBS = 

OCAMLLIBPATH=

INCDIRS=

LIBDIRS=

EXTLIBDIRS=

# We turn on debugger support in all our modules for now.
OCAMLBCFLAGS = -g
OCAMLBLDFLAGS = -g
RESULT = c_examples

SOURCES = c_examples.mli c_examples.ml c_examples_impl.c

all: byte-code-library native-code-library top

mrproper: clean
	rm -f *~ *.cmi *.cmo *.top *.so

.PHONY: mrproper

include $(OCAMLMAKEFILE)

c_examples.mli


val square: int -> int


c_examples.ml


external square: int -> int = "c_ex_square";;


c_examples_impl.c

/* 
   C examples: implementation
 */

/* The "usual" OCaml includes */

#include <caml/alloc.h>
#include <caml/callback.h>
#include <caml/fail.h>
#include <caml/memory.h>
#include <caml/misc.h>
#include <caml/mlvalues.h>

#include <stdio.h>
/* For debugging - we want to have access to printf, stderr and such */

CAMLprim c_ex_square(value ml_nr)
{
  CAMLparam1(ml_nr);
  
  int nr,result;

  nr=Int_val(ml_nr);

  result=nr*nr;

  CAMLreturn(Val_int(result));
}

Next, to demonstrate that this actually works, we do the following:

Building and using the C interface

tf@alpha:~/ocaml-tutorial/c_examples$ make
make[1]: Entering directory `/home/tf/ocaml-tutorial/c_examples'
(...)
make[1]: Leaving directory `/home/tf/ocaml-tutorial/c_examples'
tf@alpha:~/ocaml-tutorial/c_examples$ ./c_examples.top 
        Objective Caml version 3.08.3

# C_examples.square;;
- : int -> int = <fun>
# C_examples.square 10;;
- : int = 100
# 

tf@alpha:~/ocaml-tutorial/c_examples$ make libinstall
make[1]: Entering directory `/home/tf/ocaml-tutorial/c_examples'
(...)
Installation successful.
tf@alpha:~/ocaml-tutorial/c_examples$ ocaml
        Objective Caml version 3.08.3

# #use "topfind";;
- : unit = ()
Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

- : unit = ()
# #require "c_examples";;
/usr/local/lib/ocaml/c_examples: added to search path
/usr/local/lib/ocaml/c_examples/c_examples.cma: loaded
# open C_examples;;
# square 5;;
- : int = 25
# 

Now that we have seen that this actually works, let us look in some more detail at what is going on here. Clearly, the .mli file just specifies what to export - as we only provide one interfaced function anyway, this is very straightforward. In the .ml file, we declare our function as external, i.e. implemented by a piece of code adhering to C linking conventions, whose linker name we give as well. The implementation of that function takes as argument an OCaml value, and has to return an OCaml value. Here, this is supposed to encode an integer, and we need conversion functions to map OCaml values to C values and back. For int, this is pretty straightforward, but we have to keep in mind that the OCaml integer range is strictly smaller than the C integer range!

Note the use of CAMLparamX() and CAMLreturn macros to declare and handle entities of type value that represent OCaml values. These are necessary to live in harmony with the garbage collector. Indeed, there are more of this type, the next most important ones being the CAMLlocalX() macros. More about this later.

We will now proceed to extend our example with further definitions that demonstrate a few basic techniques. First, let us see how to wrap up higher order functions, how to pass floatingpoint numbers, and how to add primitive debugging facilities to the C code: we add the following definitions and then rebuild:

Extending our example
c_examples.mli


val hypotenuse: float -> float -> float


c_examples.ml


external hypotenuse: float -> float -> float = "c_ex_hypotenuse";;


c_examples_impl.c


CAMLprim c_ex_hypotenuse(value ml_x1, value ml_x2)
{
  CAMLparam2(ml_x1,ml_x2);

  double x1,x2, result;

  x1=Double_val(ml_x1);
  x2=Double_val(ml_x2);
  
  result=sqrt(x1*x1+x2*x2);

  fprintf(stderr,"CALL: c_ex_hypotenuse(%f,%f) -> %f\n",x1,x2,result);
  fflush(stderr);
  CAMLreturn(copy_double(result));
}

Indeed, after rebuilding:

Using the hypotenuse example

tf@alpha:~/ocaml-tutorial/c_examples$ make
make[1]: Entering directory `/home/tf/ocaml-tutorial/c_examples'
(...)
make[1]: Leaving directory `/home/tf/ocaml-tutorial/c_examples'
tf@alpha:~/ocaml-tutorial/c_examples$ ./c_examples.top 
        Objective Caml version 3.08.3

# C_examples.hypotenuse;;
- : float -> float -> float = <fun>
# C_examples.hypotenuse 3.0 4.0;;
CALL: c_ex_hypotenuse(3.000000,4.000000) -> 5.000000
- : float = 5.
#

The process of wrapping up and unwrappng values from one language for another language sometimes is called marshalling, but nowadays this more often refers to "serialization", that is, mapping a piece of data with potentially complex structure to a string to store it and retrieve it later on. (Incidentally, OCaml provides a Marshal library which is all about serialzation.) As we see, the names of the functions and macros we use to do the mapping is somewhat non-uniform, but so is their internal mechanics: Int_val is not much more than a very simple bit-shifting macro, while copy_double will have to heap-allocate space to hold a double-float value.

One should know that higher order functions can only be wrapped in this way if they have up to five arguments. (Usually, this is enough.) Other techniques have to be used for functions with more arguments.

Wrapping functions from a library

Let us try something slightly more challenging next: we want to wrap some functions from a library other than libc or libm and pass around strings. Let us use the low level X11 library xlib here. In particular, we want to be able to open and close a connection to an X display and obtain the X server vendor identification string. We hence add the following to our example:

Extending our example: xlib
Makefile


LDFLAGS = -L/usr/X11R6/lib -lX11

OCAMLMKLIB = ocamlmklib -ldopt -L/usr/X11R6/lib -ldopt -lX11


c_examples.mli


type x_display

val x_open_display: string -> x_display

val x_display_is_valid: x_display -> bool
(* This is not an Xlib function! *)

val x_close_display: x_display -> unit

val x_server_vendor: x_display -> string


c_examples.ml

type x_display

external x_open_display: string -> x_display = "c_ex_x_open_display_v1";;

external x_display_is_valid: x_display -> bool = "c_ex_x_display_is_valid_v1";;

external x_close_display: x_display -> unit = "c_ex_x_close_display_v1";;

external x_server_vendor: x_display -> string = "c_ex_x_server_vendor_v1";;

c_examples_impl.c


#define Store_c_field(block,offset,x) (Field(block,offset)=(value)x)

#include <X11/Xlib.h>
/* Goes to the top! */

/* (...) */

CAMLprim c_ex_x_open_display_v1(value ml_display_name)
{
  CAMLparam1(ml_display_name);

  Display *disp;

  disp=XOpenDisplay(String_val(ml_display_name));
  
  CAMLlocal1(block);

  block=alloc(1, Abstract_tag);
  /* Note that Abstract_tag > No_scan_tag
     - cf. sec. 18.2.2 of the OCaml manual
     
     Furthermore note that we just assume that a value cell
     is just as large as a void pointer. This is satisfied
     on all platforms, but we may want to be more careful
     nevertheless.
  */

  Store_c_field(block,0,(value)disp);

  CAMLreturn(block);
}

CAMLprim c_ex_x_display_is_valid_v1(value ml_display)
{
  CAMLparam1(ml_display);

  Display *disp;
  int is_valid;

  disp=(Display *)Field(ml_display,0);

  is_valid=(disp!=0);
  
  fprintf(stderr,"x_display_is_valid(disp=%08x) -> %d\n",disp, is_valid);
  fflush(stderr);
  
  CAMLreturn(Val_bool(is_valid));
}

CAMLprim c_ex_x_close_display_v1(value ml_display)
{
  CAMLparam1(ml_display);

  Display *disp;
  int is_valid;

  disp=(Display *)Field(ml_display,0);

  is_valid=(disp!=0);
  
  if(is_valid)
    {
      XCloseDisplay(disp);
      Store_field(ml_display,0,(value)0);
    }

  CAMLreturn(Val_unit);
}

CAMLprim c_ex_x_server_vendor_v1(value ml_display)
{
  CAMLparam1(ml_display);

  Display *disp;
  int is_valid;
  char *vendor_id="";

  disp=(Display *)Field(ml_display,0);

  is_valid=(disp!=0);
  
  if(is_valid)
    {
      vendor_id=ServerVendor(disp);
      /* This is a macro - there also is an
	 XServerVendor function in libx11.
      */
    }

  CAMLreturn(copy_string(vendor_id));
}

Testing the xlib functions

tf@alpha:~/ocaml-tutorial/c_examples$ make
make[1]: Entering directory `/home/tf/ocaml-tutorial/c_examples'
(...)
make[1]: Leaving directory `/home/tf/ocaml-tutorial/c_examples'
tf@alpha:~/ocaml-tutorial/c_examples$ ./c_examples.top 
        Objective Caml version 3.08.3

# open C_examples;;
# let disp = x_open_display ":0.0";;
val disp : C_examples.x_display = <abstr>
# x_display_is_valid disp;;
x_display_is_valid(disp=0808d1e0) -> 1
- : bool = true
# x_server_vendor disp;;
- : string = "The XFree86 Project, Inc"
# x_close_display disp;;
- : unit = ()
# x_display_is_valid disp;;
x_display_is_valid(disp=00000000) -> 0
- : bool = false

This is already quite nice, but it opens up new questions. If we "lose" an Xlib pointer, it will be garbage collected, but the connection stays open. We might instead prefer to have that particular case handled in such a way that an incidentally forgotten active Xlib connection that is garbage collected will be closed automatically. Furthermore, might even want to have Xlib functions that are called on an inactive/invalid display raise an exception. All this can indeed be implemented, and will be our next major example. But before we consider this, let us make an excursion that explains some more of the background mechanics underlying the low level implementation and inparticular the C interfaces of many functional languages.

Some background on functional language implementations

If look under the hood of all the fancy syntax and ignore code generator issues for now, the relevant questions at the lowest level are: how are the fundamental data types implemented and mapped to machine data types, and what conventions are in place that have to be respected? One important component in this game is the Garbage Collector, which will from time to time scan the heap (= all the memory managed by the language where values can reside) and recycle pieces of data that have become un-reachable and hence ballast.

What type information has to be available at run time? At the very least, the system has to be able to find out whether a certain OCaml value, stored in a given region of memory, contains references to other OCaml values or not. The Garbage Collector has to know this so that it can scan all the memory that has been allocated in our running program for "live" objects and declare all other data as "dead", that is, unreachable. This evidently means that the memory representation of an OCaml array (or tuple, say), which may reference (i.e. contain pointers to) other OCaml values, must contain information about the length of the array (or tuple). One could imagine that from the perspective of the garbage collector, the world of hierarchically constructed types is much simpler, and that indeed, arrays and tuples might even have precisely the same representation in memory: Both represent vectors of OCaml values, and even if they behave very differently from the programmer's perspective, there is no reason why they should not be just the same internally: after all, the question what one can do with these data is resolved entirely at compile time.

So, we may imagine an internal data representation scheme where all constant-time addressable vectors (tuples, arrays) appear as a region of memory that contains a single header word (or at most a few words) that provides length information, followed by pointers to the contents. This actually would be quite similar to the way how data are represented internally in the GCL (Gnu Common Lisp) system (see object.h in the GCL sources, especially the definition of "union lispunion"), only that the structure of the header is a little bit more complicated, and we retain enough information to derive the actual concrete type at run time - which we have to, as LISP is dynamically typed. Non-compiler scripting languages like Perl or Python, which also are dynamically typed, use similar approaches, but typically are way more verbose in their internal value data structures (see e.g. The corresponding definition of typedef struct _object (...) PyObject and the corresponding comments in the Python sources), and frequently include in particular a reference count, as they usually do not have a proper garbage collection (which, by the way, is a shame, given the existence of the very powerful Boehm-Demers-Weiser garbage collector library).

Suppose we stick with such a scheme where every value is represented by a pointer to a piece of memory that holds all the data. Whenever we pass even the smallest piece of data - like an ordinary machine integer number - into a function, the system first has to do dynamic memory allocation to obtain space where to put the number, adorn it with some header that says, basically, "there is only this single one word of data, and it is not a pointer to further values", and then pass a pointer to that piece of memory. The recipient will then have to look up the number through that pointer. Now, this "boxing" and "unboxing" is quite a lot of time consuming overhead, as it is ubiquituous and hence has to be done over and over again. Therefore, it is evidently desirable to have a compiler that is intelligent enough to avoid unnecessary boxing (maybe via inlining) for purely internal functions that are not visible to the outside. However, when calling a function from an independent binary-code library, we presumably will have to go through this boxing and unboxing.

Imagine creating something as simple as an array of one million integers. If OCaml used the scheme suggested right above, we would require two data words (32 bit on 32 bit machines) to represent every integer, and have an array of pointers, so we would need three words of memory to encode a single word of data! Clearly, this is a highly unsatisfactory situation. (Indeed, this is just exactly what happens with GCL: see!) Can this be avoided? Actually, one might think so, as we have all the type information available at compile time that allows us to discern what's a pointer to a value and what's just raw data. But consider the following example:



let wrap_up z =
  let wrap_me1 x = ((x,x),x) in
  let wrap_me2 x = wrap_me1 (wrap_me1 x) in
  let wrap_me4 x = wrap_me2 (wrap_me2 x) in
  wrap_me4 z
;;

(* Example:

# wrap_up;;
- : 'a ->
    ((((((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a)) *
       (((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a))) *
      (((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a))) *
     (((((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a)) *
       (((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a))) *
      (((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a)))) *
    (((((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a)) *
      (((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a))) *
     (((('a * 'a) * 'a) * (('a * 'a) * 'a)) * (('a * 'a) * 'a)))
= <fun>
# wrap_up 3;;
- : (((((((int * int) * int) * ((int * int) * int)) * ((int * int) * int)) *
       ((((int * int) * int) * ((int * int) * int)) * ((int * int) * int))) *
      ((((int * int) * int) * ((int * int) * int)) * ((int * int) * int))) *
     ((((((int * int) * int) * ((int * int) * int)) * ((int * int) * int)) *
       ((((int * int) * int) * ((int * int) * int)) * ((int * int) * int))) *
      ((((int * int) * int) * ((int * int) * int)) * ((int * int) * int)))) *
    ((((((int * int) * int) * ((int * int) * int)) * ((int * int) * int)) *
      ((((int * int) * int) * ((int * int) * int)) * ((int * int) * int))) *
     ((((int * int) * int) * ((int * int) * int)) * ((int * int) * int)))
=
((((((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3)),
    ((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3))),
   ((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3))),
  ((((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3)),
    ((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3))),
   ((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3)))),
 ((((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3)),
   ((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3))),
  ((((3, 3), 3), ((3, 3), 3)), ((3, 3), 3))))

(Incidentally, this is also a nice example that shows that the complexity of the type of an expression can grow at least exponentially with the size of the expression.) What code would the compiler have to generate so that the garbage collector can know which entries of all the tuples in this example hold raw data, and which hold pointers to tuple values? If you think about it long enough, you will come to the conclusion that actually, we require one bit of information for every tuple slot. We might conceive collecting these in a bit-vector which we place right after the tuple header word. This may indeed be possible, but would make the garbage collector somewhat clumsy. The approach usually taken instead makes use of the observation that pointers to values are aligned to divisible-by-four addresses. That is, the two least significant bits of these pointers are unused, and always zero. Suppose now we implement the following scheme: value references will not take the form of ordinary memory pointers to the address where the referenced value lies, but instead be pointers to that address plus one. When we want to use this as a pointer, we use CPU instructions with fixed-offset addressing that cancel this off-by-one. (This is not a problem for CISC CPUs, which have such addressing modes in their assembly language opcode set, and also not a problem for superscalar RISC CPUs, which just have to do one more offset calculation on one of their integer units - and actually, speeding up offset calculations is just one of the major reasons why they do have more than one integer unit (and usually just one memory access unit) in parallel.) We now declare that every tuple entry whose least significant bit is a "one" is such a special pointer, and everything that has a zero as its least significant bit is an "immediate value", that is, the word itself carries all the data.

In particular, we may encode true and false as the binary values 0b00 and 0b10. The integer N we encode as N*2. Addition and subtraction will still work as usual, but when we multiply or divide, we have to do one additional bit-shifting operation (which usually is quite little effort in comparison to the multiplication). This means that we will not be able to discern the memory representations of, say, 1 and true, but this does not matter, as it is of no relevance to the garbage collector, and all conflicts that may happen have been prevented by the compile-time type checking. Likewise, we can encode single characters as immediate values. Functions such as Char.code then may be just eliminated by the compiler.

Such a pointer tagging scheme is what most functional compiler systems use nowadays. There are, however, differences in the tagging schemes implemented. CMUCL/SBCL for example align all memory cells to 8-byte boundaries and use the three least significant bits as type tags to discern cons cells, characters, structures, arrays, etc. See object.tex in the CVS sources. OCaml chooses to use a least significant bit of 1 to denote integers, which is very unusual. Other systems may implement other slight variations on the general subject, such as using high bits instead of low bits. An interesting but very useful curiosity that works without any extra pointer tag bits is the Boehm-Demers-Weiser conservative Garbage Collector for C, which comes as a drop-in malloc() replacement - indeed this is so efficient that some quite reasonable functional languages (Bigloo Scheme, for example) decided not to implement their own GC, but rely on this library instead. How can this possibly work? Basically: if something looks like a pointer, we just assume it could be a pointer and scan the corresponding region.

Now, one might say that if a CPU were especially designed to support functional languages, it should provide extra type tag bits for every value. With modern 64-bit CPUs, there usually is little need for fast full 64-bit integers, so we may well afford providing only 62-bit arithmetics, and pointers will not use the full 64-bit address range anyway due to MMU limitations. (Typically, a page will consist of 1024 8-byte entries, hence use up 13 address bits. The usual three-level MMUs then only can use 10*3+13=43 address bits. Seen that way, going to 48 instead of 64 bits may have been more reasonable.) What is slightly special about OCaml is that its implementors have deliberately chosen to use internal representations that do not allow one to re-derive enough type information to print the value in a meaningful way. Internally, there is no distinction between "false" and "0", say. This is somewhat unfortunate, as it means that there is no way to implement an ad-hoc polymorphic debug-printing function of type 'a -> string that just prints out some OCaml value in a meaningful way - similar to Perl's Data::Dumper.

Even though we might not be meant to know what is going on in this file, it is nevertheless worthwile to have a look at file:///usr/lib/ocaml/3.08.3/caml/mlvalues.h to see how some of the low-level definitions work. Note that we are not supposed to rely on that particular realization, as this may change in the future!

What are the ML-specific macros such as CAMLprim, CAMLparam1, CAMLlocal1, CAMLreturn for? Roughly speaking, CAMLprim has to do with exporting our functions properly for OCaml. The CAMLparam/CAMLlocal macros are required for garbage collection. We can imagine situations where some allocated piece of memory "almost becomes garbage" in the sense that all references to it are lost, except some that are passed into C. As a garbage collection may be triggered at almost any point in time, we must make sure that even if we are inside C code that holds the last references to a given value, the Garbage Collection will know that this value still is in use. In the c_ex_x_open_display_v1 C function in our example, we first introduce a variable of type value named block, which is made visible to the GC. Then, we allocate a block which will have a header tag (Abstract_tag) that tells the GC that this region of memory does not hold a string, an array or tuple, or any other kind of value OCaml may want to deal with in some special way: it will just contain "raw data", and it will always be up to our code to interpret this in the proper way. We generally are requested to use the Field and Store_field macros to retrieve and store data from the fields of a block, if these fields contain ML values. For C data (like pointers) stored in a custom block, we must not use Store_field, as this would tell the GC to keep track of the value in that slot and regard it as a ML value that has to be scanned - with disastrous consequences. Hence, we introduce our own Store_c_field macro to make explicit that we do want to store a value without making the GC worry about it. This macro actually is implemented in a somewhat hackish way and perhaps should rather be part of official OCaml, but at the time of this writing, it is not.

Every entry in a block will be large enough to hold one ML value, and in our example, we implicitly use the slightly dangerous assumption that a value is at least as large as a pointer. (However, this actually seems to be true on all platforms.) Note that if we were to construct a float array from within C on a 32-bit system, we would have to allocate a block with twice as many value slots as the number of entries of our floatingpoint array, and we should use the Double_field and Store_double_field macros to access them. Here, our payload data is either a C Display pointer, or a null pointer, denoting an invalid display. The other C implementations of functions operating on X displays will extract that value and handle it in an appropriate way. Note that the example code sometimes is a bit more verbose than strictly necessary. This is just to clarify the general structure.

Finalization and Exceptions

Quite often, when we wrap C-controlled resources in such a way, we may want to provide means that the resource is freed automatically should it become garbage. Quite in general, it is a good idea not to rely on such GC finalization as the solitary mechanism to free resources but to at least provide explicit de-allocation means. Depending on the resource, we may even want to consider it an error that should be reported if it ever ends up being GC-finalized. One way to implement finalization would be to use the Gc.finalise function on x_display values and register x_close_display as a finalizer and wrap up our raw x_open_display function accordingly on the OCaml side. We may also provide a finalizer written in C. This will be shown in the next example. In addition, we will make sure that using x_server_vendor on an invalid X display will raise a special exception defined by us, which will provide both a human-readable problem description and an OCaml tag telling us what went wrong.

Extending our example: finalization and exceptions

c_examples.mli


(* Add: *)

type x_entity =
  | X_C_Display
  | X_C_Window
  | X_C_Colormap
(* ...May want to add more... *)

exception X_Invalid of (string * x_entity)
(* Note: the round parens are strictly mandatory and vital here!
   The code will compile but crash if we omit them.
   (A Very Dark Corner(TM) of OCaml!)
 *)


c_examples.ml


(* Change and add: *)

external x_open_display: string -> x_display = "c_ex_x_open_display_v2";;

external x_display_is_valid: x_display -> bool = "c_ex_x_display_is_valid_v2";;

external x_close_display: x_display -> unit = "c_ex_x_close_display_v2";;

external x_server_vendor: x_display -> string = "c_ex_x_server_vendor_v2";;

type x_entity =
  | X_C_Display
  | X_C_Window
  | X_C_Colormap
(* ...May want to add more... *)
;;

exception X_Invalid of (string * x_entity);;

let _ = Callback.register_exception "x_invalid" (X_Invalid ("",X_C_Display));;


c_examples_impl.c


/* Note that this is declared static - this makes our code more tidy
   by saying:

   "This function is private to this very module!"
*/
static void finalize_x_display(value block)
{
  Display *disp;
  disp=(Display *)Field(block,1);

  if(disp)
    {
      fprintf(stderr,"Warning: closing X display on finalization!\n");
      fflush(stderr);
      /* Note: normally it is a sign of bad taste to make a library(!)
	 write to stdout/stderr if it was not permitted to do so.
	 Here, this is okay, as we do it for demonstration purposes
	 only.

	 In real code, we should at least introduce an OCaml-visible
	 flag variable that may be used to silence the library.
       */

      XCloseDisplay(disp);
      /* No need to invalidate the entry, as it is garbage anyway! */
    }
}

CAMLprim c_ex_x_open_display_v2(value ml_display_name)
{
  CAMLparam1(ml_display_name);

  Display *disp;

  disp=XOpenDisplay(String_val(ml_display_name));
  
  CAMLlocal1(block);

  block=alloc_final(2, &finalize_x_display,1,10);

  Store_field(block,1,(value)disp);

  CAMLreturn(block);
}

CAMLprim c_ex_x_display_is_valid_v2(value ml_display)
{
  CAMLparam1(ml_display);

  Display *disp;
  int is_valid;

  disp=(Display *)Field(ml_display,1);

  is_valid=(disp!=0);
  
  fprintf(stderr,"x_display_is_valid(disp=%08x) -> %d\n",disp, is_valid);
  fflush(stderr);
  
  CAMLreturn(Val_bool(is_valid));
}

CAMLprim c_ex_x_close_display_v2(value ml_display)
{
  CAMLparam1(ml_display);

  Display *disp;
  int is_valid;

  disp=(Display *)Field(ml_display,1);

  is_valid=(disp!=0);
  
  if(is_valid)
    {
      XCloseDisplay(disp);
      Store_c_field(ml_display,0,(value)0);
    }

  CAMLreturn(Val_unit);
}

CAMLprim c_ex_x_server_vendor_v2(value ml_display)
{
  CAMLparam1(ml_display);

  Display *disp;
  int is_valid;
  char *vendor_id="";

  disp=(Display *)Field(ml_display,1);

  is_valid=(disp!=0);
  
  if(is_valid)
    {
      vendor_id=XServerVendor(disp);
      /* Now, we use the libX11 function and not the macro
	 for obtaining the server vendor
       */
    }
  else
    {
      CAMLlocal1(exn);
      exn=alloc_tuple(2);

      Store_field(exn,0,copy_string("Invalid X Display!"));
      Store_field(exn,1,Val_int(0));
      
      raise_with_arg(*caml_named_value("x_invalid"),exn);
    }

  CAMLreturn(copy_string(vendor_id));
}

Testing the new xlib functions


(* Installation and startup ad before *)

# let disp=x_open_display ":0.0";;
val disp : C_examples.x_display = <abstr>
# x_server_vendor disp;;
- : string = "The XFree86 Project, Inc"
# x_close_display disp;;
- : unit = ()
# x_server_vendor disp;;
Exception: C_examples.X_Invalid ("Invalid X Display!", X_C_Display).
# Gc.full_major();;
- : unit = ()
# let disp2 = ref [x_open_display ":0.0"];;
val disp2 : C_examples.x_display list ref = {contents = [<abstr>]}
# disp2:=[];;
- : unit = ()
# Gc.full_major();;
Warning: closing X display on finalization!
- : unit = ()
# 1;;
- : int = 1
# for i=0 to 100 do Printf.fprintf stderr "%d - %!" i; ignore(x_open_display ":0.0") done;;
0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
8 - 9 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 - 18 - 19 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
20 - 21 - 22 - 23 - 24 - 25 - 26 - 27 - 28 - 29 - 30 - 31 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
32 - 33 - 34 - 35 - 36 - 37 - 38 - 39 - 40 - 41 - 42 - 43 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
44 - 45 - 46 - 47 - 48 - 49 - 50 - 51 - 52 - 53 - 54 - 55 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
56 - 57 - 58 - 59 - 60 - 61 - 62 - 63 - 64 - 65 - 66 - 67 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
68 - 69 - 70 - 71 - 72 - 73 - 74 - 75 - 76 - 77 - 78 - 79 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
80 - 81 - 82 - 83 - 84 - 85 - 86 - 87 - 88 - 89 - 90 - 91 - Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
Warning: closing X display on finalization!
92 - 93 - 94 - 95 - 96 - 97 - 98 - 99 - 100 - - : unit = ()

Let us briefly discuss what's new. First, we are now using alloc_final() to allocate our custom-data blocks. We need to allocate one extra value entry and use the second slot (having index 1) for our data, as a pointer to the finalization function will go into slot 0. Actually, this is not 100% true: alloc_final() is just a legacy compatibility function for the more general and also more flexible alloc_custom() function that also allows us to specify other custom functions that handle, say, serialization to strings, hashing, and comparison. None of this really makes overly much sense on X display pointers, so we just leave it at the more simplistic approach. The other two parameters control how frequently the GC is called after allocating entities of this type. The first value is a measure of the relative amount of resources used by this entity (we just use 1 here), the second one is a measure for how many of these we allow the system to allocate before GC has to be called in order to try to reclaim some that may have become garbage. These "ten allocations per GC" can be seen at the end of our transcript.

Concerning exception handling, we first have to introduce an exception on the OCaml side, and register this with a special name, so that we can locate it from within C by that name. Then, we build the argument tuple - we may have used alloc with the special tag denoting a tuple, but alloc_tuple is a more convenient shorthand. Note that the parentheses in the exception definition are mandatory!. If we were to do some less sophisticated exception handling, we might prefer using the much simpler raise_with_string instead.

C Callbacks

As we have seen, the functional way to think about the decomposition of an algorithmic problem into sub-tasks which are realized by specialized little helper functions is very natural. Consequently, we find such a style of programming also in some C libraries. One typical application is the specification of callback C handlers. Suppose that we have a C library that provides an opaque C structure called - say - "animal", for which we can register a callback C function that is called whenever this animal has to make a sound. Typically, the C library implementor will have thought about the problem that the library user might need more flexibility than what can be provided by registering just a C function. With our background in functional programming, we now may see it that way: C "functions" are not functions, but just routines. A proper function is a piece of code specifying what to do, plus maybe some extra contextual information. We generally just called this a "proper function" so far. If one wants to emphasize the role of the contextual data grouped together with the function, this is sometimes called a "closure". A parameter to a callback-setting function that provides such context which is passed to the registered callback once it is executed then is a "closure parameter". This sounds a bit convoluted - but we will see an example soon.

When we wrap up such a library for OCaml, we will usually want to match the spirit of the original callback approach as closely as possible. On the OCaml side, we will not require a closure parameter to the callback function, as OCaml already has proper functions. On the other hand, we have to use a callback wrapper on the C side that uses the closure parameter to pass around the OCaml function.

The details are best studied by looking at an example. Note that one should pay very close attention here - making things smooth for the user of our code will require some tricky magic under the hood.

This is the C library we want to interface (and a C example):

The animal library
animal.h:


#ifndef _ANIMAL_H
#define ANIMAL_H

typedef void animal_callback_make_sound(void *);

typedef struct
{
  char *name;
  animal_callback_make_sound *cb_make_sound;
  void *cdata_make_sound;
} animal;

extern animal *make_animal(char *);
extern void free_animal(animal *);

extern void register_make_sound(animal *,
				animal_callback_make_sound *,
				void *);


extern void make_sound(animal *);

extern void test_animal(void);


#endif
animal.c:


#include 
#include 

typedef void animal_callback_make_sound(void *);

typedef struct
{
  char *name;
  animal_callback_make_sound *cb_make_sound;
  void *cdata_make_sound;
} animal;



animal *make_animal(char *name)
{
  animal *a=0;
  char *copy_name;
  int name_len;

  if(0==(a=(animal *)calloc(1,sizeof(animal))))
    {
      fprintf(stderr,"Fatal: malloc() failure!\n");
      exit(1);
      /* When such a grave situation occurs, we may even consider
	 aborting the program via raise(9), i.e. shoot ourselves
	 in the head and do not even attempt to do the atexit()
	 cleanup.
       */
    }

  name_len=strlen(name);
  if(0==(copy_name=malloc(name_len+1)))
    {
      fprintf(stderr,"Fatal: malloc() failure!\n");
      exit(1);
    }
  strcpy(copy_name,name);


  a->name=copy_name;
  return a;
}

void free_animal(animal *a)
{
  free(a->name);
  free(a);
}

void register_make_sound(animal *a,
			 animal_callback_make_sound *cb,
			 void *data)
{
  a->cb_make_sound=cb;
  a->cdata_make_sound=data;
}

void make_sound(animal *a)
{
  if(a->cb_make_sound)
    a->cb_make_sound(a->cdata_make_sound);
}

/* === Testing === */

static void callback_make_sound_demo1(void *data)
{
  char *str=(char *)data;
  printf("%c%c%c%s\n",str[0],str[0],str[0],str);
}

void test_animal(void)
{
  animal *a=make_animal("duck");
  register_make_sound(a,&callback_make_sound_demo1,(void *)"quack");

  make_sound(a);
  free_animal(a);
}
animal_example.c:


extern void test_animal(void);

int main(void)
{
  test_animal();
  return 0;
}

/* Test run:

$ gcc -o animal_example animal.c animal_example.c

$ ./animal_example
qqqquack

$ # Works fine!
*/

In order to wrap this, we make the following further modifications:

Extending our example: Lifting callback-setting from C to OCaml

Makefile

SOURCES = c_examples.mli c_examples.ml c_examples_impl.c animal.c

# Note: if we had this in a proper installed shared object library,
# we would give linker option flags as in the libX11 example instead.
c_examples.mli

(* Add: *)

type animal;;

val make_animal: string -> animal

val free_animal: animal -> unit

val register_make_sound: animal -> (unit -> unit) -> unit

val make_sound: animal -> unit
c_examples.ml

(* Add: *)

(* Callback example *)

type animal;;

external make_animal: string -> animal = "c_ex_make_animal";;

external free_animal: animal -> unit = "c_ex_free_animal";;

external register_make_sound: animal -> (unit -> unit) -> unit =
  "c_ex_register_make_sound";;

external make_sound: animal -> unit = "c_ex_make_sound";;

c_examples_impl.c


/* === Animal interface === */

#include "animal.h"

static void cleanup_animal_and_callback(value ml_animal_and_callback)
{
  animal *a;
  value *cb;
  a=(animal *)Field(ml_animal_and_callback,1);
  cb=(value *)Field(ml_animal_and_callback,2);
  
  if(a)
    {
      free_animal(a);
      Store_c_field(ml_animal_and_callback,1,(value)0);
      caml_remove_global_root(cb);
      free(cb);
      Store_c_field(ml_animal_and_callback,2,(value)0);
      /* Defensive programming. */
    }
}

static void finalize_raw_animal(value block)
{
  animal *a;
  a=(animal *)Field(block,1);

  if(a)
    {
      fprintf(stderr,"Warning: reclaiming animal on finalization!\n");
      fflush(stderr);
      cleanup_animal_and_callback(block);
    }
}

static void callback_wrapper_make_sound(void *cb)
{
  value ml_fun= *(value *)cb;

  if(ml_fun != Val_unit)
    {
      fprintf(stderr,"Calling ML callback 0x%08x!\n",ml_fun);
      callback(ml_fun,Val_unit);
    }
}


CAMLprim c_ex_make_animal(value ml_name)
{
  CAMLparam1(ml_name);
  value *ml_callback_container;
  
  animal *a=make_animal(String_val(ml_name));

  /* The callback must be placed in non-moving C-visible
     malloc()-allocated space. It will have to contain a ML value -
     the callback - and we have to tell the GC about it, so that it can
     (1) start scanning from this value,
     (2) adjust the pointer if it moves around the value on the ML heap.
   */

  if(0==(ml_callback_container=(value *)malloc(sizeof(value))))
    {
      fprintf(stderr,"AIEE: malloc failure - aborting!()\n");
      exit(1);
    }

  *ml_callback_container=Val_unit;
  caml_register_global_root(ml_callback_container);
  

  CAMLlocal1(ml_animal_and_callback);
  
  ml_animal_and_callback=alloc_final(3, &finalize_raw_animal,1,100);

  /* Slot zero will contain the custom operations, slot one
     the animal, and slot two a C value which holds 
     the ML function implementing the callback, or Val_unit
     if none is set.
  */

  Store_c_field(ml_animal_and_callback,1,(value)a);
  Store_c_field(ml_animal_and_callback,2,(value)ml_callback_container);
  
  CAMLreturn(ml_animal_and_callback);
}


CAMLprim c_ex_free_animal(value ml_animal_and_callback)
{
  CAMLparam1(ml_animal_and_callback);

  cleanup_animal_and_callback(ml_animal_and_callback);

  CAMLreturn(Val_unit);
}

CAMLprim c_ex_register_make_sound(value ml_animal, value ml_fun)
{
  CAMLparam2(ml_animal,ml_fun);
  animal *a=(animal *)Field(ml_animal,1);
  value *cb=Field(ml_animal,2);
  *cb=ml_fun;
  register_make_sound(a,&callback_wrapper_make_sound,cb);
  CAMLreturn(Val_unit);
}

CAMLprim c_ex_make_sound(value ml_animal_and_callback)
{
  CAMLparam1(ml_animal_and_callback);
  animal *a=(animal *)Field(ml_animal_and_callback,1);

  if(a)
    {
      make_sound(a);
    }
  CAMLreturn(Val_unit);
}

Quite an example, indeed. Let us see now that this really works as advertised:

Running the Callback example


# let the_duck = make_animal "duck";;
val the_duck : C_examples.animal = <abstr>
# make_sound the_duck;;
- : unit = ()
# register_make_sound the_duck (let r = ref 0 in fun () -> begin r:= !r+1; Printf.printf "Quacking for the %d-th time!\n%!" !r end);;
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0x0807a9f4!
Quacking for the 1-th time!
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0x0807a9f4!
Quacking for the 2-th time!
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0x0807a9f4!
Quacking for the 3-th time!
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0x0807a9f4!
Quacking for the 4-th time!
- : unit = ()
# Gc.full_major();;
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0xb7bbef8c!
Quacking for the 5-th time!
- : unit = ()
# Gc.full_major();;
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0xb7bbef8c!
Quacking for the 6-th time!
- : unit = ()
# register_make_sound the_duck (fun () -> Printf.printf "Quack Quack!\n%!");;
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0x0807f9f8!
Quack Quack!
- : unit = ()
# Gc.full_major();;
- : unit = ()
# make_sound the_duck;;
Calling ML callback 0xb7bbc310!
Quack Quack!
- : unit = ()
# free_animal the_duck;;
- : unit = ()
# make_sound the_duck;;
- : unit = ()
# 

The key idea is: We have to provide a C data pointer when we register our callback function. Actually, what we want to pass here is the ML function, but this may be moved around in memory by the GC. So, we have to pass a pointer to a C memory region holding the ML function. But then, we have to make sure that the GC will recognize this C-allocated memory as a position that holds a ML value, which should be treated as a root for heap scanning, and modified if the value is moved around. Therefore, we have to register_global_root it - and unregister and free it once we get rid of the object for which we registered the callback. The reader should take his time to think this through.

Actually, this unfortunately means that we will encounter an ugly problem if the callback function we register is a closure containing the object for which we registered the callback. The reason is that the callback-holding object will be responsible for removing the global GC root in its finalizer - but if we make this object accessible through that global GC root, it never will be finalized. In other words, if we write code like the following, this means asking for trouble:

Running the Callback example


open C_examples;;

let rec test n =
  if n=0
  then ()
  else
    let a = make_animal "frog" in
    let x = [a] in
    let ms () = 
      let len = List.length x in
      Printf.printf "Quaak (%d)\n%!" len
    in
    begin
      register_make_sound a ms;
      make_sound a;
      Gc.full_major();
      (* Now, forget about a and ms *)
      test (n-1)
    end
;;

XXX Actually, if I run this e.g. as (test 1000), I get a segfault, but strangely, the address reported for the callback function always is the same. This should not be possible! Something is wrong with this discussion. Have to investigate!)

This brings us about as far as we want (or have to) go with our discussion of the C interface. Let us conclude this lesson with this final pearl: a module providing functionality that allows us to specify a (Real d-dimensional Space -> Real k-dimensional Space) function in the form of a string containing C code. This will then be put into a C source code file, compiled, dynamically loaded, and linked from within OCaml, so that we can c_register a string and in the end obtain a very fast float array -> float array OCaml function implementing this computation! Documentation of the (very small) ML interface is still lacking, especially concerning re-use of the output vector, and error checking should be improved (catching compiler errors as well as making sure the wrapped function comes with array length bounds checks). Nevertheless, this is a closed example showing many of the techniques we have discussed here, plus a few new ones, in particular: using the module system to define a weak hash table and using this to keep an overview over the still-in-use C-wrapped functions and using dynamical loading of C libraries.

The complete c_examples module we discussed above (for completeness)

The fastfields module


How GCL boxes integers

One very simple way to demonstrate this is to use the shell command "ulimit -v 200000" to artificially limit virtual memory size to 200000 KB and then start GCL. If we try to define a vector of 20 million values, this would require about 80 MB of RAM. If we provide the initial value, all entries will point to the same entity. But look what happens if we start putting different numbers into different places:

GCL and memory management

$ ulimit -a
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) unlimited
cpu time             (seconds, -t) unlimited
max user processes            (-u) unlimited
virtual memory        (kbytes, -v) unlimited
$ ulimit -v 200000
$ gcl
GCL (GNU Common Lisp)  2.6.6 CLtL1    Jan 18 2005 00:13:38
Source License: LGPL(gcl,gmp), GPL(unexec,bfd)
Binary License:  GPL due to GPL'ed components: (READLINE BFD UNEXEC)
Modifications of this banner must retain notice of a compatible license
Dedicated to the memory of W. Schelter

Use (help) to get some basic information on how to use GCL.

>>(defparameter arr (make-array 20000000 :initial-element 7))

ARR

>>(aref arr 0)

7

>>(dotimes (j 20000000) (setf (aref arr j) j) (if (= 0 (mod j 1000000)) (print j)))

0 
1000000 
2000000 
3000000 
Unrecoverable error: Can't allocate.  Good-bye!.
Aborted
$ 
In comparison, MzScheme in the same situation

$ ulimit -v
200000
tf@ouija:~/talks/ocaml-tutorial$ mzscheme 
Welcome to MzScheme version 209, Copyright (c) 2004 PLT Scheme, Inc.
> (define a (make-vector 20000000 7))
> (vector-ref a 0)
7
> (let loop ((j 0)) (if (< j 20000000) (begin (vector-set! a j j) (loop (+ j 1)))))
> (vector-ref a 1999777)
1999777

So, this hints at GCL doing heap allocation of integers, while MzScheme does not. Neither do quite a lot other functional systems.


Dr. Thomas Fischbacher
Last modified: Sat May 13 18:25:36 BST 2006