Sosa: Home

Sane OCaml String API

This library is a set of APIs defined with module types, and a set of modules and functors implementing one or more of those interfaces.

The APIs define what a character and a string of characters should be.

This is the development branch of the library, the latest released version is 0.0.1.

See the INSTALL file for build instructions.

Module Types (APIs)

We have:

Implementations

Native OCaml Characters

The Native_character module implements BASIC_CHARACTER with OCaml's char type.

Native OCaml Strings

The Native_string module implements BASIC_STRING and UNSAFELY_MUTABLE with OCaml's string type (and hence Native_character).

Lists Of Arbitrary Characters

List_of is a functor: BASIC_CHARACTERBASIC_STRING, i.e., it creates a string datastructure made of a list of characters.

Build From Basic Mutable Data-structures

The functor Of_mutable uses an implementation of MINIMALISTIC_MUTABLE_STRING to build a BASIC_STRING.

Integer UTF-8 Characters

The Int_utf8_character module implements BASIC_CHARACTER with OCaml integers (int) representing Utf8 characters (we force the handling of not more than 31 bits, even if RFC 3629 restricts them to end at U+10FFFF, c.f. also wikipedia). Note that the function is_whitespace considers only ASCII whitespace (useful while writing parsers for example).

Examples, Tests, and Benchmarks

See the file sosa_test.ml for usage examples, the library is tested with:

  • native strings and characters,
  • lists of native characters (List_of(Native_character)),
  • lists of integers representing UTF-8 characters (List_of(utf8-int array)),
  • arrays of integers representing UTF-8 characters (Of_mutable(utf8-int array)),
  • bigarrays of 8-bit integers (Of_mutable(int8 Bigarray1.t)).

The tests are a self-compiling “Shell-then-OCaml-script” which depends on the Nonstd, and the OCaml Bigarray libraries:

./test/sosa_test.ml

and you may add the basic benchmarks to the process with:

./test/sosa_test.ml bench