Common Scientific Programming Library: Dimensions and Units Handling, Parsing, and Conversion

A basic problem in scientific programming is the handling of units and conversions. This issue is often solved in a myriad of ways from scratch. The simplest approach, hard-coding the unit set and forcing all inputs and outputs in that set, is pervasive and limiting. We often provide a limited set of units for input (and possibly output) which a particular code is aware of and can convert to the internal representation. Being such a basic problem, it is often regarded as trivial and a proper solution is seldom attempted.
The mere fact that a problem is of basic importance and pervasive throughout scientific computing means, in no way, that it is simple nor solved. The goal of this project is to implement a general solution to handling, parsing, and converting units for quantities of arbitrary dimensions. The solution will be implemented under a BSD license in a new, open-source, C library for scientific computing: the common scientific programming library (CSPL).
The approach to the problem is four-folds: language formalization, language parsing, dimensions and unit handling, and unit conversion.

Language formalization

The first step in solving this computing problem generally is to formalize the language of dimensions and units. Dimensions express the fundamental nature of a quantity. For example, the distance, x, between Montréal and Québec city is 251 km. The quantity x thus has dimensions of length (usually abbreviated L). The units of x, at this present time are kilometers and is a recognized unit of length in the SI system. This quantity was measured to have a value of 251. The quantity is thus composed of three parts, its value (251), its dimensions (L), and its units (km). Note that while we use a unit that is regarded as a unit of length, the km, to express the quantity x does not mean that we must use this unit always or that we must even use a unit that is often regarded as a unit of length. We could express the distance from Montréal to Québec city in electron-volts (eV), through the transformation . We would thus obtain that the distance between Montréal and Québec city is x=4.94x10-12 eV. While surprising to many engineers, such transformations are routinely used, for example in high-energy physics.
The goal of this first step is to formalize the language of the relationship between dimensions and units. We here have to answer two questinos. First, what are all the ways in which we can relate units defined in different dimensions, e.g. eV and km as above, and are such transformations unique across all fields of sicence and engineering? The second question regards semantincs. We must define a non-ambiguous syntax to record units. For example, the string “mg” readily has at least two meanings with regards to units. The first is milligram, a unit that typically expresses mass, and the second is meter-gram, a unit that naturally represents dimensions of ML, mass-length. A human researcher would use context (e.g. the particular equation, the field of research, etc) surrounding this string to dissambiguate between these two cases. Our software obviously won't be able to distinguish context, so we must find a way to write down all units in a non-ambiguous way. Solving the above example, we could impose that all multipliers be enclosed in square brackets and every “sub-unit” be separated from the next by a period. We sould then have milligrams, “[m]g”, and meter-grams, “m.g”. Other ambiguities abound when it comes to scientific units which will need to be adressed.

Language parsing

Given the previous formalization work, we will seek to implement a parser for this language using C. This parser will be implemented in a new, open-source, C library, CSPL.

Dimensions and units handling

In this same library, we will implement computing structures to handle this information. In other words, computing structures that can be linked against by other codes to store the dimension and unit of a quantity.

Unit conversion

Finally, facilities will be provided that convert between units. This goal involves checking the dimensions of quantities, to make sure that conversion from one unit to another has meaning, and performing conversions in a way that minimizes underflow or overflow possibilities.

Requirements

The ideal candidate is interested in scientific programming and has a basic understanding of programming concepts. Ideally, (s)he will know C, will have used linux, and be familiar with git or another version control system, though this is not required. Any tool can be learned.

Salary

Unpaid. Possibilities of publishing, aligning this project with a course project, or securing funding in the future.

Point of contact and direct supervisor

Dr. Charles Basenga Kiyanda
Concordia University, EV 4.233
Email for more info