Concordia University's DEC LAB: Research at the DEC LAB

Concordia DEC Lab about us . courses . students . . publications . events . people . contact us

Formal Artificial Life

Laws and Life (cont'd)

4 The Programming Language

The first part of this section describes a concrete proposal for reproduction. Versions of this have been implemented as prototypes. The second part discusses possible approaches to the design of other parts of the language.

4.1 Reproduction

Reproduction is not simply replicating. A reproducing agent must create an embryo with an appropriate genome and provide resources for the embryo to grow into an adult agent.

Reproduction has the following steps:

1. Two agents encounter one another (§3.6.5).

2. The agents agree to reproduce and decide who will be the father (F) and who will be the mother (M).

This is rare in nature (e.g., snails). Better to have a fixed sex for each agent or allow sex to evolve - but this could be difficult [Taras].

3. F passes its genome to M.

4. M creates a new genome G by combining the two genomes, using crossover, mutation, etc.

5. M uses G as a program to create a new agent, the child (C). This may take some time and may consume some of M's resources.

6. When C is sufficiently grown-up to survive by itself, M releases it into the environment.

Self-Replication This section is rather technical. It demonstrates theoretical possibilities rather than something that we might actually do.

This section is based on early work by von Neumann. We use strings and functions acting on strings, where we assume that the functions themselves can be encoded as strings. Upper case letters denote certain fixed strings and lower case letters represent string variables.

. Infix "+" denotes string concatenation.

. C is a function that copies strings: C(s) = s.

. B is a function that builds a function (that is, decodes a string) from its encoded form. It has the property that, for all strings s and t:

B(s + t) = B(s) + B(t)

. E is a function that encodes a function as a string. Thus both B and E are string-to-string functions and E = B-1.

. K is a control function (actually a combinator) defined by

K + f + g + x = f(x) + g(x)

where f and g are functions and x is a string.

We consider the effect of K with arguments B, C, and some string x. We have

K + B + C + x = B(x) + C(x). (1)

Let E(K) = k, E(B) = b, E(C) = c, and x = k + b + c. Then the right side of (1) simplifies:

B(x) + C(x) = B(k + b + c) + x = K + B + C + x.

Consequently, K + B + C + x reproduces itself when K is activated.

In order to make this work, we must define a set of operations, represented as strings, that enable K, B, and C to be coded with the effects defined above.

If we interpret x = k +b+c as the "genome" and assume that its position in the agent (that is, its position in the string K + B + C + x) is known, we can define an operation activate that applies a string to the genome. To make the agent reproduce itself, we execute activate K. In turn K executes activate B and activate C and concatenates the results.

The importance of a mechanism of this kind is that the agent's ability to reproduce itself is built into it rather than being provided by the system. This allows the effects of evolution, such as crossover and mutation, to affect the reproductive process itself.

As an example, we introduce the string "fbgsfgcfcgsfgc.bmbtn.cmotn.gkbc." which matches the pattern above. (The match is not exact, because the genome has a prefix "g" and a suffix ".".) The values of K, B, C, and x are:

K = B(k) = "fbgsfgcfcgsfgc.",

B = B(b) = "bmbtn.",

C = B(c) = "cmotn.",

x = "gkbc.".

This agent replicates itself when its atoms are interpreted as instructions for a processor. Figure 5 shows both the formal meaning and the intended interpretation of each atom. The third column shows the effect of the atom as either pseudocode or C++. Note that some atoms are used as both data and instructions. Figure 6 explains the reproduction of the agent "fbgsfgcfcgsfgc.bmbtn.cmotn.gkbc.".

The effect of f x is to look for an occurrence of ". x" in the agent. Each component of the agent starts with "." and a key letter: for, example, the build starts with "b", the copier starts with "c", and the genome starts with "g". The sequence "fb" looks for ".b" and starts a new process at that position of the agent.

In molecular biology, many reactions start because one molecule binds to another molecule (typically, the molecules are proteins). The search process performed by "f" is loosely analogous to binding.

Figure 5 gives the appearance that the model is close to conventional computing, but the following issues should be considered.

Figure 5: Semantics

Figure 6: Explanation of "fbgsfgcfcgsfgc.bmbtn.cmotn.gkbc."

. The only data structure used is a string. Many biological molecules have a string-like form: RNA and DNA are essentially strings of codons, proteins are strings of amino acids.

. It is more or less impossible to write a program without storing some state. In this model, state is stored in three registers, r0, r1, and r2, each of type char*. Thus each register represents a position on a string, just as a biological agent attaches itself to a string and moves along it. In particular, r0 plays approximately the role a the "instruction counter" - it usually points to the letter currently being interpreted. r1 and r2 are like general purpose registers.

. The call() and ret() functions also look suspiciously like conventional programming - perhaps they need better names. The effect of call(p,g) is to suspend the current thread and start a new thread at position p on a string. The new process is also given a pointer to the first letter of the genome g. Depending on p, this corresponds to either protein synthesis or genome replication.

. The find action searches the string for the delimiter ".", just as a cell activities look for start symbols on the genome.

Figure 7: Effect of mutations. Explanations are on the left. In the right column, replicating the upper string yields the lower string.

In some ways, the reproductive process is quite robust. Figure 7 shows how various changes in the agent or its genome affect replication. In other respects, the mechanism is quite fragile:

. The "f" instruction, which searches for ".", depends on a fixed sequence and number of components. It should be replaced by some kind of pattern-matching search, which would also be closer to biology. (There is a prototype for this.)

. The "build" instruction, "b", is coded essentially like this:

The mapping from gene to body component is part of the "laws" rather than part of "life". In other words, the agent has no control over the encoding of genes. This can be fixed by:

- putting more detail into the genes (e.g., "k" should be a string describing the controller)

- putting more structure into the "b" operator

- some combination of both of these

. In a revised version of the self-replicator, information has been moved from the instruction set to the genome. There are two obvious patterns in the organism above. The first is the "find and execute" pattern, which occurs twice in the controller: "fbgsfgc" invokes the builder, and "fcgsfgc" invokes the copier. The second is the "loop to dot" pattern which occurs in the builder as "mbtn" and in the copier as "motn".

In the new self-replicator, the build command, "b", defines two transformations:

in which x and y are "variables" that are replaced by particular atoms during execution. The "standard" agent is now:

kfbgsfgckfcgsfgc.bmbtn.cmotn.gqbqcdlbblco.

The controller is slightly modified, the genome is longer because it contains more information, and the builder and copier remain the same.

The substitution mechanism is reasonably plausible in the biological sense.

A slight elaboration of this scheme is to add new body parts and genes for them. The major change needed to the notation is to provide a better mechanism for locating parts of the program. As indicated above, however, this is desirable anyway.

A number of issues in self-replication remain to be analyzed:

. Building - what are the resources, time, storage/buffer areas, etc., needed for a particular component?

. Layout - define the size, shape, and connectivity of components.

. Program - each component of the agent must have a description in the genome.

4.2 Tentative Ideas

We must think of a better name than genol!

The name of the programming language (until we think of something better) is genol. Designing genol is an interesting challenge. Here are some possible features of genol:

. If it is a high-level language, the genome and the agent's program will be short; if it is a low-level language, they will be long. We should aim for a compromise, which gives genomes and programs of reasonable length (although it is not yet clear what a "reasonable length" might be).

. genol programs should be robust in the sense that a small number of random changes should change their behaviour but not destroy them completely.

. genol should be an intensional language: the effect of an instruction depends on its context, just as the effect of a protein depends on where it is used.

. Operands are vectors, scalars, or atoms.

. genol is object oriented. It will probably use delegation rather than inheritance, because delegation is more dynamic and more flexible.

. A program consists of both data and code. The data consists of a network of linked objects (abstractly, it is a directed graph). The links (probably represented as pointers) both hold the objects together and act as communication pathways: object A can send a message to object B if and only if there is a link from A to B.

. A program can "grow" by creating new objects and incorporating them into the network and it can "shrink" by shedding objects. During replication, the mother grows the child by creating new objects in this way. Growth is controlled by the genome.

Capabilities of genol programs should include:

Copying. Create a copy of a string using atoms belonging to the agent. Copying may be used by the agent during reproduction to copy the genome. Perhaps this operation should be broken down into simpler parts so that the agent has to execute a genecopying program during reproduction.

Detecting quantity. Tell the agent how many atoms of a particular kind exist at the current lattice point. An agent can use this information to decide whether to feed or to move on, whether any other agents might be in the neighbourhood, etc.

Detecting gradient. Tell the agent the concentration gradient of atoms of a particular kind in the neighbourhood (the result is a vector). An agent can use this information in order to decide in which direction to turn or move. Agents could discover the gradient for themselves by moving in each direction and measuring the quantity of a particular atom there; however, this would be very inefficient and there seems to be no advantage to this approach other than demonstrating that agents can learn something.

Digesting. Transmute atoms in the agent's stomach in order to obtain energy. Digestion should be partly voluntary and partly involuntary. For example, if the agent eats low-energy atoms, it cannot simply ignore them. They would be involved in digestive reactions and would thereby lower the agent's energy, perhaps fatally. Hopefully, agents would evolve to avoid eating such atoms.

Eating. Transfer atom(s) from the current lattice point to the agent's stomach.

Excreting. Transfer atom(s) from the agent's stomach to the current lattice point.

Moving. Move the agent to the next lattice point in the direction in which it is currently facing. The move is not permitted to occur if the next lattice point is occupied by an agent.

Figure 8: Instructions. See Figure 4 for the interpretations used for "s" and "v".

Figure 9: Program fragments

Turning. Rotate the agent through one right angle about a given axis. There are six directions and the agent is already facing in one of them. Consequently, four turns are allowed and two (through 0^o and 180^o) are not.

The quantities involved in an operation are not specified here. Possibilities include:

. An operation affects just one atom. This is probably too inefficient to be useful.

. An operation affects a fixed number of atoms ("fixed" in the sense that the amount is determined by a universal constant). This is feasible, but might be too rigid.

. There is an operation that sets the number of atoms affected. This gives the agent control over quantities and is probably the best solution.

An agent can multiply quantities by n simply by repeating an operation n times.

The precise codes have not yet been determined. (In fact, the codes given here are inconsistent with those of §4.1 below. The research was done at different times!) The list in Figure 8 describes some of the tasks that a program might be able to perform. Figure 8 suggests possible encodings of some of the instructions. Figure 9 shows some examples of code sequences based on Figure 8.

Page 1, Page 2, Page 3, Page 4, Page 5