Reference.txt
Table of Contents
- 1. The Algorithm Notation Scheme
- 2. Documentation
- 2.1. Introduction (files & repl)
- 2.2. Scheme Files
- 2.3. Errors
- 2.4. Primitive Data Types
- 2.5. Environments, System Environment and Name Spaces
- 2.6. Formatting Textual Output
- 2.7. Control
- 2.8. Data Flow
- 2.9. List Processing TODO
- 2.10. Data Associations
- 2.11. Vector Calculus
- 2.12. Sorting
- 2.13. Byte Structures
- 2.14. Data Structures
- 2.14.1. (struct . slot-names) => struct-layout
- 2.14.2. (structure-layout labels)
- 2.14.3. (allocate-struct struct-layout)
- 2.14.4. (is-a? object data-type)
- 2.14.5. (slot-ref struct slot-name)
- 2.14.6. (slot-set! struct slot-name value)
- 2.14.7. (struct-from-values struct . values)
- 2.14.8. (values-from-struct struct)
- 2.15. Double Linked Lists (D-Lists)
- 2.15.1. (make-dlist)
- 2.15.2. (dlist? object)
- 2.15.3. (dlist-start? dlist)
- 2.15.4. (dlist-end? dlist)
- 2.15.5. (dlist-head dlist)
- 2.15.6. (dlist-tail dlist)
- 2.15.7. (dlist-next dlist)
- 2.15.8. (dlist-prev dlist)
- 2.15.9. (dlist-add! dlist value)
- 2.15.10. (dlist-insert! dlist value)
- 2.15.11. (dlist-remove! dlist)
- 2.15.12. (values-from-dlist dlist)
- 2.15.13. (dlist-from-values values)
- 2.16. Memory Management
- 2.17. Scheme vs POSIX
- 2.18. Portable Operating System Interface
- 2.19. Implementation Specifics
- 3. Transformations and Macros
- 4. TODO Classification
- 4.1. Binary Distinctions
- 4.2. Unknown Distinctions
- 4.3. Type Code Tag System Tree
- 4.4. X Window System Classes
- 4.5. POSIX classes
- 4.6. Existing Meta Classes
- 4.7. Graph of Direct Accessors
- 4.8. Data Flow: Ports, Port Table, File Descriptors
- 4.9. Display Form and Serialization
- 4.10. Enclosed Storage Classes
- 4.11. Derivation
- 4.12. PS development progress
1 The Algorithm Notation Scheme
Please revert to Structure and Interpretation of Computer Programs for an introduction to the Algorithmic Language Scheme (ALS) which is the predecessor of the Algorithm Notation Scheme (ANS).
This scheme isn't the first, the last or the superior one. It's just what it is but it's already there. And it is adaptable to be what it should be.
Does the arrow mean "from this follows"? => Exactly that: a tempospatial close causality representation to be perceived: descriptive as output from the evaluation and prescriptive in the documentation. Have fun!
1.1 Sequence
(begin expressions)
Forms one expression out of a sequence of expression. The value of a sequence is determined by the last expression evaluated as part of the sequence.
1.2 Function / Procedure / Scope
(lambda formals expressions)
1.3 Closure, Procedure, Macro, Primitive, Syntactic Forms
Macros are syntax transformers bound to a symbol which expands to the corresponding transformation.
Most built in syntactic forms are macros and implemented as primary primitives.
Most primary primitives are only available in combination with the associated syntax transformers bound the corresponding keywords (i.e. symbols where the data type keyword is just another XXX).
Secondary primitives are directly bound to their symbols in the system environment and count as procedures.
A closure is a procedure created by the special lambda form.
All of these are considered objects with an associated default text representation with display or serialization with write which are currently considered generational or traditional and do not necessarily reflect current implementaion details or conventions. The reported type of some objects can in addition be changed by setting procedure properties.
1.4 Continuation
A continuation is a procedure that won't return.
A comment on different uses of that term:
The Algorithmic Language Scheme offered a different "implementation detail" without much concept but that isn't part of the Algorithm Notation Scheme anymore. The term continuation is thus bound to an abstract concept which implicates a special control flow within a program. Where one example would have asked for several return points considered continuations which can be modelled but not expressed with common lambda notation the old concept offered multiple returns at the same point which is considered sort of not well defined by now. Maybe that's just another point of view as the situation with multiple return points sees mutliple calls to each continuation.
In general the concept survived as part of the ALS but has in practice been replaced with the message passing style or paradigms: message handlers would today be a better term for the situation with multiple return points. Where message handlers usually return into the controlling loop of an application a continuation would have called the controlling procedure again.
The multiple return point situation:
Part A of a program calls GUI-control to receive events from a graphical input path. Part B of the program is the REPL and it calls read waiting for input. With multiple return points one could change the underlying implementations to enter the control loop to then return to where appropriate according to the port where input was available first.
While in theory this is tempting and in practice it would be daunting to change every loop like the one of the REPL to be compatible with another control loop one has to realize: there aren't any other applications than the REPL conforming to the ANS and the outcome of this implicit change to another control loop (or control flow with multiple return points) isn't satisfying if not prepared thouroughly. Maybe we just got that used to the message passing style where it would only have been one syntax form to retain the illusion of different parallel evaluations with little syntactical overhead but side effects and control became more important than perceived evaluation of functions. The return value of a message handler usually isn't what we care about …
To finish this comment we add a little description of the previous implementation of what had been denoted as continuation: a continuation captured the current call stack pretty much like a closure captures the current environment. Calling a continuation meant to replace the current call stack with the one saved in the continuation. That way some program state may have been reverted to that previous state when the continuation was created which lead to one interpretation where a continuation should have restored all of that gone program state against the other interpretation which considered it to be multiple returns at the same point without considering the program state. Modern stack frames might differ significantly from logical stack frames and for a lot of procedures not only related to the input and output path the actual behavior would have been unspecified.
Multiple stacks (threads or parallels) might offer the possibility to realize mutliple continuations with multiple return points after considerable organization and preparation of a setup with a controlling thread respectively parallel.
[Shouldn't that be better part of the road map. We could as well collect some "forgotten" methods of parallelism here: the eight byte word with the bit masks all applied in one operation. That perceived parallelism of the controller with multiple continuations even if evaluated completely in sequence and with- out parallels. Maybe someone was the storyteller here as it fell well where "yesterday" we sat there telling each other horror stories: sometimes these stories just fall well but sometimes or some don't like these well fallen things well, well. Spooky in the evening mood.]
1.5 Conventions
1.5.1 Predicate (?)
Predicates are procedures whose main purpose is to return a boolean, i.e. to answer with falsity or not falsity corresponding to the arguments. They usually have a question mark at the end of their symbol to distinguish them from other procedures with a similar name.
(number? 5) => #t (number? "text") => #f (number? number?) => #f (procedure? number?) => #t (list? procedure?) => #f (list? (list 1 2)) => #t
Please note that predicates are considered procedures implementing logical functions. Variables holding booleans usually don't have a question mark at the end of their symbol (but that seems to have become a matter of style).
1.5.2 Bang (!)
The theoritical basis is functional programming without side effects. In that theory data structures cannot be modified but equal data structures are to be considered equal.
(define a 4) (define b a)
Now b has the same value as a and they are considered equal.
(set! a 3)
The primitive set! allows us to modify variables directly. This contraddicts some theory and by convention we put an exclamation mark at the end of its symbol and call it "set bang" to remember everyone that we just let go off our clean theory.
In practice there are side effects and we know about memory objects which might or might not represent the same thing twice. Often it is more convenient to not need to think about memory objects and to remain on a theoretical level but without losing the ability to change data objects. One the other hand it is sometimes a lot faster to modify data structures directly. Therefore a lot of procedures have two variants: a regular one which almost always returns the result of the processing without modifying the original datum and a destructive one which is allowed to modify the datum at hand.
ll expressions 1 (define a (list 1 2 3)) (reverse a) => (3 2 1) 3 a => (1 2 3) (reverse! a) => (3 2 1) 5 a => unspecified: (1) (set! (list 3 2 1)) 7 (define b (append a a)) b => (3 2 1 3 2 1) 9 (reverse! b) => (1 2 3 1 2 3) a => unspecified: (3 1 2 3)
The lines number five and 10 of the example tell you what is meant by destructive. While the bang variants reliably tell you the original datum can be modified if necessary there is no guarantee to receive a full copy of a datum when there is no bang. The last line shows a reversed a even if we didn't ask for it. That is because append is allowed to reuse its last argument as that last argument doesn't need to be modified to become part of the final list. There is some need to consult the documentation of the destructive variants to find about their side effects or what you can rely on. Relying on the return values marked as unspecified is considered an error.
In practice append remains an exception where one has to pay attention to the de-tail. XXX that is not funny and perhaps concatenate shouldn't just be a synonym for append???
1.6 Conditional Clauses
1.6.1 when
(when condition expressions)
1.6.2 if
(if condition then), (if condition then else) then: one expression to be evaluated when the condition is true else: one (optional) expression to be evaluated when condition is flase
1.6.3 cond
(cond clauses)
- clauses: (test expressions), (else expressions)
Evaluate all tests until a test is true or the special keyword else (XXX keyword, symbol, current implementation) is found. Evaluate the corresponding expressions as a sequence and return the last return value.
While the conditional clauses should be distinct and order shouldn't matter a top down sequence for the evaluation of the tests is guaranteed.
1.7 Collections
There are two types of collection: vectors and pairs. As this is still a list interpreter lists can easily be constructed with pairs and they hold a one-way sequence of data terminated with an end of list token called the empty list ='()=. It's easy to modify existing list structure and to construct arbitrary data graphs with pairs. On the other hand vectors provide constant space for their collection which can be addressed in constant time by number where for a list an algorithm would need to traverse n elements to get the n-th element or the one after it depending on how you count.
Usually lists are more flexible and the Algorithm Notation Scheme implements expressions as lists. For advanced programming task or creation of data graphs it is advised to first follow the theory but to not forget about the object orientated extensions available which simplify the description of data graph nodes and especially the modification of these description which can become quite cryptic when realized with pairs and list processing alone.
BTW A pair isn't just a vector of size two as it doesn't hold any size argument. Every vector internally is a pair holding a size argument and the data segment.
1.7.1 Circular Data Graphs
Circular Data Graphs: a vector is a generic collection and can hold any datum (specified by the ANS) and even itsself. Circular data structures can in general not be serialized by using write and may lead to unexpected results if the circularity isn't handled correctly. [editor's note: this isn't too specific to vectors]
- Serialization
(define v (make-vector 2 0)) v => #(0 0) (vector-set! v 0 v) v => #(#0# 0) #(#0# 0) throw catched: read-error: scm_lreadr: Unknown # object: (#\0)
One way to realize serialization of arbitrary data graphs relies on unique tokens or identifiers. Often the virtual memory address can be sufficient [currently unsupported] but their "scope of uniqueness" is limited to one session / parallel [Have you ever heard of memory location recycling? How long is that address valid?]. The "spatial" scope may not necessarily be confined to one message but is hard to specify in general. UUIDs, IPv6 and other "inhuman" oddities resulted from approaches to the generalizatino of the identification by number approach. (Inhuman because these identifiers can in general not be considered human readable and thus impose practical problems and limitations on their use.
- Memory References
Circularity is in general not a problem but often useful for a lot of applications relying on such data graphs. But see also the documentation of the memory management for further information.
1.8 Multiple Return Values
See also the documentation of values elsewhere. Multiple return values as introduced with values are considered plain lists. They are guaranteed to be lists in the ANS.
1.8.1 receive is part of VSI core
(receive values variables . expressions)
It's the worst example of them all but here it is:
(receive (values 1 2 3) (a b c) (values a c b))
1.9 Repeat Algorithms
1.9.1 Named Let
The most common and most versatile looping construct is called named let:
(let name (variable-initializations) body)
where variable-initalizations is a sequence of binding forms: (variable-name initalization-expression)
Each initialization-expression is evaluated (in order ???) but may not depend nor on other variables of the variable-initializations nor on the side effects of other initialization expressions. (Use a let* form within or around the named let to implement sequential initialization).
The return value of the named let expression is the return value of the last expression evaluated within the named let form. Common practice is to not use the return value directly even if it is well specified.
(let loop-name ((a 1)) (when (<= a 5) (display a) (newline) (loop (+ a 1)) ))
Restrictions: The object denoted by name (if any) cannot be considered a first class object but rather a syntactical keyword or literal within its scope.
A named let form may or may not be implemented as a regular closure. The value denoted by loop should not be used outside of its scope as it is considered a looping construct and not a general function call even if the similarities may be complete.
That means you shouldn't use the name where a closure is expected as in (filter name list). The behavior for these cases is unspecified and may change without notice in future revisions.
Rationale: the current implementation of named let creates inherent rings which cannot be released automatically by the memory management subsystem. For this an autocutting feature has been installed which voids the value denoted by name after its scope. Any use outside of the scope would probably evaluate to the final return value of the named let expression.
1.9.2 (repeat n closure)
Repeat closure n times.
2 Documentation
This reference documentation is still work in progress. Please consider using additional resources describing Lisp or Scheme.
2.1 Introduction (files & repl)
If there is no file specified to be loaded the REPL will be started. The REPL is a traditional application allowing basic interaction with the interpreter.
The following parameters can be used to change its behavior for the current session:
- repl-echoing: #t / #f Your input will be echoed or not before printing the answer.
- repl-coloring: #t / #f The REPL prompt appears colored or not on terminals.
Usually it's a good idea to save your programs in files and to make the interpreter process them directly:
- example: =/opt/VSI/interpreter "My Program.scm"=
Please note: there is no further prompt as your program
completely substitutes the REPL which is considered a
default fallback application if there is nothing else to
do. You can start the REPL from within your program with
the following expression:
(begin (sys-load "VSI-core/repl.scm") (REPL))
2.1.1 Interpreter Invocation
/opt/VSI/interpreter
This should start the REPL as there isn't specified anything else to do.
/opt/VSI/interpreter --debug
Enables debugging aids regarding the interpreter itsself.
/opt/VSI/interpreter file-name
Load and evaluate the expression contained in "file-name".
/opt/VSI/interpreter -- --my-strange-filename
Stop argument processing with two minus signs.
2.2 Scheme Files
- Magic Shell Header: Make a POSIX shell script out of a file containing Scheme expressions. Don't forget to set the executable bit in the file permissions.
#!/opt/VSI/interpreter -- !#
- file name suffix: scm
2.2.1 (command-line), (program-arguments)
Both procedures return the same.
2.3 Errors
Some error conditions can be catched and are signalled with a throw with some arbitrary symbol as the key. These throws can be handled with catch. The more traditional way to signal error conditions with special return values (like falsity and the empty list) is still there.
There is no way to distinguish a throw as part of a special control flow from an throw caused by an error condition if not by checking for the following symbols. There is no concept of exception, nested exception or error object even if some parts consider throws to be equal to exceptions.
2.3.1 Builtin Error Keys
- read-error
- undefined
C name: scmunboundvariablekey (interpreter-eval)
- syntax-error
C name: syntaxerrorkey (interpreter-eval)
- system-error
C name: scmsystemerrorkey (debug-error)
- numerical-overflow
C name: scmnumoverflowkey (debug-error)
- out-of-range
C name: scmoutofrangekey (debug-error)
- wrong-number-of-args
C name: scmargsnumberkey (debug-error)
- wrong-type-arg
C name: scmargtypekey (debug-error)
- memory-allocation-error
C name: scmmemoryallockey (debug-error)
- out-of-memory (to be replaced with memory-allocation-error)
C name: scmoutofmemorykey (debug-error)
- misc-error
C name: scmmiscerrorkey (debug-error)
- quit
C name: ??? is there a binding for this in C?
A throw with the key quit doesn't signal an error condition but a regular program exit.
- Networking Database Error Keys
Please note: there are some more error keys in the network database routines, ie. name resolution services. Where the other (posix) routines use the key 'system-error' and the error message corresponding to errno the network database routines map herrno to special keys except for the value NETDBINTERNAL.
2.3.2 Error Condition Resolution
- Undetected error conditions will lead to unspecified behavior.
- Detected error condition resolutions
- Leave the routine and return some special value signalling an error condition (falsity, NaN)
- Call an error handler with the information describing the
error.
- Throw some key and the error description
- (Currently unsupported: Call in a debugging routine)
- Abort the session with some message on stdio
2.4 Primitive Data Types
2.4.1 Pairs
2.4.2 Booleans
Everything that isn't falsity is true. Where the concept of falsity is denoted by #f and true is #t or any other value.
Even the special value denoting unspecified is considered to not be false while often times it might as well be some unspecified arbitrary return value. But as a matter of style and to allow for future changes one shouldn't rely on a return value being not false if it is defined to be unspecified as it is for set!, define and similar procedures.
2.4.3 Numbers
Outer forms and number types:
- Exact Integer: 2 +3 -1
- Exact Fraction: 2/3 -7/6 +2/5
- Inexact Integer: 2.0 +3.0 -1.0
- Inexact Real: 0.667 -1.167 +0.4 exponential notation: 5.0e7 5.0e-7
- Inexact Complex 0.667+0.1i -1.167+5.0e-7i
Please refer to scm_read_number
in interpreter-read.c for the
exact definition of the notation of numbers.
Base selection
- #xFF hexidecimal
- #b01 binary
- #o70 octal
- #d10.1 decimal
- #I inexact
- #E exact
Note: seems like currently only decimal notations allow for (floating)
points and the exponential notation (resulting in an inexact number).
Please refer to scm_read_number_and_radix
in interpreter-read.c for
the exact definition of the notation of numbers introduced by a hash.
Special numerical values:
- +inf.0
- -inf.0
- +nan.0
Anything else counts as a symbol: /+a1 => throw catched: undefined: "Unbound variable: +1a"/
TODO document error reporting and exactness
"scmmin and scmmax return an inexact when either argument is inexact, as required by r5rs. On that basis, for exact/inexact combinations the exact is converted to inexact to compare and possibly return. This is unlike scmlessp above which takes some trouble to preserve all bits in its test, such trouble is not required for min and max."
Implicit coersion to inext numbers (implicit exact->inexact) is part of most mathematical operations. This happens as soon as one of their arguments is inexact. An implicit conversion inexact->exact shouldn't happen: a function applied to at least one inexact argument should return an inexact value.
- Selectors & Constructors
- Predicates
- any: takes any memory object as an argument
- num: only accepts memory objects for which number? returns true
- rea: takes any number except for complex numbers where complex number here denotes numbers with an imaginary part different from zero. [ REVIEW real? rational? complex? and perhaps /integer?/] [maybe these should just return the direction of the real part for complex numbers. Maybe someone just forgot that.]
- int: argument prerequisite: (and (number? x) (integer? x)) inexact integers are accepted
- Implementation Details
There are currently five data types to represent numbers:
- Immediate exact integers
- Bignums (exact integers which are bigger than any immediate integer)
- Fractions of exact integers (both immediate and bignum). As soon as fractions can be represented by an integer they will be represented by an integer.
- Reals: inexact numbers
- Complex: complex numbers with a non zero imaginary part. (As soon as the imaginary part becomes zero it is coerced into a Real.
Please note: there are currently no predicates to check for the actual data type (tag) of a number but combinations of the existing predicates might provide better abstraction in the long run.
- Immediate: (and (number? o) (immediate? o))
- Bignum: (and (number? o) (not (immediate? o)) (exact? o))
- Fraction: (and (number? o) (exact? o) (not (integer? o)))
- Real: (and (number? o) (inexact? o) (zero? (imag-part)))
- Complex: (and (number? o) (inexact? o) (not (zero? (imag-part))))
- Implementation Predicates
- Abstract Mathematical Set Predicates
(See also Documentation/Aftermath/maths.scm)
These predicates don't distinguish between exact and inexact numbers but try to judge the mathematical properties of their arguments independently of the implementation where possible.
- N: natural numbers with or without 0 and with or without the negative mappings.
- Q: N and rational numbers
- R: Q and irrational and transcendental numbers
- C: R and the imaginary unit i where i*i = -1
As this is based on set theory every number should satisfy as complex? which might be misleading if one thinks of the data type implementations while here it should convey the mathematical idea.
- num
(complex? x)
Can x be considered an element of C?
- num
(real? x)
Can x be considered an element of R?
As real numbers are represented as inexact numbers they do qualify as rational numbers in the current implementation. Therefore real? equals rational? but this isn't reflected well:
(equal? real? rational?) => #f
FIXME - num
(rational? x)
Can x be considered an element of Q?
- num
(integer? x)
Can x be considered an element of N (including 0 and its mapping into the negative range)?
- num
(zero? x)
Be aware of the following oddity:
(zero? 0.0+1.0i) => #f
- int
(odd? n)
- int
(even? n)
- Comparision
- TODO Equality eq?, eqv?, equal? is a mess
- eq? should reflect memory object addresses; comparision
of immediates where a cell pointer counts as the immediate value
- eqv? should be the same as = but isn't? [where = is an rpsub]
- equal? should be eqv? but is another …
- (= . arguments)
- (< . arguments)
- (> . arguments)
- (<= . arguments)
- (>= . arguments)
- TODO Equality eq?, eqv?, equal? is a mess
- Domain Boundaries
- Set Selection
(max . arguments)
(min . arguments)
- Sum
(+ . arguments)
- Multplication
(* . arguments)
- Subtraction
(- . arguments)
- Division
(/ . arguments)
- Square Root
(sqrt x)
- Exponentiation
(expt base exponent)
- Logarithms
(log x)
,(log10 x)
- Rationalize
(rationalize x epsilon)
- Modulo
(modulo x u)
- Quotient, Remainder, Truncation
- Greatest Common Divisor
(gcd a b)
- Least Common Multiple
(lcm a b)
- Trigonometric
(sin ro)
(cos ro)
(tan ro)
(asin x)
(acos x)
(atan x)
- Hyperbolic Variants
- TODO Constants
most-positive-fixnum, most-negative-fixnum
- machine registers / numbers
- arbitrary sized … called big numbers or bignums
- pi, e
- (inexact-pi), pi
- (inexact-e)
- (inexact-log10E): = (= (inexact-log10E) (/ 1 (log 10))) => #t = [XXX in practice it returns false: but we see the same sequence of digits for both expressions - this makes you think about rounding errors and else]
- infinity, zero (as in analysis with limes)
- Infinity: +inf.0, Minus Infinity: -inf.0
- Zero and zero plus zero: 0.0, zero minus zero: -0.0
Examples:
- (/ 1 0.0) => +inf.0 - (/ 1 +inf.0=) => 0.0 - (/ -1 +inf.0=) => -0.0 - (> +inf.0 n) => #t [when n is not +inf.0]
(/ +inf.0 +inf.0) => +nan.0
(should be or equal undefined???)Note:
-nan.0
becomes+nan.0
(when reading in constants from source files)
- expt, integer-expt
- Zero by zero
There is some clash of rules for the special case n = 0: the system routines for floating point calculation usually follow the rule n0 = 1 where the integer routines know that multiplying nothing no times as in 00 won't result in anything but nothing. If this behavior should ever become a problem we will declare this special case to beundefined (where now it isn't well specified) and just throw an error for 00.
But if you have special prerequisites for you're own calculations expt can easily be redefined (see VSI-core/number.scm).
(expt 0 0) => 0 (expt 0 0.0) => 1.0 (expt 0.0 0) => 0.0 (expt 0.0 0.0) => 1.0
- Zero by zero
2.4.4 Vector
A vector has a fixed length. Each slot of a vector can be addressed by its index from 0 to length - 1.
Read and write syntax: #(1 a "characters in this example" 2) That is a hash followed by an opening paranthese and literal constants describing the elements of the vector starting from the slot with the index 0 upwards until a closing parenthese closes the vector. The length of the vector corresponds to the number of elements described.
- Equality
Two vectors are considered eq? when they denote the same memory object. Two vectors are considered equal? if they have the same length and all their elements are pair wise equal?.
- (vector . elements)
- (vector? object)
- (vector-length object)
- (list->vector list)
- (vector->list vector)
- (vector-ref vector index)
- (vector-set! vector index value)
- (make-vector length . value)
Returns a new vector of the specified length with all slots initialized to value if specified.
- (vector-fill! vector value)
- (vector-copy vector)
- Vector Processing
- (vector-project! v i elts)
Projects the elements of the vector elts at the index i onto v.
- (vector-append . vectors)
- (vector-select v start length)
Returns the length elements from v starting at the index start as a vector of length length.
- (vector-for-each closure . vectors)
- (vector-map closure . vectors)
- (vector-project! v i elts)
2.4.5 Brass
- (equal? br1 br2)
Returns true if both brass objects are of the same length and consist of the same sequence of characters.
- (brass? object)
- (brass-length br)
- (brass . characters)
- (make-brass n character)
Returns a brass of the length n filled with character.
- (brass-copy br)
- (brass-subrass br i exclusive-end)
Return the a new brass consisting of the characters starting from the zero based index i up to but not including exclusive-end.
Thus the following expression returns a full copy of a brass object: (subrass br 0 (brass-length br))
- (brass-append . br)
- (brass-ref br i)
Returns the character at the zero based index i from the brass.
- (brass-set! br i character)
- (list->brass list)
Creates a brass from the characters in list.
- (brass->list brass)
Creates a list of characters from the brass.
- (brass->byte-string brass)
TODO Say sth about the encoding here, please.
- (byte-string->brass brass)
TODO Write sum about the encoding here, please.
2.4.6 Byte Strings
See Data FLow: Byte Communications for input and output facilities which default to byte strings.
While byte strings are part of the ANS they are out of its scope: XXX delete this paragraph a byte has eight bits. A byte string without length or with a length of zero is to be considered a protocol violation.
Read and write syntax: #"12 34 0 0 0 0 0 1" that is a hash followed by an opening quotation mark [C0 quotation mark]. From here on numbers only describing a byte each with a value from 0 to 255 seperated with white space. TODO de-extend reader to support byte strings correctly. Then a closing quotation mark. [Possible extensions: support for hexidecimal, binary and C0 components].
Byte strings start at the low memory address and end at a higher memory address. This doesn't effect and isn't affected by the byte order or endianess in effect. That is they describe a range of the computer memory (probably virtualized) starting at the byte-string-address and ending right before the sum of the start address plus the byte-string-length of the byte string. Thus the maximum offset is equal to the length minus one as an empty byte string is considered a protocol violation like an empty vector of length zero. [XXX Currently only the reader fails for empty byte strings but it should accept them as it accepts empty strings and vectors. Even malloc plays that game …] "The exact contents of the byte strings created may and should vary with machine variations. Byte strings should never be a requirement for implementations of algorithms if not for special needs mostly related to interaction with current operating system environments, data type specifications in networking and IPC protocols and other third party standards and conventions."
Routines on byte strings are non destructive: those which modify their arguments directly have a bang or exclamation mark at the end of their name. Except for byte-string-set! all routines have a specified return value. Some routines are only defined for byte strings of certain lengths like one, two, four and eight.
- Byte String Handling
byte-string?
byte-string-length
Returns the length of the byte string in bytes.
byte-string-address
Returns the data segment address as a byte string.
(make-byte-string length)
Returns a new byte string of the specified length with unspecified contents.
byte-string-clear!
Clear the byte string by filling it with zeroes and return the argument.
byte-string-adjoin
Join all arguments into one new byte string whith a length equal to the sum of the lengths of all arguments:
=(byte-string-adjoin #"0 1" #"1 0" #"0 0" #"1 1") => #"0 1 1 0 0 0 1 1"=
byte-string-copy
Return a copy of the argument.
- Equality
- equal?
Two byte strings are equal if they have the same size and all bytes are equal pair wise for each offset.
- eqv?
Currently undefined (or equal to equal) but future revisions will probably introduce the conept of a "byte-mapping" which is like a byte string describing some arbitrary memory region which will be considered eqv? to another byte string or byte mapping if they denote the same memory region, i.e. the same address and the same length.
- eq?
The same memory object.
- equal?
- Conversion Routines
- Numbers
byte-string->number
,byte-string->number+sign
Convert a byte string to a number or to a signed number. Defined for byte strings with a length of 1, 2, 4 and 8 bytes.
(number->byte-string number length)
Convert a number to a byte string of a fixed length, i.e. one, 2, four or eight bytes.
number->byte-string-8
,number->byte-string-4
,number->byte-string-2
,byte->byte-string
(number+sign->byte-string number length)
Selects one of the following routines:
number+sign->byte-string-8
,number+sign->byte-string-4
,number+sign->byte-string-2
,byte+sign->byte-string
The same number conversion for bit fields to be interpreted as signed numerals.
Other number representations in use: [YET TO COME]
byte-string->real
,byte-string->complex
byte-string->float
Please note currently VSI-C only uses the double precision floating point format and a single precision floating point number representation will be casted to double precision.
real->byte-string
, =complex->byte-stringfloat->byte-string
The resulting lengths of the byte strings can only be predicted by using size-of-real etc.
Then there are other computing registers out there with different sizes (e.g. 32 byte) and switchable modes of operation. Often an operation can be instructed once to be performed on all those registers. Due to my lack of experience with the use of these registers I don't suggest any naming conventions. But a Scheme vector is a collection and not numerical.
- Booleans
If the numerical value of a byte string equals zero then that byte string is considered to represent falsity. Otherwise it is considered to be true.
byte-string->boolean
proposed:
(bit-and? bits mask)
- Lists
A single byte is represented by a number in the range from zero to 255. Yet to come:
This corresponds
to a character in Unicode C0 (restricted to the range of code points from 0 to 127) and such characters can be composed to a byte string without further encoding leading to an UTF-8 string.byte-string->list
list->byte-string
- Numbers
- Machine Dependent Data Type Sizes
These procedure return just that and take no arguments.
size-of-pointer
,size-of-short
,size-of-int
,size-of-long
,size-of-long-long
,size-of-double
,size-of-float
- Selecting & Setting Bytes in Byte Strings
(byte-string-select byte-string offset length)
Returns a new byte string (copy not mapping) of the area from offset to offset plus length if these are within the byte string argument or throws an out of range error if the parameters don't fit.
(byte-string-set! byte-string offset byte-string) => unspecfied
Sets the last argument at the specified offset in the first argument. If parameters don't fit an out of range error is thrown.
- Bit Operations
The following operations are defined for byte strings with a length of one, 2, four or 8 bytes except for bit-not which takes a byte string of an arbitrary length.
- Bit Shifts
bit-shift-left
,bit-shift-left!
,bit-shift-right
,bit-shift-right!
Usage:
(bit-shift... byte-string offset)
Shift bits left or right by offset bits and return the result. Right means lower the value of the byte string (if interpreted as an unsigned number) - left means raise the value no matter on what kind of machine the program is evaluated.
- Bitwise Logical
bit-and
,bit-or
,bit-xor
All arguments must have the same length. The returned byte string is guaranteed to be a new memory object even for identity operations.
- and sets the bit in the result only when both operand bits (in the same position) are set
- or sets the bit if one of the operand bits in the same position is set
- xor sets the bit if one of the operand bits is set in that position but not if both are set
bit-not
,bit-not!
Takes one byte string of an arbitrary length, switches every bit and returns the modified argument. bit-not is just the same but doesn't modify the argument (and returns a new byte string).
- Bit Shifts
- Byte Order and Bit Order
Big endian and little endian machines have reversed bit orders. For this one has to take care of byte and bit ordering depending on the application: if each byte has been submitted individually (e.g. over network) it's "enough" to reverse the sequence of bytes to restore the original value. But when the data has been transferred on a storage device, i.e. as a file, one might even need to reverse the whole bit order of one datum composed of bytes. (See also Machine Dependent Data Type Sizes)
Byte Order conversion between Big Endian and Little Endian
- one byte:
:... .... -> :... ....
(identity function)as a byte string on a big endian machine: =#"1" -> #"1"=
- two bytes:
:... .... .... .... -> .... .... :... ....
(byte swap)as a byte string on a big endian machine: =#"1 0" -> #"0 1"=
Bit Order conversion between Big Endian and Little Endian
- one byte:
:... .... -> .... ...:
(bit reverse)as a byte string on a big endian machine: =#"1" -> #"128"=
- two bytes:
:... .... .... .... -> .... .... .... ...:
(bit reverse)as a byte string on a big endian machine: =#"1 0" -> #"0 128"=
Detecting the endianess of the current system: you'll need to test for a specific number and a presumption of one of its representations as a byte string to test for the endianess of the current system. The result of such a test during the initialization of the interpreter is available as a constant:
system-is-big-endian
is true if the most significant bit comes first on the hosting system.bit-reverse
,bit-reverse!
Reverse the bit order of the given byte string and return the result. (bit-reverse #"1 0") => #"0 128"
The bit reverse routines are currently only defined for byte strings with a length of 1, 2, 4 or eight bytes.
byte-reverse
,byte-reverse!
Reverse the byte order of the given byte string and return the result. (byte-reverse! #"0 8 149 91") => #"91 149 8 0"
The bit reverse routines are currently only defined for byte strings with a length of 1, 2, 4 or eight bytes.
- one byte:
- Network Byte Order
The network byte order is defined to have the most significant byte first which equals a byte wise communication of a big endian machine.
(network-byte-order byte-string)
,(network-byte-order! byte-string)
Convert the byte string from host to network byte order and return the result.
(from-network-byte-order byte-string)
,(from-network-byte-order! byte-string)
Convert the byte string from network byte order to host byte order.
2.5 Environments, System Environment and Name Spaces
2.5.1 System Environment
define, set!, %define-global, defined?
Implicit lookup for most symbols in read, eval and other procedures.
2.5.2 Name Spaces PRELIMINARY [commented; REVIEW, please]
(namespace? object)
Name Spaces relate to environments like elements do to a list. That is: an environment can be considered a sequential list of name spaces. In regular use the textual environment should be reflected by use of namespaces. The common exception to this is namespace-load which breaks the current textual coherency by inclusion of distinct textual units pretty much like an encyclopedia consists of several books (from A-C and C-M etc) referencing each other. [That describes load.]
A name space is a set or collection of bindings (with appropriate memory locations).
??? Should we go the whole way? Recognize namespaces as a fundamental primitive concept and explain or even define scope, object orientation and module system (i.e. source mangement) by name spaces. Should we allow to decompose an environment into a list of name spaces (i.e. name space descriptions)? We have that with the implicit begin and the implicit apply - maybe inner defines are better off when redefined to rely on implicit namespaces?
A name space is created by evaluation of expressions. [You couldn't even call that implicit lookup and explicit accessors like set! accessors on the level of car or vector-ref. Maybe that thing is really that fundamental? Isn't that on the fly extendibility another important property? Where any define will introduce a new binding if necessary. But this is restricted to the creation phase, i.e. evaluation. Best practice for other interpretations which can't do it on the fly is to provide the maximum number of locations (<- conditional definitions and fastlocs)]
[The Hyperspec definition of namespace really lacks this: the real thing.]
- Creation
- Conversion & Advanced Manipulation
While namespace objects themselves are considered immutable a lot of possible usage cases rely on the possibility to change symbols, filter bindings and else. All of this can be achieved by converting the name space to an association list and creating a new name space from an association list.
(namespace->alist namespace)
Convert the specified namespace to an alist with the symbols as the keys and the values.
(alist->namespace alist)
Create a new namespace object form the given association list. (The list structure itsself is currently incorporated into the namespace object but one shouldn't rely on this fact or the other.)
- Selective Use of Bindings in a Name Space
(from-namespace namespace symbol)
Lookup the specified symbol in the namespace and return its value or return undefined if there is no binding for the specified symbol in the given name space. NB undefined is just the regular symbol undefined here.
(call namespace procedure-name . arguments)
[This will be renamed and available as namespace-call but we will introduce a generic which needs to dispatch on type as an object-call should be called call, too. Even set! knows how to set variables and ilocs.] Lookup the procedure bound to procedure-name in the given name space and apply it to the arguments.
(namespace-import namespace)
Import all bindings from the given namespace into the current namespace or environment. This is only done once currently. ???
- Compatibility & Circumvention of Scope
The current namespace can be circumvented with %define-global wich is meant to create bindings directly in the system environment.
None of the core routines …
2.6 Formatting Textual Output
2.6.1 (format format-string . arguments)
TODO p means purge, please!!!
VSI> (format "Hello: ~6,3f~%" 1.2345678) "Hello: 1.235 "
- Any: A S
- Numerical: Exact Integers D X O B R
- Numerical: Inexact Numbers F E G $
- Numerical: Complex I
- Character: C
- Plural: P
(format "Word~P" 2) => "Words" (format "Part~@P" 2) => "Parties" (format "Part~@P" 1) => "Party" (format "Part~P" 2) => "Parts"
- Tilde: ~
(format "~~") => "~" (format "~10~") => "~~~~~~~~~~"
- Newline: % &
- %: like tilde
- &: like % but minus one if the current column is zero
- White Space Control: _ / |
with a numerical argument like ~
- _ space
- / tabulator
- | page break aka form feed
Ten tabulation controls in a row: (format "~10/") => "\x09\x09\x09\x09\x09\x09\x09\x09\x09\x09"
- Tabulate: T
- Continuation Lines: #\newline
Escapes the following newline to not be considered:
;; Regular newline (format " ") => " " ;; Escaped newline (format "~ ") => ""
- Modifiers and Parameters: @ : ' +-0-9 ,
- Q
Should be rationalize?
2.7 Control
2.7.1 Controlled Controlls [preliminary draft]
All of the following will [DRAFT] call / install a handler with system-control.
controlled-read (the next REPL/PREWL version might use this)
controlled-read-bytes, (controlled-write-bytes)
controlled-sleep, controlled-usleep
In single threaded programs program events like timers will only be handled when system-control is called. [That's just a select. Maybe it the current mix of conventions that resulted for read-bytes and write-bytes is better: a port argument means a controlled action, a file descriptor means an uncontrolled system call.
The variable system-control can be redefined with custom
procedures. System dependencies might need to be handled.
The regular control can be restored again with
(set! system-conrol default-system-control)
2.7.2 Sleep
2.7.3 Timer
sigalarm
2.7.4 POSIX Signals
- System Specifics
(system-signal-channel signal-mask)
Return a new file descriptor listening on the signals specified with signal mask. Returns -1 on error or throws for mischiefed arguments. Close the file descriptor to stop listening on these signals. The return value, the signal mask conventions and the protocol on the channel may be system dependent.
2.8 Data Flow
A channel currently denotes a byte level virtual communciation device. A channel is currently or a port or a file descriptor.
2.8.1 Channel
2.8.2 Ports
Ports need to be closed as they will remain in the port table as long as they aren't closed.
2.8.3 Byte Communications
(read-byte . port)
Read one byte from port (which can be a numerical file descriptor, too). In contrast to read-bytes this routine tries again after an interrupted read. See Control: POSIX Signal Processing [yet to come]. Returns the numerical value of the byte read, the EOF token if the end of file is encountered or falsity on error. [Subject to change: use CLR3 error objects instead]
(read-bytes n . port)
Try to read n bytes from port or from the current input port if no port is specified. The optional argument port can be a numerical file descriptor or a port. For a numerical file descriptor the behavior corresponds to the underlying call to POSIX read(2). Returns a byte string which may be shorter than n containing the bytes read. If the end of file is signalled during the read the EOF token is returned. For any error it returns falsity.
Please see the corresponding operating system documentation for now.
(write-bytes byte-string . port)
The optional argument port can be a file descriptor or a port. If no port argument is given the routine defaults to the use the current output port. The return value is the length of the byte string for a non numerical port argument. For a numberical file descriptor the behavior corresponds to the the underlying call to POSIX write(2). If port is a numerical file descriptor it returns the number of bytes written or falsity on any error.
(send-bytes channel byte-string)
Uncontrolled send until all bytes have been written. Throws on error or returns boolean true on success.
2.9 List Processing TODO
Lists are a derived data type. Somehow strange: their "type tag" the empty list or end of list is only found in the cdr of the last pair.
2.9.1 Predicates
(list? object)
Returns true if object looks like a list. Else falsity. An object is considered some sort of list if it's either the empty list or a pair whith another pair or the empty list in the cdr of the cdr of object.
(proper-list? object)
Does time intensive checks to find out wether the argument object is a proper list or not and returns true if so. A proper list is either the empty list or a sequence of pairs following their cdr/s to eventually end with the empty list (null token) in the /cdr of the last pair.
(circular-list? object)
Returns true if object happens to be a circular list. Else falsity.
(dotted-rest-list? object)
Returns true if object is a proper list except that it doesn't end with the empty list but with some other token or value in the cdr of the last pair. If list is a proper list it returns falsity and if the list is circular an error is thrown.
2.9.2 list-contains?
(list-contains? equality? object list)
The predicate list-contains? returns falsity if object is not part of the list according to the predicate equality? which needs to take at least two arguments. If object is part of the list it returns the remaining list with object as its first element.
2.9.3 Selection
- (list-pair i list)
Return the pair of index i from the list. Returns an empty list if the list ends before the specified index could be found.
- improper lists: error on premature non pair cdr
- circular lists: endless loop
- (list-ref list i)
Return the element at index i from the list.
- (list-head n list) or (take n list)
Return the first n elements of the list or all of the list if its length is smaller than n.
- (list-tail n list) or (drop n list)
- (take! n list)
- (drop-right! n list) (drop-right n list)
Return a/the list w/o the last n elements if applicable. (The right thing would be to not use these procedures at all.)
2.9.4 delete-duplicates, delete-duplicates!
Please note: these variants tend to delete the first duplicate entry. If compatibility with older functions is needed or some pseudo order needs to be maintained sort might be a good start.
2.9.5 Filter and Delete
XXX filter pred list …
(delete! object list . equality?)
(delete-duplicates! list . equality?)
2.9.6 fold
Repeated application of a procedure on arguments and its return value starting with an initial value. You can use this to easily implement mathematical summation formulas and similar with list processing.
(fold initial-value list-of-arguments)
(fold my-plus 0 '(1 2 3 4 5)) => => (my-plus (my-plus (my-plus (my-plus (my-plus 0 1) 2) 3) 4) 5) (define quantifier (lambda (list-of-booleans) (fold and #t list-of-booleans)))
The arguments in the list are processed left to right.
2.9.7 new non-sense
interleave
2.9.8 compatibility non-sense
dotted-rest-list
aka cons*
aka list*
(See your favorite Common Lisp manual for list*)
2.10 Data Associations
2.10.1 alists (see elsewhere)
2.10.2 Dictionary Datatype (preliminary)
Use (sys-load "VSI-core/dictionary.scm") to use this data type.
There is an implementation of a dictionary data type allowing to store information called definitions uniquely associated to pieces of (probably) text called strings. This data type allows to effectively implement applications which would be considered dictionaries but it doesn't deliver any pre-fabricated lemmas nor any pre-fabricated way to load or store dictionaries. [Depending on the data contained in the definitions this can be achieved with about 6 lines of code. See the tokenizer-2 example] For linguistic tasks you'll probably need a tokenizer, too.
Thus you shouldn't call it dictionary. Let's call it: DINOSAURS! or Acyclic Directed Graphs.
While this implementation is designed to work with strings extensions to work with symbols or adaptions to any other system with tokens should be easy. Keep the alfabet small and it'll be fast.
- Generator & Type Predicate
(make-dicionary)
(dictionary? object)
- insert-definition!
(insert-definition! dictionary string definition)
Inserts a new entry for string and definition. Any previous definition associated with string will instantly be gone.
Please consider using a unique recognizable data type syntax for the definitions just like you would for the values in association lists. For example: (insert-definition! D "hallo" (cons 'definition 5))
(See also lookup-definition)
- lookup-definition
(lookup-definition dictionary string)
Returns the definition of the entry of string.
If string cannot be found at all falsity is returned. If the string is contained in the dictionary (perhaps as a substring of an entry) the symbol undefined is returned. XXX subject to change (See also insert-definition!)
- lookup-prefix
(lookup-prefix dictionary prefix-string)
Returns the definitions of all entries which start with prefix as an association list of strings and definitions.
PROSPECTIVE While regular expressions usually turn things around and scan texts for patterns one might see the similarities and the advantages of extracting entries or definitions matching a string with wild cards following some pattern notation system.
- dictionary->alist
(dictionary->alist dictionary)
Returns an association list consisting of the strings and the definitions for each entry in the dictionary in an unspecified order.
- number-of-definitions
(number-of-definitions)
Returns the number of definitions contained in the dictionary.
- describe-dictionary
(describe-dictionary dictionary)
This returns an association list describing the dictionaries properties: currently: number-of-definitions and number-of-nodes. Please refer to your favourite description of acyclic directed graphs to find out about the meaning of nodes which internally are also called entries with or without associated definition.
- scan-dictionary (preliminary)
(scan-dictionary dictionary predicate?)
This was meant to realize reverse lookups but should be suitable for some other tasks, too. See the tokenizer examples. [XXX this suits our needs where the definition contains string of the entry again but for regular use it would be better to pass that string of keys to the predicate. Additionally predicate? could even be used for its side effects as it will be called for each entry with a definition.]
- (Entries)
These routines are not considered part of the public interface as they are considered internal routines implementing most of the functionality. Nonetheless it's sometimes just useful to retrieve the node of the entry instead of the associated definition. See the source code and especially create-entry! as well as get-entry.
- Implementation
The dictionary data type uses an acyclic directed graph of entries to store definitions and strings. The string is used a sequence of characters or keys. When extracting an entry for a given sequence of keys the implementation works like an old lock with a dial which opens after the correct sequence of numbers has been input: we start at the root of the dictionary which is an empty entry and compare the first key to each successor of the root until we have found the matching one. Then we repeat the same process with the successor as the current branch instead of the root. This way we eventually arrive at the requested node and return it as the entry respectively the associated definition. A lot of nodes or entries won't have an explicit definition associated but instead hold the symbol 'undefined' in the place of the definition. When inserting a definition for a given string we do just the same - but when no matching successor can be found a new one is created for the current key until all keys have been used that way and then we can associate the definition to the last entry created.
2.10.3 Hashtables
Use (sys-load "lib/hashtable.scm") to use this data type.
Associate numerical integers with values.
Why is it called hashtable?
Let the vertical intact thing be an assocation list: |- |- |- |- |- |- |- |- |- |- |- |- |- |- hashtable handle |- \ |- index-vector ###################### hash function |- |- |- |- |- |- |- |- |- |- |. |- |- |- |- |-
Hashtables are especially useful for disliked data and indexed data where the index has too much spread but doesn't need to. We currently don't offer any pads, toppings or taco shells with the results.
- (make-hashtable size-of-index)
- (hashtable? obj)
- (hashtable-length hashtable)
Returns the number of alist elements in the hashtable.
- (hashtable-insert! hashtable key value)
The key needs to be a numerical integer value by default.
- (hashtable-lookup hashtable key)
The lookup returns falsity #f if no entry could be found for key.
- (hashtable->alist hashtable)
- (describe-hashtable hashtable)
2.11 Vector Calculus
vector-plus
, vector-minus
vector-dot
, vector-magnitude
vector-scale
vector-cross
symbolic-vector-calculus
(private prefix %vc-
)
2.12 Sorting
2.12.1 sort
(sort collection comparision), sort!
Sort the collection according to comparision which takes two arguments and determines their order by returning falsity if the first argument isn't considered "lesser" than the second argument. The regular variant returns a full copy of the original collection which can be a vector or a list and there is destructive variant: sort!.
(sort (list 5 2 3 1 3) <) => 1 2 3 3 5 (sort! (vector 5 2 3 1 3) >) => 5 3 3 2 1
2.13 Byte Structures
system-type-size
bs-length
2.13.1 Byte Structure Description
bs-description
2.13.2 Accessors
define-bs-accessors
byte-structure-getter
, byte-structure-setter
2.13.3 Functional Interface
byte-structure->list
call-with-bytes
values->byte-structure
2.13.4 Bit Fields as Flag Sets
TODO
2.14 Data Structures
Structures are vectors with named slots. The term struct is often used in an ambigue way, ie. as well for the structure layout as well as for the instances of such a layout. Slot names are also called labels and should be symbols.
2.14.1 (struct . slot-names) => struct-layout
2.14.2 (structure-layout labels)
2.14.3 (allocate-struct struct-layout)
2.14.4 (is-a? object data-type)
Is object an instance of the specfied data-type? [This predicate might be extended to support more data types than structure layouts only.]
2.14.5 (slot-ref struct slot-name)
2.14.6 (slot-set! struct slot-name value)
2.14.7 (struct-from-values struct . values)
Use with care: any expression using struct-from-values needs to be changed when the structure layout changes.
2.14.8 (values-from-struct struct)
Return the values from the structure in the order specified by the structure layout.
2.15 Double Linked Lists (D-Lists)
See also the implementation of D-Lists in VSI-core/structures.scm. D-Lists use a handle to keep state of the head, tail as well as the current element. The current element is called "current" in the structure layout and can be accessed and changed as such. The predefined accessors usually return the value and not the element.
XXX TODO the structure layout is called "dlist" - use "dlist-layout" or sth sim
2.15.1 (make-dlist)
2.15.2 (dlist? object)
(is-a? object dlist)
2.15.3 (dlist-start? dlist)
Is the current element the first element in the list.
2.15.4 (dlist-end? dlist)
Is the current element the last element in the list.
2.15.5 (dlist-head dlist)
Returns the first value in the D-List.
2.15.6 (dlist-tail dlist)
Returns the last value in the D-List.
2.15.7 (dlist-next dlist)
2.15.8 (dlist-prev dlist)
2.15.9 (dlist-add! dlist value)
Add an element containing value after the current element.
2.15.10 (dlist-insert! dlist value)
Insert an element containing value before the current element.
2.15.11 (dlist-remove! dlist)
Remove the current element of the D-List. This returns true if an element was removed or falsity if the list was empty, ie. the current element denoted dnull.
2.15.12 (values-from-dlist dlist)
Return all values in the D-List as a regular list aka values.
2.15.13 (dlist-from-values values)
Create a D-List from the given values (which really need to appear as one argument).
2.16 Memory Management
Work in Progress
2.16.1 General Properties
There is a new memory management system which opens new possibilities in the long run but also has some immediate limitations:
- Circular data structures need manual intervention to be released: Any data graph forming a ring that should be released and reinserted into the reserve of available memory locations needs to be cut with the respective accessors like set! or set-cdr! to not form a ring anymore.
- TODO CLRs throw / long jump properties, inherent rings, concurrency
- Very long lists may provoke segmentation faults: due to stack and memory limits in effect (see Memory Limits) excessive memory consumption may provoke an ungraceful program exit: very long lists (with a length of 10 million and above depending on system limits and installed memory hardware) may provoke excessive stack usage as the current routine to release and reinsert memory locations works recursively and needs additional stack space. If that routine hits an installed system limit on stack space or else the process will be terminated by the operating system. One way to avoid this is to segment the data structure at hand and to destroy it step by step.
One immediate advantage of the new system is scalability: the more overall memory consumption rises by use of a lot of data structures the better the new system competes with traditional approaches. Furthermore it doesn't interfere with the "continuity" of program evaluation (or process execution).
- Special Considerations
- Rings
Data Graphs with Rings in them, i.e. cyclic data graphs cannot always be released automatically. Every ring can be cut by regular means in Scheme, e.g. set!, or the means are to be provided. Some inherent rings of common language constructs are cut automatically. See also Counting Lambda Revision IV.
Doubly linked lists would need a special cutting routine voiding (cyclic) back links to avoid memory leaks before voiding the last binding which provokes a release of the data graph. These usually only need to be considered for system with load. Simple scripts and REPL transcripts probably don't need to care about memory leaks and else.
- Throw & Catch
Every throw leaks memory. (Yes, that's all of SCM again if you want to change that completely).
- Rings
2.16.2 Memory Limits
The total amount of the virtual address space of an interpreter process is currently limited to 1 GB. A future extension will allow to change that limit and other process limits.
2.16.3 Experimental: Memory Backup Segment
There is a new experimental memory backup system: in an out of memory situation a 5 MB backup memory segment is opened once to enable a graceful program exit. Inner or private system structures may remain damaged and one shouldn't do anything else then quit the program as soon as possible. Error notices, automatic saving of data and other final actions should rely as far as possible on prepared data structures from before the exception.
There is currently no way to allocate another backup segment and a consecutive memory allocation failure will probably terminate the interpreter process in an ungraceful way, i.e. without any possibility to save data to permanent storage or to inform the user about the situation.
This system doesn't work for excessive stack space usage as provoked by non tail recursive function calls.
2.16.4 Memory Object Statistics (Subject to Change)
(global-object-count->alist)
(global-call-count->alist)
(global-object-delta)
(global-call-delta)
(trim-saturation-map)
2.17 Scheme vs POSIX
2.17.1 Terminological Collisions
- Data Flow Port vs TCP/IP port
- Textual Environment vs POSIX environment
2.17.2 Conceptual Differences
- Error "Signalation"
- Data Flow Port vs File Descriptor
2.18 Portable Operating System Interface
This documentation is INCOMPLETE WORK IN PROGRESS: please double check with the operating system manual as well as with the SCM source code.
- A POSIX environment refers to a collection of strings in the form of "variable=value".
- Some system calls described are restricted by standards or other security measurements.
Please note: the file system on POSIX like operating systems is usually used as the name space of the operating system which starts with the top directory entry called root. The term "device" usually refers to logical devices where one physical storage device may be partitioned into several logical devices or several physical devices could have been combined into one logical device but sometimes a one to one mapping is still found.
mode: file access bits as a byte string file: a port or a numerical file descriptor (or a string???) path: string (will become brass)
2.18.1 System Time
- Run Time [deprecated]
This interface will be replaced with a new interface for getrusage which provides the needed microsecond resolution.
- (get-internal-real-time)
- (get-internal-run-time)
- (times)
Returns a vector with information about the system run time in system clock ticks:
- 0: real-time (delta since initialization of the interpreter) ??? These aren't clock ticks!
- 1: user-time (time spent by the interpreter on processors)
- 2: system-time (time spent by the system on behalf of the interpreter)
- 3: subprocess user-time: the user-time of subprocesses of the interpreter process
- 4: subprocess system-time: the system-time of subprocesses of the interpreter process
Please revert to your system manual / source code for the exact meaning of those values. To compute the corresponding value with the unit of seconds (and its magnitues) you can use the constant system-clock-ticks-per-second (usually only 100 per second).
There is also another constant internal-time-units-per-seconds which is deprecated.
Please note: times for subprocesses are only reported after the collection of these with waitpid, e.g. by calling cancel-all-parallels in the parallels example or with the retrieval of the (final) return value in the subservice example.
- (get-internal-real-time)
- Calendar & Clock Time
Clock vectors consist of integers. Only slot #10 (index 9) contains a string describing the time zone in effect for this vector.
- 0: seconds
- 1: minutes
- 2: hour
- 3: day of month
- 4: month (zero to eleven)
- 5: year (offset to 1900)
- 6: day of week (sunday = 0 to saturday = 6)
- 7: day of year (0 to 365)
- 8: isdst (daylight saving time; see the system manual)
- 9: offset to GMT (in seconds)
- 9: time zone
- (current-time)
Return the number of seconds since the defined epoch (1970).
- (gettimeofday)
Like current-time but returns a pair with additional milliseconds in its cdr.
- (localtime epoch-time . zone)
Returns a clock vector calculated from epoch-time in the time zone described by the string zone if specified or in the default time zone of the hosting system.
- (gmtime epoch-time)
Returns a clock vector in the time zone "GMT".
- (mktime clock-vector . zone)
Calculates the epoch time for the given clock-vector respecting the XXX redundant argument zone as the time zone.
- (strftime format-string epoch-time)
Convert the argument epoch-time (see current-time) to a string according to the given format in format-string.
- (strptime format-string formatted-time-string)
Convert the argument formatted-time-string to an epoch time according to the specified format-string if applicable.
2.18.2 File System
- File Mode and Opening
- (chown file owner group)
Numerical owner and group.
- (chmod file mode)
- (umask . mode)
- (open-fdes path flags mode)
- (open path flags mode)
(sys-load "VSI-core/posix.constants.scm") (define f (open "tst2" (bit-or O_CREAT O_RDWR) (bit-or S_IWUSR S_IRUSR))) (display "test" f) (newline f) (close f) => #t (quit) => mt@nPong:~/Desktop/alfa$ ls -l tst2 -rw------- 1 mt mt 5 nov 2 11:52 tst2 mt@nPong:~/Desktop/alfa$ cat tst2 test
- (stat file)
Returns a vector with fifteen slots containing information about the argument file.
- 0: device number (of the containing storage device)
- 1: inode
- 2: mode bits (type & file access bits)
- 3: number of (hard) links
- 4: user identification number
- 5: group id
- 6: device number if file denotes a device
- 7: size in bytes
- 8: last access time
[Check this please: struct timespec vs scmfromulong - that's statim vs statime which is next to bit rot.]
- 9: last modification time [see above]
- 10: last creation time or last status change [see above]
- 11: blocksize for filesystem I/O (???)
- 12: number of blocks used for allocation (512kB the manual said???)
- 13: some mock-up of going symbolic for Scheme here: file type bits becomes one of '(regular directory symlink block-special char-special fifo socket unknown)
- 14: mode bits (file access bits)
- (chown file owner group)
- Modifying Directory Entries
- Reading Directories / Directory Streams
Directories used to be read out like any other file to retrieve the entries but at some point they started to make a big fuzz about the distinction and now here they are: directory streams. (Where the term rewind only makes sense when you know about "braids" or tapes holding data. What was an endless loop then?)
- Navigating File Systems
- Control
- select (XXX this isn't considered part of the FS multiplexing API)
TODO
(select read write exceptions timeout-seconds timeout-microseconds)
XXX some test
VSI> (define p (current-input-port)) (define p (current-input-port)) => #<unspecified> VSI> (begin (repl-drain-input p) (select (list p) '() '() 0 0)) (begin (repl-drain-input p) (select (list p) (quote ()) (quote ()) 0 0)) => (() () ()) VSI> (begin (repl-drain-input p) (select (list p) '() '() 100 0)) (begin (repl-drain-input p) (select (list p) (quote ()) (quote ()) 100 0)) => j ((#<input: standard input /dev/pts/1>) () ()) VSI> j => throw catched: undefined: Unbound variable: (j) VSI> (begin (repl-drain-input p) (select (list p) '() '() 5 0)) (begin (repl-drain-input p) (select (list p) (quote ()) (quote ()) 5 0)) => (() () ()) VSI>
- fcntl [Experimental]
See the comments in the source code.
- (fsync file)
Writes all software buffers to the physical device if applicable. The return value is explicitely unspecified.
- select (XXX this isn't considered part of the FS multiplexing API)
- Symbolic Links
- (symlink original-path path-of-link)
unspecified or throws XXX CLR3 error objects
- (readlink path)
Return the original file system entry the symbolic link at path points to.
- (lstat path)
Like stat but it doesn't follow symbolic links and instead returns information about the symbolic link if applicable.
- (symlink original-path path-of-link)
- File Copy: (copy-file from-path to-path)
A simple copy routine overwriting whatever to-path was. This simply breaks on error and interrupted writes and doesn't provide any possibility for feedback during the operation.
- Paths
- File System Specials
2.18.3 Processes
- (system command-string)
- (primitive-exit . status)
There is also primitive-exit [sic!] to be renamed.
- (primitive-fork)
- (execl filename arguments)
- (execlp filename arguments)
- (execle filename environment arguments)
- (kill process-id signal-id)
Sends the specified signal to the specified process.
- (waitpid pid options)
Returns a pair with a success/error value in its car and a status word in its cdr.
- (getpid)
- (getppid)
Identification number of the parent process.
- (nice number)
- (getpriority which who)
- (setpriority which who priority)
2.18.4 Signals
See also VSI-core/posix.sigset.scm.
2.18.5 Users & Groups
2.18.6 Environment & Host
2.18.7 (70s) Job Control & Session Support
2.18.8 Security Extensions from the 70s
2.18.9 Socket
Some note regarding the use of sockets in networks with VSI: The current implementation is only intended to be used in isolated networks (localhost, LAN). Interoperability with actual internet services is completely untested.
- Internet Addresses
The domain mismatch number instead of bits and bytes seems to stem from the original interface before the acceptance of the C domain as such.
- Generator: (inet-makeaddr network-address local-network-address)
- (inet-aton address-string)
(printable) address to number
- (inet-ntoa number)
number to (printable) address string
- (inet-netof address)
deprication?
- (inet-lnaof address)
somewhat deprecated
- (inet-pton family address)
- internet address: printable to number
- scm byte order confusional
- (inet-ntop family address)
- internet (address) number to printable
- scm byte order confusional
- Generator: (inet-makeaddr network-address local-network-address)
- Generator: (socket family style protocol)
- (socketpair family style proto)
- (getsockopt socket level name)
- (setsockopt socket level name value)
- (getsockname socket)
- (getpeername socket)
- Connections
- Data Flow Extensions
2.18.10 Networking
2.18.11 Terminal Application Support
control, terminal-full-reset, CSI,
- Input Echo
terminal-SRM-echo-off, terminal-SRM-echo-on
Example: XXX defunct?
(begin (SRM-echo-off (current-output-port)) (display (read)) )
The REPL restores the local echo setting as it is crucial for its operation.
- Cursor Control
position-cursor, save-cursor, restore-cursor, cursor-home, cursor-up, cursor-down, cursor-forward, cursor-backward
- Erase
erase-line, erase-display
- Character Attributes (Colors)
inverse, color, reset-colors
- Modifier Keys (Alt, Meta)
CSI-do-send-meta-escape, CSI-do-send-alt-escape CSI-dont-send-meta-escape, CSI-dont-send-alt-escape
- Terminal Information
get-dimensions, get-width, get-height
- Beep
beep (n/a)
- Modes: Command Line (cooked) mode, Application (raw) Mode
tty-raw-mode, tty-cooked-mode
- VSI-C low level primitives
(see also Viper's Terminal Application Driver)
%terminal-get-win-size <limited-integer file-descriptor>
Return the (values width height pixel-width pixel-height) for the terminal referenced by the file descriptor if possible. Else return false.
%terminal-raw-mode
,%terminal-cooked-mode
both return false on error … XXX TODO
2.19 Implementation Specifics
2.19.1 (immediate? object)
3 Transformations and Macros
Macros and other transformations are implemented with syntax transformers. These work "orthogonally" to regular evaluation and are procedures which receive the expression to be transformed (usually decided by the first constituent of the expression) and the current environment of evaluation. The actual transformation is achieved by using list processing routines as the source expression is delivered as a regular list.
There are four types of macros:
- "Syntax"
- Macro
- Memoizing Macro
- Builtin Macro
Whenever an expression with a syntax object as the first constituent is evaluated the associated transformation procedure is called with that expression as a list and the environment.
The return value of the transformation procedure is then treated according to the type of the syntax object.
3.1 Please note
All types receive the original data structure of the source expression and memoization can be done manually with regular list processing routines as well. Most of the time it's a good idea to use fresh data structures even if these should then be memoized into the code sequence.
Transformation procedures can be tested on quoted expressions with
a null environment. (The actual data graph of the environment and
the corresponding routines for lookup and modification aren't well
documented yet.)
(my-transformer '(my-syntax (1 1 1) abc) '() )
Transformation procedures with errors can lead to misleading error messages. The best way to avoid this is to include syntax checks and good error reporting in the transformation procedure.
Not every valid transformation is accepted and sometimes unexpected behavior can result when ignoring what probably had been considered implementation details of the interpreter.
The changes to the actual code sequences can be reviewed with procedure-code. It's counterpart procedure-source [currently] tries to unmemoize the code to return an expression which is similar or equal to the original source expression. Any change by a transformation procedure in Scheme will be visible, though.
3.2 Type 0: Syntax
(procedure->syntax (lambda (expression environment) ...))
The evaluation takes the return value of the procedure as the value of the expression. The transforming procedure is called on every evaluation of the expression.
3.3 Type 1: Macro
(procedure->macro (lambda (expression environment) ...))
Evaluation takes the result as the expression and continues with the evaluation if the result isn't an immediate. If the result is an immediate than that becomes the (current) value of the expression. This type is called on every evaluation of the original expression.
3.4 Type 2: Memoizing Macro & Type 3: Builtin Macro
(procedure->memoizing-macro (lambda (expressions environment) ...))
Warning: there is some shortcoming in the interpreter regarding memoizing macros.
If the result isn't a pair it is "quoted" in a sequence statement with begin. Then the original expression is replaced with the the return value by memoizing the result into the actual code sequence: the CAR of the original expression is set to be the CAR of the new expression and the same for the CDR. After that evaluation continues with the new expression. Due to the implicit memoization of the result the transformation procedures of these types are only evaluated once during the first evaluation of a source expression.
3.4.1 NATURE Memoizing Macro Functionality Degration & Resolutions
Sometimes syntax transformers of memoizing macros are called twice. This shouldn't happen and is considered a shortcoming of the current implementation. Severely intelligent people have failed to convey their ideas regarding syntactic transformations for decades because of bad documentation and this inherent flaw.
- Resolution #1
Macro expansion in mexpandbody should only effect macros of type 2 (memoizing macros). Such macros should or be prepared to be called twice or should memoize by themselves (setting the car and the cdr of the original expression which they receive as an argument). Therefore they can be substituted with type 1 macros which shouldn't be called twice w/o any further differences.
- Resolution #2
While mexpandbody allows for interleaved (inner) definitions and other expression in any body or forms it's called on, the standards of back then don't allow for that. There is no intention to change the current implementation of mexpandbody. Inner defines should be avoided or appear as requested in the standards. Only the next (metacircular) version could have introduced some sort of binder to handle inner defines with a primary primitive. See also VSI-core/namespaces.scm. (Original Intention)
The standards of back then told you to first write the inner defines and only after these come the expressions forming the body of the function.
- [ADDENDUM]
There are two constellations which might provoke similar effects which are caused by multiple expansion: one is a macro which uses a part of the original expressions more than once in the new expression. Calls to memory object generators and side effects could thus be multiplicated. The other constellation is macro expansion during macro expansion but without memoization or with old expressions kept in local state.
3.5 Afterthought 1: Macros by Template
(macro constituents template)
Please see the corresponding definitions in VSI core.
What about define-macro then? This non canonical form is gone but it should be easy to define it as a macro.
3.6 Afterthought 2: Pattern Languages
There is currently no support for any pattern languages in VSI. The old RnRS pattern language isn't supported for the following reasons:
- complexity of implementation
- hidden details (of list processing magic) in the ellipsis
Another approach might opt for a pattern language based on ever more detailed descriptions of the constituents until the list processing level is hit?
3.7 One Proven Transformation Protocol
In general more complex transformations need more thought and good error reporting can't be computed. The usual way to realize some transformation is the following:
- abstract notation of the transformation (see the file lambda.expressions for some examples)
- development of the transformer: it's better to test them as simple list processing routines in a Scheme session where expressions are delivered as quoted lists of symbols and values
- syntax object: choose the right syntax object type as described above and test the resulting syntax forms
4 TODO Classification
What we want to have is a classification which allows for future developments like first level user object classes, SCA data type implementations and backward compatibility with the current type code implementation.
We're opting for a hierarchical classification based on explicit distinctions composed of binary or boolean distinctions. (Hierarchies are artificial: they help reducing complexity by imposing order. It's not that they follow from nature. Whereas woven nets of protection often are grown complexities.)
A data type is defined by storage class and accessors. A data type with an explicit outer form is called a ???, a data type without might already be considered a remote object or non enclosed object (like a port).
4.1 Binary Distinctions
- SCM vs non-SCM (see also SCM2PTR and PTR2SCM)
- Tagged vs Untagged: cells and double cells are an untagged distinction (where that information is known for higher level tags in the car
- Internal vs External: (private vs public) there are internal data types like byte arrays (non-SCM) or winders (SCM) whereas variable objects (SCM) are available publicly even if they aren't of general interest.
- Immediate vs non-immediate: this equals immediate vs cell or double cell
- SCM only vs SCM+nonSCM: derived vs primary? closures vs strings
- Collection vs non-collection: vector vs string
- Global vs non-Global: symbols and associated variables are unique and aren't part of the cell recycling. Subroutines, too. Closures don't have to be unique even if their data graphs are equal.
- enclosed vs remote: ports have to be closed, windows have to be destroyed where enclosed data graphs vanish when the last reference is cut. Closed ports may remain as stale objects where a pair of a formerly enclosed data graph shouldn't reappear as a free cell. (BTW as long as you don't use it …)
4.2 Unknown Distinctions
- Numbers are considered pseudo-immediates, strings should
be but aren't. What is the difference? There is no set-floating-point-data! for numbers: they are considered immutable objects (on the Scheme level) even if they're implemented as non-immediates. Strings as well as Lists can be treated as such objects, too. But mostly because of original resource restriction (guessing) they're mutable data objects. Byte strings are currently always mutable even if they appear in the source code as literal constants which they should be then to avoid unexpected results.
- Collections and Data Graph Terminators (well known but
too much traffic here in my brain due to the importance for reference counting. Please try again later.
- Tokenizer examples: distinction between level C2 and C1
C1 is serializable, C2 isn't.
- Constants and Variables: everyone knows the difference
but this distinction doesn't exist in Scheme if not implicitly for literal constants which may even be copy on write.
4.3 Type Code Tag System Tree
(bit1 (0 (cell-pointer) (immediate integers) (tc8 (special objects / flags) (characters) (immediate symbols) (ilocs))) (1 (closure) (tc7 (symbol) (vector) (string) (port (tc16 port constructor)) (variable) (stringbuf) (byte-string) (number (tc16 (bignum) (real) (complex) (fraction))) (smob (free cell) (tc16 smob constructor)) (subroutines) )))
4.3.1 CEE tc7 bit printer
(for-each (lambda (p) (let ((label (car p)) (n (cdr p))) (byte-bit-printer n) (display " ") (display label) (newline))) (lookup-prefix SCM "scm_tc7_"))
4.4 X Window System Classes
Some new data types here to come.
4.5 POSIX classes
There are several namespaces for flags and their values which need to be "includable" into the one C namespace.
4.5.1 Networking Classes
(posix-network-object (address-family (unix) (ipv4) (ipv6)) (net-db-entity (host) (net) (protocol) (service) (socket-address???)))
Then there is also the regular network layer stack. Some call even asks for some layer number.
The network database entities are currently implemented as vectors without any tag or identifying symbol. See also VSI-core/networking.scm. *There is no associated print form (display) if not the one of the storage class vector which isn't considered appropriate.* Serialization should work the same as it does for the underlying storage class vector. Equality remains on the level of the storage class which might be acceptable.
The address families are represented by their POSIX symbol and the associated value.
4.5.2 IO or FS: Input Output vs File System
(file (socket (stream) (datagram)) (fifo) (regular) (character) (block))
Is it useful to classify a datagram socket as a file? (Yes, with byte strings again yes).
4.6 Existing Meta Classes
Ports and Smobs have meta object constructors which register new types with a distinct tag in their respective tc16 ranges. Both feature default interfaces. Description of these type generation machines, please!
4.7 Graph of Direct Accessors
- The plus sign "+" means these accessors compensate for the new offsets into the cell because of the byte word "minus one".
- Perhaps it would be more convenient to drop the "SCM_" before
"TRIM_"???
What should have meant SCMPACK which does nothing right now? Compiler warning suppression?
(SCM2PTR (SCM_TRIM..CELL_OBJECT TRIM..CELL_WORD.. [+ SCM_GC..CELL_OBJECT] SCM_TRIM_CELL_TYPE SCM_GC_CELL_TYPE [+ SCM_GC.._CELL_WORD..] <-unused? ([+ SCM..CELL_WORD..] {SCM..CELL_WORD_0 SCM..CELL_TYPE})) [+ SCM..CELL_OBJECT..] ([+ SCM_CELL_OBJECT_LOC] SCM_CARLOC SCM_CDRLOC) SCM..FREE_CELL_CDR ) *token-of-completeness*)
4.8 Data Flow: Ports, Port Table, File Descriptors
[What class is this then?]
A port as a memory object shouldn't exist. What can exist are port table entries. This will facilitate unification of file descriptors and ports as a number alone is not enough (or maybe yes) to denote that input output virtual device called "port". There is no need anymore to keep a revlealed count as entries in the port table are considered revealed automatically. There is no need to close "lost" ports by the means of the memory manager as all input output devices (virtual, void and even the enclosed soft and string ports) are available as entries in the port table.
The port table may even be thread local or not depending on future needs. The currently unserializable eof-object should or be considered an arbitrary token like the empty list or it shouldn't exist at all. But especially enclosed port types might even need the token.
If there currently is a one to one mapping of port handles and port table entries one could reinterpret some places to unify ports and file descriptors. (One to many that is: one file descriptors, several port handles). All generators of file descriptors only would need to be changed to create a simple file descriptor port entry in the port table.
4.8.1 Buffering, C stdio, IO layers
While it's somehow nice to see all those C stdio buffering routines duplicated again we somehow have some too many layers here:
- pure system calls
- (implicit) file descriptor table & state
- C buffered stdio
- SCM unbuffered io
- SCM ports, SCM file descriptors
- "Printing" subsystem
- SCM POSIX routines
- deprecated dataflow-ioext
- and read with its own ideas of when to block and how to break
That is layers (or strata) after an earthquake or some … Now that SCM has somehow become the "arc" of all algorithms from math to unusable byte-text stdio … The cost of dropping the explicit port buffering … as a pipe might block in "unexpected" circumstances on write.
4.8.2 Independent Port Interface
While a port shouldn't be considered a memory object there already is a predefined interface every port type (and accessor should adhere to). The currently implementation seems to contain inconsistencies in this regard where the port interface is sometimes circumvented. (The following is an incomplete list.)
4.8.4 Port Types
File port, string port, soft port, void port. Where all but the first might be implemented in Scheme by means of a soft port as a current void port isn't useful as a general io device and not needed as a simple mapping or simulation of the posix "/dev/null" or "/dev/zero" depending on the mode.
4.8.5 Multiplexed data flow on file ports
The following subclasses of the file ports are defined and available by means of the operating system. Some of them default to blocking other to non-blocking IO and the term blocking isn't always only related to the behavior of POSIX read and write which is reflected in read-bytes and write-bytes (if used with a file descriptor).
4.8.6 System Event Channels
This is yet another turn where there is nothing special about the file descriptor but SCA and VSI will make a big fuzz about the difference not only because of the implicit padding bytes.
4.9 Display Form and Serialization
With the yet to come controlling graphical input output path the distinction between the display form (display) and the serialization (write) might become more important.
First tests to determine serializability didn't prove to be interesting right now but it's more complicated than …
4.9.1 sharp symbols
[This head line stems from the development notes.]
We need a complete list of all sharp symbols like #<eof> and #<unspecified>. Most of them should become serializable, i.e. we should be able to read them.
There is some doubt regarding #<undefined> which is usually signalled as an error. The system dictionary is based on syntactical data graph positions where that undefined really might get a special meaning but it shouldn't impose any problemn as the content of a variable or a return value, shouldn't it?
The end of file objects should be transferrible with a port which relies on this being distinct. Signalling the end of file with a special object or token wasn't the best option. Perhaps the we we'll need some kind of serialization box for this one goody?
BTW Immediate symbols should be available in Scheme but haven't been tested yet: SCMIMBEGIN => #@begin (That's another class of broken serialization and conformance of outer form to token.)
These sharp symbols seem to have been abused in all ways: by outer form, meaning and missing theoretical foundation. Maybe we really should print a closure as a lambda form instead of using a magic volatile useless C0 combination like #<procedure>. Usually these are write only outer forms to convey whatever is just printed in these sharp symbols. The bad habit of including useless memory addresses doesn't help readability.
- #<undefined>
- #<eof>
- #<unspecified>
- #<unused> (unused?)
- #<procedure> scmtcsclosures
- #<…macro> scmtc16macro
- #<primitive-generic> scmtcssubrs
- #<primitive-procedure> scmtcssubrs
- #<uninterned-symbol> deprecated, purge please
- #<mode file-name> Ports
- #<unknown port>
- #<free-cell> Internal data types
- #<variable>
- #<srcprops>
- #<frame>
- #<winder>
- #<pre-unwind-data>
- #<hash-table>
- #<smobs> Where the smob name replaces the text here.
- #<unknown-…> scmipruk: representation of unknown memory objects
- Interrim Solution for sharp symbols
Extending the reader doesn't feel any good for the three first sharp objects which are considered sharp symbols as we would have to duplicate the tokenization stage provided by the system dictionary. Binding #<undefined> to a regular symbol is possible but the value remains inaccessible as undefined counts as an error throughout SCM land. PATCH As an interrim solution we provide boxed variants of these special tokens as:
- sys-boxed:undefined
- sys-boxed:unspecified
- sys-boxed:eof
The undefined token is needed by the dictionary implementation. The end of file token is needed by subservice. Unspecified can be synthesized as well and is in use. While only undefined needs to be boxed the other ones are boxed as well for coherency. Please see also the individual headings regarding these symbols for long term solutions.
The other sharp objects represent regular memory objects and in concordance with the old Hyperspec view of things regarding the reader one should think about how to construct something from that representation or not. For example Free Cells aren't really useful and their occurrence is considered a severe error condition: but a free cell representation in an expression shouldn't fail as if it were a completely unknown representation.
The future specification of the sharp introduction sequence "#<" should announce them as implementation specific data types which are - if at all - only needed as far the as the implementation of the interpreter is concerned. But somehow these shouldn't exist at all. Some are used as mere unique tokens like frame and could be implemented by using special boxed values of limited scope as these only occur in the winder list. Hash tables are deprecated anyway.
- #<undefined>
In other circumstances (error conditions) we already use the regular symbol undefined. There shouldn't be two of those and there shouldn't be any problem using the sharp symbol directly. This symbol is needed during construction of the system dictionary and thus cannot be tokenized like other regular symbols.
- #<unspecified>
This symbol acts like a joker to avoid really unspecified return values (which could be random memory content pretty much like in C). It was and is available in Scheme bound to the symbol unspecified. There shouldn't be any problem using this symbol directly.
- #<procedure> (Closure Representation)
While it's not difficult to print a closure as a lambda form the difference between the lambda form and the closure is the (textual) environment of the original lambda form the closure stems from. A mere representation of the sources or the lambda form doesn't adquately represent a closure object.
- #<unused>
This was a second sharp symbol for undefined currently not in use.
- #<eof>
This sharp symbol was used to signal an end of file condition. It shouldn't exist as a token or memory object and it should be replaced with a predicate of a port (pretty much like in Java? The end of file condition is a state of the port - not some inherent datum of the data stream. Care is to be taken to retain atomicity of input and output operations which needs to be checked anyway. I think input output operations can't be secured against concurrent IO on the same file if not by special protocols like record locking). See also the atomicity of the new load procedure.
4.10 Enclosed Storage Classes
Relational&hierarchical databases (like the average file system) are examples of non enclosed storage classes.
4.10.1 Immediate
Half a pair. A machine byte word of the default size matching the size of a pointer with an included type tag.
4.10.2 Vector
The storage class vector might be used to denote even non-SCM vectors, i.e. every array addressable by index and pointer arithmetic.
4.10.3 Pair
4.11 Derivation
- Immediate
- Hosting System: Pair + Type Tag (aka Cell or Pair of the Hosting System)
- Hosted System: Heuristic Type Handling or even Top Down alone, i.e. the data type is determined by the syntactical position or one has to know about the data type of the objects of ones program.
4.11.1 Basic Data Graph Construction: Pairs
The most fundamental storage class for every list processor. Every pair w/o a valid type tag (of a non immediate class) in its car is considered a pair and not some other data type.
4.11.2 "Primitive Data Types"
Every non-immediate data type is represented as a pair (called cell) with at least a type tag in its car. Every immediate data type needs to have a unique type tag incorporated.
- Boolean ??? This headline keeps jumping!
Storage Class: Immediate
- Vector
Second fundamental data type addressable by index; fixed size block allocation with explicit storage for length and type. This constitutes an enclosed storage class which denotes the first described property.
- Byte String
- Storage Class: (sub machine byte word) Vector [just call it byte array?]
4.12 PS development progress
4.12.1 storage class: tagged arrays
That's the first step towards a unification as projected.