Mars

Type variable dummy-binding trick is flawed

Bug #488595 reported by Matt Giuca on 2009-11-26

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mars	Fix Released	Medium	Matt Giuca	Mars 0.3

Bug Description

Functions (in the ir) currently store a single varset, which covers all type variables in the local environment of the function. We actually need two subtly distinct varsets for the function - a "private" and "public".

The private varset stores all type variables in the head and body of the function. It has all of the type variables in the head bound to dummy atoms to establish existential quantification in the body of the function.

The public varset should include only the type variables in the head, but not bind them to atoms (since publically, they should be seen as universally quantified).

Both are required because the bindings made in the private varset should not be seen externally:
1. Because the universal quantification binding trick should not be seen publically.
2. Because it would be very slow to search all of the vars, when only the top-level ones are required..

I'm not sure how it is currently stored, but clearly they are not bound to atoms when stored in the progtable.

Tags:

Related branches

lp:~mgiuca/mars/newtypes

Merged into lp:mars at revision 1030

Matt Giuca: Approve on 2010-05-10

Matt Giuca (mgiuca) on 2010-05-03

Changed in mars:
status:	Triaged → Incomplete

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-05:

#1

I've figured out how this is stored, and the reason why it currently works. (And why the above two "problems" don't actually show up.)

Before typechecking, the function's varset is just a bunch of unbound variables for the function's head (it is what we call in this bug the "public varset"). After typechecking, the function's varset is changed in two ways:
1. All head variables are bound to dummy terms (universal quantification trick),
2. A bunch of random internal type variables used for inference, as well as type variables imported from other globals, are added to the varset -- some bound, some not.

The resulting varset is stored as the function's varset after type checking -- this is what we call in this bug the "private varset".

The bug is that the private varset is exposed after typechecking. This isn't a problem *during* typechecking, because when a global's varset is imported, it is its *public* varset.

However, it's still bad because the varset is mangled horribly after typechecking, such that further use of the function's types will be broken (bound to dummy vars). (e.g., try running the typechecker again on its own output -- you get lots of "undefined type" errors.)

We can't *not* update the function's varset during typechecking, because expressions have types which refer to newly-created variables. I think the correct solution (instead of having separate public/private varsets, as originally proposed) is to run a "cleanup" on the final varset of each function, which would unbind any variables bound to dummy values. A problem with this is that some terms may have been derefed and include dummy values inside them (we actually don't want this at all).

There's still the (minor) problem of large varsets (which can be solved by derefing every type and then deleting any bound vars in the varset), but that's not really an issue because importing globals imports the original head varset. (The statement above "2. Because it would be very slow to search all of the vars, when only the top-level ones are required," isn't an issue.)

So I think more importantly, we need to address the issue of using dummy variables. Probably best to just write a custom unification, which checks when unifying that a variable does not appear in a certain set of "existentially-quantified" or "rigid" variables which aren't allowed to be bound.

I've figured out how this is stored, and the reason why it currently works. (And why the above two "problems" don't actually show up.)

Before typechecking, the function's varset is just a bunch of unbound variables for the function's head (it is what we call in this bug the "public varset"). After typechecking, the function's varset is changed in two ways:
1. All head variables are bound to dummy terms (universal quantification trick),
2. A bunch of random internal type variables used for inference, as well as type variables imported from other globals, are added to the varset -- some bound, some not.

The resulting varset is stored as the function's varset after type checking -- this is what we call in this bug the "private varset".

The bug is that the private varset is exposed after typechecking. This isn't a problem *during* typechecking, because when a global's varset is imported, it is its *public* varset.

However, it's still bad because the varset is mangled horribly after typechecking, such that further use of the function's types will be broken (bound to dummy vars). (e.g., try running the typechecker again on its own output -- you get lots of "undefined type" errors.)

We can't *not* update the function's varset during typechecking, because expressions have types which refer to newly-created variables. I think the correct solution (instead of having separate public/private varsets, as originally proposed) is to run a "cleanup" on the final varset of each function, which would unbind any variables bound to dummy values. A problem with this is that some terms may have been derefed and include dummy values inside them (we actually don't want this at all).

There's still the (minor) problem of large varsets (which can be solved by derefing every type and then deleting any bound vars in the varset), but that's not really an issue because importing globals imports the original head varset. (The statement above "2. Because it would be very slow to search all of the vars, when only the top-level ones are required," isn't an issue.)

So I think more importantly, we need to address the issue of using dummy variables. Probably best to just write a custom unification, which checks when unifying that a variable does not appear in a certain set of "existentially-quantified" or "rigid" variables which aren't allowed to be bound.

summary:

- Functions need to store both a public and private varset
+ Type variable dummy-binding trick is flawed

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-05:

#2

This is actually quite important, as it blocks bug #574108. (Type information in later passes is meaningless if it includes variables bound to dummy terms.)

Find an example where a variable's type is a dummy term.

Changed in mars:
importance:	Wishlist → Medium
status:	Incomplete → Triaged

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-05:

#3

> Write a custom unification, which checks when unifying that a variable does not appear
> in a certain set of "existentially-quantified" or "rigid" variables which aren't allowed to be
> bound.

Great! There is a library function which does exactly this, term.unify_term_dont_bind. It takes a set of variables which must not be bound, and fails if the unification would require that variable be bound.

Just use this instead of the dummy variable trick.

Also boost up the test cases in this regard (test that it fails to bind on both sides of an assignment, for example).

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-05:

#4

Fixed in newtypes branch, r1026.

Changed in mars:
status:	Triaged → Fix Committed

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-10:

#5

Merged to trunk, r1030.

Matt Giuca (mgiuca) on 2010-06-25

Changed in mars:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.