Paul Jarc's software

SQL sets

Suppose we have a table of widgets, with various descriptive attributes:

widget
id	size	shape	color	alignment
1	Big	Round	Purple	Chaotic
2	Tiny	Star	Purple	Lawful
3	Tiny	Linear	Red	Neutral

And another table whose rows correspond to sets of widgets:

widget_set
id	description
1	Widgets of sizes other than Tiny
2	Chaotic-aligned or Star-shaped widgets
3	Big and Red widgets

Now we'd like to set things up so we can look up the widgets that belong to a given set (or look up the sets that contain a given widget). The most natural way would be to define a view:

create view widget_set_member as
  select 1    as widget_set_id
       , w.id as widget_id
    from widget w
    where w.size!='Tiny'
  union
  select 2    as widget_set_id
       , w.id as widget_id
    from widget w
    where w.alignment='Chaotic'
      or  w.shape='Star'
  union
  select 3    as widget_set_id
       , w.id as widget_id
    from widget w
    where w.size='Big'
      and w.color='Red'

One limitation of this method is that new sets of widgets can't be defined with only data-manipulation (insert/update/delete) statements; it requires redefining the view. We could avoid this problem by using a table instead of a view:

widget_set_member
widget_set_id	widget_id
1	1
2	1
2	2

This method has its own limitation, though: changes to the widget table won't be automatically reflected in widget_set_member. Luckily, there is a way to have the best of both worlds. We'll start with an attribute_type table whose rows represent the columns of descriptive attributes of widgets:

attribute_type
id	name
1	size
2	shape
3	color
4	alignment

Then we'll use that table to add an attribute_member view, which turns the structure of the widget table into data:

create view attribute_member as
  select w.id                                as widget_id
       , at.id                               as attribute_type_id
       , at.name                             as attribute_type_name
       , case at.name
           when 'size'      then w.size
           when 'shape'     then w.shape
           when 'color'     then w.color
           when 'alignment' then w.alignment
         end                                 as attribute_value
    from widget         w
       , attribute_type at

The criteria that define our sets will be boolean predicates, based on the descriptive attributes of widgets. The primitive conditions will test that a particular attribute has a particular value—say, that the size is Big. Then several of those primitive predicates can be combined into one compound predicate with and, or, and not. We'll represent those compound predicates in disjunctive normal form. attribute_member gives us the primitive predicates, so the next layer up will be a widget_subset table corresponding to conjunctions, a widget_subset_pred table showing which (possibly negated) primitive criteria make up a conjunction, and a widget_subset_member view showing which widgets satisfy the criteria for which subsets.

widget_subset
id
1
2
3
4

widget_subset_pred
widget_subset_id	attribute_type_id	attribute_value	negated
1	1 (size)	Tiny	1 (negated)
2	4 (alignment)	Chaotic	0 (not negated)
3	2 (shape)	Star	0 (not negated)
4	1 (size)	Big	0 (not negated)
4	3 (color)	Red	0 (not negated)

create view widget_subset_member as
  select wss.id as widget_subset_id
       , w.id   as widget_id
    from widget_subset wss
       , widget        w
    where (select count(*)
             from widget_subset_pred wssp
                , attribute_member   am
             where wss.id=wssp.widget_subset_id
               and wssp.negated=1
               and wssp.attribute_type_id=am.attribute_type_id
               and wssp.attribute_value=am.attribute_value
               and am.widget_id=w.id)=0
      and (select count(*)
             from widget_subset_pred wssp
                , attribute_member   am
             where wss.id=wssp.widget_subset_id
               and wssp.negated=0
               and wssp.attribute_type_id=am.attribute_type_id
               and wssp.attribute_value=am.attribute_value
               and am.widget_id=w.id)=
          (select count(*)
             from widget_subset_pred wssp
             where wss.id=wssp.widget_subset_id
               and wssp.negated=0)

Next, we'll add a widget_set_pred table showing which subsets (conjunctions) make up a set (disjunction) and a widget_set_member view showing which widgets satisfy the criteria for which sets.

widget_set_pred
widget_set_id	widget_subset_id
1	1
2	2
2	3
3	4

create view widget_set_member as
  select distinct wsp.widget_set_id as widget_set_id
                , wssm.widget_id    as widget_id
    from widget_set_pred      wsp
       , widget_subset_member wssm
    where wsp.widget_subset_id=wssm.widget_subset_id

That's it. With these tables and views, defining a new set can now be done by inserting rows for its criteria into the widget_subset, widget_subset_pred, widget_set, and widget_set_pred tables. Members of the set will then be immediately visible in the widget_set_member view. We don't need to create or modify any views, and we don't need to keep any separate tables in sync when the widget table is changed. However, as you might guess from looking at the widget_subset_member view, the query performance may be rather slow, depending on the amount of data there is to work through.

We can improve performance by creating another table to enumerate all possible attribute values, and another to associate widgets with the attributes that apply to them (replacing the attribute_member view):

attribute
id	attribute_type_id	value
1	1	Big
2	1	Tiny
3	2	Round
4	2	Star
5	2	Linear
6	3	Purple
7	3	Red
8	4	Chaotic
9	4	Lawful
10	4	Neutral
We can set a uniqueness constraint on the (`attribute_type_id`, `value`) combination.

attribute_member
attribute_type_id	attribute_id	widget_id
1	1	1
2	3	1
3	6	1
4	4	1
1	2	2
2	4	2
3	6	2
4	9	2
1	2	3
2	5	3
3	7	3
4	10	3
`attribute_type_id` is not semantically necessary, but we can create a uniqueness constraint on (`attribute_type_id`, `widget_id`) to ensure we don't accidentally give one widget two different colors, etc.

Then we would modify the widget_subset_pred table to have an attribute_id column instead of attribute_type_id and attribute_value, and redefine the widget_subset_member view:

create view widget_subset_member as
  select wss.id as widget_subset_id
       , w.id   as widget_id
    from widget_subset wss
       , widget        w
    where (select count(*)
             from widget_subset_pred wssp
                , attribute          a
                , attribute_member   am
             where wss.id=wssp.widget_subset_id
               and wssp.negated=1
               and wssp.attribute_id=am.attribute_id
               and am.widget_id=w.id)=0
      and (select count(*)
             from widget_subset_pred wssp
                , attribute_member   am
             where wss.id=wssp.widget_subset_id
               and wssp.negated=0
               and wssp.attribute_id=am.attribute_id
               and am.widget_id=w.id)=
          (select count(*)
             from widget_subset_pred wssp
             where wss.id=wssp.widget_subset_id
               and wssp.negated=0)

The disadvantage with these extra tables is that now we have to maintain the data in the attribute and attribute_member tables, which is more complicated than maintaining widget alone. On the other hand, the descriptive columns could be dropped from the widget table, so the data would still only need to be maintained in one place. The structure of the original widget table could be reproduced as a view.

Note that this table structure can't ensure that every widget has one attribute of each type. Some widgets might be missing a size, or an alignment, etc. This would be somewhat like having a null value in the corresponding column in the original widget table, but the semantics in the set membership views would be different.