# How to Write Multiset Conditions With Oracle VARRAY Types

July 5, 2020Oracle is one of the few databases that implements the SQL standard ORDBMS extensions, which essentially allow for nested collections. Other databases that have these features to some extent are CUBRID, Informix, PostgreSQL.

Oracle has two types of nested collections:

-- Nested tables CREATE TYPE t1 AS TABLE OF VARCHAR2(10); / -- Varrays CREATE TYPE t2 AS VARRAY(10) OF VARCHAR2(10); /

The main difference at first is that a nested table can be of arbitrary size, whereas a varray has a fixed maximum size. Other than that, they behave in similar ways.

When storing a nested collection in a table, there is another difference. Varrays can be inlined into the table just like any other data type, whereas nested tables have to be accompanied by an additional storage clause:

CREATE TABLE t ( id NUMBER(10), t1 t1, t2 t2 ) NESTED TABLE t1 STORE AS t1_nt;

This is a minor hassle in terms of DDL. The runtime implications are more significant.

## Multiset Conditions

The most important difference is the fact that all the useful multiset conditions are not available with varrays. For instance, consider running these statements:

INSERT INTO t VALUES (1, NULL, NULL); INSERT INTO t VALUES (2, t1(), t2()); INSERT INTO t VALUES ( 3, t1('abc', 'xyz', 'zzz'), t2('abc', 'xyz', 'zzz') ); INSERT INTO t VALUES ( 4, t1('dup', 'dup', 'dup'), t2('dup', 'dup', 'dup') ); SELECT * FROM t WHERE 'abc' MEMBER OF t1; SELECT * FROM t WHERE 'abc' MEMBER OF t2;

The result of these queries is:

ID T1 T2 ----------------------------------------------------- 3 T1('abc', 'xyz', 'zzz') T2('abc', 'xyz', 'zzz') ORA-00932: inconsistent datatypes: expected UDT got TEST.T2

Bummer. The documentation is a bit unclear about this. It reads (emphasis mine):

he return value is TRUE if expr is equal to a member of the specified

nested table or varray. The return value is NULL if expr is null or if the nested table is empty.

There is some explicit mention of varrays supporting these operations, but in most of the documentation, varrays are not mentioned. So, how can we write such operations with varrays? Here’s an list of translations of the nested table operator to the equivalent SQL expression for use with varrays.

These are the multiset conditions:

## IS A SET condition

In SQL, everything is a (partially ordered) multiset by default. Sometimes, however, we want to work with sets, i.e. a special type of multiset that has no duplicate values. We can easily check whether nested tables are sets (or whether they aren’t):

-- Nested table version SELECT * FROM t WHERE t1 IS A SET; -- Varray version SELECT * FROM t WHERE t2 IS NOT NULL AND (SELECT count(*) FROM TABLE(t2)) = (SELECT count(DISTINCT column_value) FROM TABLE(t2));

The `IS A SET`

operation yields `UNKNOWN`

if the nested table is `NULL`

, so we have to take that into account as well. If it isn’t `NULL`

, we can count the total values in the varray and compare that with the total distinct values in the varray.

The result is:

ID T1 T2 ----------------------------------------------------- 2 T1() T2() 3 T1('abc', 'xyz', 'zzz') T2('abc', 'xyz', 'zzz')

## IS EMPTY condition

This predicate needs no explanation. It can be written as such:

-- Nested table version SELECT * FROM t WHERE t1 IS EMPTY; -- Varray version SELECT * FROM t WHERE t2 IS NOT NULL AND NOT EXISTS ( SELECT * FROM TABLE (t2) );

The result being:

ID T1 T2 --------------------------------------- 2 T1() T2()

## MEMBER condition

This handy predicate can help check if a specific value is contained in a nested collection. It can be written as such:

-- Nested table version SELECT * FROM t WHERE 'abc' MEMBER OF t1; -- Varray version SELECT * FROM t WHERE t2 IS NOT NULL AND EXISTS ( SELECT 1 FROM TABLE(t2) WHERE column_value = 'abc' );

Yielding:

ID T1 T2 ----------------------------------------------------- 3 T1('abc', 'xyz', 'zzz') T2('abc', 'xyz', 'zzz')

## SUBMULTISET condition

Just like the previous MEMBER condition, this predicate can help check if specific values (more than one) are contained in a nested collection. This is a bit more tricky than the previous emulations. The MEMBER condition works the same way for sets and multisets, as we’re checking if *exactly one element* is contained in the (multi)set.

When working with multisets, duplicates are allowed, and in the case of the `SUBMULTISET`

operation, the following can be observed:

-- Equal multisets t1() SUBMULTISET OF t1(); t1('a', 'a') SUBMULTISET OF t1('a', 'a'); -- Subsets t1('a') SUBMULTISET OF t1('a', 'a'); -- But this is not true t1('a', 'a') SUBMULTISET OF t1('a');

When we omit the fact that nested collections can be multisets and pretend we’re working with sets only, then the emulation of the `SUBMULTISET`

operator is relatively easy:

-- Nested table version SELECT * FROM t WHERE t1('abc', 'xyz') SUBMULTISET OF t1; -- Varray version SELECT * FROM t WHERE t2 IS NOT NULL AND EXISTS ( SELECT 1 FROM TABLE(t2) WHERE column_value = 'abc' INTERSECT SELECT 1 FROM TABLE(t2) WHERE column_value = 'xyz' );

Yielding, once more:

ID T1 T2 ----------------------------------------------------- 3 T1('abc', 'xyz', 'zzz') T2('abc', 'xyz', 'zzz')

If we’re really working with multisets, things are a bit more tricky:

-- Nested table version SELECT * FROM t WHERE t1('dup', 'dup') SUBMULTISET OF t1; -- Varray version SELECT * FROM t WHERE t2 IS NOT NULL AND NOT EXISTS ( SELECT column_value, count(*) FROM TABLE (t2('dup', 'dup')) x GROUP BY column_value HAVING count(*) > ( SELECT count(*) FROM TABLE (t2) y WHERE y.column_value = x.column_value ) );

Yielding:

ID T1 T2 ----------------------------------------------------- 4 T1('dup', 'dup', 'dup') T2('dup', 'dup', 'dup')

How does it work? In the `NOT EXISTS`

correlated subquery, we’re counting the number of duplicate values in the potential `SUBMULTISET`

, effectively turning that `SUBMULTISET`

into a `SET`

using the `GROUP BY`

operation.

We’re then comparing that count value from the left operand with the corresponding count value from the right operand. If there is no value in the left operand whose number of occurrences is bigger than the number of occurrences of that value in the right operand, then the whole left operand is a `SUBMULTISET`

of the right operand.

Cool, eh? We’ll talk about performance another time ðŸ™‚

## MULTISET operators

Also very interesting, the multiset operators:

- MULTISET EXCEPT [ ALL | DISTINCT ]
- MULTISET INTERSECT [ ALL | DISTINCT ]
- MULTISET UNION [ ALL | DISTINCT ]

Notice how there are some differences to the ordinary set operators that can be used in `SELECT`

statements. In particular:

`EXCEPT`

is used as defined in the standard, not`MINUS`

`ALL`

is supported on all three operators, not just on`UNION`

`ALL`

is the default, not`DISTINCT`

How can we work with these operators? Consider these queries:

SELECT id, t1 MULTISET EXCEPT t1('aaa', 'abc', 'dup', 'dup') r FROM t; SELECT id, t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') r FROM t;

Both yielding:

ID R --------------------- 1 (null) 2 T1() 3 T1('xyz', 'zzz') 4 T1('dup')

With this operator, we’re removing each element of the right operand once from the left operand:

`'aaa'`

does not appear in the left operand, so nothing happens`'abc'`

appears on row with ID = 3 and we remove it`'dup'`

appears on row with ID = 4, 3 times, and we remove it twice, leaving one value

Conversely, when adding `DISTINCT`

, we’ll get:

SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup') FROM t;

Yielding:

ID R --------------------- 1 (null) 2 T1() 3 T1('xyz', 'zzz') 4 T1('')

The only difference is on row with ID = 4, where all `'dup'`

values were removed, regardless how many there were on either side of the `MULTISET EXCEPT DISTINCT`

operator.

How to emulate this for varrays?

**DISTINCT version**

This is a bit easier, because we can now use `MINUS`

:

-- Nested table version SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup', 'dup') FROM t; -- Varray version SELECT id, CASE WHEN t2 IS NULL THEN NULL ELSE CAST(MULTISET( SELECT column_value FROM TABLE (t2) MINUS SELECT column_value FROM TABLE (t2('aaa', 'abc', 'dup', 'dup')) ) AS t2) END r FROM t;

Luckily, we can still cast a structural `MULTISET`

type that we can obtain using the `MULTISET()`

operator to a varray type. This greatly simplifies the task.

**ALL version**

If we want the `MULTISET EXCEPT`

or `MULTISET EXCEPT ALL`

semantics, things are trickier. Here’s a solution that resorts to using window functions, in order to turn a `MULTISET`

back into a `SET`

:

-- Nested table version SELECT t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') FROM t; -- Varray version SELECT id, CASE WHEN t2 IS NULL THEN NULL ELSE CAST(MULTISET( SELECT column_value FROM ( SELECT column_value, row_number() OVER ( PARTITION BY column_value ORDER BY column_value) rn FROM TABLE (t2) MINUS SELECT column_value, row_number() OVER ( PARTITION BY column_value ORDER BY column_value) rn FROM TABLE (t2('aaa', 'abc', 'dup', 'dup')) ) ) AS t2) END r FROM t;

How does this work? Ideally, we’ll look at what this `ROW_NUMBER()`

evaluates to on each row. For this, we use `OUTER APPLY`

:

SELECT id, t2, column_value, rn FROM t OUTER APPLY ( SELECT column_value, row_number() OVER ( PARTITION BY column_value ORDER BY column_value) rn FROM TABLE (t2) );

The result is:

ID T2 COLUMN_VALUE RN ----------------------------------------------------- 1 (null) (null) (null) 2 T2() (null) (null) 3 T2('abc', 'xyz', 'zzz') abc 1 3 T2('abc', 'xyz', 'zzz') xyz 1 3 T2('abc', 'xyz', 'zzz') zzz 1 4 T2('dup', 'dup', 'dup') dup 1 4 T2('dup', 'dup', 'dup') dup 2 4 T2('dup', 'dup', 'dup') dup 3

As can be seen, each duplicate value gets assigned a unique row number due to the nature of how `ROW_NUMBER()`

works (this property can be very useful for solving the gaps-and-islands-problem. See trick #4).

Now that we turned our `(COLUMN_VALUE)`

multiset into a `(COLUMN_VALUE, RN)`

set (without duplicates), we can use `MINUS`

again.

## MULTISET INTERSECT and MULTISET UNION

`MULTISET INTERSECT`

works exactly the same way as `MULTISET EXCEPT`

, with the same window function based emulation in the `MULTISET INTERSECT ALL`

case. `MULTISET UNION`

is simpler, because Oracle knows `UNION ALL`

, so we do not need to resort to such trickery.

## Conclusion

Nested collections are a very powerful tool in Oracle SQL. Oracle knows two types of nested collections:

- Nested tables
- Varrays

Nested tables are trickier to maintain as you have to think of their storage more explicitly. Varrays can just be embedded into ordinary tables like any other column. But there’s a price to pay for using varrays. Oracle regrettably doesn’t support all of the above very useful multiset conditions and multiset operators.

Luckily, when you encounter a situation where you have varrays and cannot change that, you can still emulate each of the operators using more traditional SQL.