AppLib EXECUTE TRANSFORM

From LucidDB Wiki
Jump to: navigation, search

Contents

Declaration

-- executes a script as a UDX, with the resultInserter sent to the script
-- able to take on as many columns as the input_set.
create or replace function applib.execute_transform(
    engine_name varchar(255), 
    script varchar(65535),
    input_set cursor)
returns table(input_set.*)
language java
parameter style system defined java
modifies sql data
external name 'applib.applibJar:org.eigenbase.applib.script.ExecuteScriptUdr.executeUdx';

Example Invocation

select * from table(
  applib.execute_transform('js',
    '${FARRAGO_HOME}/test/sql/scripting/gen_sequence.js',
    cursor(select * from firing, (values(0)))
));

Purpose

This provides the ability to use a heavy-duty UDX with a script, taking an input cursor from some set of data and returning an output table with up to the same amount of columns the input set contained. This is accomplished by making the variables inputSet (type ResultSet) and resultInserter (type PreparedStatement) globally available to the script on execution.

Input

  • ENGINE_NAME: Script engine to use.
  • SCRIPT: Script to run.
  • INPUT_SET: Set of data made available to the script to act on.

Output

Outputs a table with the same column count as the input set, populated by the script if desired.

Example

Making the non-partitioned form of AppLib_GENERATE_SEQUENCE in Java is quite straightforward but still somewhat painful from the need to catch exceptions that probably won't happen. We can create a slightly less powerful form very straightforwardly in JavaScript:

// gen_sequence.js
var start = 1;
var step = 1;
var meta_data = inputSet.getMetaData(); // inputSet is global
var columns = meta_data.getColumnCount();
var names = [];
var counter = start - step;
while (inputSet.next()) {
  var i = 1;
  for (; i < columns; ++i) {
    // resultInserter is global
    resultInserter.setObject(i, inputSet.getObject(i));
    if (names.length < columns) {
      names.push(meta_data.getColumnName(i));
    }
  }

  counter += step;
  resultInserter.setLong(i, counter);
  resultInserter.executeUpdate();
}

and on SqlLine, we can test the existing generate sequence:

create schema hello;
set schema 'hello';
create table firing (step varchar(100), dirty_seq int, relative_seq int);
insert into firing values ('Ready', 0, 2), ('Aim', 0, 0), ('Fire', 0, 1),
('Drag', 1, 0), ('Bury', 2, 0), ('Kill', 0, 1);

-- Simple tests without partitioning
select * from table(
  applib.generate_sequence(
    cursor(select step, dirty_seq, relative_seq from firing),
    0, 1));

--
+--------+------------+---------------+----------+
|  STEP  | DIRTY_SEQ  | RELATIVE_SEQ  | SEQ_NUM  |
+--------+------------+---------------+----------+
| Ready  | 0          | 2             | 0        |
| Aim    | 0          | 0             | 1        |
| Fire   | 0          | 1             | 2        |
| Drag   | 1          | 0             | 3        |
| Bury   | 2          | 0             | 4        |
| Kill   | 0          | 1             | 5        |
+--------+------------+---------------+----------+

We can test the JS we just wrote too:

0: jdbc:luciddb:> select * from table(applib.execute_transform('js',
  '${FARRAGO_HOME}/test/sql/scripting/gen_sequence.js',
  cursor(select * from firing, (values(0)))));
+--------+------------+---------------+---------+
|  STEP  | DIRTY_SEQ  | RELATIVE_SEQ  | EXPR$0  |
+--------+------------+---------------+---------+
| Ready  | 0          | 2             | 1       |
| Aim    | 0          | 0             | 2       |
| Fire   | 0          | 1             | 3       |
| Drag   | 1          | 0             | 4       |
| Bury   | 2          | 0             | 5       |
| Kill   | 0          | 1             | 6       |
+--------+------------+---------------+---------+

There are two notes to make here. One, if you looked at the original Java source for generate_sequence, you'll notice that the variable i actually ends at the value of the total number of columns on the input set plus one. This is because the output table is defined as the number of columns in the input set, plus an extra column (as part of a different set). We can't define our scripted generate_sequence that way because then we'll be writing one-past-the-end of the result inserter, which will result in an obscure exception. We can fake more columns by selecting from an empty (values(0)) "table", and it's that column we wish to write the sequence to, so the difference in the script is that i stops at "i < columns" rather than "i <= columns".

Even with those caveats, writing scripts this way has many benefits. While people can debate whether they like writing in Java or not, the more measurable benefits are the interpreted nature of this approach. With the GenerateSequenceUdx class, you have to write, save, compile, re-jar, copy over to LucidDB, and restart LucidDB, all before you can actually test to see if it's doing what you want. With this approach, you just simply save changes to the script and rerun the command.

Product Documentation