New feature: global variables

Let’s start with a quick intro to Scriptella dataflow which is based on a concept of rows and columns(which can be treated as variables). When a query is executed it emits multiple rows making them available to nested elements. As a consequence a variable change is only visible to nested elements of the query. Here is an example to illustrate the above said:

<properties>
userCount=0  <!-- Setting an initial value for the variable -->
</properties>

<query connection-id="db">
    <!-- The query selects number of record in the Users table, the variable userName is set to the value of COUNT(*) column.
The change is available ONLY to nested elements -->
    SELECT COUNT(*) as userCount from Users
    <script connection-id="log">
        Overriden value of userCount: $userCount
    </script>
</query>
<script connection-id="log">
     Out of scope/unmodified value of userCount: $userCount
</script>

At times it is more convenient just to set a global variable, so that it’s value can be consumed in other places of ETL file. In Scriptella 1.0 this was possible only with a help of workarounds:

  • Use System.setProperty and System.getProperty to share a variable between scripts.
  • Another approach is similar to the technqiue utilized in anonymous inner classes to modify a single-element array declared as a final variable.The following example illustrates it:
    <!-- The query defines a scoped context by declaring 
           a globalVar array available to nested elements.
           Since the globalVar is an array, changes to its elements are immediately available to all callers -->
    <query connection-id="jexl">
        //Array with only one element modifiable by nested scripts
        globalVarArray = [0];
        query.next();
        <query connection-id="db">
            SELECT COUNT(*) as userCount from Users
            <script connection-id="jexl">
                <!-- Store userCount in a global array --> 
                globalVarArray[0] = userCount;
            </script>
            <!--And now print the value of global variable we've just set -->
            <script connection-id="log">
                Inner script: globalVar=${globalVarArray[0]}
            </script>
        </query>
        <script connection-id="log">
           Outer script: globalVar=${globalVarArray[0]}
        </script>
    </query>

    <script connection-id="log">
        Out of scope: globalVar=${globalVarArray[0]}
    </script>

If you run the script the following output is printed on the console:

Inner script: globalVar=USER_COUNT
Outer script: globalVar=USER_COUNT
Out of scope: globalVar=0

As you can see it is possible to declare global variables in Scriptella, but this requires an additional query element and use of arrays. Scriptella 1.1 introduces support for etl.globals – a global map for variables available to ETL elements. Based on that, the example above can be rewritten to leverage the new mechanism:

    <query connection-id="db">
        SELECT COUNT(*) as userCount from Users
        <script connection-id="jexl">
            etl.globals['globalVar'] = userCount;
        </script>  
        <script connection-id="log">
            Inner script: etl.globals.globalVar=${etl.globals['globalVar']}
        </script>
    </query>
    <script connection-id="log">
        Outer script: etl.globals.globalVar=${etl.globals['globalVar']}
        globalVar=$globalVar (normal variable globalVar is not defined)
    </script>

The code became less verbose. Additionally the “out of scope” script was removed since global variables have no scope. The line globalVar=$globalVar (normal variable globalVar is not defined) was added to demonstrate that global variables do not affect normal variables, however it is not recommended for a global variable to share a name with normal variable to avoid possible misunderstandings.
And this time the output would be:
Inner script: etl.globals.globalVar=USER_COUNT
Outer script: etl.globals.globalVar=USER_COUNT
globalVar=$globalVar (normal variable globalVar is not defined)

Implementation note: As of now the etl.globals map is not shared between scripts when called by “scriptella” driver. Bug-12790 was logged to track this issue and will be resolved prior to 1.1 release.
Update 1:
Thanks to Anji for pointing out that the example with an array initialization globalVarArray = [0] will not work in Scriptella 1.0 due to lack of array instantiation support in JEXL1.1. JavaScript can be used as an alternative as explained in the FAQ entry, or you can use Janino driver to achieve the same effect:

<etl>
	<connection id="janino" driver="janino"/>
	<connection id="log" driver="text"/>

	<!-- The query defines a scoped context by declaring
       a globalVar array available to nested elements.
       Since the globalVar is an array, changes to its elements are immediately available to all callers -->
	<query connection-id="janino">
		// Array with only one element modifiable by nested scripts
		set("globalVarArray", new int[1]);
		next();
		<script connection-id="janino">
			<!-- Store userCount in a global array -->
			((int[])get("globalVarArray"))[0] = 22;
		</script>
		<!--And now print the value of global variable we've just set -->
		<script connection-id="log">
				Inner script: globalVar=${globalVarArray[0]}
		</script>
	</query>
</etl>
Advertisements

About Fyodor Kupolov
Scriptella ETL founder.

2 Responses to New feature: global variables

  1. In the first example refers to $userName when I think you mean $userCount. I believe $userName would be undefined because the query doesn’t return it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: