February 27, 2013

Article at IBM Systems

View original

More Flexible Open Access Input Handlers

In our previous article, “Handling Input Handlers With RPG Open Access,” we demonstrated the basics of writing an Open Access (OA) input handler for CSV files. In this article, we’re taking that same handler to the next level, adding additional flexibility and implementing some of the kinds of error handling that we discussed earlier.

Here are the first few records of the CSV file we’re processing:

DATEUSA,ZONED5_0,ZONED5_2,PACKED7_2,CHAR80 05/11/11,123,250.00,12345.6,"First Record includes a comma , in the data" 12/31/10,2345,150.00,2345,"Jon wrote this test data"

This is basically the same file we used before, but it adds a column heading row and a field sequence that no longer matches the sequence of fields in the file IFSDATA1.

Setting Processing Options

The first thing we wanted to do was allow the user program to set several processing options. As we’ve noted previously, OA provides for this by allowing an optional parameter to be specified to the HANDLER keyword. Previously, we used this to supply the actual IFS file name. Since only a single parameter is available, to pass additional information we need to pass a more complex data structure (DS) than we did before. Here’s the new version of the IFS /Copy member definitions:

 D ifs_hdlr_info_t...                                      
D ds qualified template
D fieldDelim 1a
D stringDelim 1a
D headerRow n
D failOnError n
D path 5000a varying

D comma c ','
D doubleQuote c '"'

The fields fieldDelim and stringDelim let the user override the default values of a comma as the field separator and double-quote as the string delimiters. We’ve also included the default values as constants in the /Copy member. Support for handling alternative delimiters is already built in to the CSVR4 Service program we’re using to parse the IFS file so we don’t really need to do a lot more to support this functionality.

The next field is the headerRow indicator. When set on, the handler will assume that the first record in the IFS file consists of column headings that contain the field names. When it’s off (the default), the handler will function as it did before (i.e., it assumes the data in the CSV file is in the same sequence as the fields in the related physical file definition).

Indicator failOnError is the last of the new options. When set, any data format error detected by the handler will cause a file error to be reported by the handler. When it’s off (the default), the error will be reported and a default value supplied for the field. In the current implementation, only dates are validated in this manner but numerics and other fields could be similarly handled. More on this later.

Processing Column Headings

The best time to process the column headings is during the opening of the file. As you may recall, since the handler must be able to handle multiple uses within a single program, the file handle was stored in the state information area. The same applies to the column heading information, so the size of the information area must increase to accommodate this. Here’s the revised layout:

(1)  D stateInfo       ds                  based(pstateInfo)
(2) D fileHandle *
// Count of number of active entries in entries array plus
// array of field names and their sequence in the file
(3) D fieldCount 5u 0
(4) D entries Dim(256) Ascend
D name 10a Varying Overlay(entries)
D sequence 5u 0 Overlay(entries: *Next)

Since we now need multiple fields to store our state information, it must be represented by a data structure (1). We then base that DS on the pointer to the state information preserved for us by the OA runtime. At (2) you can see the original file handle pointer definition. This is followed (3) by a count of the number of active columns (i.e., the number of header fields retrieved).

The actual field names and their respective sequence in the file can be seen in the array entries, which currently supports up to 256 elements (4). If you’re unfamiliar with the technique used to define the array, you can find out all about it in our 2003 article, “D-spec Discoveries.” This particular technique is described on page 2.

Now that you know how this information will be stored, let’s look at the logic that extracts, stores and sorts the headings.

At (A), we begin the Do loop that will process all of the header fields. The function CSV_getfld loads each field in turn into the second parameter, which is our field names array. Since CSV_getfld returns an indicator that will be off when all fields have been read, it can conveniently be used to control the Do loop as we’ve done here.

Because we’ll be sorting the header names array later to speed up lookups, we also need to know the ordinal position of the field name in the record so we can match it up with its data later. This is why we store the field number in the sequence array at (B).

The code at (C) simply checks that we’re not attempting to process a file with more columns than we can handle. Currently the program is set up to handle a maximum of 256 fields but rather than hard-code this as the limit, we’ve instead used %Elem so the limit can simply be increased by changing the size of the array.

P LoadHeaderRow   b
D pi
D handle Like(fileHandle) value

D fieldNumber s 5u 0

/free
// Load the first record - returns true if record found
If (CSV_loadrec (handle));
fieldNumber = 1;

// Now process field names header record storing field
// names and position
(A) DoW CSV_getfld(handle:
name(fieldNumber):
%size(name));
(B) sequence(fieldNumber) = fieldNumber;
fieldNumber += 1;
// Make sure that we don't have too many fields
// and error if we do
(C) If fieldNumber > %Elem(name);
dsply ('Too many fields - current limit is '
+ %Char(%Elem(name)));
info.rpgStatus = errImpOpenClose; // Open failed
Return;
EndIf;
EndDo;

// Store number of actual field headers found
(D) fieldCount = fieldNumber - 1;

// All field headers loaded - sort the header entries
// into sequence
(E) SortA %SubArr(name: 1: fieldCount);

Else; // No header row found - issue message and fail open.
dsply ('File is empty - no header row found');
info.rpgStatus = errImpOpenClose; // Open failed
EndIf;

Once all field headings have been processed, we can store the count of the number of columns. This is done at (D). Before we finish, we must sort the array into name sequence and we use the count with the %SubArr BIF to limit the sort to the number of active entries (E).


Reading the Data
Reading the Data

In our previous example, we worked on the basis that the sequence of fields in the database record had been engineered to match the sequence of the data in the IFS file. Now that we can use the column headings to identify the fields, we must process the records a little differently.

We decided the best way to do this was to extract all the fields from the record and store them in the fieldContent array (F). We then process the individual field requests by identifying the field name in the column headings array (name) that we built during the open process (G) and using the associated stored number to extract the related content from the data array. You can see the process in action here:

       // Get the data for all fields from the CSV
(F) DoW CSV_getfld(handle:
fieldContent(fieldNumber):
%size(fieldContent));
fieldNumber += 1;
EndDo;

// Process all of the requested input fields
For i = 1 to namesValues.num;

// Set up pointer to field buffer area
pvalue = namesValues.field(i).value;

// If file had a header row look up the field name
If ifs_Info.headerRow;
(G) fieldNumber = %Lookup(namesValues.field(i).externalName :
name :
1 :
fieldCount);
// If no header then just process fields in sequence by
// setting up the field number and data sequence to match
(H) Else;
fieldNumber = i;
sequence(i) = i;
EndIf;

The final part of this code (H) handles the situation where there’s no header row in the IFS file. In other words, it sets up the handler to effectively behave as our previous version did and match the fields from the CSV file in sequence to the database fields. It does this by simply setting up the field number and field data sequence values to the current loop index value.

Date Validation and Error Handling

In the previous version of the program, we included logic to ensure that any numeric field for which no value was found (e.g., two consecutive commas in the file) was set to zero. Because this version has been extended to match data to field names based on the header row, we needed to add error reporting in cases where no column had been found for a requested field. The logic that handles this is shown here:

         Else; // No matching field name found
(I) dsply ('No value found for field '
+ namesValues.field(i).externalName);
(J) If ifs_Info.failOnError;
info.rpgStatus = errInpMisMatch; // report error to RPG
Return; // return to allow RPG to handle error
Else;
(K) namesValues.field(i).valueLenBytes = 0;
EndIf;

We begin by displaying a message identifying the field in error (I). Subsequent processing depends on the error reporting level requested by the user. If they have asked for the read to fail in the even of an error, we set an error status for RPG and return control to the RPG runtime. The error status we’ve used indicates an input record mismatch. Admittedly in our case, it doesn’t match what RPG normally means by this but it seemed appropriate. If the user has indicated they don’t want the read to fail on detecting an error, then we simply set the length of the requested field to zero (K) and allow processing to continue.

For date fields, we decided to perform more elaborate validation. The extract provided shows both the validation process and the way errors are handled when detected.

We restricted the scope of our date validation to simplify the logic for explanatory purposes. To that end, we assumed all dates in the CSV file were in MDY format. We also made the assumption that the date fields in the database would be in *USA format (i.e., MM-DD-CCYY). Since OA notifies us of the required format for a date, we could easily add conditional logic to format the data appropriately. To identify different date formats in the input file would require a different approach. Probably the best way would be to add an array of field names and their corresponding input format to the ifs_info parameter along with the IFS file name, etc. We leave implementation of these features to the reader should they be desired.

If an error is detected, an error message is output. The subsequent processing once again depends on the setting of the ifs_Info.failOnError indicator. If it’s on, the RPG error status field info.rpgStatus is set and the operation will terminate with a failure. If it’s off, the system date (supplied by %Date) is used as a default value but no error is signaled.

We won’t go into the code in detail; hopefully the comments will explain the process.

// Validate and Convert format of date field if required
If namesValues.field(i).dataType = QrnDatatype_Date;
namesValues.field(i).valueLenBytes = 10; // *USA date is 10 long
// Copy data to temporary fixed length field
// (Some Date operations don't like varying fields)
tempCharDate = fieldContent(sequence(fieldNumber));
Test(DE) *MDY tempCharDate;
If Not %Error; // Date is good so convert it and place in buffer
%Subst(value: 1: namesValues.field(i).valueLenBytes)
= %Char(%Date(tempCharDate: *MDY) : *USA);
Else; // Date is in error - report and then either use today's date
// or tell RPG to terminate the program
dsply ('Invalid *MDY date "' + tempCharDate + '" in field '
+ namesValues.field(i).externalName);
If ifs_Info.failOnError;
info.rpgStatus = errInvalDate; // report invalid date to RPG
Else; // Otherwise set the date to the default
%Subst(value: 1: namesValues.field(i).valueLenBytes)
= %Char(%Date() : *USA);
EndIf;

Wrapping It Up

The handler program is now a lot more flexible than our previous version and, as we’ve noted, has the potential to go further yet. In some cases, the logic is a little less “clean” than we’d like—a result of trying to keep the basic structure the same as the previous version to simplify comparisons. Apart from cleaning up the code’s structure, the one major change we’d make is more sophisticated message handling. The use of the DSPLY op-code, while making the logic more self-documenting, limits the number of characters we can output to 52 and this prevented us from adding (for example) a record number to aid in diagnosis. In a production version, we’d change this to use message files.

Other error handling features you might choose to implement include validation of numeric fields and checking all fields to allow reporting in cases where the field content is longer than the database field length. Perhaps studying how CPYFRMIMPF handles such situations will give you some ideas. Of course those whose frustrations with CPYFRMIMPF are what has driven you to look at this OA handler as a possible solution may have a different perspective!

As usual, the code for this handler and the associated test programs are available on our website.

This is the last in our latest series of articles on OA. If you’re interested in seeing other OA handlers developed, please let us know. We’d also love to hear from any of you who implement this solution and what changes you made to it.