Bug 17256 - bug in writeForeignSAS in the foreign library when string is NA
Summary: bug in writeForeignSAS in the foreign library when string is NA
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: I/O (show other bugs)
Version: R 3.3.*
Hardware: Other Other
: P5 enhancement
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2017-04-11 02:37 UTC by kyaw.sint
Modified: 2018-07-21 13:44 UTC (History)
2 users (show)

See Also:


Attachments
patch to fix 17256 and do some sapply / vapply replacements (10.28 KB, patch)
2018-05-18 12:13 UTC, Michael Nelson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description kyaw.sint 2017-04-11 02:37:30 UTC
There is a possible bug in foreign:::writeForeignSAS when there is a NA value in a column of the character type. In the SAS program file created, it is written as "$ NA" when it expects a number, i.e., the maximum length of the string.


I think it has to do with this part of writeForeignSAS:
    if (any(strings)) {
        cat("LENGTH", file = codefile, append = TRUE)
        lengths <- sapply(df[, strings, drop = FALSE], FUN = function(x) max(nchar(x)))
        names(lengths) <- varnames[strings]
        for (v in varnames[strings]) cat("\n", v, "$", lengths[v], 
            file = codefile, append = TRUE)
        cat("\n;\n\n", file = codefile, append = TRUE)
    }


and if a value is missing, the length is considered NA, e.g.,

> max(nchar(c("apple","banana","carrot", NA)))
[1] NA

Can this bug be fixed? I am using R version 3.3.3. Thanks.

Kyaw Sint (Joe)



Below is an example:

> temp <- data.frame(name=c("apple", "banana", "carrot", NA), gender=c("male", "female", "male", "female"), stringsAsFactors = F)

> temp
  name gender
1    a   male
2    b female
3    c   male
4 <NA> female
> 
> write.foreign(temp, "temp_for_sas_import.txt", "temp_import.sas", package="SAS")


* Written by R;
*  write.foreign(temp, "temp_for_sas_import.txt", "temp_import.sas",  ;

DATA  rdata ;
LENGTH
 name $ NA
 gender $ 6
;

INFILE  "temp_for_sas_import.txt" 
     DSD 
     LRECL= 16 ;
INPUT
 name
 gender $ 
;
RUN;


# when the above is run in SAS

23628   name $ NA
               --
               22
ERROR 22-322: Expecting a numeric constant.
Comment 1 kyaw.sint 2017-04-23 23:29:45 UTC
This line in foreign:::writeForeignSAS could be changed to 

        lengths <- sapply(df[, strings, drop = FALSE], FUN = function(x) max(nchar(na.omit(x))))

to omit the NA values.
Comment 2 Michael Nelson 2018-05-18 12:13:10 UTC
Created attachment 2347 [details]
patch to fix 17256 and do some sapply / vapply replacements

The attached patch fixes both the reported bug relating to string variables with NA and also cases where all strings are length 0 (both of which cause errors when read in SAS.

The patch also does some general replacements of calls to sapply with the vapply equivalents (and some any(is.na with anyNA)
Comment 3 Martin Maechler 2018-07-21 13:44:24 UTC
(In reply to Michael Nelson from comment #2)
> Created attachment 2347 [details]
> patch to fix 17256 and do some sapply / vapply replacements   .............

Thank you, Michael for the patch --> you have become a contributor to 'foreign'!

{ When applying it by  `patch -p0 < patches-foreign.diff` there were warnings for every file  about the patch being reversed -- assuming that, it applied fine }.

I can confirm that it fixes the bug reported above by Kyaw Sint (Joe) and 
will add Joe's example as a regression test.

Then, I'll leave it in the sources for now, i.e.,
  https://svn.r-project.org/R-packages/trunk/foreign/

notably as a version of foreign had just been released to CRAN yesterday.

Thank you both, once again!