Bug 15936 - SPSS import generates extra empty columns in middle of dataset
Summary: SPSS import generates extra empty columns in middle of dataset
Status: NEW
Alias: None
Product: R
Classification: Unclassified
Component: Wishlist (show other bugs)
Version: R 3.1.1
Hardware: All Linux
: P5 minor
Assignee: R-core
URL:
Depends on:
Blocks:
 
Reported: 2014-08-15 19:22 UTC by Paul Johnson
Modified: 2014-08-15 19:22 UTC (History)
0 users

See Also:


Attachments
SPSS file with 1 record to demonstrate extra columns problem (303.17 KB, application/x-spss-sav)
2014-08-15 19:22 UTC, Paul Johnson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Johnson 2014-08-15 19:22:32 UTC
Created attachment 1645 [details]
SPSS file with 1 record to demonstrate extra columns problem

I see there was BUG 15152 (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15152) that says the long SPSS strings cause extra variables to appear in the R data frame.  I fear the problem I see today with read.spss. So I'm asking for advice.

A Macintosh-using student showed a dataset in SPSS, then tested same with read.spss.  He found the R data frame has about 100 extra empty columns. 

I see same in Linux with R 3.1.1

> dat <- read.spss("KK_SPSS.sav", to.data.frame = TRUE).

re-encoding from latin1
Warning messages:
1: In read.spss("KK_SPSS.sav") :
  KK_SPSS.sav: Unrecognized record type 7, subtype 14 encountered in system file
2: In read.spss("KK_SPSS.sav") :
  KK_SPSS.sav: Unrecognized record type 7, subtype 17 encountered in system file
3: In read.spss("KK_SPSS.sav") :
  KK_SPSS.sav: Unrecognized record type 7, subtype 18 encountered in system file

If you try that, at the left side of the data frame, there is a variable called "NAME_1_TEXT" and then there are empty columns   "NAME_7" "NAME_8"   "NAME_9"  "NAME_A" "NAME_B"  "NAME_C" "NAME_D"              

However, current pspp can open the file and there are none of those extra columns apparent.

Can you advise on work arounds?  Should the SPSS user do something differently? Or should the R be more careful somehow?