Q_and_A_2

3472 days ago by takepwave

質問

sage/RとPandas(Sage)でのデータフレームの相互変換の 「RからPandasへのデータフレーム変換 」がSage6.3で動作しないという問い合わせがありました。 以下のファイルをダウンロードしました。
  • sage-6.3-x86_64-Darwin-OSX_10.7_x86_64-app.dmg

pandasのインストール

ターミナルを起動します。 私は、Sage6.3.appを/Applicationsにインストールしましたので、以下のコマンドを起動します。
$ /Applications/Sage-6.3.app/Contents/Resources/sage/sage -sh
Starting subshell with Sage environment variables set.  Don't forget
to exit when you are done.  Beware:
 * Do not do anything with other copies of Sage on your system.
 * Do not use this for installing Sage packages using "sage -i" or for
   running "make" at Sage's root directory.  These should be done
   outside the Sage shell.

Bypassing shell configuration files...

Note: SAGE_ROOT=/Applications/Sage-6.3.app/Contents/Resources/sage

 (sage-sh) $ 
ここで、easy_installを使ってpandasをインストールします。
 (sage-sh) $ easy_install pandas
Searching for pandas
Reading https://pypi.python.org/simple/pandas/
Best match: pandas 0.14.1

途中省略
Installed /Applications/Sage-6.3.app/Contents/Resources/sage/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-macosx-10.7-x86_64.egg
Processing dependencies for pandas
Finished processing dependencies for pandas

現象の再現

以下の手順でnotebookで現象を再現してみました。
# jsonliteをインストール r("install.packages('jsonlite')") 
       
NULL
NULL
# R Graphic Cookbookのデータをインストール r("install.packages('gcookbook')") 
       
NULL
NULL
# numpy, pandasのインポート import pandas as pd import numpy as np 
       
# jsonliteのインポート r('library(jsonlite)') 
       
[1] "jsonlite"  "stats"     "graphics"  "grDevices" "utils"    
"datasets"  "methods"   "base"     
[1] "jsonlite"  "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
# gcookbookのインポート r('library(gcookbook)') 
       
[1] "gcookbook" "jsonlite"  "stats"     "graphics"  "grDevices" "utils" 
"datasets"  "methods"  
[9] "base"     
[1] "gcookbook" "jsonlite"  "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
[9] "base"     
# RからJSON形式でデータを持ってくる方法 # 例として、gcookbookのサンプルデータをRから取得する test_json = r('toJSON(heightweight, pretty=FALSE)') 
       
# 以下のコメントを外すと値が確認できます。 # test_json 
       
heightweight = pd.read_json(sageobj(test_json)) heightweight.head() 
       
Traceback (click to the left of this block for traceback)
...
TypeError: Expected String or Unicode
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_sage_input_13.py", line 10, in <module>
    exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("aGVpZ2h0d2VpZ2h0ID0gcGQucmVhZF9qc29uKHNhZ2VvYmoodGVzdF9qc29uKSkKaGVpZ2h0d2VpZ2h0LmhlYWQoKQ=="),globals())+"\\n"); execfile(os.path.abspath("___code___.py"))
  File "", line 1, in <module>
    
  File "/private/var/folders/jx/7nsrq4lw8xj553006s7bvdb80000gn/T/tmpf7Jqyg/___code___.py", line 2, in <module>
    heightweight = pd.read_json(sageobj(test_json))
  File "/Applications/Sage-6.3.app/Contents/Resources/sage/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-macosx-10.7-x86_64.egg/pandas/io/json.py", line 198, in read_json
    date_unit).parse()
  File "/Applications/Sage-6.3.app/Contents/Resources/sage/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-macosx-10.7-x86_64.egg/pandas/io/json.py", line 266, in parse
    self._parse_no_numpy()
  File "/Applications/Sage-6.3.app/Contents/Resources/sage/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-macosx-10.7-x86_64.egg/pandas/io/json.py", line 483, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
TypeError: Expected String or Unicode
sageobj(test_json) 
       
{'_r_class': 'json', 'DATA':
'[{"sex":"f","ageYear":11.92,"ageMonth":143,"heightIn":56.3,"weightLb":8\
5},{"sex":"f","ageYear":12.92,"ageMonth":155,"heightIn":62.3,"weightLb":\
105},{"sex":"f","ageYear":12.75,"ageMonth":153,"heightIn":63.3,"weightLb\
":108},{"sex":"f","ageYear":13.42,"ageMonth":161,"heightIn":59,"weightLb\
":92},{"sex":"f","ageYear":15.92,"ageMonth":191,"heightIn":62.5,"weightL\
b":112.5},
途中省略
{"sex":"m","ageYear":12.58,"ageMonth":151,"heightIn":59.3,"weightLb":87}\
]'}
{'_r_class': 'json', 'DATA': '[{"sex":"f","ageYear":11.92,"ageMonth":143,"heightIn":56.3,"weightLb":85},{"sex":"f","ageYear":12.92,"ageMonth":155,"heightIn":62.3,"weightLb":105},{"sex":"f","ageYear":12.75,"ageMonth":153,"heightIn":63.3,"weightLb":108},{"sex":"f","ageYear":13.42,"ageMonth":161,"heightIn":59,"weightLb":92},{"sex":"f","ageYear":15.92,"ageMonth":191,"heightIn":62.5,"weightLb":112.5},
途中省略
{"sex":"m","ageYear":12.58,"ageMonth":151,"heightIn":59.3,"weightLb":87}]'}

原因

sageobjの出力で'DATA'にjson形式のデータ入っていることがわかりましたので、これを抽出してpd.read_jsonに渡すように修正します。

sageobj(test_json)['DATA'] 
       
'[{"sex":"f","ageYear":11.92,"ageMonth":143,"heightIn":56.3,"weightLb":8\
5},{"sex":"f","ageYear":12.92,"ageMonth":155,"heightIn":62.3,"weightLb":\
105},{"sex":"f","ageYear":12.75,"ageMonth":153,"heightIn":63.3,"weightLb\
":108},{"sex":"f","ageYear":13.42,"ageMonth":161,"heightIn":59,"weightLb\
":92},{"sex":"f","ageYear":15.92,"ageMonth":191,"heightIn":62.5,"weightL\
b":112.5},{"sex":"f","ageYear":14.25,"ageMonth":171,"heightIn":62.5,"wei\
ghtLb":112},
途中省略
{"sex":"m","ageYear":13.92,"ageMonth":167,"heightIn":62,"weightLb":107.5\
},{"sex":"m","ageYear":12.58,"ageMonth":151,"heightIn":59.3,"weightLb":8\
7}]'
'[{"sex":"f","ageYear":11.92,"ageMonth":143,"heightIn":56.3,"weightLb":85},{"sex":"f","ageYear":12.92,"ageMonth":155,"heightIn":62.3,"weightLb":105},{"sex":"f","ageYear":12.75,"ageMonth":153,"heightIn":63.3,"weightLb":108},{"sex":"f","ageYear":13.42,"ageMonth":161,"heightIn":59,"weightLb":92},{"sex":"f","ageYear":15.92,"ageMonth":191,"heightIn":62.5,"weightLb":112.5},{"sex":"f","ageYear":14.25,"ageMonth":171,"heightIn":62.5,"weightLb":112},
途中省略
{"sex":"m","ageYear":13.92,"ageMonth":167,"heightIn":62,"weightLb":107.5},{"sex":"m","ageYear":12.58,"ageMonth":151,"heightIn":59.3,"weightLb":87}]'
heightweight = pd.read_json(sageobj(test_json)['DATA']) heightweight.head() 
       
   ageMonth  ageYear  heightIn sex  weightLb
0       143    11.92      56.3   f      85.0
1       155    12.92      62.3   f     105.0
2       153    12.75      63.3   f     108.0
3       161    13.42      59.0   f      92.0
4       191    15.92      62.5   f     112.5
   ageMonth  ageYear  heightIn sex  weightLb
0       143    11.92      56.3   f      85.0
1       155    12.92      62.3   f     105.0
2       153    12.75      63.3   f     108.0
3       161    13.42      59.0   f      92.0
4       191    15.92      62.5   f     112.5