how to read "\n\n" in python module pandas? -
there data file has \n\n
@ end of every line.
http://pan.baidu.com/s/1o6jq5q6
system:win7+python3.3+r-3.0.3
in r
sessioninfo() [1] lc_collate=chinese (simplified)_people's republic of china.936 [2] lc_ctype=chinese (simplified)_people's republic of china.936 [3] lc_monetary=chinese (simplified)_people's republic of china.936 [4] lc_numeric=c [5] lc_time=chinese (simplified)_people's republic of china.936
in python: chcp 936
i can read in r.
read.table("test.pandas",sep=",",header=true)
it simple.
and can read in python same output.
fr=open("g:\\test.pandas","r",encoding="gbk").read() data=[x x in fr.splitlines() if x.strip() !=""] id,char in enumerate(data): print(str(id)+","+char)
when read in python module pandas,
import pandas pd pd.read_csv("test.pandas",sep=",",encoding="gbk")
i found 2 problems in output:
1)how make right alignment(the problem have asked in other post)
how set alignment in pandas in python non-ansi characters
2)there nan line in every real data.
can improve pandas code better display in console?
your file when read open('test.pandas', 'rb')
seems contain '\r\r\n' line terminators. python 3.3 seem convert '\n\n' while python 2.7 converts '\r\n' when read open('test.pandas', 'r', encoding='gbk')
.
pandas.read_csv have lineterminator parameter accepts single character terminators.
what can process file bit before passing pandas.read_csv()
, , can use stringio wrap string buffer in file interface don't need write out temporary file first.
import pandas pd io import stringio open('test.pandas', 'r', encoding='gbk') in_file: contents = in_file.read().replace('\n\n', '\n') df = pd.read_csv(stringio(contents))
(i don't have gbk charset output below.)
>>> df[0:10] ??????? ??? ???????? 0 huangtianhui ?? 1948/05/28 1 ?????? ? 1952/03/27 2 ??? ? 1994/12/09 3 luiching ? 1969/08/02 4 ???? ?? 1982/03/01 5 ???? ?? 1983/08/03 6 yangjiabao ? 1988/08/25 7 ?????????????? ?? 1979/07/10 8 ?????? ? 1949/10/20 9 ???»? ? 1951/10/21
in python 2.7 stringio()
in module stringio
instead of io
.
Comments
Post a Comment