how to set alignment in pandas in python with non-ANSI characters -
when read data(death in m370 air crash) in r ,the format fine.
> read.csv("g:\\test.ansi",sep=",") 乘客姓名 性别 出生日期 1 huangtianhui 男 1948/05/28 2 姜翠云 女 1952/03/27 3 李红晶 女 1994/12/09 4 luiching 女 1969/08/02 5 宋飞飞 男 1982/03/01 6 唐旭东 男 1983/08/03 7 yangjiabao 女 1988/08/25
when read data in python ,how can set records right alignment?
>>> import pandas pd >>> pd.read_csv("g:\\test.ansi",sep=",") 乘客姓名 性别 出生日期 0 huangtianhui 男 1948/05/28 1 姜翠云 女 1952/03/27 2 李红晶 女 1994/12/09 3 luiching 女 1969/08/02 4 宋飞飞 男 1982/03/01 5 唐旭东 男 1983/08/03 6 yangjiabao 女 1988/08/25 7 买买提江·阿布拉 男 1979/07/10
the data here: http://pan.baidu.com/s/1sjhaul3
the reason when dealing chinese characters (which takes space of 2 ansi characters), pandas
still padding amount of white space ansi characters. means number of white spaces half of needed df containing chinese characters. makes situation worse pandas
ignored chinese characters takes twice space:
print pd.read_csv("test.ansi",sep=",", encoding='gb18030').loc[10:12] 10 边亮京 男 1987/06/06 11 边茂勤 女 1947/07/19 12 曹蕊 女 1982/02/19 #notice how last line missing 1 leading white space compared preceding lines.
ultimately, under hood come down __unicode__
class of dataframe
class allocates spaces according _repr_fit_horizontal_
class. not sure may best solution. using 2 spaces in stead of 1 anywhere when chinese character encountered? not idea in case of mixing rows, , without chinese characters, such in dataframe
.
maybe worthwhile report bug.
but if use ipython
notebook, less affected issue, dataframes
nicely displayed html.
Comments
Post a Comment