Column mismatch writing a Python Pandas data frame to csv in unix vs windows -
i have created python pandas data frame object , trying write out csv. following command work in windows, when run exact same code on exact same data in unix, column headers not align columns after writing out csv. df looks fine within python when running command line (ie, df["somecolumn"] shows me expect). ideas?
df.to_csv(str(outfile), sep=",", header=true, index=false, na_rep="na", cols=firstcols) edit: here @ inputs lie merge first column "var":
> more infile* :::::::::::::: infile1.tsv :::::::::::::: var chrom pos ref alt p ipu irf iuc ign 1:12892:tgg:t 1 12892 tgg t 0.1383 . intergenic ncrna none(dist=none) 1:14397:ctgt:c 1 14397 ctgt c 0.5863 . ncrna ncrna wash7p 1:17084:ggt:g 1 17084 ggt g 0.2337 . ncrna ncrna wash7p 1:17421:atg:a 1 17421 atg 0.1089 . ncrna ncrna wash7p :::::::::::::: infile2.tsv :::::::::::::: var chrom pos ref alt p ipu irf iuc ign 1:14567:g:gat 1 14567 g gat 0.1299 . ncrna ncrna wash7p 1:14670:tg:t 1 14670 tg t 0.1319 . ncrna ncrna wash7p 1:14745:ggc:g 1 14745 ggc g 0.1462 . ncrna ncrna wash7p 1:14905:ga:g 1 14905 ga g 0.1307 . ncrna ncrna wash7p :::::::::::::: infile3.tsv :::::::::::::: var chrom pos ref alt ac af pu rfg gi 21:10862612:g:a 21 10862612 g 3 0.00 intergenic none(dist=none),none(dist=none),tekt4p2(dist=894019),tpte(dist=44131),ak311573(dist=265170),tpte(dist=44131),ensg00000169861 21:10862618:t:c 21 10862618 t c 14183 0.65 intergenic none(dist=none),none(dist=none),tekt4p2(dist=894025),tpte(dist=44125),ak311573(dist=265176),tpte(dist=44125),ensg00000169861 21:10862623:t:c 21 10862623 t c 1 0.00 intergenic none(dist=none),none(dist=none),tekt4p2(dist=894030),tpte(dist=44120),ak311573(dist=265181),tpte(dist=44120),ensg00000169861:enst0000030209 and here python script
import csv import pandas import glob glob import iglob inpath = '*.tsv' outfile ="merged.out" merged = pandas.concat([pandas.read_csv(f, sep='\t', parse_dates=false) f in glob.iglob(inpath)], axis=0) dfcols = merged.columns.tolist() firstcols = ['var', 'chrom', 'pos', 'ref', 'alt'] #preserve order of first 5 columns d in set(dfcols): if d not in firstcols: firstcols.append(d) merged.to_csv(str(outfile), sep="\t", header=true, index=false, na_rep="na", cols=firstcols) here of truncated output looks illustrate how headers not match:
> more merged.out var chrom pos ref alt ac af iuc pu p rfg irf ipu ign gi na na t 1 na none(dist=none) . intergenic ncrna 0.1383 12892 na tgg na 1:12892:tgg:t na na c 1 na wash7p . ncrna ncrna 0.5863 14397 na ctgt na 1:14397:ctgt:c however, columns looks annotated correctly within python environment? stumped
>>> merged['var'] 0 1:12892:tgg:t 1 1:14397:ctgt:c 2 1:17084:ggt:g 3 1:17421:atg:a 0 1:14567:g:gat 1 1:14670:tg:t 2 1:14745:ggc:g 3 1:14905:ga:g 0 21:10862612:g:a 1 21:10862618:t:c 2 21:10862623:t:c 3 21:10862624:g:t
Comments
Post a Comment