Column mismatch writing a Python Pandas data frame to csv in unix vs windows -


i have created python pandas data frame object , trying write out csv. following command work in windows, when run exact same code on exact same data in unix, column headers not align columns after writing out csv. df looks fine within python when running command line (ie, df["somecolumn"] shows me expect). ideas?

df.to_csv(str(outfile), sep=",", header=true, index=false, na_rep="na", cols=firstcols) 

edit: here @ inputs lie merge first column "var":

> more infile* :::::::::::::: infile1.tsv :::::::::::::: var     chrom   pos     ref     alt     p       ipu     irf     iuc     ign 1:12892:tgg:t   1       12892   tgg     t       0.1383  .       intergenic      ncrna   none(dist=none) 1:14397:ctgt:c  1       14397   ctgt    c       0.5863  .       ncrna   ncrna   wash7p 1:17084:ggt:g   1       17084   ggt     g       0.2337  .       ncrna   ncrna   wash7p 1:17421:atg:a   1       17421   atg           0.1089  .       ncrna   ncrna   wash7p :::::::::::::: infile2.tsv :::::::::::::: var     chrom   pos     ref     alt     p       ipu     irf     iuc     ign 1:14567:g:gat   1       14567   g       gat     0.1299  .       ncrna   ncrna   wash7p 1:14670:tg:t    1       14670   tg      t       0.1319  .       ncrna   ncrna   wash7p 1:14745:ggc:g   1       14745   ggc     g       0.1462  .       ncrna   ncrna   wash7p 1:14905:ga:g    1       14905   ga      g       0.1307  .       ncrna   ncrna   wash7p :::::::::::::: infile3.tsv :::::::::::::: var     chrom   pos     ref     alt     ac      af      pu      rfg     gi 21:10862612:g:a 21      10862612 g            3       0.00            intergenic      none(dist=none),none(dist=none),tekt4p2(dist=894019),tpte(dist=44131),ak311573(dist=265170),tpte(dist=44131),ensg00000169861 21:10862618:t:c 21      10862618 t      c       14183   0.65            intergenic      none(dist=none),none(dist=none),tekt4p2(dist=894025),tpte(dist=44125),ak311573(dist=265176),tpte(dist=44125),ensg00000169861 21:10862623:t:c 21      10862623 t      c       1       0.00            intergenic      none(dist=none),none(dist=none),tekt4p2(dist=894030),tpte(dist=44120),ak311573(dist=265181),tpte(dist=44120),ensg00000169861:enst0000030209 

and here python script

import csv import pandas import glob glob import iglob  inpath = '*.tsv' outfile ="merged.out"  merged = pandas.concat([pandas.read_csv(f, sep='\t', parse_dates=false) f in glob.iglob(inpath)], axis=0)  dfcols = merged.columns.tolist() firstcols = ['var', 'chrom', 'pos', 'ref', 'alt'] #preserve order of first 5 columns  d in set(dfcols):     if d not in firstcols:         firstcols.append(d)  merged.to_csv(str(outfile), sep="\t", header=true, index=false, na_rep="na", cols=firstcols) 

here of truncated output looks illustrate how headers not match:

> more merged.out  var     chrom   pos     ref     alt     ac      af      iuc     pu      p       rfg     irf     ipu     ign     gi na      na      t       1       na      none(dist=none) .       intergenic      ncrna   0.1383  12892   na      tgg     na      1:12892:tgg:t na      na      c       1       na      wash7p  .       ncrna   ncrna   0.5863  14397   na      ctgt    na      1:14397:ctgt:c 

however, columns looks annotated correctly within python environment? stumped

>>> merged['var'] 0      1:12892:tgg:t 1     1:14397:ctgt:c 2      1:17084:ggt:g 3      1:17421:atg:a 0      1:14567:g:gat 1       1:14670:tg:t 2      1:14745:ggc:g 3       1:14905:ga:g 0    21:10862612:g:a 1    21:10862618:t:c 2    21:10862623:t:c 3    21:10862624:g:t 


Comments

Popular posts from this blog

javascript - jquery or ashx not working -

opencv - DataType<cv::detail::deriv_type>::depth what is it used for -

python 3.x - Mapping specific letters onto a list of words -