python - How to obtain random access of a gzip compressed file -


according this faq on zlib.net possible to:

access data randomly in compressed stream

i know module bio.bgzf of biopyton 1.60, which:

supports reading , writing bgzf files (blocked gnu zip format), variant of gzip efficient random access, commonly used part of bam file format , in tabix. uses python’s zlib library internally, , provides simple interface python’s gzip library.

but use case don't want use format. want something, emulates code below:

import gzip large_integer_new_line_start = 10**9 gzip.open('large_file.gz','rt') f:     f.seek(large_integer_new_line_start) 

but efficiency offered native zlib.net provide random access compressed stream. how leverage random access capability in python?

i gave on doing random access on gzipped file using python. instead converted gzipped file block gzipped file block compression/decompression utility on command line:

zcat large_file.gz | bgzip > large_file.bgz 

then used biopython , tell virtual_offset of line number 1 million of bgzipped file. , able rapidly seek virtual_offset afterwards:

from bio import bgzf  file='large_file.bgz'  handle = bgzf.bgzfreader(file) in range(10**6):     handle.readline() virtual_offset = handle.tell() line1 = handle.readline() handle.close()  handle = bgzf.bgzfreader(file) handle.seek(virtual_offset) line2 = handle.readline() handle.close()  assert line1==line2 

i point so answer mark adler here on examples/zran.c in zlib distribution.


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -