python - Data size in memory vs. on disk -
how ram required store data in memory compare disk space required store same data in file? or there no generalized correlation?
for example, have billion floating point values. stored in binary form, that'd 4 billion bytes or 3.7gb on disk (not including headers , such). read values list in python... how ram should expect require?
python object data size
if data stored in python object, there little more data attached actual data in memory.
this may tested.
it interesting note how, @ first, overhead of python object significant small data, becomes negligible.
here ipython code used generate plot
%matplotlib inline import random import sys import array import matplotlib.pyplot plt max_doubles = 10000 raw_size = [] array_size = [] string_size = [] list_size = [] set_size = [] tuple_size = [] size_range = range(max_doubles) # test double size n in size_range: double_array = array.array('d', [random.random() _ in xrange(n)]) double_string = double_array.tostring() double_list = double_array.tolist() double_set = set(double_list) double_tuple = tuple(double_list) raw_size.append(double_array.buffer_info()[1] * double_array.itemsize) array_size.append(sys.getsizeof(double_array)) string_size.append(sys.getsizeof(double_string)) list_size.append(sys.getsizeof(double_list)) set_size.append(sys.getsizeof(double_set)) tuple_size.append(sys.getsizeof(double_tuple)) # display plt.figure(figsize=(10,8)) plt.title('the size of data in various forms', fontsize=20) plt.xlabel('data size (double, 8 bytes)', fontsize=15) plt.ylabel('memory size (bytes)', fontsize=15) plt.loglog( size_range, raw_size, size_range, array_size, size_range, string_size, size_range, list_size, size_range, set_size, size_range, tuple_size ) plt.legend(['raw (disk)', 'array', 'string', 'list', 'set', 'tuple'], fontsize=15, loc='best')
Comments
Post a Comment