python - Split value into bins based on time -


i'm working in python modifying new york city subway turnstile data turn visualization of entrance/exits each station.

so far have list of entrance/exit counts based on start (03-24-15) , end (03-27-15)dates:

{ 'endtime': '03-25-14t21:40:30', 'entriesduringperiod': 158, 'exitsduringperiod': 597, 'starttime': '03-25-14t17:03:23' }, { 'endtime': '03-26-14t01:00:00', 'entriesduringperiod': 29, 'exitsduringperiod': 235, 'starttime': '03-25-14t21:00:00' }, 

the problem have different time periods not standardize , overlap. i'd able go through , create list normalizes these numbers 1 hour increments.

i'm not familiar python time processing, , wondering if provide information how started taking strings, converting them date objects, , dividing values based on time.

the final visualization visualized using d3.js if matters.

getting strings datetime objects isn't bad:

from datetime import datetime time import time, mktime, strptime  def get_datetime( instr ):   return datetime.fromtimestamp(mktime(strptime(instr, '%m-%d-%yt%h:%m:%s')))  # eg: get_datetime( '03-25-14t21:20:30' ) => datetime.datetime(2014, 3, 25, 21, 20, 30) 

binning / normalizing data largely depends on how want handle overlapping durations... eg. want assume people arrived & exited in linear fashion, if timestamps hour , half, 66% go full hour , 33% other partial hour?

edit: based on op's comment, here's totally functional code:

from datetime import timedelta collections import defaultdict  def add_datum( dd, v ):     end_dt = get_datetime(v['endtime'])     start_dt = get_datetime(v['starttime'])     total_duration = end_dt - start_dt       hour_start = datetime( year = start_dt.year,                             month = start_dt.month,                             day = start_dt.day,                             hour = start_dt.hour )     hour_end = hour_start + timedelta( hours = 1 )      while hour_start < end_dt:         dt = min([hour_end, end_dt]) - max([ hour_start, start_dt ])         fraction = 1.0 * dt.total_seconds() / total_duration.total_seconds()         dd[ hour_start ]['hour'] = hour_start         dd[ hour_start ]['entries'] += v['entriesduringperiod'] * fraction         dd[ hour_start ]['exits'] += v['exitsduringperiod'] * fraction # exits          hour_start = hour_end         hour_end = hour_end + timedelta( hours = 1 )     return dd   dd = defaultdict(lambda: {'entries':0,'exits':0}) all_data = [{ 'endtime': '03-25-14t21:40:30',               'entriesduringperiod': 158,               'exitsduringperiod': 597,               'starttime': '03-25-14t17:03:23' },             { 'endtime': '03-26-14t01:00:00',               'entriesduringperiod': 29,               'exitsduringperiod': 235,               'starttime': '03-25-14t21:00:00' }]  [ add_datum( dd, ) in all_data ] res = dd.values() res.sort( key = lambda i: i['hour'] )  print res # [{'entries': 32.28038732182594, #   'exits': 121.97083057677271, #   'hour': datetime.datetime(2014, 3, 25, 17, 0)}, #  {'entries': 34.209418415829674, #   'exits': 129.25963793829314, #   'hour': datetime.datetime(2014, 3, 25, 18, 0)}, #  {'entries': 34.209418415829674, #   'exits': 129.25963793829314, #   'hour': datetime.datetime(2014, 3, 25, 19, 0)}, #  {'entries': 34.209418415829674, #   'exits': 129.25963793829314, #   'hour': datetime.datetime(2014, 3, 25, 20, 0)}, #  {'entries': 30.34135743068503, #   'exits': 146.00025560834786, #   'hour': datetime.datetime(2014, 3, 25, 21, 0)}, #  {'entries': 7.25, #   'exits': 58.75, #   'hour': datetime.datetime(2014, 3, 25, 22, 0)}, #  {'entries': 7.25, #   'exits': 58.75, #   'hour': datetime.datetime(2014, 3, 25, 23, 0)}, #  {'entries': 7.25, #   'exits': 58.75, #   'hour': datetime.datetime(2014, 3, 26, 0, 0)}] 

Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -