python - Split value into bins based on time -
i'm working in python modifying new york city subway turnstile data turn visualization of entrance/exits each station.
so far have list of entrance/exit counts based on start (03-24-15
) , end (03-27-15
)dates:
{ 'endtime': '03-25-14t21:40:30', 'entriesduringperiod': 158, 'exitsduringperiod': 597, 'starttime': '03-25-14t17:03:23' }, { 'endtime': '03-26-14t01:00:00', 'entriesduringperiod': 29, 'exitsduringperiod': 235, 'starttime': '03-25-14t21:00:00' },
the problem have different time periods not standardize , overlap. i'd able go through , create list normalizes these numbers 1 hour increments.
i'm not familiar python time processing, , wondering if provide information how started taking strings, converting them date objects, , dividing values based on time.
the final visualization visualized using d3.js if matters.
getting strings datetime objects isn't bad:
from datetime import datetime time import time, mktime, strptime def get_datetime( instr ): return datetime.fromtimestamp(mktime(strptime(instr, '%m-%d-%yt%h:%m:%s'))) # eg: get_datetime( '03-25-14t21:20:30' ) => datetime.datetime(2014, 3, 25, 21, 20, 30)
binning / normalizing data largely depends on how want handle overlapping durations... eg. want assume people arrived & exited in linear fashion, if timestamps hour , half, 66% go full hour , 33% other partial hour?
edit: based on op's comment, here's totally functional code:
from datetime import timedelta collections import defaultdict def add_datum( dd, v ): end_dt = get_datetime(v['endtime']) start_dt = get_datetime(v['starttime']) total_duration = end_dt - start_dt hour_start = datetime( year = start_dt.year, month = start_dt.month, day = start_dt.day, hour = start_dt.hour ) hour_end = hour_start + timedelta( hours = 1 ) while hour_start < end_dt: dt = min([hour_end, end_dt]) - max([ hour_start, start_dt ]) fraction = 1.0 * dt.total_seconds() / total_duration.total_seconds() dd[ hour_start ]['hour'] = hour_start dd[ hour_start ]['entries'] += v['entriesduringperiod'] * fraction dd[ hour_start ]['exits'] += v['exitsduringperiod'] * fraction # exits hour_start = hour_end hour_end = hour_end + timedelta( hours = 1 ) return dd dd = defaultdict(lambda: {'entries':0,'exits':0}) all_data = [{ 'endtime': '03-25-14t21:40:30', 'entriesduringperiod': 158, 'exitsduringperiod': 597, 'starttime': '03-25-14t17:03:23' }, { 'endtime': '03-26-14t01:00:00', 'entriesduringperiod': 29, 'exitsduringperiod': 235, 'starttime': '03-25-14t21:00:00' }] [ add_datum( dd, ) in all_data ] res = dd.values() res.sort( key = lambda i: i['hour'] ) print res # [{'entries': 32.28038732182594, # 'exits': 121.97083057677271, # 'hour': datetime.datetime(2014, 3, 25, 17, 0)}, # {'entries': 34.209418415829674, # 'exits': 129.25963793829314, # 'hour': datetime.datetime(2014, 3, 25, 18, 0)}, # {'entries': 34.209418415829674, # 'exits': 129.25963793829314, # 'hour': datetime.datetime(2014, 3, 25, 19, 0)}, # {'entries': 34.209418415829674, # 'exits': 129.25963793829314, # 'hour': datetime.datetime(2014, 3, 25, 20, 0)}, # {'entries': 30.34135743068503, # 'exits': 146.00025560834786, # 'hour': datetime.datetime(2014, 3, 25, 21, 0)}, # {'entries': 7.25, # 'exits': 58.75, # 'hour': datetime.datetime(2014, 3, 25, 22, 0)}, # {'entries': 7.25, # 'exits': 58.75, # 'hour': datetime.datetime(2014, 3, 25, 23, 0)}, # {'entries': 7.25, # 'exits': 58.75, # 'hour': datetime.datetime(2014, 3, 26, 0, 0)}]
Comments
Post a Comment