multithreading - Designing concurrency in a Python program -
i'm designing large-scale project, , think see way drastically improve performance taking advantage of multiple cores. however, have 0 experience multiprocessing, , i'm little concerned ideas might not ones.
idea
the program video game procedurally generates massive amounts of content. since there's far generate @ once, program instead tries generate needs or before needs it, , expends large amount of effort trying predict need in near future , how near future is. entire program, therefore, built around task scheduler, gets passed function objects bits of metadata attached determine order should processed in , calls them in order.
motivation
it seems ought easy make these functions execute concurrently in own processes. looking @ documentation multiprocessing modules makes me reconsider- there doesn't seem simple way share large data structures between threads. can't imagine intentional.
questions
so suppose fundamental questions need know answers thus:
is there practical way allow multiple threads access same list/dict/etc... both reading , writing @ same time? can launch multiple instances of star generator, give access dict holds stars, , have new objects appear pop existence in dict perspective of other threads (that is, wouldn't have explicitly grab star process made it; i'd pull out of dict if main thread had put there itself).
if not, there practical way allow multiple threads read same data structure @ same time, feed resultant data main thread rolled same data structure safely?
would design work if ensured no 2 concurrent functions tried access same data structure @ same time, either reading or writing?
can data structures inherently shared between processes @ all, or explicitly have send data 1 process processes communicating on tcp stream? know there objects abstract away sort of thing, i'm asking if can done away entirely; have object each thread looking @ same block of memory.
how flexible objects modules provide abstract away communication between processes? can use them drop-in replacement data structures used in existing code , not notice differences? if such thing, cause unmanageable amount of overhead?
sorry naivete, don't have formal computer science education (at least, not yet) , i've never worked concurrent systems before. idea i'm trying implement here remotely practical, or solution allows me transparently execute arbitrary functions concurrently cause overhead i'd better off doing in 1 thread?
example
for maximum clarity, here's example of how imagine system work:
the ui module has been instructed player move view on area of space. informs content management module of this, , asks make sure of stars player can click on generated , ready clicked on.
the content management module checks , sees couple of stars ui saying player potentially try interact have not, in fact, had details show upon click generated yet. produces number of task objects containing methods of stars that, when called, generate necessary data. adds metadata these task objects, assuming (possibly based on further information collected ui module) 0.1 seconds before player tries click anything, , stars icons closest cursor have greatest chance of being clicked on , should therefore requested time sooner stars further cursor. adds these objects scheduler queue.
the scheduler sorts queue how each task needs done, pops first task object off queue, makes new process function contains, , thinks no more process, instead popping task off queue , stuffing process too, next one, next one...
meanwhile, new process executes, stores data generates on star object method of, , terminates when gets return statement.
the ui registers player has indeed clicked on star now, , looks data needs display on star object representative sprite has been clicked. if data there, displays it; if isn't, ui displays message asking player wait , continues repeatedly trying access necessary attributes of star object until succeeds.
even though problem seems complicated, there easy solution. can hide away complicated stuff of sharing objects across processes using proxy.
the basic idea create manager manages objects should shared across processes. manager creates own process waits other process instructs change object. enough said. looks this:
import multiprocessing m manager = m.manager() starsdict = manager.dict() process = process(target=yourfunction, args=(starsdict,)) process.run() the object stored in starsdict not real dict. instead sends changes , requests, it, manager. called "proxy", has same api object mimics. these proxies pickleable, can pass arguments functions in new processes (like shown above) or send them through queues.
you can read more in documentation.
i don't know how proxies react if 2 processes accessing them simultaneously. since they're made parallelism guess should safe, though heard they're not. best if test or in documentation.
Comments
Post a Comment