The Environment¶
Quicklinks¶
Environment |
The environment to run a parameter exploration. |
run |
Runs the experiments and explores the parameter space. |
resume |
Resumes crashed trajectories. |
pipeline |
You can make pypet supervise your whole experiment by defining a pipeline. |
trajectory |
The trajectory of the Environment |
Environment¶
Module containing the environment to run experiments.
An Environment
provides an interface to run experiments based on
parameter exploration.
The environment contains and might even create a Trajectory
container which can be filled with parameters and results (see pypet.parameter
).
Instance of this trajectory are distributed to the user’s job function to perform a single run
of an experiment.
An Environment is the handyman for scheduling, it can be used for multiprocessing and takes care of organizational issues like logging.
-
class
pypet.environment.
Environment
(trajectory='trajectory', add_time=False, comment='', dynamic_imports=None, wildcard_functions=None, automatic_storing=True, log_config='DEFAULT', log_stdout=False, report_progress=(5, 'pypet', 20), multiproc=False, ncores=1, use_scoop=False, use_pool=False, freeze_input=False, timeout=None, cpu_cap=100.0, memory_cap=100.0, swap_cap=100.0, niceness=None, wrap_mode='LOCK', queue_maxsize=-1, port=None, gc_interval=None, clean_up_runs=True, immediate_postproc=False, resumable=False, resume_folder=None, delete_resume=True, storage_service=<class 'pypet.storageservice.HDF5StorageService'>, git_repository=None, git_message='', git_fail=False, sumatra_project=None, sumatra_reason='', sumatra_label=None, do_single_runs=True, graceful_exit=False, lazy_debug=False, **kwargs)[source]¶ The environment to run a parameter exploration.
The first thing you usually do is to create and environment object that takes care about the running of the experiment. You can provide the following arguments:
Parameters: - trajectory – String or trajectory instance. If a string is supplied, a novel trajectory is created with that name. Note that the comment and the dynamically imported classes (see below) are only considered if a novel trajectory is created. If you supply a trajectory instance, these fields can be ignored.
- add_time – If True the current time is added to the trajectory name if created new.
- comment – Comment added to the trajectory if a novel trajectory is created.
- dynamic_imports –
Only considered if a new trajectory is created. If you’ve written custom parameters or results that need to be loaded dynamically during runtime, the module containing the class needs to be specified here as a list of classes or strings naming classes and there module paths.
For example: dynamic_imports = [‘pypet.parameter.PickleParameter’, MyCustomParameter]
If you only have a single class to import, you do not need the list brackets: dynamic_imports = ‘pypet.parameter.PickleParameter’
- wildcard_functions –
Dictionary of wildcards like $ and corresponding functions that are called upon finding such a wildcard. For example, to replace the $ aka crun wildcard, you can pass the following:
wildcard_functions = {('$', 'crun'): myfunc}
.Your wildcard function myfunc must return a unique run name as a function of a given integer run index. Moreover, your function must also return a unique dummy name for the run index being -1.
Of course, you can define your own wildcards like wildcard_functions = {(‘$mycard’, ‘mycard’): myfunc)}. These are not required to return a unique name for each run index, but can be used to group runs into buckets by returning the same name for several run indices. Yet, all wildcard functions need to return a dummy name for the index `-1.
- automatic_storing – If True the trajectory will be stored at the end of the simulation and single runs will be stored after their completion. Be aware of data loss if you set this to False and not manually store everything.
- log_config –
Can be path to a logging .ini file specifying the logging configuration. For an example of such a file see Logging. Can also be a dictionary that is accepted by the built-in logging module. Set to None if you don’t want pypet to configure logging.
If not specified, the default settings are used. Moreover, you can manually tweak the default settings without creating a new ini file. Instead of the log_config parameter, pass a
log_folder
, a list of logger_names and corresponding log_levels to fine grain the loggers to which the default settings apply.For example:
log_folder='logs', logger_names='('pypet', 'MyCustomLogger'), log_levels=(logging.ERROR, logging.INFO)
You can further disable multiprocess logging via setting
log_multiproc=False
. - log_stdout –
Whether the output of
stdout
should be recorded into the log files. Disable if only logging statement should be recorded. Note if you work with an interactive console like IPython, it is a good idea to setlog_stdout=False
to avoid messing up the console output.Can also be a tuple: (‘mylogger’, 10), specifying a logger name as well as a log-level. The log-level defines with what level stdout is logged, it is not a filter.
- report_progress –
If progress of runs and an estimate of the remaining time should be shown. Can be True or False or a triple
(10, 'pypet', logging.Info)
where the first number is the percentage and update step of the resulting progressbar and the second one is a corresponding logger name with which the progress should be logged. If you use ‘print’, the print statement is used instead. The third value specifies the logging level (level of logging statement not a filter) with which the progress should be logged.Note that the progress is based on finished runs. If you use the QUEUE wrapping in case of multiprocessing and if storing takes long, the estimate of the remaining time might not be very accurate.
- multiproc –
Whether or not to use multiprocessing. Default is
False
. Besides the wrap_mode (see below) that deals with how storage to disk is carried out in case of multiprocessing, there are two ways to do multiprocessing. By using a fixed pool of processes (choose use_pool=True, default option) or by spawning an individual process for every run and parameter combination (use_pool=False). The former will only spawn not more than ncores processes and all simulation runs are sent over to to the pool one after the other. This requires all your data to be pickled.If your data cannot be pickled (which could be the case for some BRIAN networks, for instance) choose use_pool=False (also make sure to set continuable=False). This will also spawn at most ncores processes at a time, but as soon as a process terminates a new one is spawned with the next parameter combination. Be aware that you will have as many logfiles in your logfolder as processes were spawned. If your simulation returns results besides storing results directly into the trajectory, these returned results still need to be pickled.
- ncores – If multiproc is
True
, this specifies the number of processes that will be spawned to run your experiment. Note if you use QUEUE mode (see below) the queue process is not included in this number and will add another extra process for storing. If you have psutil installed, you can set ncores=0 to let psutil determine the number of CPUs available. - use_scoop –
If python should be used in a SCOOP framework to distribute runs amond a cluster or multiple servers. If so you need to start your script via
python -m scoop my_script.py
. Currently, SCOOP only works with'LOCAL'
wrap_mode
(see below). - use_pool – Whether to use a fixed pool of processes or whether to spawn a new process
for every run. Use the former if you perform many runs (50k and more)
which are in terms of memory and runtime inexpensive.
Be aware that everything you use must be picklable.
Use the latter for fewer runs (50k and less) and which are longer lasting
and more expensive runs (in terms of memory consumption).
In case your operating system allows forking, your data does not need to be
picklable.
If you choose
use_pool=False
you can also make use of the cap values, see below. - freeze_input – Can be set to
True
if the run function as well as all additional arguments are immutable. This will prevent the trajectory from getting pickled again and again. Thus, the run function, the trajectory, as well as all arguments are passed to the pool or SCOOP workers at initialisation. Works also under run_map. In this case the iterable arguments are, of course, not frozen but passed for every run. - timeout – Timeout parameter in seconds passed on to SCOOP and
'NETLOCK'
wrapping. Leave None for no timeout. After timeout seconds SCOOP will assume that a single run failed and skip waiting for it. Moreover, if using'NETLOCK'
wrapping, after timeout seconds a lock is automatically released and again available for other waiting processes. - cpu_cap –
If multiproc=True and use_pool=False you can specify a maximum cpu utilization between 0.0 (excluded) and 100.0 (included) as fraction of maximum capacity. If the current cpu usage is above the specified level (averaged across all cores), pypet will not spawn a new process and wait until activity falls below the threshold again. Note that in order to avoid dead-lock at least one process will always be running regardless of the current utilization. If the threshold is crossed a warning will be issued. The warning won’t be repeated as long as the threshold remains crossed.
For example cpu_cap=70.0, ncores=3, and currently on average 80 percent of your cpu are used. Moreover, let’s assume that at the moment only 2 processes are computing single runs simultaneously. Due to the usage of 80 percent of your cpu, pypet will wait until cpu usage drops below (or equal to) 70 percent again until it starts a third process to carry out another single run.
The parameters memory_cap and swap_cap are analogous. These three thresholds are combined to determine whether a new process can be spawned. Accordingly, if only one of these thresholds is crossed, no new processes will be spawned.
To disable the cap limits simply set all three values to 100.0.
You need the psutil package to use this cap feature. If not installed and you choose cap values different from 100.0 a ValueError is thrown.
- memory_cap – Cap value of RAM usage. If more RAM than the threshold is currently in use, no new
processes are spawned. Can also be a tuple
(limit, memory_per_process)
, first value is the cap value (between 0.0 and 100.0), second one is the estimated memory per process in mega bytes (MB). If an estimate is given a new process is not started if the threshold would be crossed including the estimate. - swap_cap – Analogous to cpu_cap but the swap memory is considered.
- niceness – If you are running on a UNIX based system or you have psutil (under Windows) installed,
you can choose a niceness value to prioritize the child processes executing the
single runs in case you use multiprocessing.
Under Linux these usually range from 0 (highest priority)
to 19 (lowest priority). For Windows values check the psutil homepage.
Leave
None
if you don’t care about niceness. Under Linux the niceness` value is a minimum value, if the OS decides to nice your program (maybe you are running on a server) pypet does not try to decrease the niceness again. - wrap_mode – If multiproc is
True
, specifies how storage to disk is handled via the storage service.There are a few options:
WRAP_MODE_QUEUE
: (‘QUEUE’)Another process for storing the trajectory is spawned. The sub processes running the individual single runs will add their results to a multiprocessing queue that is handled by an additional process. Note that this requires additional memory since the trajectory will be pickled and send over the queue for storage!WRAP_MODE_LOCK
: (‘LOCK’)Each individual process takes care about storage by itself. Before carrying out the storage, a lock is placed to prevent the other processes to store data. Accordingly, sometimes this leads to a lot of processes waiting until the lock is released. Allows loading of data during runs.WRAP_MODE_PIPE
: (‘PIPE)Experimental mode based on a single pipe. Is faster than'QUEUE'
wrapping but data corruption may occur, does not work under Windows (since it relies on forking).WRAP_MODE_LOCAL
(‘LOCAL’)Data is not stored during the single runs but after they completed. Storing is only performed in the main process.Note that removing data during a single run has no longer an effect on memory whatsoever, because there are references kept for all data that is supposed to be stored.
WRAP_MODE_NETLOCK
(‘NETLOCK’)Similar to ‘LOCK’ but locks can be shared across a network. Sharing is established by running a lock server that distributes locks to the individual processes. Can be used with SCOOP if all hosts have access to a shared home directory. Allows loading of data during runs.WRAP_MODE_NETQUEUE
(‘NETQUEUE’)Similar to ‘QUEUE’ but data can be shared across a network. Sharing is established by running a queue server that distributes locks to the individual processes.If you don’t want wrapping at all use
WRAP_MODE_NONE
(‘NONE’) - queue_maxsize – Maximum size of the Storage Queue, in case of
'QUEUE'
wrapping.0
means infinite,-1
(default) means the educated guess of2 * ncores
. - port – Port to be used by lock server in case of
'NETLOCK'
wrapping. Can be a single integer as well as a tuple(7777, 9999)
to specify a range of ports from which to pick a random one. Leave None for using pyzmq’s default range. In case automatic determining of the host’s IP address fails, you can also pass the full address (including the protocol and the port) of the host in the network like'tcp://127.0.0.1:7777'
. - gc_interval –
Interval (in runs or storage operations) with which
gc.collect()
should be called in case of the'LOCAL'
,'QUEUE'
, or'PIPE'
wrapping. LeaveNone
for never.In case of
'LOCAL'
wrapping1
means after every run2
after every second run, and so on. In case of'QUEUE'
or'PIPE''
wrapping1
means after every store operation,2
after every second store operation, and so on. Only callsgc.collect()
in the main (if'LOCAL'
wrapping) or the queue/pipe process. If you need to garbage collect data within your single runs, you need to manually callgc.collect()
.Usually, there is no need to set this parameter since the Python garbage collection works quite nicely and schedules collection automatically.
- clean_up_runs –
In case of single core processing, whether all results under groups named run_XXXXXXXX should be removed after the completion of the run. Note in case of multiprocessing this happens anyway since the single run container will be destroyed after finishing of the process.
Moreover, if set to
True
after post-processing it is checked if there is still data under run_XXXXXXXX and this data is removed if the trajectory is expanded. - immediate_postproc –
If you use post- and multiprocessing, you can immediately start analysing the data as soon as the trajectory runs out of tasks, i.e. is fully explored but the final runs are not completed. Thus, while executing the last batch of parameter space points, you can already analyse the finished runs. This is especially helpful if you perform some sort of adaptive search within the parameter space.
The difference to normal post-processing is that you do not have to wait until all single runs are finished, but your analysis already starts while there are still runs being executed. This can be a huge time saver especially if your simulation time differs a lot between individual runs. Accordingly, you don’t have to wait for a very long run to finish to start post-processing.
In case you use immediate postprocessing, the storage service of your trajectory is still multiprocessing safe (except when using the wrap_mode
'LOCAL'
). Accordingly, you could even use multiprocessing in your immediate post-processing phase if you dare, like use a multiprocessing pool, for instance.Note that after the execution of the final run, your post-processing routine will be called again as usual.
IMPORTANT: If you use immediate post-processing, the results that are passed to your post-processing function are not sorted by their run indices but by finishing time!
- resumable –
Whether the environment should take special care to allow to resume or continue crashed trajectories. Default is
False
.You need to install dill to use this feature. dill will make snapshots of your simulation function as well as the passed arguments. BE AWARE that dill is still rather experimental!
Assume you run experiments that take a lot of time. If during your experiments there is a power failure, you can resume your trajectory after the last single run that was still successfully stored via your storage service.
The environment will create several .ecnt and .rcnt files in a folder that you specify (see below). Using this data you can resume crashed trajectories.
In order to resume trajectories use
resume()
.Be aware that your individual single runs must be completely independent of one another to allow continuing to work. Thus, they should NOT be based on shared data that is manipulated during runtime (like a multiprocessing manager list) in the positional and keyword arguments passed to the run function.
If you use post-processing, the expansion of trajectories and continuing of trajectories is NOT supported properly. There is no guarantee that both work together.
- resume_folder – The folder where the resume files will be placed. Note that pypet will create a sub-folder with the name of the environment.
- delete_resume – If true, pypet will delete the resume files after a successful simulation.
- storage_service – Pass a given storage service or a class constructor (default
HDF5StorageService
) if you want the environment to create the service for you. The environment will pass the additional keyword arguments you pass directly to the constructor. If the trajectory already has a service attached, the one from the trajectory will be used. - git_repository –
If your code base is under git version control you can specify here the path (relative or absolute) to the folder containing the .git directory as a string. Note in order to use this tool you need GitPython.
If you set this path the environment will trigger a commit of your code base adding all files that are currently under version control. Similar to calling git add -u and git commit -m ‘My Message’ on the command line. The user can specify the commit message, see below. Note that the message will be augmented by the name and the comment of the trajectory. A commit will only be triggered if there are changes detected within your working copy.
This will also add information about the revision to the trajectory, see below.
- git_message – Message passed onto git command. Only relevant if a new commit is triggered. If no changes are detected, the information about the previous commit and the previous commit message are added to the trajectory and this user passed message is discarded.
- git_fail – If True the program fails instead of triggering a commit if there are not committed changes found in the code base. In such a case a GitDiffError is raised.
- sumatra_project –
If your simulation is managed by sumatra, you can specify here the path to the sumatra root folder. Note that you have to initialise the sumatra project at least once before via
smt init MyFancyProjectName
.pypet will automatically ad ALL parameters to the sumatra record. If a parameter is explored, the WHOLE range is added instead of the default value.
pypet will add the label and reason (only if provided, see below) to your trajectory as config parameters.
- sumatra_reason –
You can add an additional reason string that is added to the sumatra record. Regardless if sumatra_reason is empty, the name of the trajectory, the comment as well as a list of all explored parameters is added to the sumatra record.
Note that the augmented label is not stored into the trajectory as config parameter, but the original one (without the name of the trajectory, the comment, and the list of explored parameters) in case it is not the empty string.
- sumatra_label – The label or name of your sumatra record. Set to None if you want sumatra to choose a label in form of a timestamp for you.
- do_single_runs – Whether you intend to actually to compute single runs with the trajectory.
If you do not intend to do single runs, than set to
False
and the environment won’t add config information like number of processors to the trajectory. - graceful_exit – If
True
hitting CTRL+C (i.e.sending SIGINT) will not terminate the program immediately. Instead, active single runs will be finished and stored before shutdown. Hitting CTRL+C twice will raise a KeyboardInterrupt as usual. - lazy_debug – If
lazy_debug=True
and in case you debug your code (aka you use pydevd and the expression'pydevd' in sys.modules
isTrue
), the environment will use theLazyStorageService
instead of the HDF5 one. Accordingly, no files are created and your trajectory and results are not saved. This allows faster debugging and prevents pypet from blowing up your hard drive with trajectories that you probably not want to use anyway since you just debug your code.
The Environment will automatically add some config settings to your trajectory. Thus, you can always look up how your trajectory was run. This encompasses most of the above named parameters as well as some information about the environment. This additional information includes a timestamp as well as a SHA-1 hash code that uniquely identifies your environment. If you use git integration, the SHA-1 hash code will be the one from your git commit. Otherwise the code will be calculated from the trajectory name, the current time, and your current pypet version.
The environment will be named environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. The first seven X are the first seven characters of the SHA-1 hash code followed by a human readable timestamp.
All information about the environment can be found in your trajectory under config.environment.environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. Your trajectory could potentially be run by several environments due to merging or extending an existing trajectory. Thus, you will be able to track how your trajectory was built over time.
Git information is added to your trajectory as follows:
git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.hexsha
The SHA-1 hash of the commit. commit_XXXXXXX_XXXX_XX_XX_XXhXXmXXs is mapped to the first seven items of the SHA-1 hash and the formatted data of the commit, e.g. commit_7ef7hd4_2015_10_21_16h29m00s.
git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.name_rev
String describing the commits hexsha based on the closest reference
git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.committed_date
Commit date as Unix Epoch data
git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.message
The commit message
Moreover, if you use the standard
HDF5StorageService
you can pass the following keyword arguments in**kwargs
:Parameters: - filename – The name of the hdf5 file. If none is specified the default ./hdf5/the_name_of_your_trajectory.hdf5 is chosen. If filename contains only a path like filename=’./myfolder/’, it is changed to `filename=’./myfolder/the_name_of_your_trajectory.hdf5’.
- file_title – Title of the hdf5 file (only important if file is created new)
- overwrite_file – If the file already exists it will be overwritten. Otherwise, the trajectory will simply be added to the file and already existing trajectories are not deleted.
- encoding – Format to encode and decode unicode strings stored to disk.
The default
'utf8'
is highly recommended. - complevel –
You can specify your compression level. 0 means no compression and 9 is the highest compression level. See PyTables Compression for a detailed description.
- complib – The library used for compression. Choose between zlib, blosc, and lzo. Note that ‘blosc’ and ‘lzo’ are usually faster than ‘zlib’ but it may be the case that you can no longer open your hdf5 files with third-party applications that do not rely on PyTables.
- shuffle – Whether or not to use the shuffle filters in the HDF5 library. This normally improves the compression ratio.
- fletcher32 – Whether or not to use the Fletcher32 filter in the HDF5 library. This is used to add a checksum on hdf5 data.
- pandas_format – How to store pandas data frames. Either in ‘fixed’ (‘f’) or ‘table’ (‘t’) format. Fixed format allows fast reading and writing but disables querying the hdf5 data and appending to the store (with other 3rd party software other than pypet).
- purge_duplicate_comments –
If you add a result via
f_add_result()
or a derived parameterf_add_derived_parameter()
and you set a comment, normally that comment would be attached to each and every instance. This can produce a lot of unnecessary overhead if the comment is the same for every instance over all runs. If purge_duplicate_comments=1 than only the comment of the first result or derived parameter instance created in a run is stored or comments that differ from this first comment.For instance, during a single run you call traj.f_add_result(‘my_result,42, comment=’Mostly harmless!’)` and the result will be renamed to results.run_00000000.my_result. After storage in the node associated with this result in your hdf5 file, you will find the comment ‘Mostly harmless!’ there. If you call traj.f_add_result(‘my_result’,-43, comment=’Mostly harmless!’) in another run again, let’s say run 00000001, the name will be mapped to results.run_00000001.my_result. But this time the comment will not be saved to disk since ‘Mostly harmless!’ is already part of the very first result with the name ‘results.run_00000000.my_result’. Note that the comments will be compared and storage will only be discarded if the strings are exactly the same.
If you use multiprocessing, the storage service will take care that the comment for the result or derived parameter with the lowest run index will be considered regardless of the order of the finishing of your runs. Note that this only works properly if all comments are the same. Otherwise the comment in the overview table might not be the one with the lowest run index.
You need summary tables (see below) to be able to purge duplicate comments.
This feature only works for comments in leaf nodes (aka Results and Parameters). So try to avoid to add comments in group nodes within single runs.
- summary_tables –
Whether the summary tables should be created, i.e. the ‘derived_parameters_runs_summary’, and the results_runs_summary.
The ‘XXXXXX_summary’ tables give a summary about all results or derived parameters. It is assumed that results and derived parameters with equal names in individual runs are similar and only the first result or derived parameter that was created is shown as an example.
The summary table can be used in combination with purge_duplicate_comments to only store a single comment for every result with the same name in each run, see above.
- small_overview_tables –
Whether the small overview tables should be created. Small tables are giving overview about ‘config’,’parameters’, ‘derived_parameters_trajectory’, , ‘results_trajectory’, ‘results_runs_summary’.
Note that these tables create some overhead. If you want very small hdf5 files set small_overview_tables to False.
- large_overview_tables – Whether to add large overview tables. This encompasses information about every derived
parameter, result, and the explored parameter in every single run.
If you want small hdf5 files set to
False
(default). - results_per_run –
Expected results you store per run. If you give a good/correct estimate storage to hdf5 file is much faster in case you store LARGE overview tables.
Default is 0, i.e. the number of results is not estimated!
- derived_parameters_per_run – Analogous to the above.
Finally, you can also pass properties of the trajectory, like
v_with_links=True
(you can leave the prefixv_
, i.e.with_links
works, too). Thus, you can change the settings of the trajectory immediately.-
add_postprocessing
(postproc, *args, **kwargs)[source]¶ Adds a post processing function.
The environment will call this function via
postproc(traj, result_list, *args, **kwargs)
after the completion of the single runs.This function can load parts of the trajectory id needed and add additional results.
Moreover, the function can be used to trigger an expansion of the trajectory. This can be useful if the user has an optimization task.
Either the function calls f_expand directly on the trajectory or returns an dictionary. If latter f_expand is called by the environment.
Note that after expansion of the trajectory, the post-processing function is called again (and again for further expansions). Thus, this allows an iterative approach to parameter exploration.
Note that in case post-processing is called after all runs have been executed, the storage service of the trajectory is no longer multiprocessing safe. If you want to use multiprocessing in your post-processing you can still manually wrap the storage service with the
MultiprocessWrapper
.In case you use immediate postprocessing, the storage service of your trajectory is still multiprocessing safe (except when using the wrap_mode
'LOCAL'
). Accordingly, you could even use multiprocessing in your immediate post-processing phase if you dare, like use a multiprocessing pool, for instance.You can easily check in your post-processing function if the storage service is multiprocessing safe via the
multiproc_safe
attribute, i.e.traj.v_storage_service.multiproc_safe
.Parameters: - postproc – The post processing function
- args – Additional arguments passed to the post-processing function
- kwargs – Additional keyword arguments passed to the postprocessing function
Returns:
-
current_idx
¶ The current run index that is the next one to be executed.
Can be set manually to make the environment consider old non-completed ones.
-
disable_logging
(remove_all_handlers=True)[source]¶ Removes all logging handlers and stops logging to files and logging stdout.
Parameters: remove_all_handlers – If True all logging handlers are removed. If you want to keep the handlers set to False.
-
hexsha
¶ The SHA1 identifier of the environment.
It is identical to the SHA1 of the git commit. If version control is not used, the environment hash is computed from the trajectory name, the current timestamp and your current pypet version.
-
name
¶ Name of the Environment
-
pipeline
(pipeline)[source]¶ You can make pypet supervise your whole experiment by defining a pipeline.
pipeline is a function that defines the entire experiment. From pre-processing including setting up the trajectory over defining the actual simulation runs to post processing.
The pipeline function needs to return TWO tuples with a maximum of three entries each.
For example:
return (runfunc, args, kwargs), (postproc, postproc_args, postproc_kwargs)
Where runfunc is the actual simulation function thet gets passed the trajectory container and potentially additional arguments args and keyword arguments kwargs. This will be run by your environment with all parameter combinations.
postproc is a post processing function that handles your computed results. The function must accept as arguments the trajectory container, a list of results (list of tuples (run idx, result) ) and potentially additional arguments postproc_args and keyword arguments postproc_kwargs.
As for
f_add_postproc()
, this function can potentially extend the trajectory.If you don’t want to apply post-processing, your pipeline function can also simply return the run function and the arguments:
return runfunc, args, kwargs
Or
return runfunc, args
Or
return runfunc
return runfunc, kwargs
does NOT work, if you don’t want to pass args doreturn runfunc, (), kwargs
.Analogously combinations like
return (runfunc, args), (postproc,)
work as well.
Parameters: pipeline – The pipleine function, taking only a single argument traj. And returning all functions necessary for your experiment. Returns: List of the individual results returned by runfunc. Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. In case of multiprocessing these are not necessarily ordered according to their run index, but ordered according to their finishing time.
Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling
f_update_skeleton()
and loading all results at once withf_load()
or loading manually withf_load_items()
.Even if you use multiprocessing without a pool the results returned by runfunc still need to be pickled.
Results computed from postproc are not returned. postproc should not return any results except dictionaries if the trajectory should be expanded.
-
resume
(trajectory_name=None, resume_folder=None)[source]¶ Resumes crashed trajectories.
Parameters: - trajectory_name – Name of trajectory to resume, if not specified the name passed to the environment is used. Be aware that if add_time=True the name you passed to the environment is altered and the current date is added.
- resume_folder – The folder where resume files can be found. Do not pass the name of the sub-folder with the trajectory name, but to the name of the parental folder. If not specified the resume folder passed to the environment is used.
Returns: List of the individual results returned by your run function.
Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. In case of multiprocessing these are not necessarily ordered according to their run index, but ordered according to their finishing time.
Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with
f_load()
or loading manually withf_load_items()
.Even if you use multiprocessing without a pool the results returned by runfunc still need to be pickled.
-
run
(runfunc, *args, **kwargs)[source]¶ Runs the experiments and explores the parameter space.
Parameters: - runfunc – The task or job to do
- args – Additional arguments (not the ones in the trajectory) passed to runfunc
- kwargs – Additional keyword arguments (not the ones in the trajectory) passed to runfunc
Returns: List of the individual results returned by runfunc.
Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. They are always ordered according to the run index.
Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with
f_load()
or loading manually withf_load_items()
.If you use multiprocessing without a pool the results returned by runfunc still need to be pickled.
-
run_map
(runfunc, *iter_args, **iter_kwargs)[source]¶ Calls runfunc with different args and kwargs each time.
Similar to :func:`~pypet.environment.Environment.run but all
iter_args
anditer_kwargs
need to be iterables, iterators, or generators that return new arguments for each run.
-
time
¶ Time of the creation of the environment, human readable.
-
timestamp
¶ Time of creation as python datetime float
-
traj
¶ Equivalent to env.trajectory
-
trajectory
¶ The trajectory of the Environment
MultiprocContext¶
-
class
pypet.environment.
MultiprocContext
(trajectory, wrap_mode='LOCK', full_copy=None, manager=None, use_manager=True, lock=None, queue=None, queue_maxsize=0, port=None, timeout=None, gc_interval=None, log_config=None, log_stdout=False, graceful_exit=False)[source]¶ A lightweight environment that allows the usage of multiprocessing.
Can be used if you don’t want a full-blown
Environment
to enable multiprocessing or if you want to implement your own custom multiprocessing.This Wrapper tool will take a trajectory container and take care that the storage service is multiprocessing safe. Supports the
'LOCK'
as well as the'QUEUE'
mode. In case of the latter an extra queue process is created if desired. This process will handle all storage requests and write data to the hdf5 file.Not that in case of
'QUEUE'
wrapping data can only be stored not loaded, because the queue will only be read in one direction.Parameters: - trajectory – The trajectory which storage service should be wrapped
- wrap_mode –
There are four options:
WRAP_MODE_QUEUE
: (‘QUEUE’)If desired another process for storing the trajectory is spawned. The sub processes running the individual trajectories will add their results to a multiprocessing queue that is handled by an additional process. Note that this requires additional memory since data will be pickled and send over the queue for storage!WRAP_MODE_LOCK
: (‘LOCK’)Each individual process takes care about storage by itself. Before carrying out the storage, a lock is placed to prevent the other processes to store data. Accordingly, sometimes this leads to a lot of processes waiting until the lock is released. Yet, data does not need to be pickled before storage!WRAP_MODE_PIPE
: (‘PIPE)Experimental mode based on a single pipe. Is faster than'QUEUE'
wrapping but data corruption may occur, does not work under Windows (since it relies on forking).WRAP_MODE_LOCAL
(‘LOCAL’)Data is not stored in spawned child processes, but data needs to be retunred manually in terms of references dictionaries (thereference
property of theReferenceWrapper
class).. Storing is only performed in the main process.Note that removing data during a single run has no longer an effect on memory whatsoever, because there are references kept for all data that is supposed to be stored.
- full_copy –
In case the trajectory gets pickled (sending over a queue or a pool of processors) if the full trajectory should be copied each time (i.e. all parameter points) or only a particular point. A particular point can be chosen beforehand with
f_set_crun()
.Leave
full_copy=None
if the setting from the passed trajectory should be used. Otherwisev_full_copy
of the trajectory is changed to your chosen value. - manager – You can pass an optional multiprocessing manager here,
if you already have instantiated one.
Leave
None
if you want the wrapper to create one. - use_manager –
If your lock and queue should be created with a manager or if wrapping should be created from the multiprocessing module directly.
For example:
multiprocessing.Lock()
or via a managermultiprocessing.Manager().Lock()
(if you specified a manager, this manager will be used).The former is usually faster whereas the latter is more flexible and can be used in an environment where fork is not available, for instance.
- lock – You can pass a multiprocessing lock here, if you already have instantiated one.
Leave
None
if you want the wrapper to create one in case of'LOCK'
wrapping. - queue – You can pass a multiprocessing queue here, if you already instantiated one.
Leave
None
if you want the wrapper to create one in case of ‘’QUEUE’` wrapping. - queue_maxsize – Maximum size of queue if created new. 0 means infinite.
- port – Port to be used by lock server in case of
'NETLOCK'
wrapping. Can be a single integer as well as a tuple(7777, 9999)
to specify a range of ports from which to pick a random one. Leave None for using pyzmq’s default range. In case automatic determining of the host’s ip address fails, you can also pass the full address (including the protocol and the port) of the host in the network like'tcp://127.0.0.1:7777'
. - timeout – Timeout for a NETLOCK wrapping in seconds. After
timeout
seconds a lock is automatically released and free for other processes. - gc_interval –
Interval (in runs or storage operations) with which
gc.collect()
should be called in case of the'LOCAL'
,'QUEUE'
, or'PIPE'
wrapping. LeaveNone
for never.1
means after every storing,2
after every second storing, and so on. Only callsgc.collect()
in the main (if'LOCAL'
wrapping) or the queue/pipe process. If you need to garbage collect data within your single runs, you need to manually callgc.collect()
.Usually, there is no need to set this parameter since the Python garbage collection works quite nicely and schedules collection automatically.
- log_config – Path to logging config file or dictionary to configure logging for the spawned queue process. Thus, only considered if the queue wrap mode is chosen.
- log_stdout – If stdout of the queue process should also be logged.
- graceful_exit – Hitting Ctrl+C won’t kill a server process unless hit twice.
For an usage example see Lightweight Multiprocessing.
-
finalize
()[source]¶ Restores the original storage service.
If a queue process and a manager were used both are shut down.
Automatically called when used as context manager.
-
start
()[source]¶ Starts the multiprocess wrapping.
Automatically called when used as context manager.
-
store_references
(references)[source]¶ In case of reference wrapping, stores data.
Parameters: - references – References dictionary from a ReferenceWrapper.
- gc_collect – If
gc.collect
should be called. - n – Alternatively if
gc_interval
is set, a current index can be passed. Data is stored in casen % gc_interval == 0
.