Helpers

Helper modules.

These should be stand alone modules that could reasonably be their own PyPI package. This comes with two benefits:

  1. The library is void of any business data, which makes it easier to understand.

  2. It means that it is decoupled making it easy to reuse the code in different sections of the code. An example is the stack_exchange_graph_data.helpers.progress module. Which is easily used in both stack_exchange_graph_data.helpers.curl.curl() and stack_exchange_graph_data.driver.load_xml_stream(). Since it wraps a stream it’s easily transferable to any Python loop, and due to lacking business logic means there’s no monkey patching.

Cache

Simple file cache.

Exposes two forms of cache:

  1. A file that is downloaded from a website.

  2. A 7z archive cache - files that are extracted from a 7z archive.

class stack_exchange_graph_data.helpers.cache.Archive7zCache(cache_path: pathlib.Path, archive_cache: stack_exchange_graph_data.helpers.cache.CacheMethod)

Exposes a cache that allows unzipping 7z archives.

ensure(use_cache: bool = True) → pathlib.Path

Ensure target file exists.

Unzips the 7z archive showing the name and size of each file being extracted.

Parameters

use_cache – Set to false to force reunarchiving of the data.

Returns

Location of file.

class stack_exchange_graph_data.helpers.cache.Cache(cache_dir: pathlib.Path)

Interface to make cache instances.

archive_7z(cache_path: pathlib.Path, archive_cache: stack_exchange_graph_data.helpers.cache.CacheMethod) → stack_exchange_graph_data.helpers.cache.Archive7zCache

Get an archive cache endpoint.

Parameters
  • cache_path – Location of file relative to the cache directory.

  • archive_cache – A cache endpoint to get the 7z archive from.

Returns

An archive cache endpoint.

file(cache_path: str, url: str) → stack_exchange_graph_data.helpers.cache.FileCache

Get a file cache endpoint.

Parameters
  • cache_path – Location of file relative to the cache directory.

  • url – URL location of the file to download from if not cached.

Returns

A file cache endpoint.

class stack_exchange_graph_data.helpers.cache.CacheMethod(cache_path: pathlib.Path)

Base cache object.

_is_cached(use_cache: bool) → bool

Check if the target exist in the cache.

Parameters

use_cache – Set to false to force redownload the data.

Returns

True if we should use the cache.

ensure(use_cache: bool = True) → pathlib.Path

Ensure target file exists.

This should be overwritten in child classes.

Parameters

use_cache – Set to false to force redownload the data.

Returns

Location of file.

class stack_exchange_graph_data.helpers.cache.FileCache(cache_path: pathlib.Path, url: str)

Exposes a cache that allows downloading files.

ensure(use_cache: bool = True) → pathlib.Path

Ensure target file exists.

This curls the file from the web to cache, providing a progress bar whilst downloading.

Parameters

use_cache – Set to false to force redownload the data.

Returns

Location of file.

Coroutines

Coroutine helpers.

A lot of this module is based on the assumption that Python doesn’t seamlessly handle the destruction of coroutines when using multiplexing or broadcasting. It also helps ease interactions when coroutines enter closed states prematurely.

class stack_exchange_graph_data.helpers.coroutines.CoroutineDelegator

Helper class for delegating to coroutines.

_increment_coroutine_refs() → None

Increment the amount of sources for the coroutines.

run() → List[Iterator]

Send all data into the coroutine control flow.

Returns

If a coroutine is closed prematurely the data that hasn’t been entered into the control flow will be returned. Otherwise an empty list is.

send_to(source: Union[Iterator, Iterable], target: Generator) → None

Add a source and target to send data to.

This does not send any data into the target, to do that use the CoroutineDelegator.run() function.

Parameters
  • source – Input data, can be any iterable. Each is passed straight unaltered to target.

  • target – This is the coroutine the data enters into to get into the coroutine control flow.

stack_exchange_graph_data.helpers.coroutines._is_magic_coroutine(target: Any) → bool

Check if target is a magic coroutine.

Parameters

target – An object to check against.

Returns

If the object is a magic coroutine.

stack_exchange_graph_data.helpers.coroutines.broadcast(*targets: Generator) → Generator

Broadcast items to targets.

stack_exchange_graph_data.helpers.coroutines.coroutine(function: Callable) → Callable

Wrap a coroutine generating function to make magic coroutines.

A magic coroutine is wrapped in a protective coroutine that eases the destruction of coroutine pipelines. This is because the coroutine is wrapped in a ‘bubble’ that:

  1. Primes the coroutine when the first element of data is passed to it.

  2. Sends information about the creation and destruction of other coroutines in the pipeline. This allows a coroutine to destroy itself when all providers have exited.

  3. Handles when a coroutine is being prematurely closed, if this is the case all target coroutines will be notified that some data sources are no longer available allowing them to deallocate themselves if needed.

  4. Handles situations where a target coroutine has been prematurely closed. In such a situation the current coroutine will be closed and exit with a StopIteration error, as if the coroutine has been closed with the .close.

It should be noted that these coroutine pipelines should be started via the stack_exchange_graph_data.helpers.coroutines.CoroutineDelegator. This is as it correctly initializes the entry coroutine, and handles when the coroutine has been prematurely closed.

Parameters

function – Standard coroutine generator function.

Returns

Function that generates magic coroutines.

stack_exchange_graph_data.helpers.coroutines.file_sink(*args: Any, **kwargs: Any) → Generator

Send all data to a file.

stack_exchange_graph_data.helpers.coroutines.primed_coroutine(function: Callable[[...], Generator]) → Callable

Primes a coroutine at creation.

Parameters

function – A coroutine function.

Returns

The coroutine function wrapped to prime the coroutine at creation.

Curl

Copy URL.

stack_exchange_graph_data.helpers.curl.curl(path: pathlib.Path, *args: Any, **kwargs: Any) → None

Download file to system.

Provides a progress bar of the file being downloaded and some statistics around the file and download.

Parameters
  • path – Local path to save the file to.

  • args&kwargs – Passed to request.get.

Progress

Display progress of a stream.

class stack_exchange_graph_data.helpers.progress.BaseProgressStream(stream: Iterator[T], size: Optional[int], si: Callable[[int], Tuple[int, str]], progress: Callable[[T], int], width: int = 20, prefix: str = '', start: int = 0, message: Optional[str] = None)

Display the progress of a stream.

_get_progress(current: int) → str

Get the progress of the stream.

Parameters

current – Current progress - not in percentage.

Returns

Progress bar and file size.

class stack_exchange_graph_data.helpers.progress.DataProgressStream(stream: Iterator[T], size: Optional[int], width: int = 20, prefix: str = '', message: Optional[str] = None)

Display progress of a data stream.

class stack_exchange_graph_data.helpers.progress.ItemProgressStream(stream: Iterator[T], size: Optional[int], width: int = 20, prefix: str = '', message: Optional[str] = None)

Display progress of an item stream.

SI

Simplify a number to a wanted base.

class stack_exchange_graph_data.helpers.si.Magnitude

Magnitude conversions.

byte() → Tuple[int, str]

Convert a number to a truncated base form.

Parameters

value – Value to adjust.

Returns

Truncated value and unit.

ibyte() → Tuple[int, str]

Convert a number to a truncated base form.

Parameters

value – Value to adjust.

Returns

Truncated value and unit.

number() → Tuple[int, str]

Convert a number to a truncated base form.

Parameters

value – Value to adjust.

Returns

Truncated value and unit.

stack_exchange_graph_data.helpers.si.display(values: Tuple[int, str], decimal_places: int = 2) → str

Display a truncated number to a wanted DP.

Parameters
  • values – Value and unit to display.

  • decimal_places – Amount of decimal places to display the value to.

Returns

Right aligned display value.

stack_exchange_graph_data.helpers.si.si_magnitude(base: int, suffix: str, prefixes: str) → Callable[[int], Tuple[int, str]]

SI base converter builder.

Parameters
  • base – Base to truncate values to.

  • suffix – Suffix used to denote the type of information.

  • prefixes – Prefixes before the suffix to denote magnitude.

Returns

A function to change a value by the above parameters.

XRef

Expand partial xrefs.

stack_exchange_graph_data.helpers.xref.custom_parser(prefix: str) → Type[docutils.parsers.Parser]

Markdown parser with partial xref support.

Extends recommonmark.parser.CommonMarkParser with to include the custom_parser.PendingXRefTransform transform.

Parameters

prefix – Http base to prepend to partial hyperlinks.

Returns

A custom parser to parse Markdown.