Resources¶
Resource (base class)¶
- class datagrowth.resources.base.Resource(*args, **kwargs)¶
This class defines the interface that all resources adhere to. You’ll rarely extend this class directly. The
HttpResourceandShellResourceare examples of classes that overextend this class.- clean()¶
Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.
- close()¶
This convenience method handles both the clean and save step for saving models. To make use of the resource cache it’s necessary to clean before saving and close handles this directly.
- property content¶
This method typically gets overwritten for different resource types. It should return the content_type and data from the resource.
- Returns:
content_type, data
- classmethod get_name()¶
Return the name of the resource. This is the model_name for almost all resources.
- Returns:
(str) lowercase model name
- classmethod get_queue_name()¶
Returns the queue name that background tasks should dispatch to. By default it returns the default Django Celery queue name.
- Returns:
(str) queue name
- handle_errors()¶
Overwrite this method to handle resource specific error cases. Usually you’d raise a particular
DGResourceExceptionto indicate particular errors.
- next()¶
Creates a new Resource that is the follow-up of the current Resource, like the Resource for a next page in a Resource that supports pagination. Or returns None if no such follow-up exists (the default).
- retain(retainer)¶
Links any Django model unto a
GenericRelationupon a resource. Any resources retained this way will not get deleted from cache. This is convenient to save any context that can help during debugging.- Parameters:
retainer – (model) the model retaining the resource
- property success¶
This method typically gets overwritten for different resource types. It should indicate the success of the data gathering.
- Returns:
(bool)
- variables(*args)¶
Maps the input arguments from a resource to a dictionary. This makes it easy to access the positional input variables under names. Overwrite this method to create the mapping for your particular resource.
- Parameters:
args – (tuple) the positional arguments given as input to the resource
- Returns:
(dict) a dictionary with the input variables as values
Http¶
- class datagrowth.resources.http.generic.HttpResource(*args, **kwargs)¶
You can extend from this base class to declare a
Resourcethat gathers data from a HTTP(S) source. For instance websites and (REST)API’sThis class is a wrapper around the requests library and provides:
easy follow up of continuation URL’s in responses
cached responses when retrieving data a second time
handle authentication through setting headers or GET parameters without storing credentials
slowing down requests when servers give errors or warnings related to high-load
Response headers, body and status get stored in the database as well as an abstraction of the request. Any authentication data gets stripped before storage in the database. Override handle_errors method to customize how errors in responses are detected.
- auth_headers()¶
Returns the dictionary that should be used as authentication headers for the request the resource will make. Override this method in your own class to add authentication. By default this method returns an empty dictionary meaning there are no authentication headers.
- Returns:
(dict) dictionary with headers to add to requests
- auth_parameters()¶
Returns the dictionary that should be used as authentication parameters for the request the resource will make. Override this method in your own class to add authentication. By default this method returns an empty dictionary meaning there are no authentication parameters.
- Returns:
(dict) dictionary with parameters to add to requests
- clean()¶
Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.
- property content¶
After a successful
getorpostcall this method reads the ContentType header from the HTTP response. Depending on the MIME type it will return the content type and the parsed data.For a ContentType of application/json data will be a python structure
For a ContentType of text/html or text/xml data will be a BeautifulSoup instance
Any other ContentType will result in None. You are encouraged to overextend
HttpResourceto handle your own data types.- Returns:
content_type, data
- create_next_request()¶
Creates and returns a dictionary that represents a continuation request. Often a source will indicate how to continue gather more data. By overriding the
next_parametersdevelopers can indicate how continuation requests can be made. Calling this method will build a new request using these parameters.- Returns:
(dict) a dictionary representing a continuation request to be made
- data(**kwargs)¶
Returns the dictionary that will be used as HTTP body for the request the resource will make. By default this is the dictionary from the
DATAattribute updated with the kwargs from the input from thesendmethod.- Parameters:
kwargs – keyword arguments from the input
- Returns:
- get(*args, **kwargs)¶
This method calls
sendwith “get” as a method. See thesendmethod for more information.- Parameters:
args – arguments that will get merged into the URI_TEMPLATE
kwargs – keywords arguments that will get send as data
- Returns:
HttpResource
- handle_errors()¶
Raises exceptions upon error statuses Override this method to raise exceptions for your own error states. By default it raises the
DGHttpError40XandDGHttpError50Xexceptions for statuses.
- static hash_from_data(data)¶
Given a dictionary will recursively sort and JSON dump the keys and values of that dictionary. The end result is given to SHA-1 to create a hash, that is unique for that data. This hash can be used for a database lookup to find earlier requests that send the same data.
- Parameters:
data – (dict) a dictionary of the data to be hashed
- Returns:
the hash of the data
- headers(*args, **kwargs)¶
Returns the dictionary that should be used as headers for the request the resource will make. By default this is the dictionary from the
HEADERSattribute.- Parameters:
args – keyword arguments from the input (ignored by default)
kwargs – keyword arguments from the input (ignored by default)
- Returns:
(dict) a dictionary representing HTTP headers
- next() Self | None¶
Creates a new Resource that is the follow-up of the current Resource, like the Resource for a next page in a Resource that supports pagination. Or returns None if no such follow-up exists (the default).
- next_parameters()¶
Returns the dictionary that should be used as HTTP query parameters for the continuation request a resource can make. By default this is an empty dictionary. Override this method and return the correct parameters based on the
contentof the resource.- Returns:
(dict) a dictionary representing HTTP continuation query parameters
- parameters(**kwargs)¶
Returns the dictionary that should be used as HTTP query parameters for the request the resource will make. By default this is the dictionary from the
PARAMETERSattribute.You may need to override this method. It will receive the return value of the variables method as kwargs.
- Parameters:
kwargs – variables returned by the variables method (ignored by default)
- Returns:
(dict) a dictionary representing HTTP query parameters
- static parse_content_type(content_type, default_encoding='utf-8')¶
Given a HTTP ContentType header will return the mime type and the encoding. If no encoding is found the default encoding gets returned.
- Parameters:
content_type – (str) the HTTP ContentType header
default_encoding – (str) the default encoding when
- Returns:
mime_type, encoding
- patch(*args, **kwargs)¶
This method calls
sendwith “patch” as a method. See thesendmethod for more information.- Parameters:
args – arguments that will get merged into the URI_TEMPLATE
kwargs – keywords arguments that will get send as data
- Returns:
HttpResource
- post(*args, **kwargs)¶
This method calls
sendwith “post” as a method. See thesendmethod for more information.- Parameters:
args – arguments that will get merged into the URI_TEMPLATE
kwargs – keywords arguments that will get send as data
- Returns:
HttpResource
- put(*args, **kwargs)¶
This method calls
sendwith “put” as a method. See thesendmethod for more information.- Parameters:
args – arguments that will get merged into the URI_TEMPLATE
kwargs – keywords arguments that will get send as data
- Returns:
HttpResource
- request_with_auth()¶
Get the
requestthat this resource will make with authentication headers and parameters added. Overrideauth_headersand/orauth_parametersto provide the headers and/or parameters.- Returns:
(dict) a copy of the
requestdictionary with authentication added
- request_without_auth()¶
Get the
requestthat this resource will make with authentication headers and parameters fromauth_headersandauth_parametersremoved.- Returns:
(dict) a copy of the
requestdictionary with authentication removed
- send(method, *args, **kwargs)¶
This method handles the gathering of data and updating the model based on the resource configuration. If the data has been retrieved before it will load the data from cache instead. Specify
cache_onlyin your config if you want to prevent any HTTP requests. The data might be missing in that case.You must specify the method that the resource will be using to get the data. Currently this can be the “get” and “post” HTTP verbs.
Any arguments will be passed to
URI_TEMPLATEto format it. Any keyword arguments will be passed as a data dict to the request. If a keyword is listed in theFILE_DATA_KEYSattribute on a HttpResource, then the value of that argument is expected to be a file path relative toDATAGROWTH_WEB_MEDIA_ROOT. The value of that keyword will be replaced with the file before making the request.- Parameters:
method – “get” or “post” depending on which request you want your resource to execute
args – arguments that will get merged into the
URI_TEMPLATEkwargs – keywords arguments that will get send as data
- Returns:
HttpResource
- set_error(status, connection_error=False)¶
Sets the given status on the HttpResource. When dealing with connection_errors it sets valid defaults.
- Parameters:
status – (int) the error status from the response
connection_error – (bool) whether the error occurred during a connection error
- Returns:
- property success¶
Returns True if status is within HTTP success range
- Returns:
Boolean
- static uri_from_url(url)¶
Given a URL this method will strip the protocol and sort the parameters. That way a database lookup for a URL will always return URL’s that logically match that URL.
- Parameters:
url – the URL to normalize to URI
- Returns:
a normalized URI suitable for lookups
- validate_request(request, validate_input=True)¶
Validates a dictionary that represents a request that the resource will make. Currently it checks the method, which should be “get” or “post” and whether the current data (if any) is still valid or has expired. Apart from that it validates input which should adhere to the JSON schema defined in the
GET_SCHEMAorPOST_SCHEMAattributes- Parameters:
request – (dict) the request dictionary
validate_input – (bool) whether to validate input
- Returns:
- variables(*args)¶
Parsers the input variables and returns a dictionary with a “url” key. This key contains a list of variables that will be used to format the
URI_TEMPLATE.- Returns:
(dict) a dictionary where the input variables are available under names
- class datagrowth.resources.http.generic.URLResource(*args, **kwargs)¶
Sometimes you don’t want to build a URI through the
URI_TEMPLATE, because you have a URL, where data should be retrieved from immediately. For this use case theURLResourceis very suitable. Just pass the URL as a first argument to eithergetorpostand the request will be made.Only full URL’s with protocol are excepted as an argument. And note that it is not possible to adjust the parameters through the
parametersmethod, because it is assumed that all parameters are part of the URL given togetorpost.- PARAMETERS = None¶
- class datagrowth.resources.http.files.HttpFileResource(*args, **kwargs)¶
Sometimes you want to download a file instead of storing the content in the database. For this use case the
HttpFileResourceis very suitable. Just pass the URL as a first argument togetand the URL will be downloaded as a file, storing it in yourMEDIA_ROOT.The file path of the downloaded file will get stored in the
bodyfield. This path will be relative to theMEDIA_ROOT. The path will include a downloads folder and a subfolder that is theapp_nameof the concrete class. Under that directory there are many possible subdirectories in the form of “x/yz/”. Where x, y and z will be hexidecimal characters. Creating these subdirectories is necessary to prevent huge download directories, that would hamper performance.Only full URL’s with protocol will get downloaded. Any URL’s without a protocol will get stored as a failure with a 404 (Not Found) error code. Please note that with this class it is not possible to adjust the parameters through the
parametersmethod, because it is assumed that all parameters are part of the URL given toget.- property content¶
Opens the file stored at the file path in
bodyand returns that file together with the content type.- Returns:
content_type, file
- static get_file_name(original, now)¶
Override this method to change the file naming convention. By default it will take the filename from the URL and prefix it with a datetime string of the date and time at downloading.
- Parameters:
original – (str) the URL file name
now – (datetime) a datetime object to use as prefix input
- Returns:
- post(*args, **kwargs)¶
This method calls
sendwith “post” as a method. See thesendmethod for more information.- Parameters:
args – arguments that will get merged into the URI_TEMPLATE
kwargs – keywords arguments that will get send as data
- Returns:
HttpResource
- transform(file)¶
By default the
contentproperty will return the file wrapped in a DjangoFileclass. It may be convenient to wrap it in some other way. Override this method and return the file in a different format to change the content return value.- Parameters:
file – (File) the file read from storage
- Returns:
(any) file in correct format
- class datagrowth.resources.http.files.HttpImageResource(*args, **kwargs)¶
This class acts like the HttpFileResource with the only difference that it will return content as Pillow images.
- transform(file)¶
By default the
contentproperty will return the file wrapped in a DjangoFileclass. It may be convenient to wrap it in some other way. Override this method and return the file in a different format to change the content return value.- Parameters:
file – (File) the file read from storage
- Returns:
(any) file in correct format
- datagrowth.resources.http.files.file_resource_delete_handler(sender, instance, **kwargs)¶
A Django signal handler that can be bound to a
post_deletesignal to free disk space when file resources get deleted.- Parameters:
sender – receives the class that is sending the signal
instance – the object under deletion
kwargs – ignored, for compatibility only
Shell¶
- class datagrowth.resources.shell.ShellResource(*args, **kwargs)¶
You can extend from this base class to declare a
Resourcethat gathers data from a any shell command.This class is a wrapper around the subprocess module and provides:
cached responses when retrieving data a second time
The resource stores the stdin, stdout and stderr from commands in the database as well as an abstraction of the command.
- clean()¶
Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.
- clean_stderr(stderr)¶
This method decodes the stderr from the subprocess result to UTF-8. Override this method to do any further cleanup.
- Parameters:
stderr – (bytes) stderr from the command
- Returns:
(str) cleaned decoded output
- clean_stdout(stdout)¶
This method decodes the stdout from the subprocess result to UTF-8. Override this method to do any further cleanup.
- Parameters:
stdout – (bytes) stdout from the command
- Returns:
(str) cleaned decoded output
- property content¶
After a successful
runcall this method passes stdout from the command through thetransformmethod. It then returns the value of theCONTENT_TYPEattribute as content type and whatever transform returns as data- Returns:
content_type, data
- debug()¶
A method that prints to stdout the command that will get executed by the
ShellResource. This is mostly useful for debugging during development.
- environment(*args, **kwargs)¶
You can specify environment variables for the command based on the input to
runby overriding this method. The input fromrunis passed down to this method, based on this a dictionary should get returned containing the environment variables or None if no environment should be set.By default this method returns the
VARIABLESattribute without making changes to it.- Parameters:
args – arguments from the
runcommandkwargs – keyword arguments from the
runcommand
- Returns:
a dictionary with environment variables or None
- handle_errors()¶
Raises exceptions upon error statuses Override this method to raise exceptions for your own error states. By default it raises the
DGShellErrorfor any status other than 0.
- run(*args, **kwargs)¶
This method handles the gathering of data and updating the model based on the resource configuration. If the data has been retrieved before it will load the data from cache instead. Specify
cache_onlyin your config if you want to prevent any execution of commands. The data might be missing in that case.Any arguments will be passed to
CMD_TEMPLATEto format it. Any keyword arguments will be parsed into command flags by using theFLAGSattribute. The parsed flags will be inserted intoCMD_TEMPLATEwhere ever theCMD_FLAGSvalue is present.- Parameters:
args – get passed on to the command
kwargs – get parsed into flags before being passed on to the command
- Returns:
self
- property success¶
Returns True if exit code is 0 and there is some stdout
- transform(stdout)¶
Override this method for particular commands. It takes the stdout from the command and transforms it into useful output for other components. One use case could be to clean out log lines from the output.
- Parameters:
stdout – the stdout from the command
- Returns:
transformed stdout
- static uri_from_cmd(cmd)¶
Given a command list this method will sort that list, but keeps the first element as first element. That way a database lookup for a command will always return a command that logically match that command. Regardless of flag or argument order. At the same time similar commands will appear beneath each other in an overview.
- Parameters:
cmd – the command list as passed to subprocess.run to normalize to URI
- Returns:
a normalized URI suitable for lookups
- validate_command(command, validate_input=True)¶
Validates a dictionary that represents a command that the resource will run.
It currently checks whether the current data (if any) is still valid or has expired. Apart from that it validates input which should adhere to the JSON schema defined in the
SCHEMAattribute.- Parameters:
command – (dict) the command dictionary
validate_input – (bool) whether to validate input
- Returns:
- variables(*args)¶
Parsers the input variables and returns a dictionary with an “input” key. This key contains a list of variables that will be used to format the
CMD_TEMPLATE.- Returns:
(dict) a dictionary where the input variables are available under names