oaebu_workflows.irus_fulcrum_telescope.irus_fulcrum_telescope

Module Contents

Classes

IrusFulcrumRelease

Create a IrusFulcrumRelease instance.

IrusFulcrumTelescope

The Fulcrum Telescope

Functions

download_fulcrum_month_data(download_month, requestor_id)

Download Fulcrum data for the release month

transform_fulcrum_data(totals_data, country_data[, ...])

Transforms Fulcrum downloaded "totals" and "country" data.

Attributes

IRUS_FULCRUM_ENDPOINT_TEMPLATE

oaebu_workflows.irus_fulcrum_telescope.irus_fulcrum_telescope.IRUS_FULCRUM_ENDPOINT_TEMPLATE = 'https://irus.jisc.ac.uk/api/v3/irus/reports/irus_ir/?platform=235&requestor_id={requestor_id}&beg...'[source]
class oaebu_workflows.irus_fulcrum_telescope.irus_fulcrum_telescope.IrusFulcrumRelease(dag_id, run_id, data_interval_start, data_interval_end, partition_date)[source]

Bases: observatory.platform.workflows.workflow.PartitionRelease

Create a IrusFulcrumRelease instance.

Parameters:
  • dag_id (str) – The ID of the DAG

  • run_id (str) – The airflow run ID

  • data_interval_start (pendulum.DateTime) – The beginning of the data interval

  • data_interval_end (pendulum.DateTime) – The end of the data interval

  • partition_date (pendulum.DateTime) – The release/partition date

class oaebu_workflows.irus_fulcrum_telescope.irus_fulcrum_telescope.IrusFulcrumTelescope(dag_id, cloud_workspace, publishers, data_partner='irus_fulcrum', bq_dataset_description='IRUS dataset', bq_table_description=None, api_dataset_id='fulcrum', observatory_api_conn_id=AirflowConns.OBSERVATORY_API, irus_oapen_api_conn_id='irus_api', catchup=True, schedule='0 0 4 * *', start_date=pendulum.datetime(2022, 4, 1))[source]

Bases: observatory.platform.workflows.workflow.Workflow

The Fulcrum Telescope :param dag_id: The ID of the DAG :param cloud_workspace: The CloudWorkspace object for this DAG :param publishers: The publishers pertaining to this DAG instance (as listed in Fulcrum) :param data_partner: The name of the data partner :param bq_dataset_description: Description for the BigQuery dataset :param bq_table_description: Description for the biguery table :param api_dataset_id: The ID to store the dataset release in the API :param observatory_api_conn_id: Airflow connection ID for the overvatory API :param irus_oapen_api_conn_id: Airflow connection ID OAPEN IRUS UK (counter 5) :param catchup: Whether to catchup the DAG or not :param schedule: The schedule interval of the DAG :param start_date: The start date of the DAG

Parameters:
  • dag_id (str) –

  • cloud_workspace (observatory.platform.observatory_config.CloudWorkspace) –

  • publishers (List[str]) –

  • data_partner (Union[str, oaebu_workflows.oaebu_partners.OaebuPartner]) –

  • bq_dataset_description (str) –

  • bq_table_description (str) –

  • api_dataset_id (str) –

  • observatory_api_conn_id (str) –

  • irus_oapen_api_conn_id (str) –

  • catchup (bool) –

  • schedule (str) –

  • start_date (pendulum.DateTime) –

make_release(**kwargs)[source]

Create a IrusFulcrumRelease instance Dates are best explained with an example Say the dag is scheduled to run on 2022-04-07 Interval_start will be 2022-03-01 Interval_end will be 2022-04-01 partition_date will be 2022-03-31

Return type:

IrusFulcrumRelease

download(release, **kwargs)[source]

Task to download the Fulcrum data for a release

Parameters:
upload_downloaded(release, **kwargs)[source]

Upload the downloaded fulcrum data to the google cloud download bucket

Parameters:

release (IrusFulcrumRelease) –

transform(release, **kwargs)[source]

Task to transform the fulcrum data

Parameters:

release (IrusFulcrumRelease) –

upload_transformed(release, **kwargs)[source]

Upload the transformed fulcrum data to the google cloud download bucket

Parameters:

release (IrusFulcrumRelease) –

bq_load(release, **kwargs)[source]

Load the transfromed data into bigquery

Parameters:

release (IrusFulcrumRelease) –

Return type:

None

add_new_dataset_releases(release, **kwargs)[source]

Adds release information to API.

Parameters:

release (IrusFulcrumRelease) –

Return type:

None

cleanup(release, **kwargs)[source]

Delete all files and folders associated with this release.

Parameters:

release (IrusFulcrumRelease) –

Return type:

None

oaebu_workflows.irus_fulcrum_telescope.irus_fulcrum_telescope.download_fulcrum_month_data(download_month, requestor_id, num_retries=3)[source]

Download Fulcrum data for the release month

Parameters:
  • download_month (pendulum.DateTime) – The month to download usage data from

  • requestor_id (str) – The requestor ID - used to access irus platform

  • num_retries (str) – Number of attempts to make for the URL

Return type:

Tuple[List[dict], List[dict]]

oaebu_workflows.irus_fulcrum_telescope.irus_fulcrum_telescope.transform_fulcrum_data(totals_data, country_data, publishers=None)[source]

Transforms Fulcrum downloaded “totals” and “country” data.

Parameters:
  • totals_data (List[dict]) – Fulcrum usage data aggregated over all countries

  • country_data (List[dict]) – Fulcrum usage data split by country

  • publishers (List[str]) – Fulcrum publishers to retain. If None, use all publishers

Return type:

List[dict]