oaebu_workflows.thoth_telescope.thoth_telescope
Module Contents
Classes
Construct a ThothRelease. |
|
Construct an ThothOnixTelescope instance. |
Functions
|
Hits the Thoth API and requests the ONIX feed for a particular publisher. |
Attributes
- oaebu_workflows.thoth_telescope.thoth_telescope.THOTH_URL = '{host_name}/specifications/{format_specification}/publisher/{publisher_id}'[source]
- oaebu_workflows.thoth_telescope.thoth_telescope.DEFAULT_HOST_NAME = 'https://export.thoth.pub'[source]
- class oaebu_workflows.thoth_telescope.thoth_telescope.ThothRelease(*, dag_id, run_id, snapshot_date)[source]
Bases:
observatory.platform.workflows.workflow.SnapshotRelease
Construct a ThothRelease. :param dag_id: The ID of the DAG :param run_id: The Airflow run ID :param release_date: The date of the snapshot_date/release
- Parameters:
dag_id (str) –
run_id (str) –
snapshot_date (pendulum.datetime.DateTime) –
- class oaebu_workflows.thoth_telescope.thoth_telescope.ThothTelescope(*, dag_id, cloud_workspace, publisher_id, format_specification, elevate_related_products=False, metadata_partner='thoth', bq_dataset_description='Thoth ONIX Feed', bq_table_description=None, api_dataset_id='onix', host_name='https://export.thoth.pub', observatory_api_conn_id=AirflowConns.OBSERVATORY_API, catchup=False, start_date=pendulum.datetime(2022, 12, 1), schedule='@weekly')[source]
Bases:
observatory.platform.workflows.workflow.Workflow
Construct an ThothOnixTelescope instance. :param dag_id: The ID of the DAG :param cloud_workspace: The CloudWorkspace object for this DAG :param publisher_id: The Thoth ID for this publisher :param format_specification: The Thoth ONIX/metadata format specification. e.g. “onix_3.0::oapen” :param elevate_related_products: Whether to pull out the related products to the product level. :param metadata_partner: The metadata partner name :param bq_dataset_description: Description for the BigQuery dataset :param bq_table_description: Description for the biguery table :param api_dataset_id: The ID to store the dataset release in the API :param host_name: The Thoth host name :param observatory_api_conn_id: Airflow connection ID for the overvatory API :param catchup: Whether to catchup the DAG or not :param start_date: The start date of the DAG :param schedule: The schedule interval of the DAG
- Parameters:
dag_id (str) –
cloud_workspace (observatory.platform.observatory_config.CloudWorkspace) –
publisher_id (str) –
format_specification (str) –
elevate_related_products (bool) –
metadata_partner (Union[str, oaebu_workflows.oaebu_partners.OaebuPartner]) –
bq_dataset_description (str) –
bq_table_description (Optional[str]) –
api_dataset_id (str) –
host_name (str) –
observatory_api_conn_id (str) –
catchup (bool) –
start_date (pendulum.datetime.DateTime) –
schedule (str) –
- make_release(**kwargs)[source]
Creates a new Thoth release instance
- Parameters:
kwargs – the context passed from the PythonOperator.
- Return type:
See https://airflow.apache.org/docs/stable/macros-ref.html for the keyword arguments that can be passed :return: The Thoth release instance
- download(release, **kwargs)[source]
Task to download the ONIX release from Thoth.
- Parameters:
release (ThothRelease) – The Thoth release instance
- Return type:
None
- upload_downloaded(release, **kwargs)[source]
Upload the downloaded thoth onix XML to google cloud bucket
- Parameters:
release (ThothRelease) –
- Return type:
None
- transform(release, **kwargs)[source]
Task to transform the Thoth ONIX data
- Parameters:
release (ThothRelease) –
- Return type:
None
- upload_transformed(release, **kwargs)[source]
Upload the downloaded thoth onix .jsonl to google cloud bucket
- Parameters:
release (ThothRelease) –
- Return type:
None
- bq_load(release, **kwargs)[source]
Task to load the transformed ONIX jsonl file to BigQuery.
- Parameters:
release (ThothRelease) –
- Return type:
None
- add_new_dataset_releases(release, **kwargs)[source]
Adds release information to API.
- Parameters:
release (ThothRelease) –
- Return type:
None
- cleanup(release, **kwargs)[source]
Delete all files, folders and XComs associated with this release.
- Parameters:
release (ThothRelease) –
- Return type:
None
- oaebu_workflows.thoth_telescope.thoth_telescope.thoth_download_onix(publisher_id, download_path, format_spec, host_name=DEFAULT_HOST_NAME, num_retries=3)[source]
Hits the Thoth API and requests the ONIX feed for a particular publisher. Creates a file called onix.xml at the specified location
- Parameters:
publisher_id (str) – The ID of the publisher. Can be found using Thoth GraphiQL API
download_path (str) – The path to download ONIX the file to
format_spec (str) – The ONIX format specification to use. Options can be found with the /formats endpoint of the API
host_name (str) – The Thoth host URL
num_retries (int) – The number of times to retry the download, given an unsuccessful return code
- Return type:
None