oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope

Module Contents

Classes

OapenMetadataRelease

Construct a OapenMetadataRelease instance

OapenMetadataTelescope

Oapen Metadata Telescope

Functions

download_metadata(uri, download_path)

Downloads the OAPEN metadata XML file

Attributes

DOWNLOAD_RETRY_CHAIN

oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope.DOWNLOAD_RETRY_CHAIN[source]
class oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope.OapenMetadataRelease(dag_id, run_id, snapshot_date)[source]

Bases: observatory.platform.workflows.workflow.SnapshotRelease

Construct a OapenMetadataRelease instance

Parameters:
  • dag_id (str) – The ID of the DAG

  • run_id (str) – The Airflow run ID

  • snapshot_date (pendulum.DateTime) – The date of the snapshot_date/release

property transform_files[source]
class oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope.OapenMetadataTelescope(dag_id, cloud_workspace, metadata_uri, metadata_partner='oapen_metadata', elevate_related_products=False, bq_dataset_id='onix', bq_table_name='onix', bq_dataset_description='OAPEN Metadata converted to ONIX', bq_table_description=None, api_dataset_id='oapen', observatory_api_conn_id=AirflowConns.OBSERVATORY_API, catchup=False, start_date=pendulum.datetime(2018, 5, 14), schedule='0 12 * * Sun')[source]

Bases: observatory.platform.workflows.workflow.Workflow

Oapen Metadata Telescope

Construct a OapenMetadataTelescope instance. :param dag_id: The ID of the DAG :param cloud_workspace: The CloudWorkspace object for this DAG :param metadata_uri: The URI of the metadata XML file :param metadata_partner: The metadata partner name :param elevate_related_products: Whether to pull out the related products to the product level. :param bq_dataset_id: The BigQuery dataset ID :param bq_table_name: The BigQuery table name :param bq_dataset_description: Description for the BigQuery dataset :param bq_table_description: Description for the biguery table :param api_dataset_id: The ID to store the dataset release in the API :param observatory_api_conn_id: Airflow connection ID for the overvatory API :param catchup: Whether to catchup the DAG or not :param start_date: The start date of the DAG :param schedule: The schedule interval of the DAG

Parameters:
  • dag_id (str) –

  • cloud_workspace (observatory.platform.observatory_config.CloudWorkspace) –

  • metadata_uri (str) –

  • metadata_partner (Union[str, oaebu_workflows.oaebu_partners.OaebuPartner]) –

  • elevate_related_products (bool) –

  • bq_dataset_id (str) –

  • bq_table_name (str) –

  • bq_dataset_description (str) –

  • bq_table_description (str) –

  • api_dataset_id (str) –

  • observatory_api_conn_id (str) –

  • catchup (bool) –

  • start_date (pendulum.DateTime) –

  • schedule (str) –

make_release(**kwargs)[source]

Make release instances. The release is passed as an argument to the function (TelescopeFunction) that is called in ‘task_callable’.

Parameters:

kwargs – the context passed from the PythonOperator.

Return type:

OapenMetadataRelease

See https://airflow.apache.org/docs/stable/macros-ref.html for the keyword arguments that can be passed :return: The Oapen metadata release instance

download(release, **kwargs)[source]

Task to download the OapenMetadataRelease release.

Parameters:
  • kwargs – the context passed from the PythonOperator.

  • release (OapenMetadataRelease) – an OapenMetadataRelease instance.

Return type:

None

upload_downloaded(release, **kwargs)[source]

Task to upload the downloaded OAPEN metadata

Parameters:

release (OapenMetadataRelease) –

Return type:

None

transform(release, **kwargs)[source]

Transform the oapen metadata XML file into a valid ONIX file

Parameters:

release (OapenMetadataRelease) –

Return type:

None

upload_transformed(release, **kwargs)[source]

Task to upload the transformed OAPEN metadata

Parameters:

release (OapenMetadataRelease) –

Return type:

None

bq_load(release, **kwargs)[source]

Load the transformed ONIX file into bigquery

Parameters:

release (OapenMetadataRelease) –

Return type:

None

add_new_dataset_releases(release, **kwargs)[source]

Adds release information to API.

Parameters:

release (OapenMetadataRelease) –

Return type:

None

cleanup(release, **kwargs)[source]

Delete all files, folders and XComs associated with this release.

Parameters:

release (OapenMetadataRelease) –

Return type:

None

oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope.download_metadata(uri, download_path)[source]

Downloads the OAPEN metadata XML file OAPEN’s downloader can give an incomplete file if the metadata is partially generated. In this scenario, we should wait until the metadata generator has finished. Otherwise, an attempt to parse the data will result in an XML ParseError. Another scenario is that OAPEN returns only a header in the XML. We want this to also raise an error. OAPEN metadata generation can take over an hour

Parameters:
  • uri (str) – the url to query for the metadata

  • download_path (str) – filepath to store te downloaded file

Raises:
  • ConnectionError – raised if the response from the metadata server does not have code 200

  • AirflowException – raised if the response does not contain any Product fields

Return type:

None