oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope
Module Contents
Classes
Construct a OapenMetadataRelease instance |
|
Oapen Metadata Telescope |
Functions
|
Downloads the OAPEN metadata XML file |
Attributes
- class oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope.OapenMetadataRelease(dag_id, run_id, snapshot_date)[source]
Bases:
observatory.platform.workflows.workflow.SnapshotRelease
Construct a OapenMetadataRelease instance
- Parameters:
dag_id (str) – The ID of the DAG
run_id (str) – The Airflow run ID
snapshot_date (pendulum.DateTime) – The date of the snapshot_date/release
- class oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope.OapenMetadataTelescope(dag_id, cloud_workspace, metadata_uri, metadata_partner='oapen_metadata', elevate_related_products=False, bq_dataset_id='onix', bq_table_name='onix', bq_dataset_description='OAPEN Metadata converted to ONIX', bq_table_description=None, api_dataset_id='oapen', observatory_api_conn_id=AirflowConns.OBSERVATORY_API, catchup=False, start_date=pendulum.datetime(2018, 5, 14), schedule='0 12 * * Sun')[source]
Bases:
observatory.platform.workflows.workflow.Workflow
Oapen Metadata Telescope
Construct a OapenMetadataTelescope instance. :param dag_id: The ID of the DAG :param cloud_workspace: The CloudWorkspace object for this DAG :param metadata_uri: The URI of the metadata XML file :param metadata_partner: The metadata partner name :param elevate_related_products: Whether to pull out the related products to the product level. :param bq_dataset_id: The BigQuery dataset ID :param bq_table_name: The BigQuery table name :param bq_dataset_description: Description for the BigQuery dataset :param bq_table_description: Description for the biguery table :param api_dataset_id: The ID to store the dataset release in the API :param observatory_api_conn_id: Airflow connection ID for the overvatory API :param catchup: Whether to catchup the DAG or not :param start_date: The start date of the DAG :param schedule: The schedule interval of the DAG
- Parameters:
dag_id (str) –
cloud_workspace (observatory.platform.observatory_config.CloudWorkspace) –
metadata_uri (str) –
metadata_partner (Union[str, oaebu_workflows.oaebu_partners.OaebuPartner]) –
elevate_related_products (bool) –
bq_dataset_id (str) –
bq_table_name (str) –
bq_dataset_description (str) –
bq_table_description (str) –
api_dataset_id (str) –
observatory_api_conn_id (str) –
catchup (bool) –
start_date (pendulum.DateTime) –
schedule (str) –
- make_release(**kwargs)[source]
Make release instances. The release is passed as an argument to the function (TelescopeFunction) that is called in ‘task_callable’.
- Parameters:
kwargs – the context passed from the PythonOperator.
- Return type:
See https://airflow.apache.org/docs/stable/macros-ref.html for the keyword arguments that can be passed :return: The Oapen metadata release instance
- download(release, **kwargs)[source]
Task to download the OapenMetadataRelease release.
- Parameters:
kwargs – the context passed from the PythonOperator.
release (OapenMetadataRelease) – an OapenMetadataRelease instance.
- Return type:
None
- upload_downloaded(release, **kwargs)[source]
Task to upload the downloaded OAPEN metadata
- Parameters:
release (OapenMetadataRelease) –
- Return type:
None
- transform(release, **kwargs)[source]
Transform the oapen metadata XML file into a valid ONIX file
- Parameters:
release (OapenMetadataRelease) –
- Return type:
None
- upload_transformed(release, **kwargs)[source]
Task to upload the transformed OAPEN metadata
- Parameters:
release (OapenMetadataRelease) –
- Return type:
None
- bq_load(release, **kwargs)[source]
Load the transformed ONIX file into bigquery
- Parameters:
release (OapenMetadataRelease) –
- Return type:
None
- add_new_dataset_releases(release, **kwargs)[source]
Adds release information to API.
- Parameters:
release (OapenMetadataRelease) –
- Return type:
None
- cleanup(release, **kwargs)[source]
Delete all files, folders and XComs associated with this release.
- Parameters:
release (OapenMetadataRelease) –
- Return type:
None
- oaebu_workflows.oapen_metadata_telescope.oapen_metadata_telescope.download_metadata(uri, download_path)[source]
Downloads the OAPEN metadata XML file OAPEN’s downloader can give an incomplete file if the metadata is partially generated. In this scenario, we should wait until the metadata generator has finished. Otherwise, an attempt to parse the data will result in an XML ParseError. Another scenario is that OAPEN returns only a header in the XML. We want this to also raise an error. OAPEN metadata generation can take over an hour
- Parameters:
uri (str) – the url to query for the metadata
download_path (str) – filepath to store te downloaded file
- Raises:
ConnectionError – raised if the response from the metadata server does not have code 200
AirflowException – raised if the response does not contain any Product fields
- Return type:
None