oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope

Module Contents

Classes

GoogleAnalytics3Release

Construct a GoogleAnalytics3Release.

GoogleAnalytics3Telescope

Google Analytics Telescope.

Functions

initialize_analyticsreporting(...)

Initializes an Analytics Reporting API V4 service object.

list_all_books(service, view_id, pagepath_regex, ...)

List all available books by getting all pagepaths of a view id in a given period.

create_book_result_dicts(book_entries, ...)

Create a dictionary to store results for a single book. Pagepath, title and avg time on page are already given.

get_dimension_data(service, view_id, ...)

Get reports data from the Google Analytics Reporting service for a single dimension and multiple metrics.

add_to_book_result_dict(book_results, dimension, ...)

Add the 'unique_views', 'page_views' and 'sessions' results to the book results dict if these metrics are of interest for the

get_reports(service, organisation_name, view_id, ...)

Get reports data from the Google Analytics Reporting API.

class oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.GoogleAnalytics3Release(dag_id, run_id, data_interval_start, data_interval_end, partition_date)[source]

Bases: observatory.platform.workflows.workflow.PartitionRelease

Construct a GoogleAnalytics3Release.

Parameters:
  • dag_id (str) – The ID of the DAG

  • run_id (str) – The Airflow run ID

  • data_interval_start (pendulum.DateTime) – The start date of the DAG the start date of the download period.

  • data_interval_end (pendulum.DateTime) – end date of the download period, also used as release date for BigQuery table and file paths

  • partition_date (pendulum.DateTime) –

class oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.GoogleAnalytics3Telescope(dag_id, organisation_name, cloud_workspace, view_id, pagepath_regex, data_partner='google_analytics3', bq_dataset_description='Data from Google sources', bq_table_description=None, api_dataset_id='google_analytics', oaebu_service_account_conn_id='oaebu_service_account', observatory_api_conn_id=AirflowConns.OBSERVATORY_API, catchup=True, start_date=pendulum.datetime(2018, 1, 1), schedule='@monthly')[source]

Bases: observatory.platform.workflows.workflow.Workflow

Google Analytics Telescope.

Construct a GoogleAnalytics3Telescope instance. :param dag_id: The ID of the DAG :param organisation_name: The organisation name as per Google Analytics :param cloud_workspace: The CloudWorkspace object for this DAG :param view_id: The Google Analytics view ID :param pagepath_regex: The pagepath regex :param data_partner: The name of the data partner :param bq_dataset_description: Description for the BigQuery dataset :param bq_table_description: Description for the biguery table :param api_dataset_id: The ID to store the dataset release in the API :param oaebu_service_account_conn_id: Airflow connection ID for the OAEBU service account :param observatory_api_conn_id: Airflow connection ID for the overvatory API :param catchup: Whether to catchup the DAG or not :param start_date: The start date of the DAG :param schedule: The schedule interval of the DAG

Parameters:
  • dag_id (str) –

  • organisation_name (str) –

  • cloud_workspace (observatory.platform.observatory_config.CloudWorkspace) –

  • view_id (str) –

  • pagepath_regex (str) –

  • data_partner (Union[str, oaebu_workflows.oaebu_partners.OaebuPartner]) –

  • bq_dataset_description (str) –

  • bq_table_description (str) –

  • api_dataset_id (str) –

  • oaebu_service_account_conn_id (str) –

  • observatory_api_conn_id (str) –

  • catchup (bool) –

  • start_date (pendulum.DateTime) –

  • schedule (str) –

ANU_ORG_NAME = 'ANU Press'[source]
make_release(**kwargs)[source]

Make release instances. The release is passed as an argument to the function (TelescopeFunction) that is called in ‘task_callable’.

Parameters:

kwargs – the context passed from the PythonOperator.

Return type:

List[GoogleAnalytics3Release]

See https://airflow.apache.org/docs/stable/macros-ref.html for the keyword arguments that can be passed :return: A list of grid release instances

check_dependencies(**kwargs)[source]

Check dependencies of DAG. Add to parent method to additionally check for a view id and pagepath regex

Parameters:

kwargs – the context passed from the Airflow Operator.

Returns:

True if dependencies are valid.

Return type:

bool

download_transform(releases, **kwargs)[source]

Task to download and transform the google analytics release for a given month.

Parameters:

releases (List[GoogleAnalytics3Release]) – a list with one google analytics release.

Return type:

None

upload_transformed(releases, **kwargs)[source]

Uploads the transformed file to GCS

Parameters:

releases (List[GoogleAnalytics3Release]) –

Return type:

None

bq_load(releases, **kwargs)[source]

Loads the data into BigQuery

Parameters:

releases (List[GoogleAnalytics3Release]) –

Return type:

None

add_new_dataset_releases(releases, **kwargs)[source]

Adds release information to API.

Parameters:

releases (List[GoogleAnalytics3Release]) –

Return type:

None

cleanup(releases, **kwargs)[source]

Delete all files, folders and XComs associated with this release.

Parameters:

releases (List[GoogleAnalytics3Release]) –

Return type:

None

oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.initialize_analyticsreporting(oaebu_service_account_conn_id)[source]

Initializes an Analytics Reporting API V4 service object.

Returns:

An authorized Analytics Reporting API V4 service object.

Parameters:

oaebu_service_account_conn_id (str) –

Return type:

googleapiclient.discovery.Resource

oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.list_all_books(service, view_id, pagepath_regex, data_interval_start, data_interval_end, organisation_name, metrics)[source]

List all available books by getting all pagepaths of a view id in a given period. Note: Google API will not return a result for any entry in which all supplied metrics are zero. However, it will return ‘some’ results if you supply no metrics, contrary to the documentation. Date ranges are inclusive.

Parameters:
  • service (googleapiclient.discovery.Resource) – The Google Analytics Reporting service object.

  • view_id (str) – The view id.

  • pagepath_regex (str) – The regex expression for the pagepath of a book.

  • data_interval_start (pendulum.DateTime) – The start date of the DAG Start date of analytics period

  • data_interval_end (pendulum.DateTime) – End date of analytics period

  • organisation_name (str) – The organisation name.

  • metrics (list) –

Param:

metrics: The metrics to return return with the book results

Returns:

A list with dictionaries, one for each book entry (the dict contains the pagepath, title and average time

Return type:

Tuple[List[dict], list]

on page) and a list of all pagepaths.

oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.create_book_result_dicts(book_entries, data_interval_start, data_interval_end, organisation_name)[source]

Create a dictionary to store results for a single book. Pagepath, title and avg time on page are already given. The other metrics will be added to the dictionary later.

Parameters:
  • book_entries (List[dict]) – List with dictionaries of book entries.

  • data_interval_start (pendulum.DateTime) – The start date of the DAG Start date of analytics period.

  • data_interval_end (pendulum.DateTime) – End date of analytics period.

  • organisation_name (str) – The organisation name.

Returns:

Dict to store results

Return type:

Dict[dict]

oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.get_dimension_data(service, view_id, data_interval_start, data_interval_end, metrics, dimension, pagepaths)[source]

Get reports data from the Google Analytics Reporting service for a single dimension and multiple metrics. The results are filtered by pagepaths of interest and ordered by pagepath as well.

Parameters:
  • service (googleapiclient.discovery.Resource) – The Google Analytics Reporting service.

  • view_id (str) – The view id.

  • data_interval_start (pendulum.DateTime) – The start date of the DAG The start date of the analytics period.

  • data_interval_end (pendulum.DateTime) – The end date of the analytics period.

  • metrics (list) – List with dictionaries of metric.

  • dimension (dict) – The dimension.

  • pagepaths (list) – List with pagepaths to filter and sort on.

Returns:

List with reports data for dimension and metrics.

Return type:

list

oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.add_to_book_result_dict(book_results, dimension, pagepath, unique_views, page_views, sessions)[source]

Add the ‘unique_views’, ‘page_views’ and ‘sessions’ results to the book results dict if these metrics are of interest for the current dimension.

Parameters:
  • book_results (dict) – A dictionary with all book results.

  • dimension (dict) – Current dimension for which ‘unique_views’ and ‘sessions’ data is given.

  • pagepath (str) – Pagepath of the book.

  • unique_views (dict) – Number of unique views for the pagepath&dimension

  • page_views (dict) – Number of page views for the pagepath&dimension

  • sessions (dict) – Number of sessions for the pagepath&dimension

Returns:

None

oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.get_reports(service, organisation_name, view_id, pagepath_regex, data_interval_start, data_interval_end)[source]

Get reports data from the Google Analytics Reporting API.

Parameters:
  • service (googleapiclient.discovery.Resource) – The Google Analytics Reporting service.

  • organisation_name (str) – Name of the organisation.

  • view_id (str) – The view id.

  • pagepath_regex (str) – The regex expression for the pagepath of a book.

  • data_interval_start (pendulum.DateTime) – The start date of the DAG Start date of analytics period

  • data_interval_end (pendulum.DateTime) – End date of analytics period

Returns:

List with google analytics data for each book

Return type:

list