oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope
Module Contents
Classes
Construct a GoogleAnalytics3Release. |
|
Google Analytics Telescope. |
Functions
Initializes an Analytics Reporting API V4 service object. |
|
|
List all available books by getting all pagepaths of a view id in a given period. |
|
Create a dictionary to store results for a single book. Pagepath, title and avg time on page are already given. |
|
Get reports data from the Google Analytics Reporting service for a single dimension and multiple metrics. |
|
Add the 'unique_views', 'page_views' and 'sessions' results to the book results dict if these metrics are of interest for the |
|
Get reports data from the Google Analytics Reporting API. |
- class oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.GoogleAnalytics3Release(dag_id, run_id, data_interval_start, data_interval_end, partition_date)[source]
Bases:
observatory.platform.workflows.workflow.PartitionRelease
Construct a GoogleAnalytics3Release.
- Parameters:
dag_id (str) – The ID of the DAG
run_id (str) – The Airflow run ID
data_interval_start (pendulum.DateTime) – The start date of the DAG the start date of the download period.
data_interval_end (pendulum.DateTime) – end date of the download period, also used as release date for BigQuery table and file paths
partition_date (pendulum.DateTime) –
- class oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.GoogleAnalytics3Telescope(dag_id, organisation_name, cloud_workspace, view_id, pagepath_regex, data_partner='google_analytics3', bq_dataset_description='Data from Google sources', bq_table_description=None, api_dataset_id='google_analytics', oaebu_service_account_conn_id='oaebu_service_account', observatory_api_conn_id=AirflowConns.OBSERVATORY_API, catchup=True, start_date=pendulum.datetime(2018, 1, 1), schedule='@monthly')[source]
Bases:
observatory.platform.workflows.workflow.Workflow
Google Analytics Telescope.
Construct a GoogleAnalytics3Telescope instance. :param dag_id: The ID of the DAG :param organisation_name: The organisation name as per Google Analytics :param cloud_workspace: The CloudWorkspace object for this DAG :param view_id: The Google Analytics view ID :param pagepath_regex: The pagepath regex :param data_partner: The name of the data partner :param bq_dataset_description: Description for the BigQuery dataset :param bq_table_description: Description for the biguery table :param api_dataset_id: The ID to store the dataset release in the API :param oaebu_service_account_conn_id: Airflow connection ID for the OAEBU service account :param observatory_api_conn_id: Airflow connection ID for the overvatory API :param catchup: Whether to catchup the DAG or not :param start_date: The start date of the DAG :param schedule: The schedule interval of the DAG
- Parameters:
dag_id (str) –
organisation_name (str) –
cloud_workspace (observatory.platform.observatory_config.CloudWorkspace) –
view_id (str) –
pagepath_regex (str) –
data_partner (Union[str, oaebu_workflows.oaebu_partners.OaebuPartner]) –
bq_dataset_description (str) –
bq_table_description (str) –
api_dataset_id (str) –
oaebu_service_account_conn_id (str) –
observatory_api_conn_id (str) –
catchup (bool) –
start_date (pendulum.DateTime) –
schedule (str) –
- make_release(**kwargs)[source]
Make release instances. The release is passed as an argument to the function (TelescopeFunction) that is called in ‘task_callable’.
- Parameters:
kwargs – the context passed from the PythonOperator.
- Return type:
List[GoogleAnalytics3Release]
See https://airflow.apache.org/docs/stable/macros-ref.html for the keyword arguments that can be passed :return: A list of grid release instances
- check_dependencies(**kwargs)[source]
Check dependencies of DAG. Add to parent method to additionally check for a view id and pagepath regex
- Parameters:
kwargs – the context passed from the Airflow Operator.
- Returns:
True if dependencies are valid.
- Return type:
bool
- download_transform(releases, **kwargs)[source]
Task to download and transform the google analytics release for a given month.
- Parameters:
releases (List[GoogleAnalytics3Release]) – a list with one google analytics release.
- Return type:
None
- upload_transformed(releases, **kwargs)[source]
Uploads the transformed file to GCS
- Parameters:
releases (List[GoogleAnalytics3Release]) –
- Return type:
None
- bq_load(releases, **kwargs)[source]
Loads the data into BigQuery
- Parameters:
releases (List[GoogleAnalytics3Release]) –
- Return type:
None
- add_new_dataset_releases(releases, **kwargs)[source]
Adds release information to API.
- Parameters:
releases (List[GoogleAnalytics3Release]) –
- Return type:
None
- cleanup(releases, **kwargs)[source]
Delete all files, folders and XComs associated with this release.
- Parameters:
releases (List[GoogleAnalytics3Release]) –
- Return type:
None
- oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.initialize_analyticsreporting(oaebu_service_account_conn_id)[source]
Initializes an Analytics Reporting API V4 service object.
- Returns:
An authorized Analytics Reporting API V4 service object.
- Parameters:
oaebu_service_account_conn_id (str) –
- Return type:
googleapiclient.discovery.Resource
- oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.list_all_books(service, view_id, pagepath_regex, data_interval_start, data_interval_end, organisation_name, metrics)[source]
List all available books by getting all pagepaths of a view id in a given period. Note: Google API will not return a result for any entry in which all supplied metrics are zero. However, it will return ‘some’ results if you supply no metrics, contrary to the documentation. Date ranges are inclusive.
- Parameters:
service (googleapiclient.discovery.Resource) – The Google Analytics Reporting service object.
view_id (str) – The view id.
pagepath_regex (str) – The regex expression for the pagepath of a book.
data_interval_start (pendulum.DateTime) – The start date of the DAG Start date of analytics period
data_interval_end (pendulum.DateTime) – End date of analytics period
organisation_name (str) – The organisation name.
metrics (list) –
- Param:
metrics: The metrics to return return with the book results
- Returns:
A list with dictionaries, one for each book entry (the dict contains the pagepath, title and average time
- Return type:
Tuple[List[dict], list]
on page) and a list of all pagepaths.
- oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.create_book_result_dicts(book_entries, data_interval_start, data_interval_end, organisation_name)[source]
Create a dictionary to store results for a single book. Pagepath, title and avg time on page are already given. The other metrics will be added to the dictionary later.
- Parameters:
book_entries (List[dict]) – List with dictionaries of book entries.
data_interval_start (pendulum.DateTime) – The start date of the DAG Start date of analytics period.
data_interval_end (pendulum.DateTime) – End date of analytics period.
organisation_name (str) – The organisation name.
- Returns:
Dict to store results
- Return type:
Dict[dict]
- oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.get_dimension_data(service, view_id, data_interval_start, data_interval_end, metrics, dimension, pagepaths)[source]
Get reports data from the Google Analytics Reporting service for a single dimension and multiple metrics. The results are filtered by pagepaths of interest and ordered by pagepath as well.
- Parameters:
service (googleapiclient.discovery.Resource) – The Google Analytics Reporting service.
view_id (str) – The view id.
data_interval_start (pendulum.DateTime) – The start date of the DAG The start date of the analytics period.
data_interval_end (pendulum.DateTime) – The end date of the analytics period.
metrics (list) – List with dictionaries of metric.
dimension (dict) – The dimension.
pagepaths (list) – List with pagepaths to filter and sort on.
- Returns:
List with reports data for dimension and metrics.
- Return type:
list
- oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.add_to_book_result_dict(book_results, dimension, pagepath, unique_views, page_views, sessions)[source]
Add the ‘unique_views’, ‘page_views’ and ‘sessions’ results to the book results dict if these metrics are of interest for the current dimension.
- Parameters:
book_results (dict) – A dictionary with all book results.
dimension (dict) – Current dimension for which ‘unique_views’ and ‘sessions’ data is given.
pagepath (str) – Pagepath of the book.
unique_views (dict) – Number of unique views for the pagepath&dimension
page_views (dict) – Number of page views for the pagepath&dimension
sessions (dict) – Number of sessions for the pagepath&dimension
- Returns:
None
- oaebu_workflows.google_analytics3_telescope.google_analytics3_telescope.get_reports(service, organisation_name, view_id, pagepath_regex, data_interval_start, data_interval_end)[source]
Get reports data from the Google Analytics Reporting API.
- Parameters:
service (googleapiclient.discovery.Resource) – The Google Analytics Reporting service.
organisation_name (str) – Name of the organisation.
view_id (str) – The view id.
pagepath_regex (str) – The regex expression for the pagepath of a book.
data_interval_start (pendulum.DateTime) – The start date of the DAG Start date of analytics period
data_interval_end (pendulum.DateTime) – End date of analytics period
- Returns:
List with google analytics data for each book
- Return type:
list