Reporting on Google Analytics and Adwords in Python, Part 2

My previous post gave you some working code to get you started extracting Google analytics data from the API.  While visits and pageviews are important metrics they don’t mean much without context.  In this post I’ll extend the code from the previous post to show the visits and pageviews for each referrer to a site.  That’ll put some meat on these bones!

First, I’ll focus on just the creation of the query URI.  In part 1, our query URL looked like:

query_uri = gdata.analytics.client.DataFeedQuery({
      'ids': PROFILE_ID,
      'start-date': sd,
      'end-date': ed,
      'dimensions': 'ga:date',
      'metrics': 'ga:visits',
	})

and had the output:

ga:date ga:visits
20111121        3214
20111122        2692
20111123        2360
20111124        1537
20111125        2227
20111126        2171
20111127        2220

A query asks the Google Analytics API for all metrics for each dimension specified. In this case the query asks for all “ga:visits” for each “ga:date” dimension which are recorded over the date range from “start-date” to “end-date” for the “ids” specified.

Here’s a new query to show visits and pageviews for all referrers each day inside the date range:

 query_uri = gdata.analytics.client.DataFeedQuery({
      'ids': PROFILE_ID,
      'start-date': sd,
      'end-date': ed,
      'dimensions': 'ga:date,ga:source',
      'metrics': 'ga:visits,ga:pageviews',
        })

And its output looks like:

ga:date ga:source       ga:visits       ga:pageviews
20111121        (direct)        1029    7806
20111121        ask     8       44
20111121        austin360.com   1       12
20111121        bing    133     1130
20111121        en.wikipedia.org        2       40
20111121        facebook.com    2       9
20111121        google  1550    10642
20111121        google.com      6       27
20111121        home.myhughesnet.com    1       2
20111121        yellowbook.com  3       15
20111121        yellowpages.com 1       5
20111121        yelp.com        1       9
... many entries eliminated to protect the innocent
20111122        (direct)        844     4728
20111122        aol     5       11
20111122        ask     4       32
20111122        yahoo   124     1005
20111122        yellowbook.com  2       10
20111122        yellowpages.com 3       31
... many entries eliminated to protect the innocent
20111123        (direct)        793     4256
20111123        aol     9       62
20111123        ask     3       75
20111123        bing    118     867
20111123        yelp.com        1       7
... many entries eliminated to protect the innocent
20111124        (direct)        475     3526
20111124        aol     4       31
20111124        ask     5       37
20111124        bing    60      389
... many entries eliminated to protect the innocent
20111125        (direct)        676     4041
20111125        aol     6       42
20111125        ask     7       44
20111125        bing    99      588
20111125        yahoo   130     1152
20111125        yellowbook.com  3       14
... many entries eliminated to protect the innocent
20111126        (direct)        664     3524
20111126        aol     5       35
20111126        ask     9       34
20111126        bing    76      599
20111126        sxsw.com        7       17
20111126        yahoo   138     1002
20111126        yellowbook.com  5       18
20111126        yellowpages.com 1       19
20111126        yelp.com        2       48
... many entries eliminated to protect the innocent
20111127        (direct)        636     3507
20111127        aol     7       127
20111127        ask     7       67
20111127        bing    84      548
20111127        en.wikipedia.org        1       7
20111127        facebook.com    2       12
20111127        google.com      10      72
20111127        m.yp.com        2       5
20111127        mail.aol.com    1       4
20111127        yahoo   127     1074
20111127        yellowbook.com  3       10
20111127        yellowpages.com 1       1
20111127        yelp.com        2       38

 

Drop the date dimension and you get all referrers in your date range with no date grouping by the “ga:date” dimension. The query:

query_uri = gdata.analytics.client.DataFeedQuery({
      'ids': PROFILE_ID,
      'start-date': sd,
      'end-date': ed,
      'dimensions': 'ga:source',
      'metrics': 'ga:visits,ga:pageviews',
        })

It’s output:

ga:source       ga:visits       ga:pageviews
(direct)        5117    31388
aol     50      386
ask     43      333
bing    677     4705
bookmarks.yahoo.com     2       6
business.com    1       7
google  7937    53270
mws.ask.com     1       10
my.msn.com      1       41
my.yahoo.com    1       3
sxsw.com        33      89
yahoo   875     6789
yellowbook.com  25      104
yellowpages.com 7       68
yelp.com        6       102
... many entries eliminated to protect the innocent

 

Both of those outputs are valuable. First, the referrers, their referral volume, by date is quite valuable. I’d expect to use the second version to get an overview of aggregate referreral volume without the date segmentation to get a feel for a site’s referral health. I’d use the first version to find out when the referrals were delivered.

Here’s the full code for the Python program that produced the second output:

#!/usr/bin/python

'''

Use the Google Data python module to query Google Analytics

You'll get the "PROFILE_ID" from your Google Analytics account.  From the default
listing you click on the account name and then take the "Edit" action on the website
profile you want to find the profile id for.  It is listed near the top right under
"Profile Settings"
'''

__author__ = 'Leo Edmiston-Cyr '

import gdata.analytics.client
import datetime

USERNAME = 'yourAnalyticsAccount@gmail.com'
PASSWORD = 'youToughPassw0rd'
PROFILE_ID = 'ga:1234567' # the GA profile ID to query
SOURCE_APP_NAME = 'GAGettah' # anything you want to call it
sd = datetime.date(2011,11,21)
ed = datetime.date(2011,11,27)
#ed = datetime.date.today()

def main ():

    my_client = gdata.analytics.client.AnalyticsClient(source=SOURCE_APP_NAME)
    my_client.client_login(
        USERNAME,
        PASSWORD,
        SOURCE_APP_NAME,
        service='analytics')

    query_uri = gdata.analytics.client.DataFeedQuery({
      'ids': PROFILE_ID,
      'start-date': sd,
      'end-date': ed,
      #'dimensions': 'ga:date,ga:source', # enable grouping by date
      'dimensions': 'ga:source',
      'metrics': 'ga:visits,ga:pageviews',
        })
    feed = my_client.GetDataFeed(query_uri)

    # find out if this is the first run through the results
    # to build a simple header for the dimensions and metrics
    firstRun = True
    heading = []

    # we'll run through the data feed reutrned from our query
    for entry in feed.entry:

      # build each row of data from the feed
      row = []

      # pull all dimensions out of this entry
      for dim in entry.dimension:
        if firstRun:
                heading.append(dim.name)
        row.append(dim.value)

      # pull all metrics out of this entry
      for met in entry.metric:
        if firstRun:
                heading.append(met.name)
        row.append(met.value)

      # print the dimension and metric names as the header
      if firstRun:
        print "t".join(heading) + "r"

      # print all rows from the feed as they are built
      print "t".join(row) + "r"

      # don't print the dimension and metric names as the header again
      firstRun = False


if __name__ == '__main__':

        main()

Want more dimensions and metrics? Visit the GA API dimensions and metrics reference page http://code.google.com/apis/analytics/docs/gdata/dimsmets/dimsmets.html

These articles were aimed at the technical web marketer who either writes a little code or has a programmer they want to egg on.