📷Extract Tableau Workbooks calculated fields and group definition 😎(by @ace108)

So you inherited took over some Tableau Workbooks and need to rework them into something more efficient and migrate to Tableau server. I am going to recreate teh Tableau Workbooks on Tableau server rather than just publishing as I found the calculated fields get published as a dimension. Other than calculated fields, I found there are group definition to take care of too and I have not look into sets yet. With multiple data sources and calculated fields and groups, I wanted get a complete list of these to work through.

🎶Meanwhile, some music for you to enjoy as you read. This piece is part of Vivaldi’s The Four Seasons, particularly the first of the 4 violin concertos, called Spring.
🎶让大家继续读的时候,欣赏一点音乐。这件作品是维瓦尔第的“四季”的一部分 - 4小提琴协奏曲中的第一首,名为“春天”。
Source | 来源: Source | 来源:
I did some research and found that Tableau Workbooks are basically saved as XML files. So I should be able to “look” into it. I found a page which gives a clue on handling the calculated fields but that doesn’t take care of the group definitions. So I studied the way it works for the calculated fields and rework a bit to cater for getting the information for calculated fields and group definitions. My program will produce two Excel files: one with the information for calculated fields and the other with the information for group definitions. It’s not perfect but here is the program which I wrote in Jupyter Notebook and downloaded as python file if anyone is interested.

#!/usr/bin/env python
# coding: utf-8 
# This program has two parts: 
# 1. Extract Tableau TWB Calculated Fields into an Excel file 
#    Reference : 
# 2. Extract Tableau TWB Group definition into an Excel file  
# Note: I decided to output to 2 worksheets because I wanted to look at them in different format 

# Librarys import and setup 
import easygui 
import xml.etree.ElementTree as ET 
import os 
import pandas as pd 
from IPython.display import Markdown, display, HTML 
pd.set_option('display.max_rows', None) 
pd.set_option('display.max_columns', None) 
pd.set_option('display.width', None) 
pd.set_option('display.max_colwidth', None) 

# To prompt user for twb file, uncomment the next line and comment out the line after. 
# file = easygui.fileopenbox(default="C:\projects\ace\twb\*.twb") 
# Update your TWB filename to read 
file = r"C:\Users\ace\Documents\My Tableau Repository\Workbooks\Tableau For Healthcare - Ambulatory Access.twb"   # <--- change filename if necessary 

#parse the twb file 
tree = ET.parse(file) 
root = tree.getroot() 

# create a dictionary of name and tableau generated name 
calc_dict = {} 
for item in root.findall('.//column[@caption]'): 
    if item.find(".//calculation") is None: 
        calc_dict[item.attrib['name']] = '[' + item.attrib['caption'] + ']' 

# list of calculation field name, tableau generated name, and calculation/formula 
calc_list = [] 

for item in root.findall('.//column[@caption]'): 
    if item.find(".//calculation") is None: 
        if item.find(".//calculation[@formula]") is not None: 
            calc_caption = '[' + item.attrib['caption'] + ']' 
            calc_name = item.attrib['name'] 
            calc_raw_formula = item.find(".//calculation").attrib['formula'] 
            calc_comment = '' 
            calc_formula = '' 
            for line in calc_raw_formula.split('\r\n'): 
                if line.startswith('//'): 
                    calc_comment = calc_comment + line + ' ' 
                    calc_formula = calc_formula + line + ' ' 
            for name, caption in calc_dict.items(): 
                calc_formula = calc_formula.replace(name, caption) 
            calc_row = (calc_caption, calc_name, calc_formula, calc_comment) 

# convert the list of calculation into a data frame 
data = calc_list 
data = pd.DataFrame(data, columns=['Name', 'Remote Name', 'Formula', 'Comment']) 
# remove duplicate rows from data frame 
data = data.drop_duplicates(subset=None, keep='first', inplace=False) 
# Note: This part does not handle the comments enter in the same line as the formula but works for me well enough and I'm not going to amend further. 

# get the name of the file 
# export to Excel 

base = os.path.basename(file) 
filename = os.path.splitext(base)[0] 

out_folder = 'C:\\projects\\ace\\output\\' 
out_file = out_folder + filename + '_' + 'calculations.xlsx' 
data.to_excel(out_file, 'Calculations') 
display(f'Opening output file containinng calculations {out_file}') 

# list of group name, tableau generated name, and group definitions 
group_list = [] 

for item in root.findall(".//column[@caption]"): 
    group_caption = item.attrib.get('caption')[0:] 
    group_name = item.attrib.get('name')[1:-1] 
    for group in item.findall(".//calculation[@column]"): 
        group_depend_field = group.attrib.get('column')[1:-1] 
        for bin in group.findall(".//bin[@default-name]"): 
            default_bin = bin.attrib.get('default-name') 
            bin_value = bin.attrib.get('value')[0:] 
            bin_items = [] 
            for bin_item in bin.findall(".//value"): 
                bin_item_value = bin_item.text 

            bin_items_dict = dict(enumerate(bin_items, 1))            
            group_row = (group_caption, group_name, group_depend_field, bin_value, default_bin, bin_items_dict) 

# convert the list of groups into a data frame 
group_data = group_list 
group_df = pd.DataFrame(group_data, columns=['Name', 'Remote Name', 'Dependent Field', 'Bin', 'Default', 'Items']) 

# export to Excel 
out_file = out_folder + filename + '_' + 'groups.xlsx' 
group_df.to_excel(out_file, 'Groups') 
display(f'Opening output file containinng groups {out_file}') 



Please see my other posts: @ace108

请看我其他帖: @ace108
I See👀. 
 Shoot📷. I Blog it👆😎


Future reading