Watson Knowledge Catalog
Discover and govern analytic assets to power AI
IBM Watson Knowledge Catalog powers intelligent, self-service discovery of data and models— activating them for artificial intelligence, machine learning and deep learning. With the Catalog, data professionals can access, curate, categorize and share machine learning models, structured and unstructured data for data science and AI.
It works tightly with Watson Studio, an integrated environment that gives data scientists, developers and business analysts all the tools they need to build, deploy, train and manage AI, ML and deep learning models at scale.
While serving as Design Lead, The Forrester Wave: Machine Learning Data Catalogs, Q2 2018 named Watson Knowledge Catalog a leader.
Data professionals spend 80% of their time trying to find and prepare the data they need. That means they only spend 20% for their actual work. They are forced to rush through the important pieces that produce results like model building, visualizations, and reporting.
A large part of the problem is the expansive data lakes that have turned into swamps at enterprises. The more data added, the harder it is to find, store, and govern terabytes on terabytes of data. This leaves data scientists, business analysts, and anybody that uses data to question:
Where is our company's data? And once I've found it, what is it really?
Where did the data come from and how accurate is it? Can I trust it?
I'm blocked by governance policies. How can I get access to my company's data?
I've had the honor and pleasure of leading our design team based in Austin, TX since the beginning of 2018. For the first 3 months of 2018, our team worked in two week sprint cycles to deliver four key new experiences for the Catalog to be shipped for annual IBM's Think Conference. We hit our targets.
As Design Lead, I am responsible for leading the design vision and execution for the product. I do this by partnering with offering management and engineering leaders and executives to prioritize product requirements and implement user experiences that are valuable and consistent with other products of the platform.
McKenzie Carlile, UX Designer
Frances DiMare, Design Researcher
Amanda Hughes, Visual Designer
Joshua Kramer, Visual Designer
Tina L. Zeng, Design Lead
With assistance from: Vickie Culbertson, Design Lead; Danielle Demme, Visual Designer; Wenjing Li, UX Designer
With Design in Austin, we collaborated with our Offering Management team in the UK and Canada, and our Development teams in the US, Japan, and India. Yup, 6 different timezones....
Global Collaboration at Scale
To deliver Watson Knowledge Catalog, we worked with a global multidisciplinary team across 6 different time zones!
At the end of 2017, our design team had shipped Watson Data Platform and designed the key components of a data catalog, governance, and a unified UI. As both a reflection and next step, we were able to go completely blue sky on re-envisioning and re-designing what we had just built with the support of our Offering Manager, Jay Limburn.
Design Prompt: Reimagine what the ultimate Data Catalog experience could be.
As a design team, we ideated through 3 design thinking workshops:
Design Inspiration and Audit: “What if it was like…”
User Stories and Journeys: “Our favorite data scientist starts her day…”
Ideation and Prototyping: “Let’s show the team how this could be real…”
In January of 2018, we were green-light-go on developing the blue sky concepts that our team came up with to be generally available/shipped for annual IBM's Think Conference.
Our users' main goal at the start of their workflow is to find safe, trusted, and accurate data for business analysis, predictive modeling, and active governance.
Based on user research with business analysts, data scientists, and Chief Data Officers, we identified the key users, artifacts, and the key tasks they needed to complete to get their job done.
We shipped 5 new user experiences for Watson Knowledge Catalog in 3 months:
Smart search and suggest of assets powered by Watson
Ingest and profile PDFs and unstructured data using Natural Language Processing
Masking sensitive data with policy driven transformation
Understand how all assets are connected together with a visual map of related content
Document and share tribal knowledge with ratings and reviews of assets
1. let watson power your search
First, we designed smart search and suggest by Watson so that users can easily find data assets that they were looking for and discover assets recommended by Watson that weren't on their radar originally. Sometimes you just don’t know what you’re missing— Watson should help you with that.
Watson Knowledge Catalog uses Watson Machine Learning to derive a list of assets that users haven't accessed yet based on attributes common to the assets that they've viewed, created, and added to projects, such as tags, asset classification, attribute classifiers, data types, asset owners, and asset types.
Smart Search and Suggest
Sometimes you just don’t know what you’re missing— Watson should help you with that.
2. Leverage Natural language processing to profile unstructured data
From user research, we’ve heard that business analysts and data scientists rely heavily on each other to complete their journey from finding data to delivering an analysis for business decision making. This tribal knowledge comes with time working at the company and asking other colleagues about which data to use and where they can find it. We asked ourselves, what if we could aid in that process? The first of the social components that we introduced is rating and reviewing assets to capture this tribal knowledge and document it in the tool itself and not have it lost in Word files, emails, or Slack channels.
Profiling Unstructured Data using Natural Language Processing
Ingest unstructured data like PDFs and HTML files and the Catalog will convert it to a consumable form for you to clean, shape, and create models and reports with.
3. Mask sensitive data with the power of Policy driven transformation
Never worry about governance again because what you see is what you can get. The Catalog masks sensitive data automatically and gives access to users to more data. User can use the Rule Builder in the Policy Manager and enforce policies and rules based on writing conditions and assigning an action.
For example, If asset contains PII data, then deny access. Conditions use terms and operators to specify the relationship between data and users.
Up until now, governance tools enabled enterprises to document their business rules and policies regarding their data. With the impending GDPR enforcement date of May 25th, 2018, policy driven transformation in the Catalog can finally help enterprises actually enforce their policies and rules on the data rather than just documenting them.
This is innovation.
4. understand how it all connects together: Visual Map of Related Content
Through speaking to Chief Data Officers, we learned that not only are structured or unstructured data files are data assets, they also view the enterprises' policies, rules, and business terms as data assets. Governance teams want to know which assets are governed by which governance asset— what rules and policies are governing this particular asset? Additionally, business analysts and data scientists want to know if what other users have used this asset and if they can safely access this asset and if not, what policies and rules are blocking their access. Users can click on the related content tab in the overview page of an asset to view details about the associations of the selected data asset.
Understand how assets are all connected together through a visual map of related assets that reveal related policies, projects, rules, terms, and users.
5.Document and share tribal knowledge: Ratings and Reviews
From user research, we’ve heard that business analysts and data scientists rely heavily on each other to complete their journey from finding data to delivering an analysis for business decision making. This tribal knowledge comes with time working at the company and asking other colleagues about which data to use and where they can find it. We asked ourselves, what if we could aid in that process? Can we introduce a social component to the Catalog inspired by the "shop for data" metaphor?
Ratings and Reviews
Leverage the expertise of your colleagues by reading reviews of assets in the Catalog and contributing your own review to help others.
Shipped for IBM Think Conference 2018
Our team shipped four new user experiences for Watson Knowledge Catalog for the IBM Think Conference in March 2018. I worked with offering management and engineering leads to prioritize the user experiences that was GA and demoed at Think.
Recognized Leader in 2018 Forrester Wave
The Forrester Wave: Machine Learning Data Catalogs, Q2 2018 names Watson Knowledge Catalog a leader.
Now that we've shipped five new features for Watson Knowledge Catalog, we are entering a vigorous user testing phase. Before we ship new user experiences, we're giving the entire product a design audit— where have we created experiences that aren't the most delightful now that we can step back and evaluate the experience as a whole? Where are the gaps that exist for our user when trying to accomplish their tasks? What UI components aren't pixel perfect?