Monte Carlo

Monte Carlo - Redesign data lineage experience

Role

Product design and research

Company

Monte Carlo

Timeline

Jan 2023 - present

Problem

Monte Carlo provides data observability through a combination of out of the box and custom rules on various different data quality dimensions such as accuracy, validity, timeliness, consistency. We wanted to improve our troubleshooting and investigation offerings and improve time to resolution for incidents.

Solution

Lineage is a core feature through which users not only navigate their data stack, but also troubleshoot issues along the pipeline. Our existing lineage had many technical and usability problems that users frequently complained about.

Lineage issues

Lineage is one of the most used features in our product. New users find it as a great way to start exploring the product because it gives them a good overview of their data architecture visually. Seasoned users rely on lineage trace upstream causes and downstream impact. Our existing lineage really frustrated users resulting in poor retention and tons of support tickets.

Layout and canvas issues

Lineage canvas was placed inside a card, awkward to navigate especially on very busy graphs.

Handling massive graphs

Trying to bulk-expand the graph was very slow and the expanded graph gets really messy and unruly. Also the interaction for bulk selection was done by choosing how many levels of nodes to be expanded (depth). Users called out that was not intuitive and cumbersome.

Poor filtering experience

For large graphs being able to filter out certain types of nodes is critical. But filtering out irrelevant nodes in the lineage did not hide the nodes fully, this was an implementation detail that became a technical restriction. As a result, the lineage still had poor readability.

Poor and buggy upstream tracing usability

Tracing incidents on the current and upstream nodes was done via a pop-over list. It was very clunky, didn't scale well for a long list and had been buggy for a while as the hyperlink was broken.

Buggy node details

We had a toggle that allowed user to control the density of information they can see on a node. When they're just exploring relationships, a smaller node size is preferred. But when user is drilling down into a specific section of the lineage, the detailed view is more relevant. But even this interaction was buggy, some information was not showing up or showing incorrect data.

Competitive lineage analysis

"We explored and tested a handful of lineage libraries and visual structure of the nodes to narrow down the ones that best suited our requirements."

"Let's walk through the final redesigned lineage."

Improved canvas that is seamless and snappy

The lineage canvas now occupies the full page. The new canvas also provides some standard patterns such as trackpad zooming and rearranging nodes.

Lazy loading to handle massive graphs

We simplified bulk-expansion of the lineage by providing lazy loading that expands the lineage as the user navigates around the canvas.

Better filtering

Users can now fully hide nodes by applying filters making the lineage a lot more readable. The performance is fast and snappy.

Tracing upstream is simpler, bugs fixed

We improved the incident overlay on the node, allowing the ability to set lookback period and view recent alerts in a drawer. This makes it easier for the users to trace incidents along a pipeline.

Feature parity by allowing users to collapse nodes

Previously users could only expand nodes, but not collapse. Added a new feature that was requested by users: ability to collapse expanded nodes.

"Impact"

Qualitative feedback from users was positive and support tickets for lineage went down by 60%

While we did not see any significant increase in the number of users as lineage is already one of the most used features of our product. But we did observe an improvement in user retention. We would see a dropoff after 10 weeks, but the new lineage manages at 40% retention even at week 15.

There was a small increase in conversion of users investigating incidents after using lineage.