Blog | tags:

Tooling means more than syntax: How we built a YAML-based language

Written by Tomáš Muchka | Jan 22, 2024

We at GoodData are pioneering the idea of treating analytics as code. This allows you to apply the same principles, which Software Engineers have practiced for a long time.

We decided to use YAML as a cornerstone for our analytics code. Interested in why? Then check my article 5 Reasons Why to Write Your Semantic Layer in YAML.

The main question for this article is: Is there a possibility that YAML would feel like a full-fledged programming language? We believe there is with the VS Code extension we have developed. Let's explore the extension together.

1. Semantic syntax highlighting

You might ask what kind of syntax highlighting we could possibly add for the configuration format such as YAML. Well, you got a point there, there are plenty of YAML extensions already and most of the IDEs even support YAML out of the box. But our syntax highlighting goes way beyond YAML. We are using Semantic highlighting for our Domain-specific language (DSL), which we are building using the YAML format.

Notice the identifiers (e.g., order_id) have a distinct color from the rest of the YAML.

2. Semantic code validation

Our extension automatically validates the YAML structure against our schemas, so VS Code can immediately notify the user when something is wrong.

But let’s not stop there. We are also validating referential integrity, so you don’t have to be afraid of using invalid or unsuitable variables or identifiers. VS Code calls this Diagnostics.

The example contains two code issues. There is an invalid YAML attribute and the referenced metric is also not valid.

3. Interoperability with other languages

Mixing multiple languages together in a single file might not be a good software engineering practice, but let’s admit it: in the world of analytics, it is a common practice. Here are examples of using SQL in Cube measures and Looker fields. We provide in-line support for SQL and for our own query language (MAQL), including syntax highlighting and validations.

The interoperability can be improved even further by moving the SQL or MAQL code to a separate file and then referencing it.

Notice the SQL statement seamlessly integrated into YAML. The statement even supports the SQL syntax highlighting.

4. IntelliSense

I personally consider IntelliSense (or autocompletion) as one of the best productivity boosters when coding. For our use case, we have added a highly sophisticated version of IntelliSense to YAML. We did not stop at YAML structures; we have also added support for user-created entities (e.g., metrics and visualizations). Does it start to feel like you would write in a full-fledged programming language instead of YAML?

A list of applicable metrics whispered by the IntelliSense

5. Preview

Let’s admit it, analytics are highly visual and users would like to see and validate their code as often as possible. It reminds me of my time at the university when I was writing in LaTeX. Having a preview right next to the code is a nice feature that reduces context switching while keeping the effectiveness of an IDE. Preview is also invaluable when debugging the code.

For our preview, we used the Webview API.

To support the discoverability of the preview, we are using a VS Code feature called CodeLens. This feature displays the preview action in the code, one click away.

Visual preview of the dataset in form of a table

6. Go to definition

I guess you all know this functionality from your IDEs. Just Ctrl+Click on the variable name to open its definition. To be honest, we haven’t implemented this in our extension yet. But we know it is possible and, based on our exploration, it should be pretty easy.

7. Code snippets

Adding a snippet is an easy way to efficiently create a large chunk of code. VS Code snippets can be further enhanced by:

Tabstops for user input.
Placeholders with default values.
Choice options for these placeholders.
Utilization of variables.

All of this can be packed and delivered as a part of the extension. We first started with global snippets (project-scoped) but eventually replaced this implementation with context-aware snippets. Imagine a mixture of snippets and IntelliSense.

8. Command palette

IDEs are all about the effectiveness of writing code. Including adding as much keyboard-accessibility as possible. The most important key combination in VS Code is Ctrl+Shift+P, which brings up the Command Palette. From there, you have access to all of the functionality of VS Code, including keyboard shortcuts for the most common operations.

Therefore, adding analytics-as-code related commands was only natural and improved our developer experience. In fact, most of our commands mirror the code lifecycle - particularly cloning or deployment, which would otherwise require a CLI tool.

Tip: Be sure to clearly communicate which commands belong to your extension. We use the prefix “GoodData:” in all the command names.

Conclusion

Doing a custom DSL is a challenging task that without convenient tooling can easily lead to a nightmare. Fortunately, nowadays, IDEs (specifically VS Code in our experience) allow us to deliver an experience we have previously associated only with languages natively supported by IDEs.

Most of the features presented here are a part of the language server extension we have implemented. Interested in creating your own DSL? Then check the Language Server Extension Guide

Want to test what we are cooking at our GoodData Labs? Sign up for GoodData Labs! A space where you can test and experience advanced analytics ideas and features that are currently in development.

Written by Tomáš Muchka | Jan 22, 2024

Blog | tags:

Beyond BI Developers