Lazy Doc, Your documentation tool has arrived.

· 11min · pxp9
A Markdown logo

Idea

All started with a conversation with my boss, I was talking to him about a game we play (Balatro, wonderful game), and suddenly he asked, Are you using AI to program ? I told him that current the state of the AI is not that good to suggesting production code, but it was good to suggest other things like explanations, suggest syntax fixes, etc... One of the problems we have in my company codebase is that we have so much undocumented code, which is not a big problem at all because we have a really clear API of how to interact but each time someone new comes to the project you need to explain the localized knowledge. We also talked about AI writing docs might be good, but we did not even know because we had not tested yet. That was how Lazy Doc idea came from.

What is Lazy Doc ?

Lazy Doc is a tool to improve your nonexistent or missing Elixir docs via AI. It detects undocumented functions, it gives the source code to the AI and it generates documentation based on it.

You just need to install it and configure it, execute mix lazy_doc and boom you have brand new documentation from scratch.

Purpose

The purpose of Lazy Doc is much bigger it aims to be a docs checker as well.

with mix lazy_doc.check you can check if all the documentation under patterns documentation is documented. This check can be easily implemented in the CI, so if you forget to document something CI will not pass.

Documentation criteria

Lazy Doc will only document functions which agree with the following criteria:

  • Public functions not already documented. (already documented functions are ignored)
  • Public functions without hidden documentation tag @doc false. (hidden docs function are ignored).
  • Private functions are ignored (but included in the prompt if they are auxiliary functions continue reading for more info).

so this way if lazy_doc.check does not pass because of a function, you take one of the following actions.

  • Make the function private
  • Make the doc hidden
  • Document or just execute mix lazy_doc.

The main reason LazyDoc can be really good for your project, it is because it will enforce you to write documentation. Even if you don't want, it offers a good start template.

How is it built ?

Okay, you want to build a an automatic documentor right ? So the first step is to detect what the hell you should document and what you should not and leave it as it is.

So what alternatives do you have to detect undocumented functions.

You always need to read the file at least once and then you have multiple approaches.

Process line by line approach

This approach consist to check each line and look for search patterns, in our case @doc whatever on top of a function.

There are a bunch of problems with this approach:

  • how do you get the code of a single function and its clauses ? you can try to read lines until one end , but how do you know if it is an end of a case or an end of the def.
  • how do you retain the format of the file ?
  • how do you detect inner inner modules ?
  • how do you detect if a function is inside an inner module or not ?

As you can see this is a simple approach but it has a bunch of problems and probably many more problems which I had not thought because I did not even try to implement it this way.

So what is the alternative ?

ASTs for the king approach

Okay, so we are gonna try to detect functions through the AST of the module but first, let's explain what an AST is. An AST is a data structure which contains the syntax of the code, in fact it stands for Abstract Syntax Tree.

In Elixir an AST has always the same pattern.

{:defmodule = name, [...] = meta, [...] = children } = node

So you get the AST of file, via Code.string_to_quote() function.

For instance this module, it will have the corresponding AST

defmodule Example do
  @doc "converts to string"
  def number(n) do
    Kernel.to_string(n)
  end
end

Note: you can experiment with ast.ninja site if you want to experiment with ASTs

    {:defmodule, [line: 1],
     [
       {:__aliases__, [line: 1], [:Example]},
       [
         do: {:__block__, [],
          [
            {:@, [line: 2],
             [
               {:doc, [line: 2],
                ["converts to string"]}
             ]},
            {:def, [line: 3],
             [
               {:number, [line: 3],
                [{:n, [line: 3], nil}]},
               [
                 do: {{:., [line: 4],
                   [
                     {:__aliases__,
                      [line: 4], [:Kernel]},
                     :to_string
                   ]}, [line: 4],
                  [{:n, [line: 4], nil}]}
               ]
             ]}
          ]}
       ]
     ]}

You can just pattern match the node you want and get the code from that node. for example:

{:def = name, [...] = meta, [...] = children } = ast_fun

In ast_fun we have the sub ast of that function we just need to convert it back to code and send it to analyze with AI, but just if it is not documented.

And you know that number function is in the Example module because it was in their children nodes list.

How do you convert AST to String again ?

You can just use Macro.to_string(ast_fun). (There is some caviar on this, continue reading).

How do we know if a function is already documented or not ?

Easy, you just need to call Code.fetch_docs(module) which requires to give the name of the module.

Not so fast, what happens if you have the same name of function with different number of arguments (different arities not clauses). This function will return you documentation for each different function in the language, but are 2 functions with the same name different from each other, just because their arity differs? The answer is YES, but does this make sense in the POV of documentation. Think for a moment of what coding situation would you use the same name with different arities.

Let's take a look this example:


## Look this function, does make sense if you hide the other functions ?
def fibs(n) do
  fibs(n, [1, 0])
end

def fibs(1, [a, b | rest]), do: [a + b, a, b | rest]

def fibs(n, [a, b | rest]) do
  fibs(n - 1, [a + b, a, b | rest])
end

Probably just because one of the functions is auxiliary of the other. So the auxiliary function does not make sense by itself and the entry point function does not make sense without the auxiliary. We need a way to group up the functions by name even if they have different arity and clauses (function same name and same arity) to send it to AI in a single prompt.

Fetch the prompt

Okay, you need a way to get the documentation using AI

So what you should do is to call an AI API, but which one ?

There is a bunch of AI APIs and everyone has its favorite AI. Not talking about if you want to use an internal model of the company, it could be really hard. So why not give freedom to call your own AI ?

You just need to implement the LazyDoc.Provider behavior. This is the contract that will use LazyDoc to request the docs to API and extract them from the response.

Make a good prompt (AI is like a kid explain what you want).

Making a good prompt is really important for fetching the docs, you should specify your prompt always with at least one example and be precise with the format you want.

Example:

~s(You should describe the parameters based on the spec given and give a small description of the
 following function.\n\nPlease do it in the following format given as an example, important do not
 return the header of the function, do not return a explanation of the function, your output must
be only the docs in the following format:\n\nReturns the Transaction corresponding to
transaction_id\(Initial sentence indicating what returns the function\)\n\n## Parameters\n\n-
transaction_id - foreign key of the Transactions table.\n## Description\n Performs a search in the
database\n\nFunction to document:\n)

This is the default prompt the library will use, to this prompt you should concatenate the function code. Of course there is a bunch of attempts to get the right prompt and you need to even consider to be compliant with existing doc tools.

Insert the docs on the AST

To insert the doc in the AST you just need to create a @doc node and insert it just before def node.

but what if this def node is in an inner module children. You need to find first the children of the corresponding module. JUST pattern match, it will work (complex pattern match but I promise, it will work). Once you have the children, need to locate the def node position in the children and then insert it just one position before.

The end of the trip ?

And then transform the AST to String with Macro.to_string() and write it to a file right ?

WRONG, you just messed up all the format of the original file.

what? Where are my comments? it just deleted my comments during the entire process.

how do you fix it ?

The master talk about AST manipulation.

At this point, you are almost there, almost done, but you are kind of stuck because you want to preserve the code as it was. Remember this ?

leave the code as it is

It is one of the premises that we have at least to use it in production.

Surprise surprise, you have seen it before, it is the AST NINJA talk. Actually the name of the talk is The Elixir parser under microscope by Arjan Scherpenisse.

This talk is about how to preserve the code as it is after an AST manipulation. Basically what you need to do is what the Elixir Formatter does, Add tokenizer metadata to AST, store the comments outside the AST and merge it after all the AST manipulation.

Fortunately this talk was done years ago and the Elixir std library changed according the talk recommendations, so we just need to use Code.string_to_quote_with_comments with some options and then after the AST manipulation use

ast
|> quote_to_algebra(comments: comments, ...)
|> Inspect.Algebra.format(max_line_width_configured)
|> IO.iodata_to_binary()

This will preserve the format of the file.

Feature requests

After telling you how you can build LazyDoc, I can tell you what features request were requested.

  • They requested a way to not have bloated documentation files, the issue users found was that generating so much docs will cause the file incresize the size dramtically. So in order to address it, I added the external_doc option which allows you to write the doc in a separate markdown file and read the doc of that function at compile time.

Example:

Returns a list of documentation for each function in the specified modules.

## Parameters

- modules - a list of module names to extract documentation from.
  
## Description
 Retrieves the documentation for functions in the given modules using Elixir's introspection capabilities.
  @doc File.read!("priv/lazy_doc/lazy_doc/docs_per_module.md")
  def docs_per_module(modules) do
    ...
  end 

this way, you can enjoy a rich documentation without having big blocks of documentation before each function.

  • In the article, we focus on the functions because they were more interesting to analyze how they were done, but the library can also generate @moduledoc based on the code of the module, but usually a module needs a context in the application in order to write their @moduledoc so the feature request is to make a dependency module graph and include the dependency @moduledoc or the code of dependency module in the prompt. (TO BE DONE).

  • Another interesting feature is to clean all the docs, so this way you can regenerate them from scratch, so we have created a third task mix lazy_doc.clean which deletes all the @doc annotations, but it keeps @doc false annotations.

  • Make a default prompt to generate docs compliant with ex_docs, this way when we generate the docs webpage, you can experience the documentation as it was a usual docs.

Conclusion

Making new projects make you learn a lot of stuff. Usually to share your projects, you need to write how to use them properly, that is the reason developers write docs. I do not know a bunch of people who likes writing docs or they have time to write them. Writing docs maybe does not add too much business value to the product you are building but it will add a huge value to developers building the product or the developers using your product and they are gonna be less prone to making mistakes.

Writing LazyDoc, it allows me to learn about deep aspects of the Elixir core, Elixir AST and learn what weird stuff you can do. For example


defmodule Hello do

  defmodule Hello do

  end

end

This is valid Elixir code, but it is not allowed in LazyDoc.

Writing LazyDoc, I realize that good software sometimes needs to be opinionated, take a strong opinion a follow it until the end, for example external_doc feature is really opinionated because some people will like to have the docs in the code and some other will just like to see the code and not being distracted but anything else. Another strong opinion that this library takes is what does with the functions of the same name, basically we understand that they are the same function to document even that for Elixir it is not. At least every teammates I asked told me that 2 functions named the same are always related, so it makes sense to document them together. Same nested named modules does not make sense in almost any situation, so LazyDoc does not allow it because they have the same AST structure.

Thank you very much for your reading effort.

You can support lazy_doc repo and if you are curious how LazyDoc docs look, LazyDoc is documented by LazyDoc itself you can check it in lazy_doc hexdocs