The perils of AI Engineering

A short while back, I had a brief message exchange over LinkedIn with a friend and good ex-colleague on how well documented LangChain is (clue, it is sadly not the case); or, how easy it was to figure out how to do things that were not given as examples in whatever documentation exists.

The message exchange brought to fore what I have been finding out over the past year or so, on how Generative AI has progressed on the tools and frameworks front.

In a short period of time since Generative AI has come to the scene, the industry and specifically the open source community, has responded to the challenge of enabling easy and efficient large model deployment through the development of new tools and abstraction frameworks (or the repurposing of existing ones). 

This has given rise to the Transformer, LangChain and LlamaIndex frameworks, to name but a few (and probably the main ones).  It is fantastic that we are spoiled for choice when it comes to frameworks, APIs and levels of abstraction that help us focus on the ‘what’ more than the ‘how’, in the space of 4 years since Large Models came into the scene.  What’s more, it is all free to use – at least for the time being.  Who does not like free things, right?

However such rapid growth has not been without pain.  As alluded above, there are some concerns that need to be addressed and resolved, to a varying degree, for each of these frameworks.

Based on our current experience, which seems is also borne by others, one of the main concerns is the quality of the documentation in general, but of LangChain and LlamaIndex in particular.  The documentation available is presented in a haphazard way and is difficult to navigate.  One ends-up having to perform internet searches to find guidance and information for particular examples. 

What is even more concerning, is that the information is lacking completeness:  more than one hyperlink on the LangChain and Chroma DB integration pages has resulted in a ‘404 – Page Not Found’ result.  It is probably of further interest that the same link appeared in error messages provided in response to executing Python code, which demonstrates the rather scant attention given to documentation.  Even more concerning is that the above ‘404’ situation occurred both in links present in coder error messages, the LangChain website on the subject, as well as the relevant Chroma DB website; indicating the close dependencies of documentation quality in the current Generative AI eco-system.

One has to resort to spending valuable time and effort going over code repositories to figure out how APIs actually work and meant to be called and generally figuring things out via experimenting with code.  If it were not for the piece-by-piece code execution capabilities and experimentation that Jupyter notebooks and Python affords, I doubt we would have had the progress we are seeing today.

Alongside the concerns raised on the documentation above, the documentation of APIs seems to be largely consisting of a design / coding tool ‘dump’, with little, and sometimes indecipherable, additional information provided on what the options and parameters do.  There are even fewer examples of how they should be called.

Which brings me to the next point we have observed:  the examples provided in the frameworks we have come across, perhaps (unfairly?) picking on LangChain (it being the most widely used), are either very simplistic and do not build from one section to the next to give you a solid example of usage.

Invariably, they jump right at the deep end, with very little step-by-step explanation of how things work.  A lot seems to left to YouTube trainers and their code examples.  However, even then, you tend to get always the same patterns and depth of explanations (which in most cases is very swallow and does not provide end-to-end usage or explanations as to ‘why’.  Having said that, there is also some real quality videos (and code) out there, if you are lucky to come across them.

The final concern we have, has to do with the fact that the frameworks available seem to mostly do the same thing, but with a slightly different approach.  There is no much separation or significant differences to guide adoption, save for the capabilities that Transformers have in enabling relatively easy LLM fine tuning.

It was rather amusing when, for example, in trying to use Rerankers in one of our project, we realised that the ‘Document’ class of Rerankers was subtly different than the ‘Document’ class of LangChain, even though they were both serving an identical purpose.  Or, discovering that the template variable parameter name for prompt construction, used by a prompt-construction framework function, was hard-wired to a specific parameter name!

I suppose adoption comes down to preference, experience and adoption, a bit like programming language adoption for a particular task.

However, and I think this is a key point, given not only the above, but also the fluidity of the current tools and frameworks scene in GenAI, a company that tries to adopt, develop solutions, deploy and operate GenAI without support and lacking in depth experience is risking failure of adoption. 

Unless a company has either onboarded dedicated teams that have ‘seen the monster in the eye’, or have spent time, effort and cost in training their teams proactively, then the only route to success is by engaging specialists in the area to advise, support and (where necessary and appropriate) lead the company’s GenAI adoption.

I would be interested to hear of your experiences, stories, thoughts and opinions on GenAI tooling, frameworks and adoption to date.  Do the above concerns resonate?  Have you spent time trying to figure out how things work by looking into API source code?

Scroll to Top