LLM-based Tools for Data-Driven Applications – Decomposition of Architectural Components

Welcome back to the next in this three-part series, which outlines a pragmatic approach to selecting LLM-based tools for data-driven, enterprise applications.

In this blog series, I break down the tooling ecosystem for LLMs, starting with open-source tools and public cloud-based managed services, to help organizations select the right tool or suite of tools for their use case and understand how to best approach decision-making. (We are two parts in, and I still don't mention a single tool until the end of this post.)

In the last installment, we looked at the starting point for evaluation, the most viable use cases, and the non-negotiable elements on which to build.

In this installment, we do two things:

We look at how to decompose your use case into its necessary components from a workflow perspective.
We discuss the architectural components of tools in the LLM ecosystem and the patterns that these tools need to embody in order to successfully meet the needs of your data-driven use case.

Along the way, we will begin to introduce some of the criteria we recommend you consider for evaluating and selecting the software and infrastructure tools once you have a firm handle on your use case and workflow.

Decomposing Your Workflow into Task Components

A Task Component is any node you would find in an LLM-based workflow, including the jobs you would ask an LLM-based tool or system to do. Task Components can be combined to accomplish meaningful goals, but not all tools and frameworks can connect each of these Task Components together.

Therefore, the selection of the tool or framework depends on the Task Components that need to be combined, the complexity of the workflow graph (diagram) to be constructed, and your willingness to take a service-based approach to LLM-based workflow development.

Some frameworks only support a limited set of Task Components. Other frameworks only support linear workflows. Still, other frameworks outsource the construction of the workflow graph to the LLM itself or decompose the graph into agents that operate independently.

Below, I provide a few examples of workflows and corresponding task components. These examples are meant to be illustrative but representative of the types of components that would typically be incorporated into an LLM-based system.

Example workflows and corresponding task components

We recommend constructing the workflow graph yourself, with the possibility of conditional nodes using LLM-based or directly coded “reasoning” to lead to other segments of the graph. Repeated segments of the graph can appear as loops.

If you have an LLM-based use case in mind, find a piece of paper, a LucidChart, or a Miro board, and start diagramming the workflow of your use case using the task components described below. Understanding which task components you will need and where each task component will integrate with other systems and other task components helps to determine which tools you should use to construct the workflow. This diagram will likely resemble a directed acyclic graph (DAG)*.

Note: A directed acyclic graph (DAG) is a conceptual model used to represent a sequence of operations or tasks where the vertices (nodes) represent the tasks, and the directed edges (arrows) indicate the dependencies [or information passed] between these tasks. The "acyclic" part of the name implies that the graph does not contain any cycles; this means there is no way to start at one task and traverse through the dependencies to arrive back at the original task. This property is crucial because it ensures that there are no infinite loops or deadlocks within the workflow. Source: ChatGPT

Task Components of an LLM-based workflow

Below, we list the basic building blocks of LLM-based workflows.

LLM Task Workflow Components (Divided so that LinkedIn will render them)

Please note that not all of these components need to be completed by an LLM. Although an LLM can be used to complete all of these components, any of them could be achieved by a human in the loop, another model in the loop, or an API in the loop. The beauty of identifying the task components is that you can understand the nature of the workflow to be completed before you selectively integrate the LLM to complete aspects of that workflow.

Identifying task components also encourages you to be aware when adding components to your system. These systems are so flexible and human-workflow-like that I often find myself adding components without even realizing it. Yet each new component adds another layer of complexity, another thing that must be evaluated, quality controlled, debugged, and optimized.

An Example of RAG Workflow

As you can see, LLM-based systems can exist at varying levels of complexity. A simple retrieval-augmented generation (RAG) system might only have a few components:

The RAG example above uses the input as a query, injects the retrieval results into the generator, and receives the output. This architecture is used in many proofs of concept, and it can achieve many valuable goals.

A more complex RAG system might have more components:

Example 2: A more complicated RAG workflow

Please note that this diagram isn’t as complicated as it might be, for example, if you utilized a router to query specific types of documents or data depending on the nature of the query, in so-called “Agentic RAG”. Nevertheless, this workflow diagram is directionally correct for companies that are using LLMs to compile large research reports that synthesize information from available data.

I find most of the online discussions of these workflows focus on the idea that “good systems always incorporate more (complex) components,” such as more refined chunking, more elaborate search systems, additional layers of reranking, etc. I respectfully disagree. You should use the absolute minimum number of components necessary to accomplish your goal. Even if you choose to utilize more components, the best approach is to break down the workflow into as many independent workflows as possible, each of which can be evaluated on its own merits.

By converting a monolithic LLM system into a series of microservices, each of which can be benchmarked and evaluated in isolation, the software system as a whole can be optimized by evaluating each individual component without having to diagnose problems with complicated end-to-end scenarios.

Breaking Down the Mega-RAG

For example, here is how one could break down the complicated RAG workflow into something more manageable:

Phase 1: How good is our system at returning the documents we need to return? Tweak the queries, optimization, document chunking, and retrieval parameters until the initial ranked set of documents contains the most relevant information to our queries for our specific use case toward the top of the results.

Phase 1: Document Retrieval + Evaluation

Phase 2: How well is our reranking model working? Tweak the reranking model parameters (or just remove the reranking component entirely!) to ensure that the secondary output is reliably the most relevant document(s) to the input queries.

Phase 2: Document Reranking + Evaluation

Phase 3: Is the initial model generating the output we expect from the augmentation and conditioning we apply to it? Optimize the augmentation and conditioning (prompt optimization, fine-tuning, preference optimization) until the output matches your expectations. Or return to phase 2 to improve the outputs from the retrieval pipeline.

Phase 4: Are our revisions to and validations of intermediate outputs improving the product? Iterate on the revision and evaluation process until the output meets your expectations. Establish better criteria for evaluating the intermediate draft to improve the back-prompting likely happening within the Model / Generation.

Phase 4: Intermediate Validation Loop + Second Draft Evaluation

Phase 5: Are our revisions of the final combined output leading to improvement in the quality of the output? Are we improving against our final evaluation benchmarks? If the individual outputs from Phase 4 are good, then any remaining issues have to be with the combination and revision of the final output to be produced. Once again, this would lead us to diagnose an issue with our revision approach, which may have to do with the conditioning we apply to the model, the information we incorporate into the Augmentation or Prompt, or the criteria we utilize for intermediate evaluation while refining the final Output.

Phase 5: Additional Validation Loop + Final Output Evaluation

Benefits of Workflow Decomposition

Does this seem more complicated? From an infrastructure perspective, it likely is. Each of these phases needs to be hosted independently, and interfaces need to be created between them.

Yet many benefits come with decomposing your workflow into task components and phased workflows:

Higher Resolution Evaluation, Debugging, and Optimization: Decomposing the full workflow into phased workflows allows you to understand your expectations from each phase and what “job” each phase is designed to complete. It also forces you to create evaluation criteria for each phase. Whenever there is a problem with the system as a whole, looking at intermediate evaluation metrics helps you diagnose where the problem likely originates.
Safer Experimentation and Testing: With the workflow decomposed, you can experiment with altering components, or parameters of components, to understand the impact of those alterations on a particular phase’s output. If you only have a monolith to experiment with and a single set of final evaluation criteria, you can only experiment with one thing at a time and cannot understand the interdependencies between the phases.
Better Navigation of Latency, Cost, and Performance Tradeoffs: Utilizing workflow phases helps with selecting the best model for the job at each phase. This makes it much easier to incorporate third-party APIs where data is not sensitive, questions are diverse and open-ended, and output quality is critical. You can attribute costs to each workflow phase and optimize that phase to reduce costs while holding quality constant. You can utilize self-hosted, smaller LLMs where the task is repetitive, inputs and outputs are consistent, and consistent response latency is critical to workflow success. You can also experiment with new, open-source models as they become available to discover whether they can be used in the place of third-party APIs. Or experiment with new fine-tuning approaches or preference optimization approaches. None of this is possible if you specify a single model for the entire workflow.
Easier Evaluation and Integration of External Tools: Ask not: Can I build my entire end-to-end workflow in this tool or framework? Ask: Can this tool or framework accomplish the one job I assign to it within this workflow phase? Ask not: Can I stand up the monolithic workflow in the fewest lines of code possible? Ask: Can this tool or framework easily plug into my architecture without requiring me to migrate every component of that architecture?

A microservice approach to LLM-based workflow development gives you the confidence to experiment with and integrate new tools.

Do you want to try a different vector database with joint keyword and vector querying? Great! Swap out your existing database to see how it impacts your preliminary evaluations.

Do you want to change your chunking strategy? No problem! Just alter your component and see how it impacts preliminary evaluations.

Thinking this way assists both with improving your system as a whole and with selecting tools to assist with that improvement.

Conclusion: 10 Principles for Selecting LLM Tools and Frameworks

As a teaser for Part 3 of this series on how to evaluate tools and frameworks for working with LLMs, we believe you should choose tools and frameworks that align with the following principles embodied in the discussion above:

They do at least one job (task component) very well
They play nicely with an ecosystem of tools with which you may experiment to do a workflow of tasks
They give you control of the workflow DAG
You can use them in the context of a larger workflow DAG
They make it simple to strategically insert a human into the loop
They make benchmark dataset creation easy
They make evaluation against benchmarks easy, either through integration with external evaluation tools or direct evaluation as part of the framework
They provide interfaces for local task components and API-based task components (generation, storage, retrieval, etc)
They make it easy to integrate model conditioning in all its forms into your workflow, including prompt-optimized, fine-tuned, and preference optimized models
They make it easy to review, evaluate, and take action based on intermediate outputs, not just the final output

With the principles above, we believe that you can find the LLM tools that set your workflow up for long-term success and maintainability.

A Few Notes on Agents

I like LangChain's categorization of LLM-based applications

Harrison Chase, founder of LangChain, recently proposed these levels of LLM automation in his recent blog (shown in the image below).

Harrison Chase's Levels of Autonomy in LLM Applications

My view is that the further you move down this hierarchy, the more likely you are to fail. I suggest you architect your LLM-based system as close to the top of the hierarchy as possible, given your use case, and work your way down. Fully autonomous, LLM-based systems are still a ways off (or a large time + financial investment off), so I would avoid attempting this for now. Building a viable LLM-based state machine is a worthy goal, but not a trivial exercise.

I'm skeptical of role-playing LLM agents outside of research-based workflows

Several of the agentic frameworks, in particular CrewAI, focus on assigning roles to different “agents” so that messages can pass between them and teams of agents can accomplish a particular goal. So, in the case of the complex RAG framework demonstrated above, with multiple iterations of generation, revision, and evaluation, each component could be assigned to a separate “agent” with a separate role. In many ways, this framework is easier to understand because it maps to our experience of working in teams of people who have different roles to accomplish a given outcome.

Yet, this framework requires you to give more control to the LLM as an “agent” than I would recommend giving. Yes, there are methods of incorporating human feedback into the agent’s workflow, but one of the downsides of this paradigm is that the automation component is completely AI-driven rather than controlled using software you own and architect directly. This creates many more opportunities for errors and far fewer levers from which to optimize the system.

Ultimately, I think these systems can work well for research because research is an intermediate output along a larger workflow with a human in the loop. But these systems can consume lots of API tokens with no recourse but to replatform to a microservices-based system with more control when you cap out capabilities.

DAGs make sense, but beware heavy abstractions

Several agentic frameworks, such as LangGraph, Burr, and newly released ControlFlow, focus on constructing the DAG / state machine for the LLM-based workflow. I love that this organizational principle (heavily utilized in this blog post) is explicitly embodied in the frameworks, and that the DAG is visualizable as you construct the graph's nodes (tasks or states) and edges (dependencies) along the pathway to achieving the outcome.

And yet, embodying the full DAG within a single framework makes it more challenging to decompose that DAG into its component parts and achieve the benefits discussed above. It's not impossible, just more difficult and subject to the levers and enhancements exposed by the framework's developers.

The reason for this will be the subject of Part 3 of our discussion of LLM-based Tools for Data-Driven Applications: the lightness or heaviness of the abstractions provided by LLM Tools and Frameworks and how that relates to your ability to build maintainable and enhanceable applications on these frameworks confidently.

If you made it this far, I thank you for reading and look forward to your feedback!

‍

Join newsletter

Stay up to date with new case studies. We promise no spam, just good content.