Stanford’s DSPy LLM Framework

Setting up the Foundational Model Landscape
DSPy
- Inspired by PyTorch
Why DSPy is superior to current LLM frameworks like Langchain?
- 1. Clean up prompts and structured input/output
- 2. Control
Teleprompter

Setting up the Foundational Model Landscape

The current LLM/foundational model landscape can be defined by dedication to task-specific design, that prioritize flexibility for performance. These tasks include:

Adaptation
- Prompting- zero-shot, few-shot, kNN
- Finte-tuning- PEFT, LoRa
Reasoning
- Chain-of-thought
- Agent loops
Augmentation
- Retrieval augmentation, multi-model augmentation
- Calling external tools
Retrieval modeling

Currently, it is not clear to how to connect and optimize all the pieces of current LLM pipelines. This is because pipeline fintetunes can be very complicated, and prompts are extremely fragile and non-transferable. Just take a look at current research papers that utilize on prompt engineering, and you will find massive free-form researcher-written prompts that are not sustainable or applicable to industry practices.

Given these myriad of tasks and the rapid development of the industry and academia associated with language modeling, it should be of critical important to develop robust pipelines that can serve a variety of tasks rather than hyper-specialized systems for individual language modeling tasks.

This is where DSPy combines in: to program, not prompt LLMs.

DSPy

DSPy was created by Stanford University researchers with the goal of providing improved abstractions for different language modeling tasks. DSPy is all about programming, not prompting; it has been designed and engineering to control how your LLM modules interact with each other programmatically. As such, the goal of DSPy is to shift the LLM paradigm from hacking on prompts to making sound and reproducible architectural decisions. They do so through the introduction of three concepts:

DSPy’s framework is enabled by its provision of composable and declarative modules, structured in a manner familiar to Python users. This advancement elevates “prompting techniques” such as chain-of-thought and self-reflection from manual string manipulation tactics to versatile modular procedures capable of adapting to diverse tasks.

Signatures- abstract prompts as input/output dimensions of a module
1. question -> answer
2. long_document -> summary
3. context, question -> rational, response
4. passage -> main_entities
Modules- generalized prompting techniques (and compose multi-stage systems) as parameterized layers
1. dspy.ProgramofThought
2. dspy.ReAct
3. dspy.ChainOfThought
4. dspy.MultiChainComparison
Teleprompters- optimize systems towards arbitrary metrics
1. BootstrapFewShot
2. BootstrapFinetune
3. BootstrapFewShotWithRandomSearch

Inspired by PyTorch

A neural network is implemented in python by initializing the model components, and then utilizing those components in a function that executes a forward pass through the model:

class Net(nn.Module):
	def __init__(self):
		super().__init__()
		self.conv1 = nn.Conv2d(3, 6, 5)
		self.pool = nn.MaxPool2d(2, 2)
		self.conv2 = nn.Conv2d(6, 16, 5)
		self.fc1 = nn.Linear(16 * 5 * 5, 120)
		self.fc2 = nn.Linear(120, 84)
		self.fc3 = nn.Linear(84, 10)

	def forward(self, x):
		x = self.pool(F.relu(self.conv1(x)))
		x = self.pool(F.relu(self.conv2(x)))
		x = torch.flatten(x, 1)
		x = F.relu(self.fc1(x))
		x = F.relu(self.fc2(x))
		x = self.fc3(x)
		return x

In the CNN above, the convolutions have inductive bias due to the weight sharing kernels. Similarly in DSPy, the docstring that serves as the prompt implements a degree of inductive bias by giving the LLM our use-case.

Current LLM frameworks typically assume labels only for the program’s final output, not the intermediate steps. As you can see below, like in the PyTorch model above, we can specify intermediate model steps and even add validation steps via DSPy’s Suggestion module.

class RAG(dspy.Module):
	def __init__(self, num_passages=3):
		super().__init__()

		self.retrieve = dspy.Retrieve(k=num_passages)
		self.generate_answer = dspy.ChainOfThought("question, context -> answer")

	def forward(self, question):
		context = self.retrieve(question).passages
		prediction = self.generate_answer(context=context, question=question)
		
		return dspy.Prediction(context=context, answer=prediction.answer)

In the above RAG system, we are defining or components in the class initialization, and then defining how they interact during the forward function. In this system, we plug the question into our retriever, get our context passages, and then finally pass the context to our generator to get the answer.

Why DSPy is superior to current LLM frameworks like Langchain?

1. Clean up prompts and structured input/output

Within the realm of DSPy, when tasks are allocated to LMs, specificity in behavior is expressed through Signatures. A Signature encapsulates a declarative representation of an input/output behavioral pattern associated with a DSPy module.

Unlike expending effort on guiding an LLM through sub-tasks, Signatures empower DSPy users to elucidate the nature of the sub-task. Subsequently, the DSPy compiler takes on the responsibility of devising intricate prompts tailored for a large LM or fine-tuning a smaller LLM, all in line with the specified Signature, the provided data, and the existing pipeline.

class GenerateAnswer(dspy.Signature):
""" Answer questions with short factoid answers."""
	context = dspy.InputField(desc="may contain relevant facts.")
	question = dspy.InputField()
	answer = dspy.OutputField(desc="often between 1 and 5 words")

The docstring gives the prompt of what the task is.

2. Control

DSPy offers assertions, a feature with the capability to declare computational constraints within DSPy programs. This allows programmers to specify natural-language rules for valid outputs, guiding the behavior of language model calls during both compiling and inference stages. The approach harnesses Pythonic style of assertions while meshing backtracking logic to ensure autonomous self-correction and refinement of language model calls. By accounting for past outputs and passing forward relevant feedback and guidelines for self-correction, this feature offers a significant leap in DSPy with enhanced control over program behavior.

DSPy provides two key mechanisms for Assertions:

dspy.Assert: This mandates that the program must satisfy the given assertion, raising an Exception otherwise. This is important when enforcing non-negotiable constraints within the program.
dspy.Suggest: Unlike Assert, Suggest is more flexible. It encourages the program to meet the assertion but allows the program to continue even if the assertion is not satisfied. This is particularly useful for guiding the program towards desired outcomes without halting execution for non-critical issues.

class MultiHopQA(dspy.module):
	def __init__(self):
		self.retrieve = dspy.Retrieve(k=3)
		self.gen_query = dspy.ChainOfThought("context, question -> query")
		self.gen_answer = dspy.ChainOfThought("context, question -> answer")

	def forward(self, question):
		context, queries = [], [question]
		for hop in range(2):
			query = self.gen_query(context=context, question=question)

			dspy.Suggest(len(query) < 100, "Query should be less than 100 characters")
			dspy.Suggest(is_query_distinct(query, queries), f"Query should be distinct from {queries}")

			context += self.retrieve(query).passages
			queries.append(query)

		pred = self.gen_answer(context=context, question=question)

		return dspy.Prediction(context=context, answer=pred.answer)

Teleprompter

Notably, DSPy introduces an automatic compiler that educates LLMs on executing declarative stages within a program. This compiler internally traces the program’s execution and subsequently devises high-quality prompts suitable for larger LMs, or even trains automatic finetuning for smaller LMs, imparting knowledge of the task’s intricacies.

Compiling hinges on three pivotal factors: a potentially compact training data set, a validation metric, and the selection of a teleprompter from DSPy’s repertoire. These teleprompters, embedded within DSPy, are potent optimizers capable of mastering the art of formulating impactful prompts tailored to any program’s modules. (The “tele-” prefix in “teleprompter” implies “at a distance,” referencing the automatic nature of prompting.)2024-02-13

Here’s a simple example:

trainset = [dspy.Example(question="Joseph and his friends watch two movies...", answer="4", ...)]

CoT = dspy.ChainofThought("question -> answer")
teleprompter = dspy.BootstrapFewShotWithRandomSearch(metric=gsm8k_accuracy)
bootstrapped_program = teleprompter.compile(CoT, trainset=trainset)

Overall, DSPy offers a very concrete way for collaborative, modular, open research to compete with and outperform “scale is all you need” LLMs. See more at the project github: https://github.com/stanfordnlp/dspy/tree/main.