Data Envelopment Analysis

Data Envelopment Analysis (DEA) is a linear programming-based technique used to measure the relative efficiency of decision-making units (DMUs) with multiple inputs and outputs. The basic idea is to compare the performance of each DMU against a set of “best practice” DMUs called the “efficient frontier”. The efficient frontier represents the most efficient combinations of inputs and outputs for the DMUs in the analysis.

The objective of DEA is to identify the best-performing DMUs relative to the others, and to determine the optimal weights for each input and output that would result in the most efficient DMU. The constraints in DEA depend on the specific model being used, but typically include a set of linear equations that define the production frontier, and a set of linear inequalities that enforce the non-negative and non-decreasing properties of the weights.

Let’s consider a simple example of a DEA model with two inputs and one output. Let x1 and x2 be the inputs, y be the output, and let there be n DMUs to be evaluated. Then the DEA model can be formulated as follows:

Objective: Maximize θ, subject to:

  • θ * yi >= xi,1 * w1 + xi,2 * w2 for all i (the DMUs’ weighted outputs must be greater than or equal to their weighted inputs)
  • w1, w2 >= 0 (the weights must be non-negative)
  • w1 + w2 = 1 (the weights must sum to 1)

In this formulation, θ represents the efficiency score of the DMU, and the weights w1 and w2 are the optimal weights for the two inputs that would result in the most efficient DMU. The first constraint ensures that each DMU’s weighted output is greater than or equal to their weighted inputs, which represents the production frontier. The second and third constraints enforce the non-negative and non-decreasing properties of the weights.

DEA models can be extended to include multiple inputs and outputs, as well as various forms of efficiency measures, such as technical efficiency, allocative efficiency, and cost efficiency. Additionally, there are different variants of DEA, such as the BCC model, the CCR model, and the SBM model, which have different assumptions and formulations.

Here's an example implementation of Data Envelopment Analysis (DEA) using Python's Pulp library:


import pulp

# Define the problem
problem = pulp.LpProblem("DEA Problem", pulp.LpMaximize)

# Define the input and output data
X = [[4, 2, 1], [6, 3, 2], [8, 4, 3], [10, 5, 4], [12, 6, 5]]  # input data
Y = [50, 70, 90, 110, 130]  # output data

# Define the decision variables
lambda_vars = [pulp.LpVariable(f"lambda_{i}", lowBound=0) for i in range(len(X))]
s_vars = [pulp.LpVariable(f"s_{i}", lowBound=0) for i in range(len(Y))]

# Define the objective function
problem += pulp.lpSum(s_vars)

# Define the constraints
for i in range(len(X)):
    problem += pulp.lpSum([lambda_vars[j] * X[i][j] for j in range(len(X[i]))]) <= pulp.lpSum([s_vars[i]])

for i in range(len(Y)):
    problem += pulp.lpSum([lambda_vars[j] * Y[i] for j in range(len(X[i]))]) == 1

# Solve the problem
status = problem.solve()

# Print the solution
print("Status:", pulp.LpStatus[status])
print("Objective value:", pulp.value(problem.objective))
print("Solution:")
for i in range(len(lambda_vars)):
    print(f"lambda_{i} =", pulp.value(lambda_vars[i]))
for i in range(len(s_vars)):
    print(f"s_{i} =", pulp.value(s_vars[i]))
In this example, we have a set of decision-making units (DMUs) and want to perform DEA to identify the most efficient DMUs. Each DMU has several inputs and outputs, and we want to identify the DMUs that are most efficient at producing the outputs given their inputs. We assume that the inputs and outputs are measured in the same units (e.g., dollars, kilowatt-hours, etc.). In this example, we have 5 DMUs, each with 3 inputs and 1 output. The input data is represented as a list of lists, where each inner list represents the inputs for a single DMU. The output data is represented as a list. We define the decision variables lambda_i to represent the efficiency score of each DMU, and s_i to represent the slack variable for each DMU. The objective function maximizes the sum of the slack variables, which represents the total amount of unused inputs or outputs across all DMUs. We define constraints that ensure that the efficiency score for each DMU is less than or equal to 1, and that the efficiency scores are calculated based on the inputs and outputs for each DMU. We also define constraints that ensure that the slack variables are non-negative. After solving the problem, we print the solution, which includes the efficiency scores and slack variables for each DMU. The DMUs with the highest efficiency scores are the most efficient DMUs.