Simplifying Database Queries with OpenAI & SQL Automation

Building a Conversational REST API with OpenAI and SQL Query Automation

Introduction

Accessing data through queries can be daunting for non-technical users unfamiliar with SQL. To address this, we’ve developed a REST API that leverages OpenAI’s GPT-4o model to convert natural language queries into SQL commands, execute them against a database, and return the results in JSON format. This article explains the implementation of the API, highlighting its components, features, and the steps to get started.


Objectives

The primary goals of this project are:

  • Enable users to interact with a database using natural language.
  • Provide accurate and safe SQL generation based on the database schema.
  • Return structured data in JSON format for easy integration.
  • Lay the groundwork for future generative UI components based on query results.

Technology Stack

  • Programming Language: .NET 9.0
  • API Framework: ASP.NET Core
  • Database Access: Dapper (lightweight ORM)
  • AI Integration: OpenAI GPT-4o
  • Configuration Management: appsettings.json
  • Logging: Microsoft.Extensions.Logging

Architecture Overview

The system is composed of the following key components:

  1. User Input: Users provide queries in natural language.
  2. OpenAI GPT-4o: Translates the input into SQL using schema and relationship context.
  3. Database Service: Executes the SQL query and retrieves the results.
  4. REST API: Provides endpoints for generating SQL, executing queries, and combining both functionalities.
  5. JSON Output: Returns results and generated SQL in a structured format.

The architectural diagram depicts the end-to-end flow of the system. Below are the concise explanations for each step:

Step 1: Client Interaction

  • What Happens: The client (UI or external application) sends a natural language query to the API Server.
  • Example: “Show all products with stock below 50.”
  • Representation in Image: An arrow from UI/Client to API Server labeled as Request.

Step 2: Schema Fetching

  • What Happens: The API Server fetches the schema and relationships from the database (MS SQL Server). This step provides the necessary context for SQL query generation.
  • Optimization: The schema and relationships can be cached to reduce database calls for repeated queries.
  • Representation in Image: An arrow from API Server to MS SQL Server labeled Fetch Schema, with a response arrow labeled Schema & Relationships.

Step 3: Query Generation

  • What Happens: The API Server sends the schema and natural language query to OpenAI. OpenAI processes the input and generates a syntactically correct SQL query.
  • Example: Converts “Show all products with stock below 50” to SELECT * FROM Products WHERE Stock < 50.
  • Representation in Image: An arrow from API Server to OpenAI labeled Generate SQL Query, with a response arrow labeled Generated SQL Query.

Step 4: Query Execution

  • What Happens: The API Server executes the SQL query on the database (MS SQL Server) and retrieves the query results.
  • Example: Executes SELECT * FROM Products WHERE Stock < 50 and retrieves relevant data.
  • Representation in Image: An arrow from API Server to MS SQL Server labeled Execute SQL Query, with a response arrow labeled Query Results.

Step 5: Response Delivery

  • What Happens: The API Server formats the query results and returns them, along with the generated SQL query, to the client in JSON format.
  • Example:{ "Query": "SELECT * FROM Products WHERE Stock < 50", "Results": [ { "ProductId": 1, "ProductName": "Widget A", "Stock": 20 }, { "ProductId": 2, "ProductName": "Widget B", "Stock": 10 } ] }
  • Representation in Image: An arrow from API Server to UI/Client labeled Response.

Key Components

1. Database Service

The DatabaseService retrieves schema details and relationships while also executing SQL queries.

Schema Extraction:

public string GetSchemaAndRelationships()
{
    var schemaQuery = """
    SELECT TABLE_SCHEMA AS SchemaName, TABLE_NAME AS TableName, COLUMN_NAME AS ColumnName, DATA_TYPE AS DataType
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA != 'sys'
    ORDER BY TABLE_SCHEMA, TABLE_NAME;
    """;
    var relationshipsQuery = """
    SELECT fk.name AS ForeignKeyName, tp.name AS ParentTable, tr.name AS ReferencedTable
    FROM sys.foreign_keys AS fk
    INNER JOIN sys.tables AS tp ON fk.parent_object_id = tp.object_id
    INNER JOIN sys.tables AS tr ON fk.referenced_object_id = tr.object_id;
    """;
    using var connection = new SqlConnection(_connectionString);
    connection.Open();
    var schemaDetails = connection.Query<dynamic>(schemaQuery);
    var relationships = connection.Query<dynamic>(relationshipsQuery);
    return FormatSchemaAndRelationships(schemaDetails, relationships);
}

Query Execution:

public async Task<IEnumerable<dynamic>> ExecuteQueryAsync(string sqlQuery)
{
    using var connection = new SqlConnection(_connectionString);
    await connection.OpenAsync();
    return await connection.QueryAsync(sqlQuery);
}

Explanation:

  • Retrieves schema and foreign key relationships for context.
  • Executes SQL queries using Dapper.

2. OpenAI Integration

The OpenAi Service uses GPT-4o to convert user prompts into SQL queries. By leveraging structured output, it ensures consistency and accuracy in the generated queries.

Code Highlights:

public async Task<string> GenerateSqlQueryAsync(string userPrompt, string schemaContext)
{
    List<ChatMessage> messages = new()
    {
        new SystemChatMessage("You are a SQL assistant. Generate SQL queries based on the given schema and relationships."),
        new SystemChatMessage($"Schema and Relationships:\n{schemaContext}"),
        new UserChatMessage(userPrompt)
    };
    ChatCompletionOptions options = new()
    {
        ResponseFormat = ChatResponseFormat.CreateJsonSchemaFormat(
            jsonSchemaFormatName: "sql_query_generation",
            jsonSchema: BinaryData.FromBytes("""
                {
                  "type": "object",
                  "properties": {
                    "query": {
                      "type": "string",
                      "description": "A fully-formed SQL query that satisfies the user request."
                    }
                  },
                  "required": ["query"],
                  "additionalProperties": false
                }
                """u8.ToArray()),
            jsonSchemaIsStrict: true
        )
    };
    ChatCompletion completion = _client.CompleteChat(messages, options);
    string responseJson = completion.Content[0].Text;
    using JsonDocument structuredJson = JsonDocument.Parse(responseJson);
    _logger.LogInformation("Generated SQL Query: {Query}", structuredJson.RootElement.GetProperty("query").GetString());
    return structuredJson.RootElement.GetProperty("query").GetString();
}

Explanation:

  • Prompt Engineering: The initial ChatMessages include schema context and an explicit directive to generate SQL queries, guiding GPT-4o to produce domain-specific outputs.
  • Structured Output with JSON Schema:
    • A JSON schema is defined to ensure the output format is predictable and conforms to specific requirements.
    • Example schema:{ "type": "object", "properties": { "query": { "type": "string", "description": "A fully-formed SQL query that satisfies the user request." } }, "required": ["query"], "additionalProperties": false }
    • This guarantees that the response will always contain a valid query field.
  • Parsing and Validation: The response is parsed using JsonDocument.Parse, ensuring the generated SQL query adheres to the specified schema.
  • Logging: The generated query is logged for debugging and monitoring purposes.

Benefits of Structured Output:

  • Consistency: Responses always follow the predefined format, reducing errors.
  • Error Reduction: Invalid or malformed outputs are caught early.
  • Ease of Integration: The structured data simplifies downstream processing.

3. REST API

The SqlController exposes endpoints for interacting with the system:

Endpoints:

  1. /api/sql/generate:
    • Generates SQL based on user input.
    • Returns the SQL query.
  2. /api/sql/generate-and-execute:
    • Combines SQL generation and execution.
    • Returns both the SQL query and results.

Example Implementation:

[HttpPost("generate-and-execute")]
public async Task<IActionResult> GenerateAndExecuteSql([FromBody] string userPrompt)
{
    string schemaContext = _databaseService.GetSchemaAndRelationships();
    string sqlQuery = await _openAiService.GenerateSqlQueryAsync(userPrompt, schemaContext);
    var results = await _databaseService.ExecuteQueryAsync(sqlQuery);
    return Ok(new { Query = sqlQuery, Results = results });
}


Configuration

appsettings.json

This file stores the OpenAI API key and database connection string:

{
  "ConnectionStrings": {
    "DefaultConnection": "your-connection-string"
  },
  "OpenAI": {
    "ApiKey": "your-api-key"
  }
}

Environment-Specific Overrides

Use appsettings.Development.json for local development configurations.


Example Usage

Generating SQL Query

Request:

POST {{OpenAISQLApi_HostAddress}}/api/sql/generate
Content-Type: application/json
"List all orders with order details, customer name placed in December 2023, including order IDs and total amounts."

Response:

{
  "query": "SELECT o.OrderID, o.OrderDate, o.TotalAmount, od.OrderDetailID, od.ProductName, od.Quantity, od.Price\nFROM Customer c\nJOIN [Order] o ON c.CustomerID = o.CustomerID\nJOIN OrderDetails od ON o.OrderID = od.OrderID\nWHERE c.CustomerName = 'Jane Smith' AND YEAR(o.OrderDate) = 2023;"
}

Generating and Executing SQL Query

Request:

POST {{OpenAISQLApi_HostAddress}}/api/sql/generate-and-execute
Content-Type: application/json
"List all orders with order details, customer name placed in December 2023, including order IDs and total amounts."

Response:

{
  "query": "SELECT o.OrderID, o.OrderDate, o.TotalAmount, od.OrderDetailID, od.ProductName, od.Quantity, od.Price\nFROM Customer c\nJOIN [Order] o ON c.CustomerID = o.CustomerID\nJOIN OrderDetails od ON o.OrderID = od.OrderID\nWHERE c.CustomerName = 'Jane Smith' AND YEAR(o.OrderDate) = 2023;",
  "results": [
    {
      "OrderID": 2,
      "OrderDate": "2023-12-02T00:00:00",
      "TotalAmount": 200,
      "OrderDetailID": 3,
      "ProductName": "Widget C",
      "Quantity": 1,
      "Price": 200
    }
  ]
}

Future Enhancements

Generative UI

  • Extend the API to return UI components like tables and forms.
  • Enable seamless integration into front-end frameworks.

Enhanced Security

  • Implement SQL injection prevention.
  • Add user authentication and authorization.

Conclusion

This REST API bridges the gap between non-technical users and databases, enabling seamless natural language interactions. By combining the power of OpenAI and SQL automation, it lays a solid foundation for advanced data accessibility and future UI-driven capabilities.

For a complete reference implementation, visit the GitHub Repository.


Get Involved!

  • Explore the Code: Test the examples provided in the GitHub repository and share your insights in the comments.
  • Follow Us: Stay updated on new developments by following the project on. GitHub.
  • Subscribe: Sign up for our newsletter to receive tips on leveraging Azure and OpenAI for modern development.
  • Join the Conversation: Have you encountered challenges with Generative AI APIs like OpenAI or Azure OpenAI? Share your experiences and solutions in the comments below!


Discover more from Nitin Singh

Subscribe to get the latest posts sent to your email.

Leave a Reply