Leveraging Pandas to Interact with SQL

Published: March 14, 2024

Most data work involves going back and forth between SQL databases and Python. You write a query to pull what you need, load it into a DataFrame, do your analysis, maybe write results back. Pandas has built-in support for this workflow, and once you set it up, you rarely need to leave Python to interact with your database.

This post covers the basics: what SQL and Pandas each do well, and how to connect them so you can query, transform, and write data without switching contexts.

SQL

SQL (Structured Query Language) is how you talk to relational databases — selecting rows, filtering, joining tables, aggregating. If your data lives in PostgreSQL, MySQL, SQLite, or similar, you’re writing SQL to get it out.

Pandas

Pandas is a Python library for working with tabular data. Its core data structure, the DataFrame, gives you fast filtering, grouping, reshaping, and plotting — all the stuff that’s tedious in raw SQL or plain Python.

Over the years Pandas has added tight integration with SQL databases through read_sql(), read_sql_query(), and to_sql(). You can run a SQL query and get a DataFrame back in one line, or push a DataFrame into a database table just as easily. This means you can use SQL for what it’s good at (joins, filtering large tables on the server side) and Pandas for what it’s good at (reshaping, plotting, quick exploratory analysis) — all without leaving your Python environment.

Share on

Twitter Facebook LinkedIn

Exploring UK Road Accidents: What 104K Collisions Tell You Before You Model

Published: May 24, 2026

Before building any model, you need to understand the data well enough to make defensible modeling decisions. This post walks through how I approached exploratory data analysis on the UK Department for Transport’s 2023 road accident dataset — 104,258 collisions and 189,815 vehicle records. The EDA directly shaped the dual-model strategy I describe in my class imbalance post.

Tackling Extreme Class Imbalance: UK Road Accident Severity with LightGBM

Published: May 24, 2026

When 76% of your labels belong to a single class and the rarest class sits at 1.4%, standard classifiers will happily predict the majority class every time and report impressive accuracy. This post walks through what I learned building a severity classifier on 104,258 UK road collisions from the Department for Transport’s 2023 STATS19 data, and why the numbers that looked great at first turned out to be completely wrong.

LatexForLLM: Turning LaTeX Papers into Graphs for Smarter LLM Retrieval

Published: May 22, 2026

If you have ever pasted an entire research paper into ChatGPT or Claude and watched your token budget evaporate, you know the problem. A typical 10-page paper burns 8,000-12,000 tokens, yet the model only needs a few hundred to answer most questions about it. I built LatexForLLM to fix this. It parses LaTeX documents into a typed graph and retrieves only the sections, equations, and figures that matter. On benchmark tasks against a realistic 200-line paper, graph-based retrieval cuts word count by ~54% on average (up to ~80% for focused queries) compared to pasting the full document.

Ankit Singh