Tutorial: how to use ChatGPT API to process large amount of data
With instructions and code examples
TL;DR
Go to https://github.com/yang3kc/llm_for_css and check it out.
Why I created this tutorial
It's becoming increasingly common for researchers to use OpenAI's API to query the models and perform data analysis. Although it's very easy to talk with ChatGPT in the chat box, things become tricky if you need to process tens of thousands of text messages. For individuals with a background in programming and experience with HTTP requests, this is not overly difficult. However, I see many friends and colleagues struggling with this. Somehow, I couldn't find any tutorials on this topic, so I decided to write one myself.
What’s covered
The tutorial currently covers the following topics:
Handling API keys properly
Writing a simple Python script to query the API
Using JSON mode to obtain structured output and parse (validate) the output
Using async programming to accelerate the querying process
Using the batch API to process large amounts of data with reduced cost
If you have worked with the API before, hopefully, you can immediately see the usefulness of the content. I provide instructions and example code.
I'm currently focusing on OpenAI, but I plan to write about working with other API providers and running large language models locally as well (actually, I have a post on running Llama3 locally already.).
I might also provide detailed instructions on performing different practical tasks later.
If you find this tutorial useful, please share it with your colleagues. Issues, pull requests, and stars are welcome!


