Turn an entire GitHub Repo into a single organized .txt file to use with Code Interpreter
Demo
Creating a react front end for a GitHub repo containing functioning back end code:
https://chat.openai.com/share/0670c1ec-a8a8-4568-ad09-bb9b152e1f0b
RepoToText
RepoToText is a web app that scrapes a GitHub repository and converts its files into a single organized .txt. It allows you to enter the URL of a GitHub repository and an optional documentation URL (the doc info will append to the top of the .txt). The app retrieves the contents of the repository, including all files and directories and also fetches the documentation from the provided URL and includes it in a single organized text file. The .txt file will be saved in the root project directory with user/repo/timestamp info. This file can then be uploaded to Code Interpreter and you can use the chatbot to interact with the entire GitHub repo. Add your GitHub API Key in the .env file
Prompt Example
This file is a .txt file that contains an entire GitHub repository with all of the files separated by delimiters (”’) The file paths are the titles after the delimiters. (”’— FILE AND FILEPATH HERE —) Add your idea here (Example): Please create a react front end that will work with the back end
Info
- creates a.txt with (”’—) seperating each file from the repo.
- each file from the repo has a header after (”’—) with the file path as the title
- the .txt file is saved in the root directory
- you can add a url to a doc page and the doc page will append to the top of the .txt file (great to use for tech that came out after Sep 2021)
Tech Used
- Frontend: React.js
- Backend: Python Flask
- GitHub API: PyGithub library
- Additional Python libraries: beautifulsoup4, requests, flask_cors, retry
Frontend
The frontend of the app is implemented using React.js. The main component is App.js
, which handles user input and interacts with the backend API.
App.js
This file defines the main React component of the app. It uses React hooks to manage the state of input fields and the response received from the backend.
-
useState
hooks are used to define the state variablesrepoUrl
,docUrl
, andresponse
, which hold the values of the repository URL, documentation URL, and the response from the backend API, respectively. -
The component defines event handlers (
handleRepoChange
,handleDocChange
,handleSubmit
, andhandleCopyText
) to update the state variables based on user interactions. -
When the user clicks the “Submit” button, the
handleSubmit
function is called. It sends a POST request to the backend API using the Axios library, passing therepoUrl
anddocUrl
values in the request body. The response from the API is then stored in theresponse
state variable. -
The component renders the input fields, buttons, and the output area using JSX.
Backend
The backend of the application is implemented using Python and the Flask web framework. The main script is RepoToText.py
, which defines the Flask application and handles the scraping and conversion logic.
RepoToText.py
This file contains the Flask application and the GithubRepoScraper
class responsible for scraping the GitHub repository and generating the text file.
-
The
GithubRepoScraper
class initializes with a GitHub API key and the repository URL. It provides methods to fetch all files from the repository, scrape documentation from a provided URL, write the files and documentation to a text file, and clean up the text file by removing unnecessary line breaks. -
The Flask application is created using the
Flask
class and enables Cross-Origin Resource Sharing (CORS) using theCORS
extension. It defines a single route/scrape
that accepts POST requests. -
When a POST request is received at the
/scrape
endpoint, the request data is extracted and the repository URL and documentation URL are retrieved. -
An instance of
GithubRepoScraper
is created with the repository URL and documentation URL. -
The
run
method ofGithubRepoScraper
is called, which fetches all files from the repository, writes them to a text file along with the documentation, and performs cleanup on the text file. -
The generated text file is read and returned as the response of the API.
Running the Application
To run the application, follow these steps:
-
Install the required dependencies mentioned in the frontend and backend sections.
-
Start the backend server by running the
RepoToText.py
script. The Flask application will listen on port 5000. -
Start the frontend development server by running the React application.
-
Access the application in a web browser and enter the GitHub repository URL and documentation URL (if available).
-
Choose All files or choose specific file types.
-
Click the “Submit” button to initiate the scraping process. The converted text will be displayed in the output area, and it will also be saved to the project root directory.
-
You can also click the “Copy Text” button to copy the generated text to the clipboard.
TODO
- FIX: Broken file types: .ipynb |
- add in the ability to work with private repositories
- create a small desktop app via PyQT or an executable file
- add ability to store change history and update .txt to reflect working changes
- add checker function to make sure .txt is current repo version
- adjust UI for flow, including change textarea output width, adding file management and history UI
- explore prompt ideas including breaking the prompts into discrete steps that nudge the model along