AItrika

Enhance your knowledge in medical research.

AItrika (formerly PubGPT) is a tool that can extract lots of relevant informations inside medical papers in an easy way:

  • Abstract
  • Full text (when available)
  • Genes
  • Diseases
  • Mutations
  • Associations between genes and diseases
  • MeSH terms
  • Other terms
  • Results
  • Bibliography

And so on!

πŸš€ Run the demo app

You can try AItrika with the Streamlit app by running:

streamlit run app.py

Or you can use it a script by running:

python main.py

πŸ“¦ Install

To install everything, you need uv.

First of all, install uv with the command:

python main.py

After that, create a virtual environment with the command:

uv venv venv_name

Activate the virtual env:

source venv_name/bin/activate

And install dependencies:

uv pip install -r requirements.in

πŸ”‘ Set LLM API Keys

In order to set API keys, insert your keys into the env.example file and rename it to .env.

πŸ” Usage

You can easily get informations of a paper by passing a PubMed ID:

from aitrika.engine.aitrika import OnlineAItrika
aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id)
title = aitrika_engine.get_title()
print(title)

Or you can parse a local pdf:

from aitrika.engine.aitrika import LocalAItrika
aitrika_engine = LocalAItrika(pdf_path = pdf_path)
title = aitrika_engine.get_title()
print(title)
Breast cancer genes: beyond BRCA1 and BRCA2.

You can get other informations, like the associations between genes and diseases:

associations = aitrika_engine.get_associations()
[
  {
    "gene": "BRIP1",
    "disease": "Breast Neoplasms"
  },
  {
    "gene": "PTEN",
    "disease": "Breast Neoplasms"
  },
  {
    "gene": "CHEK2",
    "disease": "Breast Neoplasms"
  },
]
...

Or you can get a nice formatted DataFrame:

associations = aitrika_engine.associations(dataframe = True)
      gene                          disease
0    BRIP1                 Breast Neoplasms
1     PTEN                 Breast Neoplasms
2    CHEK2                 Breast Neoplasms
...

With the power of RAG, you can query your document:

## Prepare the documents
documents = generate_documents(content=abstract)

## Set the LLM
llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY"))

## Query your document
query = "Is BRCA1 associated with breast cancer?"
print(llm.query(query=query))
The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer.

Or you can extract other informations:

results = engine.extract_results(llm=llm)
print(results)
** RESULTS **

- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes
- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk
- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC
- Current clinical practice - high-penetrance genes - widely used
- Future prospect - all familial breast cancer genes - to be included in genetic test
- Research need - clinical management - of moderate and low-risk variants

πŸš€ Run the API

To run the AItrika API, follow these steps:

  1. Ensure you have set up your environment and installed all dependencies as described in the Installation section.

  2. Run the API server using the following command:

python api.py

The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints:

  • /associations: Get associations from a PubMed article
  • /abstract: Get abstract of a PubMed article
  • /query: Query a PubMed article
  • /results: Get results from a PubMed article
  • /participants: Get number of participants from a PubMed article
  • /outcomes: Get outcomes from a PubMed article

You can use tools like curl, Postman, or any HTTP client to interact with the API. For example:

curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}'

The API documentation is automatically generated and saved to docs/api-reference/openapi.json. You can use this file with tools like Swagger UI for a more interactive API exploration experience.

Support the Project

If you find this project useful, please consider supporting it:

  • 🌟 Star the project on GitHub
  • πŸ› Report bugs or suggest new features
  • 🀝 Contribute with pull requests
  • β˜•οΈ Buy me a coffee or consider a sponsor.

Commercial / Business use

If you’re using this project in a business or commercial context, please contact me.

I’m available for consulting, custom development, or commercial licensing.

Your support helps keep this project active and continuously improving. Thank you!

License

AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details.

Star History