← All projects
02 · Data Eng · ML

BooksData - Big Data books pipeline

End-to-end data pipeline on a book catalogue: ingestion, cleaning, statistical analysis and a rating-prediction model. Results are exposed through a REST API and an interactive dashboard.

IngestionCleaningAnalysisFastAPIStreamlitML
FastAPI · Streamlit · MongoDB · MLView the code ↗
BooksData pipeline architecture
FIG. 01BooksData pipeline architectureFrom catalogue ingestion to the rating-prediction model, exposed through a REST API and a dashboard.
Interactive dashboard (Streamlit)
FIG. 02Interactive dashboard (Streamlit)Exploring catalogue statistics and predictions in a Streamlit interface.
WordCloud of book titles
FIG. 03WordCloud of book titlesText analysis of the titles surfacing the dominant themes.