Problem
Publishing metadata is often fragmented across retailers, publishers, rights catalogs, libraries, and news sources. Translation credits may be missing, inconsistent, or misleading. Older and post-Soviet editions may also have indirect translation chains that are not clearly disclosed, such as a work translated through an intermediary language.
Readers, translators, publishers, and bookstores need better visibility into what exists, what is coming, and how trustworthy the metadata is.
Product idea
World Publishing Houses is designed as a shared data layer with different audience modes. Reader mode focuses on what can be read in English now, what is coming soon, and what is not yet available. Professional mode supports rights, acquisition, translation, and market-intelligence workflows.
Reader valueDiscover books by country, language, publisher, translator, availability, and reading status.
Professional valueTrack rights signals, upcoming translations, publisher activity, market gaps, and cross-country opportunities.
Trust valueShow sources, attribution confidence, translation routes, and disputed or unverified metadata.
My role
I am building this as a side project and portfolio data product. My work includes product strategy, data modeling, QA review, UX testing, acceptance criteria, country-page logic, reader/professional mode definitions, and the trust model around translators and editions.
- Designed the Denmark and Iceland pilot markets as early country templates.
- Defined reader-facing flows such as Read now in English, Coming soon in English, and Not yet in English.
- Created QA test prompts and blocker-focused sign-off checks for production routes.
- Modeled translation path transparency as a core trust signal.
- Connected coursework ideas from ML, text mining, classification, and evaluation to a real product.
Technical and data concepts
Data modeling
Works, editions, publishers, countries, languages, translators, illustrators, releases, rights signals, provenance, and verification states.
Quality system
Metadata validation, route testing, acceptance checks, low-data vs rich-data UX, duplicate detection, trust labels, and source traceability.
ML opportunities
Entity resolution, publisher classification, translation likelihood prediction, recommender systems, clustering, text mining, and anomaly detection.
Product UX
Separate reader and professional lenses so the same data supports different jobs without overwhelming first-time users.
Why this project matters
The project demonstrates how I think about data products: the goal is not only to collect data, but to make the data trustworthy, explainable, useful, and testable. It is a strong bridge between my QA background and my data science direction.
My working principle for this project is: AI can assist discovery, but data provenance is the source of truth.