DSpace Data Harvesters
DSpace Data Harvesters
Automatic ingestion of publications from global scholarly databases - directly into your DSpace repository
What is DSpace Data Harvesters?
For years, keeping a repository current has depended on manual submission by researchers or time-consuming data entry by administrative staff. DSpace Data Harvesters by PCG Academia change this fundamentally. Instead of waiting for publications to arrive, the harvester connects to major global scholarly databases – Web of Science, Scopus, OpenAlex and Crossref – and pulls in records automatically. Institutional publications are matched using two complementary methods: the institution’s ROR identifier, and the affiliation information declared in the source database. This two-track approach ensures that no publication is missed. Every harvested record enters a review workflow before publication, so institutions gain both automation and full editorial control.
Whether seeding a new repository from scratch or keeping an established one continuously up to date, Data Harvesters turn repository completeness from an ongoing effort into a natural outcome of how the system works.
Key benefits
Automatic retrieval of publications from Web of Science, Scopus, OpenAlex, and Crossref – no manual submission required
Precise institutional matching via ROR identifier or source-system affiliation data – two complementary methods that together ensure no relevant publication is overlooked
Controlled review workflow – every harvested record is verified, enriched, and approved by staff before going live
Bulk approval for large volumes – practical handling of high-volume ingestion without turning every item into a separate task
Continuous scheduled harvesting – the repository stays current with the institution’s actual research output automatically
Implementation delivered by PCG Academia – DSpace Platinum Service Provider – guaranteeing the highest service standards and years of experience in the academic environment
Key features
How does it fit into the university ecosystem?
Proven approach, real results
A solution that turns repository completeness from a manual effort into an automatic, ongoing outcome – removing the dependency on individual researchers remembering to submit.
Built on DSpace – the most widely adopted open-source repository platform in academic institutions worldwide and maintained by PCG Academia, a LYRASIS-certified Platinum DSpace Service Provider
A clear, controlled data flow: external database → harvesting → review → live repository – with staff in full control at every stage
Ready for:
- new harvesting sources as they become available;
- changes in institutional identifier standards (ROR, ORCID, local IDs);
- growing repository volume without growing administrative burden
Next steps
Process demo: harvesting from Web of Science, Scopus, and OpenAlex — with a walkthrough of the review and approval workflow
Architecture consultation: matching the harvesting configuration to your institution’s identifier setup and repository environment
Discussion of the implementation scenario and expected outcomes for your repository’s completeness and staff workload
Discover all our products
- WEBCON
- Education
- DSpace
Contact us to schedule a DEMO
Contact the Business Department