Jump to content

Arquivo.pt

From Wikipedia, the free encyclopedia

Arquivo.pt, formerly known as the Portuguese Web Archive, is a web archive that preserves Web content dating back to 1996.[1] It is a service of the Fundação para a Ciência e Tecnologia (FCT) and was founded at the Fundação para a Computação Científica Nacional on the 8th November 2007.[2]

Arquivo.pt collects regularly all the websites that are part of the Portuguese Web, in other words, all the websites with the .pt top level domain, as well as all the websites of the national interest. The preserved content is available one year after its collection for any user on the Arquivo.pt website.

As of March 2025, Arquivo.pt stores over 21 billion webpages from 47 million websites, totaling 1.4 petabytes of data.[3][4]

History

[edit]

The original idea of archiving the Portuguese Web started in 2001 with the project tumba!, developed by the XLDB investigation group at the Science Faculty of the University of Lisbon and it was supported by FCCN (Fundação para a Computação Científica Nacional), where it collected about 57 million pieces of content, mainly textual.[5] From this project, Tomba started.[6]

On the 8th November 2007, the project for the Portuguese Web Archive was created at FCCN,[7] after it as combined the resources and skills acquired at the previous project. The project was led by Daniel Coelho Gomes from 2007 to 2025.[8] At the beginning of 2008, the project team made their first web crawl of .pt websites. The project had a 2-year maturity. Meanwhile, it was transformed as a permanent service of FCT.[9][10][11]

Services

[edit]

Search and access

[edit]

Arquivo.pt makes available a search tool of web pages from an inserted URL. This functionality allows the users to access different versions of the same page from different dates. Moreover, this functionality is also compatible with full-text search.

On the 24th of March 2021, Arquivo.pt introduced an image search feature, known as Dionisius. This tool allows users to search for images archived from the web, dating back to 1996. Users can find images that are no longer available on the live web and can also locate the original web pages where these images were published.[12][13][14]

The page access can be made automatically with the use of APIs which was introduced in 2012.[15][16]

ArchivePageNow

[edit]

In 2022, Arquivo.pt launched ArchivePageNow. This functionality allows the users to archive a web page at the intended moment. Afterwards, the archived web pages stay available for search.[17]

Arquivo404

[edit]

In 2022, Arquivo.pt developed the Arquivo404, an algorithm that allows web pages with the 404 error to contain a hyperlink directed to the preserved page at Arquivo.pt.[18]

Others

[edit]

Arquivo.pt Awards

[edit]

Since 2018, the Arquivo.pt Awards is organized with the sponsor President of Portugal and with a partnership with the Público newspaper, where the best investigative works using the features of Arquivo.pt are awarded.[20][21]

Awards and recognitions

[edit]
  • 2008 - Best Paper Award for its work on measuring the Portuguese web at the Ibero-American IADIS WWW/Internet 2008.[22]
  • 2022 - Honour roll for security in Portugal according to the Portuguese Observatory of Internet Technologies.[23]
  • 2022 - Best Digital Service award in 2022.[24]
  • 2023 - Top 3 government digital services in Portugal.[25]
  • 2024 - Finalist for The National Archives (UK) Award for Safeguarding the Digital Legacy (Digital Preservation Coalition Awards 2024).[26]
  • 2024 - Best Central Public Administration Digital Project[27]
  • 2024 - Digital Transformation 2024 award[28][29]

References

[edit]
  1. ^ Gomes, Daniel (2022-11-14). "Web archives as research infrastructure for digital societies: the case study of Arquivo.pt". Archeion. 123: 46–85. doi:10.4467/26581264arc.22.012.16665. ISSN 2658-1264.
  2. ^ Pinto, Pedro (2023-11-26). "Arquivo.pt já tem 1 PetaByte de informação guardada..." Pplware (in European Portuguese). Retrieved 2025-08-25.
  3. ^ "Arquivo.pt em números". Arquivo.pt (in European Portuguese). Retrieved 2025-08-25.
  4. ^ "Arquivo.pt in numbers". arquivo.pt. Retrieved 2025-08-25.
  5. ^ GOMES, Daniel. Arquivo e medição da Web portuguesa (PDF).
  6. ^ "História Arquivo.pt". Arquivo.pt (in European Portuguese). Retrieved 2025-08-25.
  7. ^ Pinto, Pedro (2023-11-26). "Arquivo.pt já tem 1 PetaByte de informação guardada..." Pplware (in European Portuguese). Retrieved 2025-08-25.
  8. ^ "Antigos Membros". Arquivo.pt (in European Portuguese). Retrieved 2025-08-25.
  9. ^ "História Arquivo.pt". Arquivo.pt (in European Portuguese). Retrieved 2025-08-25.
  10. ^ "List of the collections preserved by Arquivo.pt". Google Docs. Retrieved 2025-08-25.
  11. ^ "Arquivo.pt". arquivo.pt. 18 March 2008. Retrieved 2025-08-25.
  12. ^ SAPO. "Arquivo.pt tem mais de mil milhões de imagens históricas da internet pesquisáveis online". SAPO Tek (in Portuguese). Retrieved 2021-05-05.
  13. ^ "Milhões de imagens sobre o passado! – sobre.arquivo.pt" (in European Portuguese). 2022-08-23. Retrieved 2025-08-25.
  14. ^ Mourão, André; Gomes, Daniel (9–13 October 2023). Searching images in a web archive. International Conference on Data Science and Advanced Analytics. IEEE. pp. 1–10. doi:10.1109/DSAA60987.2023.10302607. ISBN 979-8-3503-4503-2. Retrieved 2024-08-27.
  15. ^ "APIs". GitHub. Retrieved 2025-08-25.
  16. ^ "História Arquivo.pt". Arquivo.pt (in European Portuguese). Retrieved 2025-08-25.
  17. ^ "Arquive páginas no Arquivo.pt com o ArchivePageNow". Arquivo.pt (in European Portuguese). 2025-01-07. Retrieved 2025-08-25.
  18. ^ "Arquivo404 mostra páginas preservadas em vez de "páginas não encontradas"". FCCN - serviços digitais da FCT (in European Portuguese). 21 April 2022. Retrieved 2025-08-23.
  19. ^ "CitationSaver preserves citations to web resources". 2023-04-20. Retrieved 2025-08-25.
  20. ^ "Prémios Arquivo.pt" (in European Portuguese). Retrieved 2025-08-23.
  21. ^ "Prémio Arquivo.pt". FCT (in Portuguese). Retrieved 2025-08-23.
  22. ^ Gomes, Daniel; Miranda, João (December 2008). "Arquivo e Medição da Web Portuguesa". Ibero-Americana IADIS WWW/Internet 2008.
  23. ^ "ISOC Portugal lança o Observatório da Internet portuguesa". Retrieved 20 August 2024.
  24. ^ "Os Melhores & As Maiores do Portugal Tecnológico 2022: conheça os vencedores". November 30, 2022. Retrieved August 20, 2024.
  25. ^ Fevereiro, Sara. "Quem são os líderes da transformação digital do país?". Expresso. Retrieved 20 August 2024.
  26. ^ "The National Archives (UK). The National Archives (UK) Award for Safeguarding the Digital Legacy". Retrieved 22 August 2024.
  27. ^ "Arquivo.pt receives award for Best Central Public Administration Project". 25 October 2024. Retrieved 4 November 2024.
  28. ^ "Arquivo.pt won the Digital Transformation 2024 award". 3 December 2024. Retrieved 5 December 2024.
  29. ^ "4.ª Edição – Prémio Transformação Digital (2024) – APDSI" (in European Portuguese). 2024-12-17. Retrieved 2025-08-25.