"Az-buki" National Publishing House
Ministry of Education and Science
Wikipedia
  • Login
  • Registration
Natural Science
and Advanced Technology Education
Няма резултати
Вижте всички резултати
  • Home
  • About the journal
  • Aims and Scope
  • Editorial Board Members
  • Contents
  • Guidelines
    • Guide for Authors
    • Reviewer's Guide
  • Publishing Ethics
  • Contact
  • Subscribe now
  • en_US
  • Home
  • About the journal
  • Aims and Scope
  • Editorial Board Members
  • Contents
  • Guidelines
    • Guide for Authors
    • Reviewer's Guide
  • Publishing Ethics
  • Contact
  • Subscribe now
  • en_US
Няма резултати
Вижте всички резултати
Natural Science and Advanced Technology Education
Няма резултати
Вижте всички резултати
Home Uncategorized

Challenges in Web Crawling for Data Collection

„Аз-буки“ от „Аз-буки“
08-03-2024
in Uncategorized
A A

Dr. Georgi Cholakov, Assist. Prof. 1), Dr. Emil Doychev, Assoc. Prof. 1),
Prof. Dr. Svetla Koeva 2)
1) Plovdiv University „Paisii Hilendarski“ Faculty of Mathematics and Informatics
2) Institute for Bulgarian Language „Prof. Lyubomir Andreychin“ - Bulgarian Academy of Sciences

https://doi.org/10.53656/math2024-1-1-cha

Absract. The article presents the challenges of implementing a System for data retrieval and visualisation from the Internet by crawling language resources from the Hugging Face repository and extracting the associated data. The data in the system is updated at regular intervals to track the dynamics of language resource creation for different time periods. The article presents: a) the analysis of the available data and its structure; b) the chosen method for crawling the pages and extracting the data. The shared experience of overcoming the specific challenges can serve to solve similar problems related to the extraction of data from the Internet, a task that often has to be solved in various projects (including school projects).
Keywords: web crawling; automatic data extraction; linguistic datasets

 

Log in to read the full text Your Image Description

Свързани статии:

Default ThumbnailОСВОБОЖДЕНИЕТО, ЕВРОПЕЙСКАТА ЦИВИЛИЗАЦИОННА ИДЕНТИЧНОСТ НА БЪЛГ АРИЯ И ВЪЗПИТАНИЕТО НА МЛАДОТО ПОКОЛЕНИЕ Default ThumbnailНОРМАТИВНА УРЕДБА НА УЧИЛИЩНАТА ДИСЦИПЛИНА АКТУАЛНО СЪСТОЯНИЕ* Default ThumbnailOntology Non Finitio Default ThumbnailРазвиване на културната компетентност и на творческите способности на учениците от прогимназиален етап чрез занимания по интереси: клуб „Пиша.БГ“
Tags автоматично извличане на даннинабори от езикови данниуебобхождане

Последвайте ни в социалните мрежи

Viber
shareTweet
Previous article

Видеопослание за Левски

Next article

An Approach and a Tool for Euclidean Geometry

Next article

An Approach and a Tool for Euclidean Geometry

Are Established Taxonomies Relevant for e-Learning?

Student Satisfaction with the Quality of a Blended Learning Course

Последни публикации

  • Обучение по ПРИРОДНИ НАУКИ и върхови технологии ГОДИШНО СЪДЪРЖАНИЕ / ГОДИНА XXXIII, 2024
  • Natural Science and Advanced Technology Education, 5-6/2024, vol. 33
  • Natural Science and Advanced Technology Education, 3-4/2024, vol. 33
  • Natural Science and Advanced Technology Education, 1-2/2024, vol. 33
  • ГОДИШНО СЪДЪРЖАНИЕ / ГОДИНА XXXII, 2023
  • Natural Science and Advanced Technology Education, 5-6/2023, vol. 32
  • Natural Science and Advanced Technology Education, 3-4/2023, vol. 32
  • Natural Science and Advanced Technology Education, 2/2023, vol. 32
  • Natural Science and Advanced Technology Education, 1/2023, vol. 32
  • Annual Contents of Natural Science and Advanced Technology Education, 2022
  • Natural Science and Advanced Technology Education, 6/2022, vol. 31
  • Natural Science and Advanced Technology Education, 5/2022, vol. 31
  • Natural Science and Advanced Technology Education, 4/2022, vol. 31
  • Natural Science and Advanced Technology Education, 3/2022, vol. 32
  • Natural Science and Advanced Technology Education, 2/2022, vol. 31
  • Natural Science and Advanced Technology Education, 1/2022, vol. 31
  • Годишно съдържание на „Обучение по природни науки и върхови технологии“
  • Natural Science and Advanced Technology Education, 6/2021, vol. 30
  • Сп. „Обучение по природни науки и върхови технологии“, книжка 5/2021, година XXX
  • Natural Science and Advanced Technology Education, 4/2021, vol. 30
  • Project-Based Teaching in Model for Meaningful and Technological Integration of the Ecological Education
  • Сп. „Обучение по природни науки и върхови технологии“, книжка 3/2021, година XXX

София 1113, бул. “Цариградско шосе” № 125, бл. 5

+0700 18466

izdatelstvo.mon@azbuki.bg
azbuki@mon.bg

Полезни линкове

  • Къде можете да намерите изданията?
  • Вход за абонати
  • Home
  • Contact
  • Subscribe now
  • Projects
  • Advertising

Az-buki Weekly

  • Вестник “Аз-буки”
  • Subscribe now
  • Archive

Scientific Journals

  • Strategies for Policy in Science and Education
  • Bulgarian Language and Literature
  • Pedagogika-Pedagogy
  • Mathematics and Informatics
  • Natural Science and Advanced Technology Education
  • Vocational Education
  • Istoriya-History journal
  • Chuzhdoezikovo Obuchenie-Foreign Language Teaching
  • Filosofiya-Philosophy

Newsletter

  • Accsess to public information
  • Условия за ползване
  • Профил на купувача

© 2012-2025 Национално издателство "Аз-буки"

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
en_US
bg_BG en_US
  • Login
  • Sign Up
Няма резултати
Вижте всички резултати
  • Home
  • About the journal
  • Aims and Scope
  • Editorial Board Members
  • Contents
  • Guidelines
    • Guide for Authors
    • Reviewer's Guide
  • Publishing Ethics
  • Contact
  • Subscribe now
  • en_US

© 2012-2025 Национално издателство "Аз-буки"