| Author | Do Duc Hanh |
| Call Number | AIT Thesis no. CS-95-05 |
| Subject(s) | Data compression (Computer science)
|
| Note | A thesis submitted in the partial fulfillment of the requirement for the degree of Master of
Engineering |
| Publisher | Asian Institute of Technology |
| Abstract | Digital libraries require efficient methods of storing vast amounts of information in such a
way that provides fast search and retrieval. But there is a conflict. Decompression increases
access time and the need for an index enlarges stored space. This study was involved in efficient
compression methods of large Vietnamese text documents to create databases for digital libraries.
The characteristics of Vietnamese text were analyzed. The zero-order word-based method
coupled with the canonical Huffman coding was used to compress Vietnamese text documents.
Then an in-place merging algorithm was used to create inverted files. Finally, the coding methods
of integers were used to reduce a space requirement of temporary and inverted files. By the
proposed approach, the documents can be decoded fast and full-text queries are supported on
compressed documents. The size of compressed database (including indexing to every word) is
about 40% of the original text size. |
| Year | 1995 |
| Type | Thesis |
| School | School of Engineering and Technology (SET) |
| Department | Department of Information and Communications Technologies (DICT) |
| Academic Program/FoS | Computer Science (CS) |
| Chairperson(s) | Huynh, Ngoc Phien; |
| Examination Committee(s) | Phan, Minh Dung;Batanov, Dentcho N.; |
| Scholarship Donor(s) | The Swedish International Development Authority Agency (SIDA); |
| Degree | Thesis (M.Eng.) - Asian Institute of Technology, 1995 |