An Efficient way of Record Linkage System and Deduplication using Indexing techniques, Classification and FEBRL Framework
Nishand. K1, Ramasami. S2, T. Rajendran3
1Nishand.K, II-ME CSE Department of Computer Science and Engineering, Angel College of Engineering and Technology, Tirupur-641 665.
2Ramasami. S, Assistant Professor Department of Computer Science and Engineering, Angel College of Engineering and Technology, Tirupur-641 665.
3Dr. T.Rajendran, Dean Department of Computer Science and Engineering, Angel College of Engineering and Technology, Tirupur-641 665.
Manuscript received on May 11, 2013. | Revised Manuscript received on May 15, 2013. | Manuscript published on May 25, 2013. | PP: 69-73 | Volume-1 Issue-7, May 2013. | Retrieval Number: G0291051713/2013©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Record linkage is an important process in data integration, which is used in merging, matching and duplicate removal from several databases that refer to the same entities. Deduplication is the process of removing duplicate records in a single database. In recent years, data cleaning and standardization becomes an important process in data mining task. Due to complexity of today’s database, finding matching records in single database is a crucial one. Indexing techniques are used to efficiently implement record linkage and deduplication. In this paper, three indexing techniques namely blocking index, sorting indexing and bigram indexing are used with a modification of existing techniques that reduces the variance in the quality of the blocking results. In addition to the indexing techniques, six comparison techniques and two classifiers are used. There is a potential for large performance speed-ups and better accuracy to be achieved by using indexing techniques along with comparison and classifier techniques.
Keywords: Record linkage,Indexing techniques, data matching, blocking, Febrl framework