Roger Johnson

Research Activities

Interval and Temporal Database Management

I have had a long involvement in temporal data management. The major results from this were the development, with my then PhD student, Dr Nikos Lorentzos, [3], of the first use of a temporal interval data type to support temporal data, [1]. We subsequently generalised this to a generalised interval, [2]. This work was later influential with the ISO Standards Committee on Database Languages in considering potential temporal extensions as noted in the new book by Date, Darwen and Lorentzos, “Temporal Data and the Relational Model”.

I continued with this work and explored the use of nested relations with my research students Yung Jang and Georgia Garani. With Yung Jang, we addressed issues relating to managing temporal data with a nested data model, [4] and subsequently with an object model, [5]. I continued this work with Georgia Garani and we published the first generalised nested join algorithm, [6]. A fuller discussion is in her recently examined PhD thesis [7].

Arising from a comparison of two potential temporal extensions, ATSQL and IXSQL, my research student, James Green, and I perceived a need to evaluate which extension was the more “usable” and why. Basing our work on the HCI literature on the subject, a series of experiments is being conducted at the moment which involves subjects in studying a temporal extension and then attempting a series of tests involving writing queries in the relevant SQL extension.

During the design of the tests it became clear that results could be influenced by whether subjects did a “pencil and paper” test or were tested in an online simulation. A generalised package was developed to build simulations of SQL extensions.

1. Extending Relational Algebra to Manipulate Temporal Data (with N A Lorentzos)

Information Systems Vol 13 (3), 1988 pp 289-296.

2. An Extension of the Relational Model to Support Generic Intervals (with N A Lorentzos)

Proc of the International Conference on Extending Database Technology, Venice, Italy, 1988 in Advances in Database Technology ‑ EDBT'88, Ed. J.W.Schmidt, S.Ceri & M.Missikoff, Springer‑Verlag, 528‑542 (1988).

3. A Formal Extension of the Relational Model for the Representation and Manipulation of Generic Intervals

N A Lorentzos, PhD thesis, London University, 1988.

4. A Heterogeneous Temporal Nested Relational Data Model (with Y P Jang*)

Proc of 7th IEEE-CS International Symposium on Computer and Information Sciences, Antalya, Turkey, pp 369-375. Published by IEEE-CS, 1992.

5. Evolutions of Object States in Temporal Object-Oriented Databases (with Y P Jang*)

Proceedings of 22^nd ACM Computer Science Conference CSC'94, Phoenix, Arizona, pp 304-311. Published ACM, 1995.

6. Joining Nested Relations and Sub-relations

(with G Garani*)

Information Systems Vol 25, 4 (2000) pp 287-307.

7. ProSQL - A Prototyping Tool for SQL Language Extensions (with J Green*)

Proceedings of 20^th British National Conference on Databases BNCOD 20, Coventry, UK, July 2003. Springer Verlag, LNCS, 2003 pp190-197.

8. A Temporal Database Model using Nested Relations

G Garani, PhD Thesis, London University, 2003.

Computer History

Birkbeck’s School of Computer Science and Information Systems can trace its roots back to Prof (then Dr) A D Booth’s Computer Laboratory founded to support J D Bernal’s studies of crystal structures. Booth began building the college’s first computer, using relay technology, in 1946. His first electronic computer, SEC, was completed around 1950. Booth published an algorithm for a parallel multiplier which still forms the basis of the multiplication circuits in a modern PC. He also pioneered rotating memories. Due to engineering problems he failed in the late 1940s in an attempt to build a workable disc but succeeded in building the world’s first drum store which were widely used in the 1950s for both main memory and backing store. His best known machine APE(X)C was designed in 1949. In 1951, BTM used the APE(X)C hardware as the basis of the design of their HEC1 computer which evolved directly by 1956 into HEC4 which was renamed the ICT 1201. This machine was the best selling British computer at the end of the 1950s with a total of about 100 machines installed. A paper, based in part on interviews with the developer of the HEC1 and also a research student of Booth, is in preparation.

The School taught possibly the first Masters degree course in computing when it began the M.Sc in Numerical Automation in 1957. At the same date the Computing Laboratory was re-designated the Department of Numerical Automation and was amongst the world’s first formally established academic computing departments in contrast to “computing laboratories”.

I have created a (very large) Powerpoint version of the history.

I was Programme Chair of the BCS@50 conference marking the BCS 50^th Anniversary, held from July 12-14^th 2007, celebrating the British contribution to computing. My talk is available here.

Another strand of research is being undertaken by my research student, Bruce Williams, who is an authority on calculators, the history of devices for multiplication, especially related to the needs of commerce, [1]

1. Ready Reckoners (with B O B Williams*)

IEEE Annals of the History of Computing Vol 27, No 4 (2005), pp64-80.

2. Tabular Calculators (with B O B Williams*)

Under revision for IEEE Annals of the History of Computing.

3. HEC 1200 Computer - In preparation.

For submission to IEEE Annals of the History of Computing.

Software Re-use

Software re-use has been a major challenge for many years. I am co-supervising a research student who is exploring issues relating to the re-use of Java components. Having examined started with several general aspects of the problem, the latest papers are starting to address the problem of obtaining a metric for the extent of coupling within Java code. So far two papers have been published [1,2] and one accepted [5] and two journal papers are being reviewed [3, 4].

1. Trends in Java code changes: the key to identification of refactorings? (with Y Hassoun*, S Counsell)

Proceedings of 2nd ACM International Conference on the Principles and Practice of Programming in Java, Kilkenny, Ireland, June 2003, pp 45-48.

2. Reusability, Open Implementation and Java’s Dynamic Proxies (with Y Hassoun*, S Counsell, K Mannock and E Mendes)

Proceedings of 2nd ACM International Conference on the Principles and Practice of Programming in Java, Kilkenny, Ireland, June, 2003 pp 3-6.

3. Reusability and Open Implementations in Distributed Environments (with Y Hassoun*, S Counsell)

Software - Practice and Experience, Wiley, Vol 35, No 1, pp75-99.

4. A Dynamic Runtime Coupling Metric for Meta-Level Architecture (with Y Hassoun*, S Counsell)

Proceedings of IEEE 8th European Conference on Software Maintenance and Reengineering (CSMR 2004), Tampere, Finland, March, 2004, pp339-346

5. Dynamic Coupling Metric: Proof of Concept (with Y Hassoun*, S Counsell)

IEE Proceedings Vol 152, No 6 (2005) pp273-279.

ADEPT - Parallel Relational Database Engine

Processing operations involving interval data types gave rise to considerable computational loads in executing the additional operators needed. At the time the School had a Meiko Computing Surface with 32 transputers. With my colleague Nigel Martin and research assistant Xiaoding Zhao we designed a parallel implementation of an SQL database engine. This was successfully implemented and demonstrated proof of concept. The architecture employed mapped the execution tree derived from the SQL query on to the transputers. Each node in the tree could be allocated to a number of transputers reflecting the expected volume of tuples to be processed and the complexity of the instruction. The results then “flowed” on to one of the transputers allocated to the next node in the tree.

This engine was, to the best of the implementors’ knowledge, the first working parallel database engine in the UK. Work on the engine ceased when vendors such as IBM and Oracle launched products in which linked copies of their relational DBMS ran on multiple processors although we continued to work on the underlying algorithms for several more years.

1. PPS ‑ a Parallel Partition Sort Algorithm for Multiprocessor Database Systems (with X.Zhao*, N Martin)

Proceedings of 11th International Conference on Database and Expert System Applications (DEXA).

IEEE Computer Society, 2000, pp 635‑644.

2. DBJ - A Dynamic Balancing Hash Join Algorithm in Multiprocessor Database Systems (with X Zhao*, N Martin)

Information Systems, Vol 18 (2), pp 89-100, April 1994.

3. Dynamic Distribution Estimation for Hash Partitioning in Multiprocessor Database Systems (with X Zhao*, N Martin)

Proceeding of High Performan ce Computing Conference 94, National Supercomputing Research Centre, Singapore. National University of Singapore, pp 144-150.

4. An Improved Dynamic Balancing Hash Join Algorithm (with X Zhao*, N Martin)

Proceeding of EDBT'94 Extending Data Base Technology, Cambridge, March 1994. pp 301-308, Springer-Verlag, 1994.

Intelligent Data Analysis – Chemistry & Bioinformatics

My involvement with intelligent data analysis started with the SCAMS (Structure Characterisation and Automation for Mass Spectrometry) project which was a collaboration between our department, Kodak UK and VG Biotech (later bought by Finnigan). The problem addressed was that faced by Kodak who in their laboratories carry out chemical experiments to manufacture chemicals compounds and whose scientists need to ascertain that the reaction has produced the expected results. This involves testing the results of the experiment with a mass spectrometer. The problem addressed by SCAMS was to assess whether the chemical result of the experiment matched the result predicted by the chemist. This involved cleaning the MS results, analysing the predicted results and comparing the expected and actual results to determine if they matched. The project was the subject of a number of papers, notably [1,2, 3].

Similar techniques were also applied to medical and bioinformatics data and reported in [4, 5, 6]. The work reported in [4] led on to further work on protein structures although most of the work was related to the integration of data from multiple sources into a re-implemented CATH database, see ref [7].

1. Knowledge-Based Data Generation (with H Dettmar*, X Liu et al)

Knowledge‑Based Systems, Vol 11, pp167‑177, 1998.

2. Advanced Data Pre‑processing in SCAMS (with X Liu, M Phalp*)

4th European Congress on Intelligent Techniques and Soft Computing, Aachen, Germany September 1996, Vol 3, pp 1595-1599.

3. The Acquisition and Application of Meta-Knowledge in the SCAMS System (with Dettmar H J* et al)

AICS'94, Proceedings of the 7th Annual Conference Dublin, pp75-88.

4. Molecular simulations on proteins: A software protocol for analysis and comparison of simulation data in relation to experimental crystallographic data (with M Phalp*, M Knaggs and J Goodfellow)

Technical Report, Computer Science Department, Birkbeck College, 1997.

5. Domain Knowledge in Intelligent Data Analysis (with X. Liu)

Proc of the 16th IFIP World Congress, August 2000, Beijing, China. Intelligent Information Processing Conference, pp 449-456.

6. Soft Computing for Intelligent Data Analysis (with X Liu et al)

Proc. of the 18th International Conference of the North American Fuzzy Information Processing Society, pp 527‑531,

IEEE Press, New York, (1999).

7. PFDB: A Generic Protein Family Database integrating the CATH Domain Structure Database with Sequence Based Protein Family Resources (with A Shepherd*, P Kellam, N J Martin, C Orengo)

Bioinformatics Vol 18, No 12 (2002), pp1666-1672.

R.Johnson@bcs.org.uk

webmaster@dcs.bbk.ac.uk