The Biostatistics Core is overseen by the Leadership and Administrative Core (LAC), operates in collaboration with the Genetics, Genomics and Molecular Core (GGMC), and is a resource for the Research Career Development Core (RCDC), Pilot studies and supported research investigations.
The Biostatistics Core consists of two sub-units: the Data Administration Unit and the Data Analysis Unit.
The Data Administration Unit is led by a Senior Data Systems Manager. This position reports to the Core leadership for specification of activities but has independent responsibility for design, creation, maintenance and security of data entry systems, database structures, web applications and data-archiving procedures.
The Senior Data Systems Manager directly supervises the database programmer, Long He, PhD, and the SAS programmer. Dr. He implements procedures to develop and maintain the Ollder Americans Independence Center (OAIC) database. He also augments the OAIC website—key for bringing to fruition the consortium of frailty-related databases we are developing, as web utilities will be the main portal for requesting and obtaining data. He has over four years experience in the design and implementation of such applications.
The SAS programmer performs data cleaning, flattens raw files, constructs and documents analytic datasets, provides data for quality control analyses and assists researchers in utilizing data.
The Data Analysis Unit is staffed by two master-degree level statisticians and a doctoral student in the JHSPH Department of Biostatistics. Wenliang Yao, MS, primarily supports the RCDC and the Genetics Core. Jin Xia, MS, assists in pilot, external and core-supported research, and produces educational manuals and seminars.
Both collaborate with OAIC investigators to implement routine and high-level data analyses to achieve scientific aims, support grant submissions, presentations and LAC needs, and co-author manuscripts. We also include a PhD student in Biostatistics, to collaborate with the leadership of this Core and the GGMC on projects requiring intensive statistical programming (e.g., development projects).
Finally, an administrative assistant provides assistance to the data systems manager in preparing codebooks and periodic reports summarizing current database status. This assistance allows for substantial knowledge of OAIC data, and this position also assists investigators in becoming familiar with data available for their use.
Data Management Resources
Data Acquisition and Repository System (DARS) is a web-based system for facilitating data storage, display and access by investigators.
DARS consists of two modules that function independently but are linked internally at the back end through a SQL server data warehouse: data acquisition module and data repository module. The former allows investigators to submit proposals for analysis of available databases and supports online capabilities where reviewers can comment on and rate proposals.
Additional features include email notification (e.g., when comments are added to proposals), constraints on proposal length, and simple scoring forms to streamline the submittal and review process. Once analysis proposals are approved, investigators can access the data repository module and self-select sections within the relevant database that are appropriate for their study.
The data repository module provides a simple-to-use method for browsing archived datasets at form-, section-, and question-level. At each level, the system clearly indicates whether the form/section/question was or was not included from each of the exam visits in the study.
Most importantly, at the question level, investigators can look at summary statistics, such as the number of null or non-null values for each question by visit, and can see summary counts of the individual responses to questions and compare the counts from one visit to the next. It also includes an inflectional search mechanism that allows users to enter any search term and the system will return all questions that with that term (or related terms) and all questions with responses that contain the term.
For example, using the word “drive” as a search term returns questions containing “drive,” “driven,” “driving,” “drives,” etc. Questions such as “When you go to the doctor or other medical care, how do you usually get there?” in which the response could be “drives self” are also returned.
DARS has been implemented for the Women’s Health and Aging Study data. The search feature has become the most popular method for browsing these data, as it so quickly allows researcher to find what they are looking for. DARS also makes it possible to track and manage requests by project staff for data access by multiple users.
The Women's Health and Aging Study Blood Database Reporting System (WHAS Blood Database Reporting System) is a web application that allows users to access real-time summary of the database. The database currently contains over 100 blood measures collected yearly over three years in WHAS I and every 18-36 months over 11 years in WHAS II. Summary reports can be automatically generated by using our simple online query form, such as study-level summary of number of available blood draws and number of non-missing values for each blood measure by study visit, and subject-level summary of number and timing of available blood draws, as well as assay information (e.g. manufacturer, CVs).
We’re committed to strengthen RCDC support of promising junior investigators by maximizing the investigators’ basic statistical skills, facility with modern tools for acquiring and transmitting data, their breadth of access to data and ability to apply the most effective statistical methodology for their research.
One way to accomplish this is through the superb quantitative education and intellectual enrichment opportunities already available at our institution, prominently including courses provided by the Bloomberg School of Public Health (e.g. a two-term, online course of introduction to biostatistics: (140.611-2).
However, we have also recognized a need to supplement already-available statistical training with material either disseminating development project research, focusing on frailty or making general statistical learning specific to aging. To this end we have begun developing video teaching modules. We have completed a module on frailty measurement; and filming is nearly complete for modules on longitudinal data analysis and missing data. We design these talks to be both informational and application-based.