Research Design
SCIENCE, THEORIES AND HYPOTHESES:
Science is a step-by-step acquisition of knowledge. The goals of science are to describe natural events or phenomena, understand and explain natural phenomena and control natural phenomena by understanding the causes of events and predicting their occurrences
Speech and language science is based on many foundations among which are theories and hypotheses. A theory is a comprehensive description and explanation of a total phenomenon. On the other hand, a hypothesis is concerned with a more specific prediction stemming from a theory. As such, hypotheses are limited in scope compared to theories. For example, the behavioral theory of language learning explains the process of learning in all children around the world. A hypothesis might address language learning in children with autism. To test their hypothesis, scientists gather data that are obtained by systematic observation (empirical; based upon events that resulted in some form of sensory contact) and in many cases, experimentation. Scientists observe events and record some measured values of those events (e.g. the actual number of dysfluencies when stress is increased)
RESEARCH AND ITS TYPES:
Research can be defined as the structured inquiry that utilizes acceptable scientific methodology to solve problems and create new knowledge that is generally acceptable. Research is what scientists do as they practice science. It is the process of asking and answering questions. It is science in action
Research can be classified in different ways; from application perspectives, objectives perspectives and modes of enquiry perspective. From the application perspective, research can be classified into pure and applied research. From the objectives perspective, it can be classified into descriptive, exploratory, correlation and explanatory research. From the mode of enquiry perspective, research can be classified into quantitative and qualitative research. The most commonly used types of research in the field of communication disorders are experimental and descriptive researches
EXPERIMENTAL RESEARCH:
The hallmark of experimental research is the investigation of cause-effect relationships for example, studying the efficacy of a language intervention program on the child's academic achievement. This lends itself to a pre-test/post-test methodology in which the research will determine the academic achievement prior to the intervention and then again after the language intervention program has been implemented. However, in order to determine the actual impact of an intervention, a pre-test/post-test methodology must always be compared with a control group. researchers use either randomization or matching. Using the first option, they randomly draw a sample, or a small number, of participants needed for the study from population. A population is a large, defined group (e.g. patients scheduled for laryngectomy surgery, people who stutter) identified for the purpose of a study. Randomly, selected participants are then randomly assigned to different groups. These two kinds of randomization-random selection and assignment- are expected to result in groups that are equal to begin with. The selection is random when each potential participant in the population has an equal chance of being selected for the study. The assignment is random when two levels of randomization reduce experimenter bias in selecting participants and ensures that the sample is representative of the population.
Experimental studies must entail random assignment of units e.g. people to the levels, or categories of the manipulated variable. The goal of having these two groups is to demonstrate that the experimental participants improved and the control participants did not, thus showing the efficacy of treatment. In forming two or more groups,
Quasi-experiments refer to investigations that have all elements of an experiment except that the subjects are not randomly assigned into groups
After assigning subjects into groups, experimenters need to manipulate independent variables to assess the effect of these variables upon independent variables. A good experimental research also involves conditions that are carefully controlled to eliminate extraneous or confounding variables to make sure that only the independent variable of interest is affecting the dependent variable. A confounding variable is an extraneous variable that is statistically related to (or correlated with) the independent variable. This means that as the independent variable changes, the confounding variable changes along with it. Failing to take a confounding variable into account can lead to a false conclusion that the dependent variables are in a causal relationship with the independent variable
DESCRIPTIVE RESEARCH
The main objective of a descriptive research is to describe a certain phenomenon. However, it cannot establish cause-effect relationship as there is no manipulation of variables. Descriptive research studies include comparative research, normative research, correlation research, ethnographic research.
The purpose of comparative research is to measure similarities and differences of groups of people with defined characteristics. Here, the confounding variables are not controlled e.g. patients with dementia might perform differently on receptive tasks than healthy subjects due to educational or socio-economic differences rather than the presence or absence of dementia. In addition, the variables are not termed independent and dependent variables. They are referred to as classification variable (for example having or not having a history of dementia) and criterion variables (receptive scores).
Correlation research is another type of descriptive research. It measures the strength of relationship or associations between variables. However, it does not imply causation. A positive correlation means that as one variable increases, the other also increases. On the other hand, the negative correlation means that as one variable increases, the other decreases. An example of correlation descriptive studies would be studying the correlation between autistic features and sensory integration dysfunction or communication skills in a group of children having autism.
Developmental (normative) research is a descriptive research that measures changes in subjects over time as individuals get older. It can be longitudinal, cross-sectional or semi-longitudinal. Cross-sectional studies are simple in design and are aimed at finding out the prevalence of a phenomenon, problem, attitude or issue by taking a snap-shot or cross-section of the population. This obtains an overall picture as it stands at the time of the study. In cross-sectional studies, participants from various age levels are selected and studied. An example of cross sectional study would be taking a sample from third graders, fourth graders and fifth graders and comparing their language to see how language develops with age assuming that when the third graders grow they will have the same language skills like fourth graders and thus, because it is based on assumption, it is less accurate than the longitudinal study but much saver regarding time, effort and cost.
In longitudinal (cohort) studies, the same participants are studied over time. Some longitudinal studies last several months, while others can last decades. In longitudinal studies, variables are not manipulated and no causal relationships are detected. Though lengthy and expensive, longitudinal studies are more accurate than cross-sectional studies in describing naturally occurring phenomenon e.g. stages of pragmatic development in typically developing children
Semi-longitudinal study is a sort of a compromise between cross-sectional and longitudinal studies. In semi-longitudinal studies, the total age span to be studied is divided into several overlapping age spans. The subjects selected are those who are at the lower and of each age span, and they are followed until they reach the upper end of their age span. For example, groups might be as follows; 3 year-olds followed until the age of 4 and 4 year-olds followed until the age of 5 year-old and 5 year-olds followed until the age of 6 year-olds. The researcher can then make observations both between as well as within subjects as time passes
Cohort (longitudinal) studies can be further subdivided into retrospective and prospective studies . Retrospective / Ex-post-facto means after-the-fact-research i.e. it examines information and specimens that have been collected in the past e.g. an attempt to find out how many children admitted to a children’s hospital in the past five years had a swallowing disorder.
In contrast to retrospective research, prospective studies begin in the present and follow subjects in the future e.g. one might design a study in which the children who comes to the outpatient clinic for post-cochlear implant rehabilitation.
Ethnographic research is a descriptive research. It is relatively new in the field of communication disorders. It involves observation & description of naturally occurring phenomenon by dealing with qualitative data. Disadvantages of ethnographic research are that it is time consuming, often expensive, yields data that are difficult to quantify and lacks the objectivity of experimental research . An example of ethnographic research would be studying how the production of the affricate /dᶾ/ varies in children from Upper Egypt versus ones living in Cairo.
Survey research inspects the prevalence of a certain phenomenon by asking people. The tools most commonly used are questionnaires and interviews. These need to be cautiously designed to avoid any possible bias . The purpose of survey research is to generate a detailed inspection of the prevalence of phenomena in an environment by asking people as opposed to direct observation .
RESEARCH PROCESS:
Research consists of three steps; posing a question a question, collecting data to answer the question, and presenting an answer to the question . These steps can be further divided into many sub-steps among which are the following:
1. CHOOSING A TOPIC:
For a researcher to choose a topic, it is important to consider a broad area of inquiry and interest. This may be as broad as “language,” but it should be an area that is of interest to the researcher. However, a broad area is useful only at the beginning of a research plan.
Within a broader topic of inquiry, each researcher must begin narrowing the field into a few subtopics that are of greater specificity and details. Oftentimes, students as well as professional researchers discover their topics in a variety of conventional and unconventional ways. Many researchers find that their personal interests and experiences help to narrow their topic. The researcher has to also consider whether it would be feasible to collect the data, and if so, would it be ethical, valid and reliable to conduct. In the field of communication disorders, for example, a researcher might be interested in "language" but could focus more specifically on “language development in children.” Although this topic is still too broad for a research project, it is more focused and can be further specified into a coherent project.
2. FORMULATING A RESEARCH PROBLEM
A good research question has to address an important relevant issue. It has to be logical, ethical, feasible to study and novel. This means that there will be some new aspect of the study that has never been examined before. This does not mean that we should avoid replicating past research. In fact, not only is replication a good way to get a research methodology, it is how science is supposed to advance knowledge. However, when replicating a pervious study, it is best to add or change one or two things to increase the novelty of the research.
A good research question needs to be “operationalizable”: Oftentimes, beginning researchers pose questions that cannot be operationalized, or assessed methodologically with research instruments. In general, the more abstract the idea, the harder it is to operationalize.
The research question has to also be of adequate cost-effectiveness value. It also needs to be within a reasonable scope: the more focused the research question, the more likely it will be a successful project. For example, a study that seeks to identify the prevalence of autism in a specific area is more likely to succeed than a comparable study that seeks to identify autism prevalence in the world population.
3. PLANNING A RESEARCH:
In designing a study, the researcher may find it helpful to consider the relationship between the research question (the question he wants to answer), the study design and what is the study expected to answer taken into consideration the anticipated errors of implementation. Good judgment by the investigator and advice from colleagues are needed for the many trade-offs involved and for determining the overall viability of the research. Estimating the sample size is also one of the most important early parts of planning a research.
4. CONDUCTING A RESEARCH
4-A) SELECTING A POPULATION:
Once the researcher has chosen a hypothesis to test in a study, the next step is to select the study population from the target population. A target population refers to all subjects of interest to whom the conclusions of the study will be applied. On the other hand, study population refers to the people actually available and accessible for study.
A researcher often cannot work with the entire population of interest, but instead must study a smaller sample of that population in order to draw conclusions about the larger group from which the sample is drawn [16]. An example of a population is the population of children having stuttering in Egypt, an example of a sample is a group of third-grade Egyptian children having stuttering as opposed to those is the second grade and an example of an element is a single child with stuttering. In selecting the study population, researches may need to specify research entry data by using inclusion criteria, exclusion criteria and stratification
4-B) COLLECTING DATA:
Data comprise observation on one or more variables; any quantity that varies is termed a variable e.g. variables affecting a study on fundamental frequency levels can be age, gender, etc. Data is usually obtained from a sample of individuals which represents the population of interest.
The data collected from the study population can be quantitative or qualitative . Qualitative (categorical or nominal) data are verbal descriptions of attributes of events. In a nominal data, a category is present (e.g., hypernasality) or absent (normal nasality)
Numerical / quantitative data are numerical descriptions of attributes of events. Examples of quantitative data are when the researcher states that; in a 5-minute spontaneous speech sample, the participants omitted word final phonemes 75 % of the time. On the other hand, data can be described as discrete when the variable can only take certain whole numerical values e. g. days on stuttering
An ordinal scale is a numerical scale that can be arranged according to rank orders or levels. Ordinal scales use relative concepts such as greater than or less than. The intervals between numbers of categories are unknown.
Examples of ordinal scales of measurement are; 1=strongly agree, 2=agree, 3=neutral, 4=disagree, and 5=strongly disagree)
An interval scale of measurement is a numerical scale that can be arranged according to rank orders or levels; the numbers on the scale must be assigned in such a way that intervals between them are equal with regard to the attribute being scaled. The ratio scale has the same properties as interval scale, but numerical values must be related to an absolute zero point. The zero suggests an absence of the property being measured . An example of a ratio scale is one that involves frequency counts in stuttering; it is possible to have zero instances of stuttering in a speech sample.
Sometimes an arbitrary value such as scores is used when quantities cannot be measured. For example, a series of questions in a sensory integration questionnaire is summed up to give the overall tactile / vestibular sensory dysfunction score.
4-C) DATA ENTRY:
While entering data, it is essential to avoid errors and missing data. For categorical data, assigning numerical codes to categorical data before entering the data will be essential. The researcher has to thoroughly revise the data to exclude the existence of any outliers. There are several reasons why subjects could be outliers. One reason is that they are different from other subjects and another reason is that subjects have systematically responded without really thinking of what they were doing.
Cleaning data is a rather simple but necessary step. The researcher needs to check that all data lie within the expected range e.g. calculating mean scores for each item and then checking that the listed values lie within the expected range.
4 E.) CARRYING OUT STATISTICAL ANALYSIS:
Statistics encompasses the methods of collecting, summarizing, analyzing and drawing conclusions from the data. The aim of any statistical study is to condense data in a meaningful way and extract useful information from them. It is true that a part of the theory of statistics concerns effective ways of summarizing and communicating masses of information which describe some situation. This part of the overall theory and set of methods is usually known as descriptive statistics.
DESCRIPTIVE STATISTICS:
Descriptive statistics simply describes the data pertaining to a population or a sample, specifically the center of the data (e.g. mean, median and mode), spread (variability of the data points), and shape of the plotted graph (e.g. symmetrical or not). There must be some evidence that the sample chosen is a representative of the target population as this will greatly affect the interpretation of the results obtained.
DESCRIBING THE AVERAGE VALUES OF DATA:
The arithmetic mean: often called the mean, of a set of values is calculated by adding up all the values and dividing this sum by the number of values in the set.
- The median: If data are arranged in order of magnitude, starting with the smallest value and ending with the largest value, then the median is the middle value of this ordered set.
- The mode: The mode is the value that occurs most frequently in a data set; if the data are continuous; the data are usually grouped and the modal group calculated
- The weighted mean (overall mean): When a weighted mean is used, certain values of the variable of interest are more important than the others.
- A trimmed mean is one in which the highest and lowest values are omitted this reducing the distorting effect or outliers. A 5% trimmed mean is one in which the top 5% and the lowest 5% of the data are removed.
- An approximate mean resembles the weighted mean but is used when data points are intervals
- A geometric mean summarizes changes over time as the average ratio or rate of change for example in tracking how fast the practice of speech and language pathology grew in Egypt in the last three years.
DESCRIBING THE SPREAD OF DATA / MEASURES OF DISPERSION:
If two summary measures of a continuous variable, one that gives an indication of the average and one that describes the spread of the observation, then the data can be condensed in an meaningful way.
- Range: high and low scores
- The standard deviation: It is the distance from the mean. The standard deviation is the square root of the variance. We can think of the standard deviation as a sort of average of the deviations of the observations from the mean.
- Ranges derived from percentile: The values of x that divide the ordered set into 10 equally sized groups , that is 10th, 20th, etc., are called deciles. The values of x that divide the ordered set into four equally sized groups, that is 25th, 50th, and 75th percentiles, are called quartiles. The 50th percentile is the median. By using percentiles, we can obtain a measure of spread that is not influenced by outliers by excluding the extreme values in the data set and determining the range of the remaining observations.
- The variance: One way of measuring the spread of data is to determine the extent to which each observation deviates from the arithmetic mean. The larger the deviation, the greater the variability of the observation.
- If distribution of data is relatively symmetrical, the three measures of central tendency should be the same. This would result in a normal distribution or bell-shaped curve.
INFERENTIAL STATISTICS:
Although descriptive statistics form an important basis for dealing with data, a major part of the theory of statistics is concerned with how can one go beyond a given set of data and make general statements about the large body of potential observations, of which data collected represent but a sample. This is the theory of inferential statistics.
4. F) WRITING A RESEARCH REPORT
The researcher then writes his research report with its entire components e.g. abstract, introduction, objectives, results and discussion. Research reports need to be comprehensive as well as readable. This needs to be followed by drawing conclusions, deriving possible implications and recommending future studies. While writing research reports, researchers need to adequately refer to recent references whenever possible.
EVALUATION OF A RESEARCH
Measures in speech-language-pathology whether they apply to research studies or clinical practice, need to be valid and reliable . Validity refers to whether or not a study is well designed and provides results that are appropriate to generalize to the population of interest. It is an indicator of how much meaning can be placed upon a set of test results. For example, a valid child language test should measure language skills, not auditory memory.
Any research should be evaluated based on two distinct features; internal validity and external validity. Internal validity supports the conclusion that the causal variable caused the effect variable in a specific study. Internal validity applies in studies that seek to establish a causal relationship between two variables, and it refers to the degree to which a study can make good inferences about this causal relationship. The essence of internal validity is whether or not a researcher can definitively state that the effects observed in the study were in fact due to the manipulation of the independent variable and not due to another factor.
“Third variables” that the researcher may not consider or may not be able to control can affect the outcome of a study and can therefore prevent internal validity e.g. demonstrating the effect of a drug intake on voice and proving that the drug was the cause of voice change and not, for example using non gender-matched groups. Thus, the researcher can't prove whether the differences between the two groups were secondary to difference in male-female distribution across the two groups or due to a true effect of the independent variable (the drug) on the dependent variable (voice change). Good experimental techniques in which the effect of an independent variable on a dependent variable is studied under highly controlled conditions (including eliminating of any confounding variable), usually allow for higher degrees of internal validity than, for example, single case designs. Unfortunately, many factors can reduce the internal validity include instrumentation, history, statistical regression, maturation, attrition, testing, subject selection biases, and interaction of factors.
External validity refers to generalizability i.e. to what settings, populations, treatment variables and measurement variables the effect can be generalized i.e. it is concerned with the extent to which the conclusions can be generalized to the broader population. A study is considered to be externally valid if the researcher’s conclusions can in fact be accurately generalized to the population at large scale (i.e. across time and space). External validity is usually split into two distinct types, population validity and ecological validity (whether the results can be applied in real life situations) and they are both essential elements in judging the strength of an experimental design.
External validity of a study can be threatened by several factors. This include Hawthorne effect (the extent to which the participants' knowledge that they are taking part in a research o that they are being treated differently than usual), subject selection, multiple treatment interference and reactive and interactive effects of pretesting for example, people who regularly abused their voices through speaking loudly and using hard glottal attacks might fill out a questionnaire before treatment that assessed the frequency with which they used such abusive vocal habits. The participants, thus sensitized to how often they abused their voices, might begin to modify their vocal quality
Face validity is a measure of how representative a research project is ‘at face value,’ and whether it appears to be a good project. On the other hand, construct validity defines how well a test or experiment measures up to its claims. A test designed to measure speech nasality must only measure that particular construct, not closely related ideals such voice quality. Construct validity is the degree to which test scores are consistent with theoretical constructs or concepts. For instance, a test of language development in children should meet the theoretical expectation that as children grow older, their language skills improve.
Convergent validity tests the constructs that are expected to be related are, in fact, related whereas Discriminant validity tests that constructs that should have no relationship do, in fact, not have any relationship (also referred to as divergent validity).
Other types of validity that are important to be considered while designing a new measuring tool or instrument are content, concurrent and predictive validity. Content validity is the estimate of how much a measure represents every single element of a construct. It is a non-statistical type of validity that involves "the systematic examination" of the test content to determine whether it covers a representative sample of the behavior domain to be measured. A test has content validity built into it by careful selection of which items to include. Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. By using a panel of experts to review the test specifications and the selections of items, the content validity of a test can be improved. The experts will be able to review the items and comment on whether the items cover a representative sample of the behavior domain.
Concurrent validity measures the test against a benchmark test and high correlation indicates that the test has strong criterion validity. For example, a new receptive vocabulary test might be correlated with the well-established Peabody Picture Vocabulary Test-Revised to demonstrate the concurrent validity of the new test. A moderate, positive correlation is good for the new test. However, if the correlation is too high, however, there may be questions of the need for the new test.
Predictive validity is a measure of how well a test predicts abilities. Predictive validity is also referred to criterion validity. Broadly speaking, a criterion is any variable (e.g. language development one wishes to explain and / or predict by resorting to information from other variables. Predictive validity is the accuracy with which a test predicts future performance on a related task. It involves testing a group of subjects for a certain construct and then comparing them with results obtained at some point in the future. For example, a graduate student's score on comprehensive examinations might predict whether or not he or she will be a competent clinician. Thus, future performance is the criterion used to evaluate the predictive validity of a measure; the comprehensive examination in this case.
RELIABILITY
Reliability refers to consistency with which the same event is measured repeatedly. Scores are reliable if they are consistent across repeated testing or measurement. The concept of reliability applies to any kind of measures, including standardized tests.
Most measures of reliability are expressed in terms of correlation coefficient. The correlation coefficient is a number or index that indicates the relationship between two or more independent measures. It is usually expressed through Pearson Product Moment r (often referred to as Pearson r). An r value of 0.00 indicates that there is a relationship between two measures. The highest possible positive value is 1.00. Conversely, the lowest possible negative value of r is -1.00. The closer r is to 1.00, the greater the reliability of the test or measurement. There are several types of reliability of a measure or a test.
Inter-observer or inter-judge (inter-rater) reliability refers to the extent to which two or more observers agree in measuring an event. For example, if three judges independently rate the fluency of a subject, there is a high inter-judge reliability if there is good agreement between judges. Optimally, good agreement results in an inter-judge reliability coefficient of .90 or more.
Intra-observer or intra-judge reliability refers to the extent in which the same observer repeatedly measures the same event consistently. For example, if the same clinician rates a child's intelligibility over several occasions, those ratings should be consistent if there is good intra-observer reliability (assuming that the child's intelligibility has not changed).
Alternate form reliability also known as parallel form reliability is based on the consistency of measures when two parallel forms of the same tests are administered to the same people. For example, the Test of Nonverbal-Intelligence-Third Edition (TONI-3) includes Form A and Form B. If both those forms are administered to an adult client and the scores are very similar, then the TONI-3 has alternate form reliability.
Split-half reliability is a measure of internal consistency of a test. Split-half reliability is determined by showing that the responses to items on the first half of a test are correlated with responses given on the second half or the responses to even-numbered items should correlate with responses to odd-numbered items. Split-half reliability generally overestimates reliability because it does not measure stability of scores over time.
EVIDENCE-BASED PRACTICE:
After examining the internal and external validity of research results, clinicians may choose techniques that are supported by well-designed and well-executed therapy efficacy studies. Clinicians have to use techniques and procedures developed by research methodologies to consolidate, improve, develop, refine and advance clinical aspects of their practice to serve their patients' better. Levels of evidence are often classified into three major classes; Class I evidence (based on a randomized group experimental design study. This is the best evidence supporting a procedure. Class II evidence is based on well designed studies that compare the performance of groups that are not randomly selected or assigned to different groups. Class III evidence is based on expert opinion and case studies. This is the weakest of the levels of evidence.
An alternative way of classifying evidence for clinical procedures accepts all valid research designs and is based on research that is uncontrolled, controlled, and replicated by the same or different investigators. This hierarchy of evidence moves from the least desirable to the most desirable evidence:
- Level 1. Expert advocacy: There is no evidence supporting a treatment; the procedure is advocated by an expert.
- Level 2. Uncontrolled unreplicated evidence: a case study with no control group and the research was done once
- Level 3. Uncontrolled directly replicated evidence: the study did not involve a control group but was repeated by the same researcher in the same setting and has obtained the same or similar levels of improvement.
- Level 4. Uncontrolled systematically replicated evidence: the study did not involve a control group but was repeated by another researcher in another setting with different patients and has obtained the same or similar levels of improvement.
- Level 5. Controlled unreplicated evidence: this is the first level at which efficacy is substantiated for a treatment procedure.
- Level 6. Controlled directly replicated evidence: the study involves a control group was repeated by the same researcher in the same setting and has obtained the same or similar levels of improvement. The technique is now known to produce the same effects, at least in the same setting.
- Level 7. Controlled systematically replicated evidence: this is the highest level of evidence. The study involves a control group and was repeated by another researcher in another setting and has obtained the same or similar levels of improvement. This shows that the studied technique will produce the same effect under varied conditions. A technique that reaches this level may be recommended for general practice.
A critical examination of research evidence is at the heart of evidence-based practice. Clinicians should choose techniques that are supported by well-designed and well-executed treatment efficacy studies.
CONCLUSION & RECOMMENDATIONS:
- Research should be considered an integral part of any clinical practice.
- Researchers have to keep the concept of validity, reliability and evidence-based practice in mind at all times when designing a study.
- A good researcher will discuss the project design with an advisor or a group of colleagues to help ensure that validity is preserved at every stage of the process.
- Research should be considered as an integral part of any clinical practice. A researcher must think very carefully about the population that will be included in the study and how to sample that population.
- Correctly forming research problem, sticking to research ethics while carrying out research work and closely reviewing selected literature prior to as well as during the study.
- Randomly selecting a representative sample of adequate size to stand for the targeted population.
- Identifying variables before starting research, choosing appropriate study designs, and appropriately dealing with missing data and confounding (extraneous) variables.