Evidence Based Practice (EBP)

Evidence Based Practice Definition (EBP) in health care is an old concept with a new name, with the principles embedded in everyday life. EBP is the integration of current, best research results, clinical expertise, and the unique circumstances and values of the patient.

Evidence Based Practice, a foundation of best practice, is the incorporation of three elements into decision-making process of patient care:

Best available research.
clinical expertise.
the circumstances and values of the individual patient.

These concepts are intertwined with the patient examination, the clinical diagnosis and subsequent intervention plan.

Each disease or injury is associated with multiple prevention strategies, diagnostic approaches, and intervention strategies. EBP provides a framework to determine if one strategy is better than another, if both strategies are similar, or if one is just simply wrong.

Levels of Evidence Based Practice

The Centre for Evidence-Based Medicine has developed criteria to evaluate the quality of research. Termed “Levels of Evidence,” it describes a hierarchy of the different sources of data from which clinical decisions are made. Accurate and generalizable studies are more useful in making clinical decisions.

Those at the top of the hierarchy carry more weight than the ones ranked lower:

Levels of Evidence	Description
Meta-analysis	Technique that combines the results of similar high-quality research studies and draws a conclusion based on statistical results.
Systematic review	A literature review that critiques and synthesizes high-quality research relating to a specific, focused question.
Randomized clinical trials	A research technique in which subjects are randomly assigned to an experimental or control group. The experimental group receives the treatment. The control group does not. The results for each group are statistically compared to identify any differences.
Cohort studies	Two groups, one that receives the treatment and one that does not, are studied forward over time to determine the impact of the treatment.
Case-control studies	Similar to cohort studies, but groups are studied from a historical perspective (backwards in time). Differences between groups of patients with the specified condition (the case group) and without the specified condition (the control group) are identified.
Case series	A report on a series of patients with a particular condition; no control group is used.
Expert opinion	An opinion based on general principles, animal or human-based laboratory research, physiology, and clinical experience.

Fundamentals of Interpreting Research

Reliability Definition

Before the diagnostic accuracy of a test can be established, its reliability (how often the same results are obtained) must be determined. A test cannot be diagnostically useful without acceptable reliability.

Clinically, there are two types of reliability: intrarater reliability and interrater reliability.

Intrarater (intraexaminer) reliability

Intraexaminer reliability describes the extent to which the same examiner obtains the same results on the same patient. If the same examiner performs the McManus test on the same patient multiple times, how consistently will the same result (positive or negative) be obtained?

Interrater (interexaminer) reliability

Interexaminer reliability describes the extent to which different examiners obtain the same results for the same patient. If different examiners perform the McManus test on the same patient multiple times, how consistently will they obtain the same findings?

Depending on the type of test, statisticians report reliability using the kappa coefficient (k) or intraclass correlation coefficient (ICC). Clinically, however, the most important aspect of these statistical measures is the interpretation of their relative usefulness.

Remember that the closer the reliability measure is to 1.0, the better it is, and the closer to 0.0, the less reliable it is:

If the reliability measure is . . .	. . . then the clinical usefulness falls within this range
Less than 0.5	Poor
0.5–0.75	Moderate
Greater than 0.75	Good

Measures of Reliability

When a statistician determines the percentage agreement of positive or negative findings for procedures that produce a “yes/no” or “true/false” result (nominal data), the kappa coefficient is used. This statistic accounts for differences attributed to chance alone.

When one outcome is far more common than the other (for example, almost everyone will have a negative result), agreement needs be extremely high for an acceptable kappa coefficient. Reliability between two or more repeated interval measurements (e.g., temperature) or ratio measurements (e.g., range of motion [ROM]) are expressed using an intraclass correlation coefficient.

The closer the reliability measures (k or ICC) are to 1.0, the more reliable are the findings; the closer to 0.0, the more unreliable they are. The diagnostic accuracy of a test is suspect in the presence of low reliability.

Reliability is influenced by factors such as:

consistency in performing a skill,
how the results are interpreted,
the experience and training of the examiner,
the equipment used.

Example

The first step in establishing the usefulness of the McManus test is determining its reliability. To establish intrarater reliability, we would ask the same clinician to perform the McManus test on multiple patients on at least two different occasions.

The sample would consist of some patients who were ACL-deficient and some who were not. The clinician would not know (or would be “blinded” to) the patient’s history or condition when performing the test. We would then compare the examiner’s result for consistency using the kappa coefficient. To what extent does the examiner reach the same finding on each patient each time?

To establish interrater reliability, we would have multiple clinicians perform the McManus test on the same group of patients. Again, blinding is important. We would then measure the extent of agreement in the examiners’ determination of positive and negative results. Fortunately, the McManus test performed well, yielding an interrater reliability of 0.76 and an intrarater reliability of 0.86.

Diagnostic Accuracy

How often do the results correctly identify whether or not the pathology is present? To assess diagnostic accuracy, we first identify a population of individuals on whom to perform the test.

In the previous example, our population would be individuals presenting to us with knee pain. We know that a certain number of these individuals have sustained an ACL injury (the prevalence of a condition) and are trying to determine how effective our test is at correctly categorizing those people who have an ACL tear and those who do not.

Prevalence, the extent to which a condition is present in a specific population, is an important consideration for diagnostic statistics. The prevalence of any condition can change based on the group being studied. For example, prevalence of ACL sprains for everyone in the United States would be much different than the prevalence in a population that reports to an orthopedic clinic because of acute knee pain. In your group of friends, there is a chance (probability) that one or more has an ACL-deficient knee.

In an orthopedic clinic, the chance of finding someone who has ACL pathology is much greater simply because of the types of patients seen at the facility. Prevalence information is helpful to establish the pretest probability that a condition exists in a given population.

See Also: ACL Tear

Diagnostic Gold Standard

To determine accuracy, our test results are compared with a diagnostic gold standard (also known as the reference standard).

Gold standards have the highest diagnostic accuracy but are generally more expensive, less accessible, slower, invasive, and/or require additional personnel as compared with the clinical test.

Arthroscopy is the gold standard for diagnosing ACL tears. The clinical results of the McManus test will be compared with those obtained during arthroscopy. The clinical diagnostic results are compared with the gold standard via a table consisting of two columns and two rows (a 2 * 2 contingency table).

One of four outcomes can occur:

True positive: The clinical test and the gold standard are both positive, and the condition is correctly identified. The McManus test and arthroscopy both indicate ACL pathology.
False positive: The clinical test incorrectly identifies a condition as present when, in fact, there is no pathology. The McManus test indicates ACL pathology, but the ACL is shown to be intact during arthroscopy.
True negative: The clinical test and the gold standard are both negative. The absence of the pathology is correctly identified. The McManus test indicates no ACL pathology, and the ACL is shown to be intact during arthroscopy.
False negative: The clinical procedure identifies a condition as not present when, in fact, it is present. The McManus test indicates no ACL pathology, but the arthroscopy shows the ACL as torn.

Diagnostic Predictive Value

The accuracy of a test is determined by comparing the number of correctly classified patients (true positives + true negatives) to the total number of patients examined. A test that is 100% accurate correctly classifies every single patient; however, this level of accuracy is highly unlikely. Relying on accuracy to determine a test’s usefulness can be deceptive because it is impacted by the prevalence of a condition.

Research on the usefulness of diagnostic techniques may report the positive predictive value (PPV) and negative predictive value (NPV). By comparing the true positive rate to the overall positive rate (or true positive/true positive + false positive), PPVs depict how often a positive finding is correct. Conversely, NPVs identify how often a negative finding is correct (true negative/true negative + false negative).

Although useful, predictive values are less valuable than likelihood ratios because a low prevalence of a condition in a given population deflates the PPV and inflates the NPV. In other words, when the number of those who will test negative is large simply because of the low prevalence, the true positives will be much more difficult to find without including more false positives in the process. Failing to consider prevalence rates when comparing predictive values from two different studies can lead to false conclusions. Because of the wide spectrum of prevalence rates and the resulting difficulty in making comparisons, predictive values are not reported in this text.

Sensitivity and Specificity

Sensitivity and specificity describe how often the technique identifies the true positive and true negative results.

Sensitivity Definition

Sensitivity describes the test’s ability to detect those patients who actually have the disorder relative to the gold standard. Also known as the true positive rate, sensitivity describes the proportion of positive results a technique identifies relative to the actual number of positives. Sensitivity is calculated as true positives/ (true positives + false negatives). Compared with arthroscopy, the McManus test correctly identified 17 out of 23 individuals who have ACL pathology, yielding a sensitivity of 0.74.

Tests with high sensitivity accurately identify all or most patients with a given condition. The sensitivity value alone, however, can be misleading. While all true positives are likely to be identified, the number of false positives obtained along the way can also be high. To gain a better understanding of a test’s overall usefulness, specificity must also be considered.

Specificity Definition

Specificity, the true negative rate, describes the test’s ability to detect patients who do not have the disorder. The specificity of a diagnostic technique identifies the proportion of true negatives the technique detects compared with the actual number of negatives in a given population. Specificity is calculated as true negatives/(true negatives + false positives).

With a specificity of 0.82, the McManus test correctly identified those without ACL damage by yielding a negative result 82% of the time.

Sensitivity and specificity determine how well a test detects true positives and true negatives. Yet, taken individually, these measures may not be sufficiently useful. Unless both the sensitivity and specificity values are high, determining the procedure’s clinical usefulness is difficult (and often inconclusive). To avoid these pitfalls, sensitivity and specificity values are considered together and are expressed as likelihood ratios (LRs).

Likelihood Ratios

Likelihood ratios provide information on how positive and negative findings on a particular test determine a test’s diagnostic usefulness. Likelihood ratios incorporate a test’s sensitivity and specificity and are not influenced by the prevalence of a condition.

Likelihood ratios explain the shift in the pretest probability that a patient has a condition after a test result is obtained. Pretest probabilities are population specific and derived from prevalence data from regional or national databases, practice databases, published research findings, or clinical experience. Often, pretest probabilities must be estimated based on clinical experience because specific data are not available.

An Likelihood Ratios that is near or at 1 indicates that there is little to no shift in the pretest probability that a condition is present after the results -either positive or negative- of the test are considered. An Likelihood Ratios that is greater than 1 increases the probability that the condition exists, and an Likelihood Ratios of less than 1 decreases the probability that the condition exists.

Consideration of LR results can lead to one of three clinical decision options:

The posttest probability is so high that there is acceptable certainty that the pathology is present.
The shift in posttest probability is inconclusive. A stronger test or tests, if available, are needed to rule in or rule out the pathology.
The posttest probability is so low that there is acceptable certainty that the pathology is not present. Other diagnoses must be considered.

Following a positive or negative test result, how sure are we that the patient has the condition in question? A positive LR describes the shift in the pretest probability that the condition is present based on a positive test result. A negative LR describes the change in the pretest probability that a condition exists based on a negative test result.

Positive Likelihood Ratio

The positive likelihood ratio (LR+) expresses the change in our confidence that a condition is present when the test is positive. The higher the LR+, the more a positive test enhances the probability that the pathology is present.

Positive Likelihood Ratio = Sensitivity / (1 – specificity)

Negative Likelihood Ratio

The negative likelihood ratio (LR-) expresses the probability that the pathology is still present even though the test was negative. How convincing is a negative test in diminishing the likelihood that the patient has the pathology?

The closer the LR- is to 1, the less significant is the change in pretest probability. The lower the LR- is, the lower is the probability that the condition exists.

Negative Likelihood Ratio = (1 – sensitivity) / specificity

Negative likelihood ratios cannot be calculated for tests having a specificity of 0.

Tests with LR+ or LR- values that approach 1.0 have little clinical usefulness and can be omitted from the clinical diagnostic procedure.

Clinical Practice Guidelines

Intended to help healthcare providers and patients make informed decisions, clinical practice guidelines (CPGs) are recommendations that guide the care of patients with specific conditions. Starting with a clinical question, a CPG is based on a systematic review of published evidence and evaluation of those findings by experts.

The end product is a set of recommendations that involves both evidence and value judgments regarding the patient’s course of care for a given condition. The National Guideline Clearinghouse is an indexed repository for CPGs and provides a mechanism for locating guidelines that meet specific inclusion criteria.

References

Fetters, L, Tilson, J: Evidence-Based Physical Therapy. Philadelphia, PA: FA Davis Company, 2012.
Daly J, Willis K, Small R, Green J, Welch N, Kealy M, Hughes E. A hierarchy of evidence for assessing qualitative health research. J Clin Epidemiol. 2007 Jan;60(1):43-9. doi: 10.1016/j.jclinepi.2006.03.014. Epub 2006 Sep 28. PMID: 17161753.
Cleland, J: Orthopedic Clinical Examination: An Evidence-Based Approach for Physical Therapists. Carlstadt, NJ: Icon Learning Systems, 2005.
Straus, SE, et al: Evidence-Based Medicine: How to Practice and Teach It (ed 4). Philadelphia, PA: Elsevier Churchill Livingstone, 2011.
Scholten, RJ, et al: Accuracy of physical diagnostic tests for assessing ruptures of the anterior cruciate ligament: a meta-analysis. J Fam Pract, 52:689, 2003.
Moore, SL: Imaging the anterior cruciate ligament. Orthop Clin North Am. 33:663, 2002.
Loong, T-W: Understanding sensitivity and specificity with the right side of the brain. BMJ, 327:716, 2003.
Gatsonis, C, Paliwal, P: Meta-analysis of diagnostic and screening test accuracy evaluations: methodologic primer. AJR Am J Roentgenol, 187:271, 2006.