In articles published prior to November 2015, the intervention was classified into one of 4 levels of evidence. When there were effect studies (interventions at levels 3 and 4), the interventions were also classified in documentation grades to further differentiate the degree of evidence. The criteria were built on each other such that the requirements for classification into one level of evidence always include the requirements of the levels below. Following is a detailed description of the previous set of criteria.


Establishment of the criteria
The criteria for classification of interventions were formlated in 2008 by a scientific committee consisting of professor Willy-Tore Mørch, RBUP North (leader), researcher Simon-Peter Neumer, RBUP East and South and professor Per Holth, Akershus University College.


Evidence level 1: Potentially effective interventions
When an intervention is potentially effective, the intervention’s objectives and target group are described. Furthermore, there is a clear description of methods, techniques and materials. There are several research methods that may contribute to clarifying which components a intervention comprises; for example, interviews, text analysis, descriptions, analyses of observations, qualitative studies and case studies.

This type of research may help practitioners and leaders to gain an overview of the intervention and which elements it incorporates. From a research perspective this type of descriptive documentation is completely essential to be able to continue with research that aims at understanding the rationale for the intervention and whether it has an effect or not.

Practitioners sometimes use interventions they are comfortable with but that are not yet described or not described in the way mentioned above. If an intervention can be described, it constitutes a great step forward in documenting the intervention while also creating a solid basis for future research. Interventions at this level are described as being potentially effective.


Evidence level 2: Interventions that are likely effective
Evidence level 2 goes one step further from potentially effective interventions (level 1) in that there is a reasonable and plausible rationale for believing that the intervention should have an effect.

A theory that demonstrates probable effect in relation to objectives and target group must be described. Such a theory may be generally-known on causes for debut of a mental health problem; for example, the theory that learned helplessness may lead to depression. There may also be theoretical knowledge based on literature reviews or expert opinions that confirm or strengthen knowledge that figures as implicit knowledge among professionals.

One example of this could be the expert opinion from the Norwegian Institute of Public Health on the effects of psychosocial treatment of children and youth with behavior problems that led to the import and implementation of several psychosocial interventions for children and youth with behavior problems. That an intervention is effective may also be shown probable by the existence of simple n = 1 studies (that study one person at a time) with few participants. When there are only international studies of the intervention, or when determinations are based solely on international research, the intervention belongs to this level of evidence.

With this we would like to signal that we are not fully comfortable adopting the findings from international research, but rather wish to stimulate Norwegian research. A plausible theory gives practitioners help to determine how the intervention should have an effect on a specific child or family. Level 2 is a completely essential platform for progressing with the model for evidence-based knowledge.

A plausible theoretical rationale is instructive for which types of effect may be expected in different target groups and which mechanisms may lead to changes. If there is any evidence that the intervention has effect, a plausible theoretical rationale can convince decision-makers to set aside resources to try out the method. Interventions at this level are described as being likely effective.


Evidence level 3: Functionally effective interventions
Functionally effective interventions meet all the criteria described in levels 1 and 2 (detailed description of the intervention and a plausible theoretical rationale for the intervention).  Additionally, interventions at this level have been exposed to a systematic evaluation that shows that the desired changes occur in the target group. This means that the goals of the intervention have been reached, the problems are reduced and the target individuals are satisfied.

There are an infinite number of methods to perform such evaluations.  Some of these are user satisfaction surveys, before and after measurements and other goal-reaching outcomes, for example, decrease in re-hospitalization. These are evaluations that demonstrate that something has occurred after the intervention has been initiated and with a positive outcome. Such evaluations may be used to improve the quality of the intervention in an organization.

This level allows for the possibility that the design may be improved by adding some form of benchmark for goal achievement.

A benchmark study may constitute a comparison of average results from the selected intervention to the results from a randomized control group design (RCT) that show significantly better effect for a similar intervention for the same mental health problem.  If the average result for the intervention is higher than that of the control group in the RCT study, it is an important signal that the intervention seems to have an effect.

A norm study involves comparison of the result of the intervention with a norm; for example, that a certain percentage of the clients are satisfied (95 %), that 90 % of the clients achieved the treatment objectives or that 80 % of the clients score in the normal range measured with a standardized instrument.

In Theory of Change studies, the effect of a particular treatment element can be investigated through correlation studies. In dose-response studies the minimum number of treatment sessions (dose) necessary before achieving the desired effect is clarified. In a quasi-experimental design, the effect of an intervention is tested by comparing it with a placebo, comparison or waiting-list control group without necessarily randomizing the groups. At this level of evidence, we also find series of n = 1 studies or multiple baseline studies. Single subject design (n = 1) is characterized as having a subject that is thoroughly observed in relation to important result goals before the intervention is initiated (baseline) and the observations continue after the intervention has begun.

Through these designs, practitioners get systematic feedback on effects of the intervention both on the individual and group levels, making it possible for systematic monitoring and modifications of the intervention.  However, these designs cannot ensure the probability that effects are being primarily caused by the intervention. Documentation of causal inference requires that other possible causes of change are eliminated or controlled for. The design at evidence level 3 nonetheless gives the researcher a strong indication that the intervention influences the result such that there is preliminary evidence of effectiveness. This applies particularly if a design at evidence level 3 has been repeated under many different conditions and replicated by different research groups.

Interventions at evidence level 3 should have a clear implementation strategy that considers the providing organization’s structure and resources.

Functionally effective interventions are classified in documentation grades of *, ** and *** dependent on the research design. For example, simple pre-post studies are classified as *. If the intervention is also evaluated by reference studies or  ”theory of change” studies, the classification ** is given, and for quasi-experimental control group designs with follow-up measurements after 6 months or series with n = 1 and multiple baseline design, the classification *** is given.


Evidence level 4: Interventions documented as effective
Interventions at evidence level 4 satisfy the requirements of evidence levels 1 (description), 2 (theoretical rationale) and 3 (demonstration that the intervention leads to desired results).

In order for an intervention to be evaluated as being documented as effective, there must be a research design that allows for probability of the result being caused by the intervention. There are three designs that meet this requirement.

Randomized control group design (RCT) is characterized by the fact that the selection method for comparison groups ensures that the groups are comparable on important variables. This ensures a high internal validity. Interrupted time series analyses involve taking a series of data points that are interrupted through one or more interventions (so-called A-B, ABA, ABC design where A is baseline and B,C, etc., represent different interventions). If systematic changes occur in the result measurements as a result of the interventions of a large series of n = 1 studies (at least 9 participants), this is strong evidence that it is the intervention that causes the results and can, as an effectiveness measurement, be compared to an RCT design.

Interrupted time series design is also a useful supplement when time series studies are done in relation to groups of individuals; for example, school classes. Longitudinal cohort studies involve a cohort (e.g., an age group of children) constituting the control conditions for another cohort of the same age group at a later time point.

Studies under specially-prepared conditions (efficacy studies) are characterized by the fact that they are performed in a university or hospital context with specially-trained therapists and selection of clients with “pure diagnoses”, often recruited through announcements or actively selective recruitment by researchers.

It is usually program developers who carry out efficacy studies as the first test of whether the intervention is the cause of the results. In such cases the external validity can be low (generalizability).

The effectiveness level at evidence level 4 increases considerably when the study is performed under natural conditions and by independent researchers, which thereby also increases the external validity. This means that the research takes place in an ordinary clinic or other place of practice (normal access to resources) with a client selection that is representative of the referred clients (with normal co-morbidity) and that the intervention is delivered by the clinic’s ordinary practitioners.

The practitioners benefit from evidence level 4 evaluations in order to gain a better picture of which interventions are effective for which client groups. This type of knowledge may help to adapt the intervention to client characteristics. For the researcher, a level 4 evaluation is necessary to validate the intervention’s theoretical basis and thereby contribute to gaining more general knowledge about the mechanisms behind the therapeutic effects.

Interventions documented as effective receive the classifications **** or *****. RCT (efficacy) studies and interrupted time series design and consecutive evaluation of the implementation process received that classification ****. If the study is additionally replicated by at least one independent researcher under natural conditions (effectiveness study) the intervention is classified as *****.

Preferred supplemental qualities upon evaluation
Interventions that are placed at evidence level 4 and documentation grade ***** meet all the criteria for levels 1 to 4. This means that they are well described (1), have a plausible theory (2), have demonstrated that the intervention leads to goal achievement (3) and have documented that the intervention is responsible for the results (4), including replication by at least one independent researcher under natural conditions.

In addition to an evaluation of this high standard, it is preferable that an intervention be exposed to further research that gives us more knowledge about its validity. It is also preferred that knowledge about consequences of intervention implementation in a given environment be provided.

Thus, it is desirable but not necessary that the following questions be answered through research: What are the long-term effects of the intervention (more than one year)? What is the benefit of the intervention in relation to the costs (cost-benefit analyses)? Is there a lower incidence of the mental health problem that should be prevented after long-term use of the intervention (e.g., after 5-10 years)? Is the intervention maintained over time with high integrity and quality (program fidelity)? Are there particular elements of the intervention that are critically important in gaining optimal effect (element analysis)? Has the implementation of the intervention had an effect on the organization’s working methods, structure or use of resources (qualitative analyses)? Is evaluation of the intervention included in meta-analyses whereby the effects of the intervention may be combined and compared with results from corresponding studies?

Additional evaluations of this type strengthen the intervention’s validity and effectiveness and are explicitly described in the database as a part of the evaluation basis for quality classification.

Table 1. Levels of evidence

Levels of evidence Type of evidence Research methods
4. Documented effective interverventions As in 1, 2 and 3, but there is also certain evidence that the intervention is the cause of the changes Randomized control group design (RCT efficacy and effectiveness). Interrupted time series design.  Longitudinal cohort studies. Desirable supplemental studies
3. Functionally effective interventions As in 1 and 2, but have demonstrated that the intervention leads to the desired results Quasi-experiments with control groups, theory of change, reference studies, norm studies. pre- post studies, series with n =1 studies (multiple baseline).
2. Probably effective interventions As in 1, but the intervention has a theoretical rationale Reviewer, literature reviews, expert opinion, simple n =1 with few subjects. Based only on international research
1. Potentially effective interventions Explicit description of the intervention (goals, target groups, methods, materials) Descriptive studies, observations, document analysis, interviews, qualitative studies, case descriptions



Table 2 Classification of documentation grade

Documentation grade Research design Preferred supplemental qualities for documentation grade 5 *****
***** RCT under natural conditions (effectiveness). The study is replicated by at least one independent researcher and 1-year follow up. Long-term follow up >3 years, cost-benefit analyses, incidence calculation,  program fidelity research, element analyses, qualitative analyses, organization analyses, meta-analyses.
**** RCT efficacy study or interrupted time series analyses with 1-year follow up. Longitudinal cohort studies.
*** Quasi-experiment with control group and 6-month follow up.
** Benchmark studies, theory of change, norm studies.
* Simple pre- post studies, series with n = 1 (multiple baseline)



Table 3 Evidence levels tied to classification of documentation grade

Level of evidence Research design Documentation grade
4. Documented effective intervention RCT under natural conditions (effectiveness), replicated by at least one independent researcher and 1-year follow up. *****
RCT laboratory study (efficacy) or interrupted time series design with 1-year follow up. Longitudinal cohort studies. ****
3. Functionally effective interventions Quasi-experiment with control groups ***
Theory of change studies, benchmark studies, norm studies. **
Simple pre- post studies, series with n = 1 (multiple baseline). *
2. Likely effective interventions Reviews, literature reviews, expert opinions, simple n = 1 with few subjects. Based on international research
1. Potentially effective interventions Descriptive studies, observations, document analyses, interviews, qualitative studies, case studies


Knowledge-based and research-based practice
Interventions at evidence levels 1 and 2 probably represent what is referred to as knowledge-based practice while interventions as evidence level 3 may likely be called research-based practices.

Through Ungsinn, however, we wish to show that evidence may be provided in many different ways and with many different research methods. We choose, therefore, to call interventions at Evidence levels 1 and 2 interventions that have some evidence and not reserve evidence-based interventions only for Evidence level 4 with a five-star documentation grade.

According to our view, such a limitation derails the evidence debate and creates unnecessary conflict in the academic field.  Through its structure, however, Ungsinn wishes to visualize both the complexity of research at a high level and the manageability of performing valuable assessments with simpler methods and design that elevate an intervention within the evidence levels.


Interventions with negative effect
Ungsinn also has the category,  ”Negative effect”, because, every once in a while, evaluations are published claiming that certain interventions, at a high evidentiary level, have a negative effect.

The requirement for the practice field that journals allow the publishing of 0-effect studies and studies with negative effect dictates, to a greater degree, that there may be an increasing publication rate of such studies.  It is a serious ethical problem that interventions with proven negative effects may be used in the practice field. Interventions in this category should have a research basis that corresponds with the quality classification *** at evidence level 3 (quasi-experimental control group design with 6-month follow up).


The text above is taken and translated from Martinussen, M., Reedtz, C., Eng, H., Neumer, S. P., Patras, J., & Mørch, W.T. (2016). Ungsinn – kriterier og prosedyrer for vurdering og klassifisering av tiltak. [Ungsinn – Criteria and procedures for evaluation and classification of interventions]. Tromsø: UiT The Arctic University of Norway