Judgmental Forecasting.
1
PAGE
5
Judgmental Forecasting
Course and Assignment
Student Name
Overview & Introduction
Use this section to introduce your paper and the resources you will use to describe Judgmental Forecasting. You may erase this notation, but only up to where the sentence starts or you will eliminate the formatting.
What is Judgmental Forecasting
The heading above would be used to divide your paper into sections based on content. This is the first level of heading, and it is centered and bolded with each word of four letters or more capitalized. The heading should be a short descriptor of the section. Most papers will have headings or subheadings to organize the paper into sections.
Where is Judgmental Forecasting used Most Effectively?
The subheading above would be used if there are several sections within the topic labeled in a heading. The subheading is flush left and bolded, with each word of four letters or more capitalized.
Examples of how Judgmental Forecasting is Used Most Effectively
Xxx
Advantages and Disadvantages of Judgmental Forecasting
Xxx
Advantages
Xxx
Disadvantages
Xxx
The Delphi Method of Judgmental Forecasting
In this section, thoroughly explain the use of, advantages of, and an example of the Delphi method of forecasting. You may erase these instructions prior to inserting your narrative.
References (Example of a web-based Reference)
U.S. Department of Health and Human Services, National Institutes of Health, National Heart, Lung, and Blood Institute. (2003).
Managing asthma: A guide for schools (NIH Publication No. 02-2650). Retrieved from http://www.nhlbi.nih.gov/
health/prof/asthma/asth_sch
This is a correctly cited Web address
5
A Comparison of Techniques
for Judgmental Forecasting by
Groups with Common Information
JANET A. SNIEZEK
University of Illinois at Urbana-Champaign
Forty-four groups made judgmental forecasts for five problems. All group members received
the same task relevant information: historical data for each variable in the form of a graph, and
numerical listing of 36 previously monthly values. Each person first produced an individual
forecast, and then was assigned to one of four Group Technique conditions: Statistical, Delphi,
Consensus, and Best Member. Results show: (a) low accuracy of group forecasts compared to
Actual Best Member forecasts in difficult tasks, (b) under-confidence in unbiased easy tasks and
overconfidence in biased difficult tasks, (c) some unequal weighting of individual forecasts to
form Consensus group forecasts, and (d) an inability of groups to identify their best members.
Judgmental forecasting is the most popular forecasting approach in organi-
zations (Fildes and Fitzgerald, 1983; Dalrymple, 1987), and is very fre-
quently performed by groups of persons (Armstrong, 1985). Although re-
search on group judgment has a long history (see Einhom, Hogarth, &
Klempner, 1977; Sniezek & Henry, 1989) research on group forecasting is
limited (for recent reviews, see Armstrong, 1985; Ferrell, 1985; Lock, 1987).
The findings from judgmental forecasting studies with individuals may
not generalize to groups. It can be seen from the group judgment literature
that group judgment processes often differ substantially from individual
processes (see McGrath, 1984; Sniezek & Henry, 1989, in press). For
example, individual forecasting and planning is susceptible to a number of
information processing limitations which are potential causes of forecast
error (Hogarth & Makridakis, 1981). Unlike a single person, groups interact,
and can experience disagreement, communication error, or social pressures.
The effect of any group process might be either to exaggerate or to diminish
individual biases in information processing.
Group & Organization Studies, Vol 15 No. 1, March 1990 5-1
9
© 1990 Sage Publications, Inc.
6
The nature and extent of differences between group and individual judg-
ments often depends on the technique used to obtain the group forecast (see
McGrath, 1984), though not all comparisons of group techniques show differ-
ences (Fischer,1981). A group technique is defined here as a specified set of
procedures for the formation of a single judgment from a set of individuals.
For example, communication among the individuals may be unlimited,
restricted, or prohibited with various techniques. One goal of applied group
research is to determine how to choose a group technique to constrain the
group process in order to improve performance. To do this, we must first learn
how group techniques affect judgment under various conditions.
One important condition that has received little attention concerns the
extent of unique versus common information held by each forecaster. For
example, Brown, Foster, and Noreem (1985) have raised questions about the
extent to which high correlations among security analyst forecasts result from
the use of a common information set, or from communication among ana-
lysts. Laboratory studies of ad hoc groups have emphasized the role of
pooling information uniquely held by various group members in enhancing
group performance (see Kaplan & Miller, 1983; Hill, 1982). However,
multiple judges working together within the same organization may have
more important differences about how to interpret available information than
differences in information. The increased availability of databases is likely
to create conditions in which all group members have access to the same
information. This is particularly likely to be true about historical data for time
series estimation problems. Yet, as noted by Hastie (1986), little is known
about group process when all members have the same information.
The major purpose of this article is to report an empirical study of group
judgmental forecasting when all group members have access to the same
data. Four techniques for obtaining group forecasts are compared across five
forecasting problems that are expected to vary in difficulty. The four group
techniques – Statistical, Delphi, Consensus, and Best Member – have differ-
ent constraints on the process of forming a group forecast. These group
techniques are compared to each other, and to relevant baseline models based
on individual forecasts in terms of their effects on two dimensions of group
performance: (a) forecast accuracy and (b) confidence in the forecasts.
MODELS OF GROUP JUDGMENT
Understanding the process by which a set of individual judgments are
transformed into a single group judgment is of great theoretical and practical
7
importance. Various models have been used to describe how groups actually
form judgments (Sniezek & Henry,1989), and to evaluate the quality of these
group’s judgments (Einhorn et al., 1977; Sniezek & Henry, 1989). These
models fall into one of two classes: equal and unequal weighting of individ-
ual judgments. Unequal weighting, discussed in more detail below, involves
differential weighting of the judgments so that not all group members
have the same impact on the group’s judgment. In contrast, each of k group
members has the same weight (1/k) on group output with equal weight-
ing ; the group judgment is simply the average or mean of the individual
judgments.
In practice, the mean of the judgments of k individuals who never interact
is often more accurate than most individual judgments, but primarily because
it reduces random error. This suggests that the use of multiple judges is
advantageous, but that it is not necessary to apply group techniques involving
group interaction to improve judgment accuracy over the level achieved by
most individuals. Indeed, practitioners might be advised to simply average
multiple individual judgments (von Winterfeldt & Edwards, 1986). Averag-
ing also has the advantage of being inexpensive and efficient, relative to
common group techniques. For these reasons, the mean model provides one
important baseline against which to evaluate the actual performance of
groups. But, although averaging reduces random error, it will always be of
limited usefulness whenever the individual judgments are systematically
biased (Einhorn et al., 1977). Individual judgments are biased if the mean
individual judgment is above or below the actual value of the criterion
variable being forecast. There are many reasons to suspect bias in individual
forecasts (Hogarth & Makridakis, 1981). In order to increase accuracy when
individual forecasts are biased, a group technique leading to unequal weight-
ing is required (Sniezek & Henry, 1989). The set of tasks selected for this
study was chosen to include a variety of time series with varying amounts of
bias in individual judgments.
But unequal weighting of multiple forecasts is not necessarily advanta-
geous (Ashton & Ashton, 1985). In practice, group members’ individual
contributions might be weighted in proportion to, or without regard for,
individual judgment accuracy. Thus, in interacting groups, unequal weight-
ing can either help or hurt performance compared to equal weighting. Re-
search has found the judgments of interacting groups to be more accurate
than average individual judgments (Sniezek & Henry, 1989, in press;
Sniezek, 1989). From these studies, it can be inferred that group interaction
cannot be adequately described by an equal weighting model.
8
A group process using unequal weights can take many forms. First,
consider the case in which one member’s individual judgment is used as the
group judgment. At one extreme, groups can maximize performance by
assigning a weight of 1 to the Actual Best judgment (i.e., the judgment that
is closest to the actual value) and weights of 0 to the remaining judgments.
The problem, of course, is that in forecasting problems the Actual Best
judgment is not likely to be identified until after the criterion is known. For
this reason, the group’s total reliance on the judgment of one &dquo;chosen best&dquo;
member may not lead to the level of accuracy of the Actual Best baseline
model. However, it must be noted that the Actual Best baseline for evaluating
group judgment accuracy is particularly stringent in that the Actual Best
capitalizes in part on chance. Thus, in practice we would not generally expect
consistent forecasting performance at the level of the Actual Best. An
interesting exception is discussed by Sniezek & Henry (1989), who discov-
ered frequent occurrences of group judgments that were outside the range of
members’ individual judgments and more accurate than the Actual Best. A
second meaningful baseline model with all-or-none weighting concerns the
random selection of one member’s judgment to use as the group judgment.
The present research evaluates group judgment accuracy by comparing it to
the levels that would have been achieved had the groups used each of three
baseline models: Mean, Actual Best and Random Member.
GROUP TECHNIQUES
Four group techniques of considerable interest in group judgment research
and practice are compared in this study: Statistical, Consensus, Delphi, and
Best Member. These group techniques differ greatly in their constraints on
communication in the group judgment process, and, therefore, can be ex-
pected to have differential effects on group judgmental forecasting accuracy
and confidence, if communication aids in the interpretation, and not just
sharing, of data. Face-to-face interaction is permitted with all but the Delphi
and Statistical techniques. Some form of communication is permitted with
all but the Statistical technique.
The Consensus group technique requires only that the group use face-to-
face discussion to produce a single final judgment to which all members
agree. The process is otherwise discretionary. For more accurate group
prediction than can be obtained with averaging, the more accurate individuals
must have greater impact on the group output. If data interpretation – and not
just sharing – is important, the Consensus technique will lead to greater
9
forecast accuracy and confidence than the Statistical technique, as in past
judgment studies (Sniezek & Henry, 1989; Sniezek, 1989).
A group using the Best Member technique engages in face-to-face discus-
sion for the purpose of selecting one of the group members as &dquo;best,&dquo; so that
this person’s judgment will be the final group judgment. Little empirical
research has been done on the ability of group members to assess their own
or each others’ performance quality. Einhorn et al. (1977) show that the
effectiveness of the Best Member technique increases with bias in the
individual judgments and the likelihood of selecting the Actual Best member.
Sniezek (1989) found that ongoing groups with both shared and unique
information selected best members with less bias than the mean model. The
emphasis on selection of a group member instead of formation of a forecast
is expected to hurt forecast accuracy relative to the Consensus technique,
regardless of the relative importance of information pooling or interpretation.
The Delphi procedure (Dalkey & Helmer, 1963) presumably reduces what
Steiner (1972) termed &dquo;process loss,&dquo; for example, the inappropriate influ-
ence of variables such as status or confidence over ability. Members do not
meet face-to-face, and opportunities for both data sharing and interpretation
are indirect at best. Communication among them is limited to feedback about
the others’ judgments (e.g., the median judgment). The process of making
judgments and receiving feedback is repeated until consensus is achieved (or
until the median judgment stabilizes). This final median is then the group
judgment. There is some evidence that the Delphi procedure leads to more
accurate predictive judgment than statistical averaging of group member’s
judgments (e.g., Jolson & Rosnow, 1971; Sniezek, 1989). However, the
absence of any opportunity to resolve differences of interpretation is likely
to minimize the benefits to forecast accuracy. Thus, it is predicted that the
Delphi technique will produce forecasts more accurate than the Statistical
technique, but less accurate than the Consensus.
The fourth group technique relevant to this study, the Statistical group
technique, prohibits interaction and communication among group members.
The mean of the individual judgments is called the &dquo;group&dquo; judgment.
Because of the lack of interaction, this technique is expected to produce the
least accurate group forecasts.
One way of evaluating the quality of Delphi, Best Member, and Consensus
group techniques is to examine the relationship of individual judgment
accuracy to influence on the group judgment. Best Member and Consensus
judgments should be superior to individual and Statistical judgments when-
ever two conditions hold: (a) member judgment accuracy variance is high,
and (b) influence on group judgment is determined by accuracy level. A high
10
variance in group members’ judgments has been found to be positively
related to the extent of improvement in group over individual judgment
accuracy (see Sniezek & Henry, 1989). In these studies, the variance could
be partly attributable to unique information held by individual group mem-
bers. To the extent that group member forecast variance is reduced by
common information in the present study, differences among the group
techniques should be diminished.
If influence and accuracy are unrelated, then neither the Best Member nor
the Consensus technique will lead to more accurate judgment than the
Statistical technique. The relationship between individual input and influ-
ence on the group may well be mediated by individual confidence in that
input (Hastie, 1986; Sniezek & Henry, in press). An individual’s confidence
in his or her own judgments compared to his or her confidence in the group
average presumably explains his or her own &dquo;influence&dquo; in the Delphi
technique (Sniezek, 1989). Thus it is of interest to investigate the appropri-
ateness of confidence as well as influence.
JUDGMENTAL FORECASTS
Intuitive time series prediction tasks were used in this study. Such tasks
have been used by previous researchers (e.g., Carbone & Gorr, 1985;
Eggleton, 1982) and have several features worth noting. First, all group
members have the same information, i.e., the time series data. Second, with
time series data it is possible to make distinctions between judgmental
forecasting policies. If group forecasts are found to vary with group tech-
nique, the differences can be described in terms relative dependence on recent
values, global trends, seasonality, etc.
In evaluating judgmental forecasts, two dimensions of performance are
of interest: forecast accuracy and confidence. Whereas accuracy reflects
the actual quality of the forecast, confidence reflects perceived quality.
The success of judgmental forecasting in an organization depends on both
accurate forecasts and appropriate confidence. While the importance of
accurate forecasts is obvious, the importance of appropriate confidence
deserves some discussion. As Sniezek and Henry (1989) point out, the way
in which a forecast is used will depend on how much confidence is placed
in it. If confidence is unrealistic (i.e., either too high or too low relative to
the accuracy of the forecast), then organizational decision-making will be
sub-optional.
11
A large body of empirical research on individuals’ confidence assessments
has led to the general conclusion that people are overconfident, though
exceptions have been observed for easy tasks or some types of experts (see
Lichtenstein, Fischhoff, & Phillips, 1982). Since difficult tasks are more
likely to involve judgmental forecasts from multiple persons, the implica-
tions of overconfidence for group forecasting in organizations are potentially
serious. But, research on confidence in group judgment has been very limited.
Sniezek (1989) and Sniezek and Henry (1989) found that confidence in-
creased following group discussion. In the Sniezek and Henry (1989) study
it was possible to evaluate realism showing that, although groups were more
confident about their judgments than were their individual members, they
were actually less overconfident due to the improvement in accuracy through
grouping.
Significant differences among group techniques in terms of the realism of
confidence in judgments would have a bearing on their value in organiza-
tional decision making. Although sample sizes were too small to draw
conclusions, data from the Sniezek (1989) study suggest such differences.
The present study will allow the evaluation of the realism of confidence
assessments across group techniques and across tasks of varying difficulty.
In summary, the major questions addressed in this study are: In judgmental
forecasting tasks in which group members have shared information, do the
group techniques lead to different levels of forecast accuracy and confi-
dence ? How do the group forecasts produced with each technique compare
to those of baseline models based on individual forecasts? If the differences
among group techniques are attributable to the varying opportunities to share
information, no difference among group techniques are expected in this
study. If, however, the group techniques permit different chances to interpret
data, the rank ordering is expected to be: Consensus, Best Member, Delphi,
and Statistical.
RESEARCH METHODOLOGY
First, all participants (n = 200 undergraduate students) independently
made individual forecasts of the future (three months hence) values of five
variables. They were then randomly assigned to groups of five. The forty-four
groups were then randomly assigned to one of four group technique condi-
tions : Statistical, Delphi, Consensus, and Best Member. The group yielded
one point forecast for each variable. Each individual and group forecast was
1
2
accompanied by an associated 50% confidence interval. This subjective
confidence interval is considered to be a measure of doubt or uncertainty
about the forecast. Bonuses (free books and magazines) were offered for
accurate predictions.
The forecasting task was constructed so that the time series data were
realistic, yet group members’ information concerning the problems could be
controlled. To accomplish this, actual time series data were used from five
financial variables: consumer installment credit outstanding, total retail sales,
3-month certificate of deposit interest rate, exports, and federal reserve bank
reserves. The actual outcome for these variables was obtained three months
after the collection of forecasts.
Task materials contained a graph of time series data from the three
previous years for each of five variables. The attached form instructed the
individual participant to make a forecast for each variable. Group forecast
forms were similar, but contained procedural directions specific to each
experimental condition. On the final page, following an explanation of the
concept of a 50% confidence interval, were blanks for the lower and upper
limits of a 50% confidence interval. To avoid order effects, a forward and a
reverse order of the five variables were each used on half of the forms.
Delphi members were seated apart from one another so that verbal and
non-verbal communication with other group members was not possible. The
experimenter obtained the initial individual predictions, then supplied mem-
bers with the median individual prediction for the first variable. Again,
members made individual predictions. The cycle of feedback of median
individual prediction and making new predictions continued until either the
same median results for three trials in a row, or three of the five members
gave identical predictions. One of the conditions was always met in fewer
than six trials. The final median served as the group forecast. After all five
variables had been forecast, members produced limits for the 50% confidence
intervals.
Consensus groups were instructed to discuss each problem until all
members agreed to a single point forecast. Then, as a group, they defined the
confidence intervals for the five variables. Best member groups were directed
to discuss each problem, then choose the one group member who was most
likely to have the most accurate forecasts. That member’s initial forecasts
became the group forecast. As a group, they set limits to the confidence
intervals. Members of statistical groups produced 50% confidence intervals
around their own individual forecasts.
13
RESULTS AND DISCUSSION
To differentiate among the five tasks with respect to difficulty, the mean
percent individual forecast error was calculated across all 220 individuals for
each variable. The five subtasks (and their mean percent errors) are ordered
in terms of increasing difficulty: exports: 5.3%, total retail sales: 8%, con-
sumer installment credit outstanding: 12.7%, federal reserve bank reserves:
25.75%, and 3-month certificate of deposit interest rate: 37.16%.
To assess group forecast accuracy, two performance measures were used:
where YG is the group forecast and Y is the actual criterion outcome. Over
n forecasts, squared error is a measure of variation about the point of zero
error, while bias is a measure of mean error. Squared error is used under
assumption that the negative consequences increase exponentially as forecast
error increases. A measure of subjective uncertainty was obtained by com-
puting the difference between the upper and lower limits of the subjective
50% confidence intervals.
To assess overall effects of group technique, a MANOVA was applied to
the dependent variables squared error, bias, and subjective uncertainty for
each forecasting task. Significant effects were obtained only for the three
most difficult tasks: federal reserve bank reserves (Hotelling’s F = 3.73,
df =15/173,p < .001), 3-month certificate of deposit interest rate (Hotelling’s
F = 3.00, df = 15/173, p < .001), and consumer installment credit outstand-
ing (Hotelling’s F = 4.70, df = 15/173, p < .001) with effect sizes oi = .19,
.20, and .28, respectively. Condition means for each variable showing signif-
icance in subsequent univariate tests are given in Table 1.
First, it must be noted that the exports and total retail sales tasks for which
group technique did not affect performance were the relatively easy forecast-
ing problems. Regardless of the procedure for obtaining group forecasts, they
were not inferior to the Actual Best values. The consumer installment credit
outstanding, federal reserve bank reserves, and 3-month certificate of deposit
interest rate subtasks were all more difficult, but could be differentiated
according to the bias parameter. Individual and group forecasts for both
consumer installment credit outstanding and 3-month certificate of deposit
interest rate showed significant (p < .001) non-zero bias, while individual
14
N
c
tO
d
2
c0
;a
c
o
(J
l§%J Cm 0
< E
~ .¡:
0
o.
x
W
-’5
N
1
2
.0
3
(n
0
0
c:
N
L
*0
«as
us
c
0)
M
fez
U
N
:5
.9
C)
Lê58(,)
as
’B
I
o
…..
W
:5
To
%
E
as(,)
w_
c
tB
U
N
’0
-@c:
o
U
Um
U
Co
E
E8
t0
C)
c
m
M
3Q
Q)
Eas
Q) .S~
c ~
OQ)
(f)C)cC c:
,Q ~
~(,)-g~
8.!. (f)
.o
T a
15
and group forecasts for federal reserve bank reserves were unbiased. On the
basis of differences in difficulty and bias among the five tasks, group process
effects on performance can be expected to vary across the five subtasks.
The pattern of pairwise significant differences is similar for consumer
installment credit outstanding and 3-month certificate of deposit interest rate,
but is unique for federal reserve bank reserves. For both consumer installment
credit outstanding and 3-month certificate of deposit interest rate, all exper-
imental conditions resulted in forecasts less accurate than Actual Best fore-
casts, and no better than the statistical forecasts on both the squared error and
bias indices. In short, all these group techniques failed to produce forecasts
as good as those of the best group member under conditions of bias. Unlike
the Sniezek and Henry (1989, in press) studies, not one group judgment fell
outside the range of the individual members’ forecast, thereby exceeding the
accuracy level of the Actual Best. Unlike those studies, the judges in the
present study were focused on common data for each task.
The federal reserve bank reserves results reveal four subsets of homoge-
neous squared error means. Here, Statistical does not differ from Actual Best,
and both Statistical and Actual Best are superior to all other conditions. This
is not surprising given that averaging individual judgments reduces ran-
dom error and that the federal reserve bank reserves individual judgments
were generally unbiased. In contrast, Best Member’s forecasts were signifi-
cantly less accurate than all other forecasts, even Random Member’s. Clearly,
the groups did not choose a member with above average ability to predict
federal reserve bank reserves. They could not be expected to, given that
individual differences in judgment accuracy reflected only random error. But
they also did not adhere closely to an averaging rule by picking the member
closest to the group average. Delphi, Consensus, and Random Member all
yielded similarly inaccurate forecasts. The fact that groups are significantly
poorer than the Actual Best and Statistical forecasts implies that (a) Consen-
sus was not achieved by averaging individual federal reserve bank reserves
forecasts, and (b) judgments in the Delphi groups were not weighed equally
(i.e., the final median judgment was not the mean of individual judgments).
Thus, with unbiased judgments, all of the group techniques in this study led
to some non-averaging processes, and therefore, to greater error than statis-
tical averaging.
Also important, in addition to forecast accuracy, is the apparent quality of
forecasts at the time that they are made. Subjective Confidence intervals are
intended to measure confidence placed in the point estimate: as size in-
creases, confidences decreases. Significant Confidence differences between
conditions occur only for the federal reserve bank reserves task. The Con-
16
TABLE 2
% of Confidence Intervals Containing Criterion Outcomes
NOTE. Since &dquo;50%&dquo; confidence intervals were requested, the most appropriate table
entry is 50%
sensus groups clearly construct the widest intervals. The Confidence values
in Table 1 do not correspond well to the squared error values, supporting
Sniezek and Henry’s (1989) finding that judgment accuracy,and confidence
are not highly related. Indeed, Pearson correlations among Confidence,
Squared Error, and Bias (for both individual and groups) were weak at best,
and not necessarily in the right direction. Regardless of task difficultly, the
size of confidence intervals was not strongly related to forecast accuracy.
A related research question concerns the absolute quality of the confi-
dence intervals. Table 2 lists the percent of group and individual confidence
intervals containing the actual criterion outcome. Comparing the columns
reveals consistent differences which can be attributed to task difficulty and
bias.
When judgments are unbiased (as in total retail sales and federal reserve
bank reserves), confidence intervals generally range from appropriate (near
50%), in the moderately difficult federal reserve bank reserves task, to wide,
in the easy total retail sales task. In the biased tasks, exports, consumer
installment credit outstanding, and 3-month certificate of deposit interest
rate, the intervals range from appropriate, in the easy task-exports-to
17
narrow, in the more difficult tasks – consumer installment credit outstanding,
and 3-month certificate of deposit interest rate. In summary, both groups and
individuals appear to be underconfident about their forecasts in the easy and
unbiased task, total retail sales, and overconfident in the biased and difficult
tasks, consumer installment credit outstanding and deposit interest rate. This
supports previous research on the relationship of task difficulty to confidence
(cf. Lichtenstein et al., 1982).
CONCLUSIONS
The results of this study indicate that, when task information is shared,
group techniques have little differential impact on the quality of group
judgmental forecasts. In easy tasks without much bias, any process results in
an accuracy level as high as that of the Actual Best group member. In fact,
there is little reason to use groups in these tasks, since individuals tend to
perform as well as groups. In contrast, none of the group techniques studied
was clearly preferable in the more difficult tasks, since they all yield judg-
ments inferior to the Actual Best member’s. These data suggest that the
differences among group techniques occur due to the pooling, and not the
interpretation, of data. If relevant information is shared, there is simply less
to be gained with the use of a group. The choice of group technique appears
to be less important to forecasting performance when all members have
access to the same information. Thus, there is no evidence to suggest the use
of one technique over another.
The more basic question of whether to use any group technique in practice
depends on whether one can identify task difficulty or bias before forecasting
takes place. The finding that confidence intervals did not show sensitivity to
task difficulty suggests that subjective judgments about task difficulty would
not be useful. The alternative is to rely on statistical analyses of past data to
determine task predictability or difficulty. In tasks that are difficult, the use
of multiple judges is advised. Further, the formation of independent individ-
ual forecasts prior to group interaction is a good practice to follow. If all
information is held in common by group members, this practice will allow
for &dquo;error checking,&dquo; or will reveal inter-judge agreement. If there is some
relevant information uniquely held, heterogeneity in group members’ judg-
ments is a likely result. Heterogeneity can improve group judgment perfor-
mance (Sniezek and Henry, 1989) and increase group members’ commitment
to the consensus judgment (Sniezek and Henry, in press).
18
The study additionally reveals that individual and group confidence in
forecast (as measured by 50% confidence intervals) is unrelated to forecast
accuracy, and does not vary appropriately with task difficulty. In addition,
groups were not able to determine the relative quality of members’ forecasts.
It must be cautioned that these results, from judgmental forecasting with
shared information, do not necessarily have any implication for tasks with
both unique and shared information. Further, the results of this study may not
generalize to groups with large status differences among their members, or
groups that have ongoing interactions. While the group techniques did not
differentially affect the manner in which groups use data, they may well affect
behaviors important to other group forecasting situations. Future research
should compare the aggregation and integration of information with various
group techniques.
REFERENCES
Armstrong, J. S. (1985). Long-range forecasting: From crystal ball to computer. (2nd Edition).
New York: John Wiley and Sons.
Ashton, A. H. & Ashton, R. H. (1985). Aggregating subjective forecasts: Some empirical results.
Management Science, 31, 12, 1499-1508.
Brown, P., Foster, G., & Noreem, E. (1985). Security analyst multi-year earnings forecasts and
the capital market. Studies in accounting, Research, 21, entire volume.
Carbone, R. & Gorr, W. L. (1985). Accuracy of judgmental forecasting of time series. Decision
Sciences, 16, 153-160.
Dalrymple, D. J. (1987). Sales forecasting practices. International Journal Forecasting, 3,1-13.
Dalkey, N. C. & Helmer, O. (1963) An experimental application of the delphi method to the use
of experts. R17-127-PR, Santa Monica, CA: RAND Corp.
Eggleton, I.R.C. (1982). Intuitive time-series extrapolation. Journal of Accounting Research,
20(1), 68-102.
Einhom, H. J., Hogarth, H. M., & Klempner, E. (1977). Quality of group judgment. Psycholog-
ical Bulletin, 84,158-172.
Ferrell, W. R. (1985). Combining individual judgments. In G. Wright (Ed.),Behavorial decision
making. New York: Plenum.
Fildes, R. & Fitzgerald, M. D. (1983). The use of information in balance of payments forecasting.
Economica, 50, 249-258.
Fischer, G. W. (1981). When oracles fail—a comparison of four procedures for aggregating
subjective probability forecasts. Organizational Behavior and Human Performance, 28,
96-110.
Hastie, R. (1986). Experimental evidence on group accuracy. In B. Grofman & G. Owens (Eds.),
Decision research, Vol. 2. Greenwich, CT: JAI.
Hill, G. W. (1982). Group versus individual performance: Are n + 1 heads better than one?
Psychological Bulletin, 91, 517-539.
Hogarth, R. M. & Makridakis, S. (1981). Forecasting and planning: An evaluation. Management
Science, 267, 115-138.
19
Jolson, M. & Rosnow, G. (1971). The delphi process in marketing decision making. Journal of
Marketing Research, 8, 443-448.
Kaplan, M. F. & Miller, C. E. (1983). Group discussion and judgment. In P. B. Paulus (Ed.),
Basic group processes. New York: Springer-Verlag.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The state
of the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment and under
uncertainty: Heuristics and biases. Cambridge:
Cambridge University Press.
Lock, A. (1987). Integrating group judgments in subjective forecasting. In G. Wright & P. Ayton
(Eds.), Judgmental forecasting. New York: John Wiley and Sons.
McGrath, J. E. (1984). Groups: Interaction and performance. Englewood Cliffs, NJ: Prentice
Hall.
Sniezek, J. A. (1989). An examination of group process in judgmental forecasting. International
Journal of Forecasting, 5, 171-178.
Sniezek, J. A. & Henry, R. A. (1989). Accuracy and confidence in group judgment. Organiza-
tional Behavior and Human Decision Processes, 43(1), 1-28.
Sniezek, J. A. & Henry, R. A. (in press). Revision, weighting, and commitment in consensus
group judgment. Organizattonal Behavior and Human Decision Processes.
Steiner, I. D. (1972). Group process and productivity. New York: Academic Press.
Von Winterfeldt, D. & Edwards, W. (1986). Decision analysis and behavioral research. London:
Cambridge University Press.
Janet A. Sniezek is Assistant Professor of Psychology at the University of Illinois at
Urbana-Champaign. She received her doctorate in psychology from Purdue University.
Professor Sniezek research program includes the development of models of judgment
and decision making at the individual and group levels, as well as the empirical study
ofj udgmental forecasun g tn organizations. She has published in various leading journals
in psychology, business, and organizational behavior.