SACJ2 - Assignment Essays

Okay, So today we’re going to be talking about measures of central tendency. Central tendency, it describes the points around which the rest of this course focus. So the three measures that we use for central tendencies are the mean, the median, and the mode. So the mean is just the arithmetic average score of all the scores in a set of scores. The median is considered the central score, or that point which divides a distribution into two equal parts with 50% of the distribution on one side of the median and 50 percent on the other. The mode is considered the typical or most frequent occurring score in a distribution of scores. Each has its own assumptions. And it is important to know these assumptions in order to know when, when is appropriate to report daring data analysis. So the mode is the most common score, as I just talked about. It can be used with variables at all three levels of measurement. It makes it very versatile. Most often used with nominal level variables. So when we want to find the mode, we count the number of times that each core occurs, and the score that occurs the most often is the mode at the variable is presented in a frequency distribution. The mode is the largest category or the one with the most frequencies. And at the variable is presented in a line chart, then the mode is the highest peak. So that would represent the, the one with the most. Okay, so here we have a couple of examples trying to find the mode. And so if we go through and count, we would see that we have two 25s where 326 is 21 at the rest of them, right? So that means that 26 is our mode. So for the next array, we would want to go through and kind of do the same. So we have three 25s, 320, sixes, and 327. So that means that this one actually has three modes. And so for the next one, we have two and then two sets and then 22 and then two. And so once we hit more than three, we would consider it to not have a mode kind of the same if it wasn’t multiple of one particular score. So we can have unimodal with just one node. We can have bimodal with two modes, or three modes would be multimodal. But then after that, it’s considered not to have a mode. So when we look at the mode for group scores, the mode for group data is defined as the midpoint of the interval containing the most frequencies. So three is the maximum number for group modes as well. So if there’s a fourth, then is considered not to have a mode. Use the mid points of the intervals as the mode. Example here, we would have to find out where our one is with the most frequencies. And so that looks like it is here with the team. And so 15 is found in the interval of 40 to 44. And so the next step would be to find the midpoint of this particular interval. And so to find the midpoint, you take the lower limit of the variable that, or if sorry, of the interval that you’re in. So the lower limit here would be 39.5. So that’s the lowest line because it could round up and 240. And then you divide the interval size by two. So five divided by two would be equal to 2.5. And so now I want to add 90 or sorry, 39.5 to 2.5 and that gives me 42. So when you have very small interval sizes, of course it’s easy to count and just see that it is 42. But that’s the way that you would go about looking at it and seeing. And so over here we have three sets of 15. And so this one would be considered to be tri modal. And you would just go through the same process and just find the midpoint for each one of these intervals. So, so when we’re looking at this, Typically have to realize that the most popular is not always the most central score. It can actually be very far away from the central tendency. So deviant scores are outliers. Score is located in one extreme or the other, can affect the mode. And so an example of this could be on a, on a quiz. And so let’s say that you have five people who made a 90 on it. And then you have by people who made a 80 on it. And then you have for people who made a 100, right? And so really you would have the 22 modes, right? But that really doesn’t an 80 and 90. And then all those hundreds doesn’t really give you exactly very much useful information at all. So you’ve really got to consider why exactly that you are reporting your data as into which one it is that you want. But also, let’s say with that same score, and you had to buy five zeros. And the zeros, we’re just because somebody had taken them yet. And so that would be extremely misleading if that was the mode. So some of the limitations of the mode. So some distribution tab, no mode. Some distribution have multiple modes. And mode of a ordinal or interval ratio level variable may not be central to the whole distribution. And so again, that’s why we use it a lot more for nominal variables even though it can be used for all three. So the median, the exact center of the distribution of scores. So essentially just the one in the middle. It can be used with variables measured at the ordinal or interval. And ratio levels cannot be used for nominal level variables. So we can’t really have the middle colors. It doesn’t make any sense. At least not numerically to have 50 percent of the site green and 50 percent about site green just doesn’t work that way of where as for the mode, right, we could say that Green had the highest frequency. So in order to find the mean, you need to put all the scores into an array. So array the cases from low to high or from high to low. It really doesn’t matter as long as you put them in order, you locate the middle case. So if n is odd, the median score in the middle case. If n is even, and the median is the average of the scores of the two middle cases. And so here we have a couple of arrays that we can use for district nutrition. And so here when we’re looking for the one in the middle. So we’ll, we have three cases. That would mean that 15 would be our median. So here we have a even number. So that means that the two in the middle we have to add together and then we divide by two. And so that would give us immediate 16. And so here though, is where the median could also be helpful and maybe a little bit more resistant to outliers. Because still here, even though we have a 100 instead of a 19, the median is still 16. And so here it’s 15 and 35. And then here we’d have to do 15 plus 15 divided by 2 is 13. So that is pretty much it. It’s really pretty easy to be able to find out which one that is. It gets a little bit more difficult if we’re needing to find the median for group data. So the formula for finding the median for group data is the lower limit plus frequencies needed in the interval divided by frequencies found in the interval times the interval size. Okay? And so this is where having a cumulative frequency comes in handy. And so the first thing that I want to do is find out where exactly the sinner score is. And so to do that, I would take my a 111 and divide it by two. And so that would give me 55.5. And so I want to go up through here and find out how close I can get to 55.5 without going over. And so it looks like I can get to 51. Alright, so how many more do I need in order to hit my pity by 0.5? So 55.5 minus one gives me 4.5. So that would be my frequencies needed out the next interval. And so if I go up to the next interval, it looks like I find nine frequencies, so that’s my frequencies pound. And so I want to use the lower limit of the interval from which I am going to be borrowing some of these frequencies from. So that would be a 139.5. And so I would just go through and fill it in. So a 139.5 plus 4.5 divided by nine times five. And so my essentially my median would be essentially a 142, would be our closest guess. And so that is how we would figure out where the median is by just finding out what these numbers are and then doing the math. So pretty simple. So one of the things about median is that it assumes that data can be measured at the ordinal or scale or higher. So again, it can’t be like we’ve already talked about, used for actual, for nominal data is still. So this is a pretty stable measure of central tendency in the sense that it divides the scores in half. So it is the center of the data. So we don’t have to look at just the 50 percent. Sometimes it will tell us more to look at maybe descent tiles or desk styles or quartiles. So 50 percent is the desk style, but it’s also the second quartile. And so this, this gives us a little bit more information. So if you have ever taken an exam and you scored in the 75th percentile, that means that you scored better than 75% of all the other people who took that particular exam. And so if we’re wanting to look at near 30 percent, 33 percent, then that would be the 33rd Gentile. And so it’s very, some work easy way to look at it as like with our coins. It’s like $0.01. It’s 1%. A dime, 10, $0.10 quarter, 25 cents. So that’s kind of an easier way to help you remember it. And so like if I wanted to find the 75th percentile or the third quartile for the dataset that we just had, then instead divided by 2. Or if I wanted to multiply, I would multiply it by 0.5, which will be the same equivalent. So here I would just take the 111 and multiply it by 0.75. And so the 75th percentile for that data set that we were just looking at is 83.2. And then from there we would just use a formula to plug in and find where exactly that particular score would be. So the mean is very simply, the average score. Requires variables measured at the interval ratio level. But it’s often used with the ordinal level variables. Really shouldn’t do that. It’s really bad practice to do this because it violates a couple of assumptions, but some people do. Well. And when you get to the other classes, we’ll talk more about how to handle that. But for the most part, it really just needs to be used for interval ratio. Because it doesn’t essentially you can’t have a 4.5 agreement or a level of agreement. So it doesn’t make sense to use at the ordinal level variable. So cannot be used for nominal level variables at all. Definitely can’t be 0.5 green. And so kind of the same reason that you really should not use it at the ordinal level either because you can’t be 0.5 agreeable. So the mean or arithmetic average is by far the most commonly used measure of central tendency. So the formula for the mean is just the x bar. That’s the symbol here, equals the mean. And so you have your sum symbol here, and you have this symbol for scores. So when you see this, this just means the sum of the scores. And then n is the number of cases that you have. So some of the characteristics of the mean, the mean balances out all of the scores in a distribution. So all scores cancel out around the mean. So if we was to take each individual score and minus it from the mean, and then some that together it should always equals 0. And we’ll talk more about that in the next chapter’s lecture video. The mean is the point of minimised variation of the scores. So the least squares principle. So again, that has to do a lot more with it. Levels of variants. And so again, we’ll talk more about that next lecture video. The mean is affected by all scores. So all scores are used in the calculation of the mean. And so this makes it very difficult because when you have extreme scores and it really throws off the demand and the main can actually be very misleading. And so a lot of times you would want to look at your outliers and maybe report both means or report the median. And I mean, we’re just kinda talk about it in some way so that you’re not misleading your audience when you do have a skewed mean. So a strength of the mean is that the mean is used at the available information from the variable. So it uses all of the information. Some of the weaknesses though, is that it is affected by every score. So our high scores and low scores or outliers, do you affect it? If there are some very high or very low scores? So what we’re talking about here would be a skewed distribution, then the mean can be very misleading, like I just talked about. Here, is kind of an example. And so all you do is, you know, this fancy formula let me into really is that you add up all the numbers that you have here. And so if we totaled them together, we would have 695. And then I have 21 scores. And so I take the 695 and I divide it by 21. And that would give me a mean of 33.1. And so when I’m talking about it being sensitive to outliers, Let’s say that instead of the 60, we had 600. And so that would be a very far extreme from where our data actually is. That if we want to have a 600 instead of 60, that would change our mean, 258.8. And so 58.8 would be nowhere near our central of this particular data set. And so a lot of times the median is the most appropriate when there are extreme outliers like because in that case than the median will be centered, more central to where the rest of the cases are. So the mean for group data is simply the sum of the frequencies times and then points divided by n. And so we find the midpoints for each of our intervals. And then we multiply that by the number of frequencies. So here we have 8 times 8, which gives us 464. So you keep going down through. And once you find that Then you divide by n. And so in this particular case, N equals 100, 15. And so you would, if we added amount together, then we would have books like my numbers are off a little bit. Canada up to 404,615 divided by 15. And then that would give us 40, 0.1. And so, and so this, ignore this right here. This right here should, or you want to go to. And so that’s really pretty simple. I mean, it’s a lot of work. But again, don’t let like all these fancy equations that you off. But I think that’s where a lot of people get intimidated about stats. But if you break it down into, you just take this and you multiply it by that. And then you add all this together and divide by that, then truly not that hard. You just have to pay attention to detail. All right? And so means medians and skew. When a distribution has API very high or low scores, the mean will be pulled in the direction of the extreme scores. For a positive skew, the mean will be greater than the median. And for a negative score and the mean will be less than the median. And so that is essentially how you can tell the difference between the two. So some people say if the tails to the left or to the right, that then that gets complicated as unto, you know, which lacked what right. And so positive and negative is just a direction. But the best way to be able to tell which way it’s skewed. It, It’s positive than the mean will be greater than the median. And if it’s negative, the me, an OB less than when an interval ratio level variable has a pronounced skew. The media and maybe the most trustworthy measure of central tendencies. And so that’s pretty much sums up what we’re talking about with making sure that we know why we’re reporting them. And some of the assumptions behind in order to make sure that we don’t report something that doesn’t make sense. And so we would end up with what are those studies that says that the average household has 2.3 kids, um, because you can’t really have 0.3 of my kids. So that’s make sure that you know, all these assumptions so that you don’t end up saying something like that, right? And that pretty much covers central tendency.

For this video, we are going to be talking about measures of variability or dispersion. That the concept of dispersion, it refers to the variety, diversity, or amount of variation among scores. The greater the dispersion of a variable, the greater the range of scores and the greater the differences between scores. When we’re looking at measuring variability, we have to consider the different types of data that we have. And also why it is that we are trying to measure the variability. And so the ones that we’re going to be talking about in this lecture are the ICU UV range, a deviation score or average deviation, variance and standard deviation. So if we have nominal data, then the one that we have to use is Miller and scholars index a qualitative variation. Range can be used for a number different types of variables. So this is the distance between over which particular proportions of scores are spread. Deviation score. It’s a distance of scores from the mean of their distribution. Variance is the sum of the squared deviation scores divided by n. And then our standard deviation is the square root of the variance. And the standard deviation is important because it’s the one that we use for decision-making about if something is significant or not. So very important part of understanding statistics. Okay, so here we’re going to talk about I QV first. And so this particular is the formula I QV equals observed heterogeneity divided by maximum heterogeneity the times a 100. And so essentially to be able to find out how many of these observed and maximums that you need. You would use this particular formula, k times k minus one divided by two. And so here we have an example of nominal data and which we have different types of grapes. And so we have date rape, rate bag, close friend, by a family acquaintance, by a stranger, and by a relative. And so in order to find out our level of variance between these, we need to find out the observed and the expected. And so if there is nothing going on, if there was no types influences on the outside as to what leads to these different types of grapes, then we would expect for all of them to be even all to be 200. And so that would be our maximum heterogeneity, right? And so in this particular case, we would look at the number of products. And so here we have 12345 cases. Then we multiply it by that same number minus 1, so 5 plus 4. And then we would divide it by two. So that gives us 10. So we would have 10 different sets of products. And so what that’s talking about here is for our observed, our first product would be 200 times 100. And then we would add that score to 200 times 200. And then we would add that score to 200 times 350 and then 200 times 150. And then we would go down to the next one. And we would have a 100 times 200100 times 350 and a 100 times a 150. And so going ahead and knowing how many products you should have, right? So product, this means how many times you’re going to be multiplying something by. And that kinda lets us go ahead and lay out the equation and know exactly what it’s supposed to be expected. And so since the base is the same. Then you can actually just go ahead and we know that it’s going to be 10 products. So you could just take 200 times 200 and raise it to the tenth, which would give us 400 thousand. So that’s pretty easy to do on this side. But essentially to know that we do a 10, right? So this is one, this is two, this is three, this is four. This is 567. This is eight, this is nine, this is 10. And so it just helps keep everything straight. And then you add all those together. And said If we did that, then we would end up with 382,500 divided by 400 thousand. And so that gives us 0.9562. And then we multiply that by a 100 in order to get it into a percentage. And so that would be 95.62. And so that tells us that there is a, a decent amount of diversity, but it’s really not that far along because like Syria we have here that are 200 even and the a 150 is close. And so that tells us that essentially there’s not a whole lot of dispersion there between observed and expected. Okay, so the range, range indicates the distance between the highest and lowest scores of a distribution. Range is often indicated with an r equals high score minus lowest score. It is a quick and easy indication. Variability. It can be used with ordinal or interval ratio data. And the reason that, again, that we can’t use the range from a nominal variable is that it just wouldn’t make sense that you would have a measurable distance between pigs and cows are between apples and oranges. So it can only be used for the higher levels of measurement. So here we have an example of an array that we’re going to talk about the range for. And so we would indicate and find our highest and our lowest. So again, by putting them in array, that helps us to find that information. And so our lowest is 20 and our highest is 49. And so the range, so distance over which 100% of the scores in a distribution is spread would be 49 minus 20. So that would give us 29. So some textbooks would say that you, that you should use the lower limits and upper limits. And so again, it depends on the textbook, so always pay attention to that. So like if back when I was where you guys were at, I would actually hacked they take 49.5 and subtract it from night team 0.5 and so my range would be 30. So it really depends. And again, you’ve had a look at what your textbooks wanting and what your teachers wanting. So this follow the textbooks guidance on that one. So here we can also find our interquartile range. And so we’ll go back to this slide here just a second. So the interquartile range is a type of range measurement. It considers only the middle 50 percent of cases of a distribution. It avoid some of the problems of the range by focusing on just the middle 50 percent of the scores. It is limited because the inner quartile range is based only on to scores. Yeah, it felt to yield any information from all of the other scores. So you’re cutting off 50% of the scores. And so if we went back here and we wanted to find where our information would be, then I would need to locate my first third quartile. So the first one. So it would be our fifth case. So 1234. Five, so 28 is our first quartile. And then the last 5, 1, 2, 3, 4, 5. Though, that would be the start of the fourth. So I want the end of the third. And so this would give me 44 minus 28, which would give me a intake or inter-quartile range of 16, which is very different from the 2009 that we got before. So this when there really wouldn’t be any need to do that because we don’t have extreme outliers on either side. So that would be kind of misleading in order to use that type of range for this particular distribution. So again, you’ve got to look at what your data looks like and to determine what you should report. So kind of the same whenever we’re wanting to find the range for group data, then we would just take the lowest from whichever said it is that we have so the 115 and 179. And so essentially the range would be 179 minus 1. That teen or if you’re doing old school like I was raised to do when some 9.5 minus 1.514. So again, follow the textbook and do whichever way the textbook tells you to do. All right? And so here, the 175 minus 14.5.75.5 minus 1414.5 equals 61. So this is a very unstable measure because it is very sensitive to deviant scores. So it’s a poor choice if you have outliers. And so we can also find the interquartile range or group data as well, kind of like we talked about with the last video. So this is actually the same chart that we’re going to be looking at from, from that. And so here, if we were doing the halfway, would take it by five. So this is exactly what we went through with our last equation. And so go through that and we would just change it up a little bit. Where if we wanted to find the first third quartiles, then we would have to take the 111 and multiply it by point T5. And so that would give us the 27.5. And then we would take this and multiply the a 111 by 0.75 to give us the third quartile. And then based on that information, we would find the median score for each of those. So weird. Yeah, Essentially you’d have to go figure out which one it is and you’d have to go through and follow the exact same steps. And so if I wanted to find where 27.5 would be, I would go as close as I can get without going over. So there is 26. So I would need to borrow from this particular interval right here. So my lower than that would be my 129.5. And I at 26. So 27.5 minus 26 would be what, 1.5. So that is the number that I need. But I ended up finding 15. I only needed 1.5. And so that’s why I would have to take and put my frequency needed and divide it by my frequency found times 5. And so remember when you’re doing these equations that we have to go by order of operations. So again, that’s why this, you gotta make sure Follow-up order of operations. Okay, so for this formula, remember. At this right here, then this right here is to multiply. It’s not the parentheses. And so it’s essentially meaning that we need to multiply the frequency needed divided by the frequency spans times the interval. And so in order to do that, we would take the 1.5, which is what we found, and the 15, and we multiply it by five. And so anytime you multiply a fraction by the, by a whole number, you would put one as being the number below. That would be the divider. And so that is essentially why this bill ends up as nine because nine times one is nine. And then I’m sorry, nine times. Yeah. And then five times the top number. And so here we would end up with is point 1 times 5, which gives us 0.5 if we were trying to find the 25th percentile. And so our actual first quartile would be a 130. So we would actually just use this first one right here. And so then we would go up and then we’ll find the third quartile, and then we would subtract the two. And that’s pretty much how you would find. The range for group data. Though, limitations of the range, range is based on only two scores. So this is distorted by a typically high or low score. So very similar to the mean. No information about variables between high and low scores. And so you don’t really know how many are things going man finds. And so it really doesn’t give us a good look at the data within itself. So when we look at the average deviation, the average deviation variation of scores from the mean of the distribution is essentially on average, how far is a distribution off? So this little symbol right here just means that it is the deviation score. And of course the sum you guys have already seen. And then this, the sidebars here means the absolute value of those. And then n is the sample size that you take each score and subtract it from the mean to get the deviation score. So here our mean is 29. And so, but say that this was a basketball player and he, he or she normally scored 29 points per game. And so for this particular game, they shy that are scored 23 points, which is six less than their average. And here 30. So that would be one more than average, and 31 would be two more, and so on and so forth. And so we did this over the course of five games, right? 12345. And so we would take the absolute value. So we would want to turn all of these and the positives. That’s all that means is we will get rid of the negatives and added altogether. So if we added everything, then we would end up with 40 divided by five and our average deviation would be eight. And so essentially, what that means is if we were going to maybe bet on this particular person, then we would do it with their average mine and it’s essentially plus or minus 8. And then we would be pretty often as N2 where they’re probably going to fall. So really as far as that’s concerned, this kinda goes along with like bedding and trying to figure out like what’s a safe bet and what’s not. So, you know, you never know for sure, but this gives you a decent idea of what to expect from a particular player on any given day. So this goes into where you can check and make sure that you didn’t make any mistakes by like we were talking about the last video. Is that the mean was calculated correctly. The sum of all the deviation scores will always equal 0. So here we have 17 minus 14, right? Which gives us 3 plus 2 is 5, plus 1 is 6, minus 6 is 0. So we didn’t make any mistakes here. That’s just the way of going back and double checking yourself. The standard deviation calculation. So to solve, we have to subtract the mean from each score. Then we have to square the deviations. And then we sum the squared deviations. And we divide the sum of the squared deviations by n, and then we find the square root of those results. And so the first steps up until the Find the square roots is actually finding the variance first, right? And then, so S squared is variance. And then just plain old S is your standard deviation. So here, so variance is the sum of the squared deviation scores divided by n. And standard deviation is just the square root at those. And so here we were able to, we have our numbers. So let’s say that we were looking at days missed for 11 kids. And so this person or maybe pays attended. So this person intended 20 days and this person intended 30 days. Let’s say it’s 30. So this one had perfect attendance. And so on average, how did the group do? And so we would take our total number of days and we would divide it by the number of students that we had. And so that would give us 275 divided by 11. Or maybe you would make more sense to use the example course because it does the test scores. And so it could really be whatever it is that you want it to be. But anyway, if you add all of those scores together, you’re going to get to 175. And we have 11 kids, so 275 divided by 11 gives us 25. And so from each of the scores, we’re going to subtract 25. And so that gives us our deviation score. And so if you add all these up, we get 0. So that means that we did it correctly. And then the next thing we want to do is to square the deviation scores, though, right? So here we have the sum of the difference between, so the sum of the deviation scores squared. That means that I have to square all the deviation scores, right? Because remember again, that you have to follow order of operations and this particular case, then this is parentheses, so it is done first and then exponents, right? Please excuse. And then adding would be last as far as that. And then dividing. So we take all this and then we’ll divide it by that bit. Gotta go through this first. And so essentially, we, if we were to take all of these squared and we would get to a 110. We added everything together. And so a 110 divided by 11 would give us 10. So our variance is 10. And then in order to get the square root, that variance, then we would take the square root of 10, which is 3.16. And so that is pretty much how you calculate for their deviation. And we could do it for grouped that your books not going to. So I’m just going to go ahead and stop the lecture here.

For this video, we are going to be going over Module 2. And so Marshall to you is all about measures of central tendency and dispersion. And so here you had your lecture videos on Chapters 3 and 4. And here’s some basic information about how to use SPSS and here’s the PowerPoints that I used for the video presentations. And so for the quiz, There’s going to be 10 questions from Chapter 3 and 10 questions from Chapter 4 are very similar to the way it was set up Module 1. Right? And so the discussion board, essentially all of these, if you read the chapters and you watch the videos, you will have no problem answering any of these questions here. Pretty straightforward. For assignment one. So in the videos and the readings and everything, you will learn how to calculate the mean, the median, the mode, the range and standard deviation, and the third quartile. And so remember back from module one, where I talk about in the lecture video of putting things that are in disarray an array. And so that’s going to be very important in order to be able to find the median, the mode, and the range for each of these three years. So don’t forget to do that. Otherwise, it’s going to be a lot harder for you to be able to figure out what is the range and puts the mode and the median. So all of those should be pretty standard. Then SPSS part though. So that’s apart. That may not be quite as easy. And so I’m going to go over how to do that particular assignment. And so I am at home, so I have logged in via the V lab. And so to get here, all I did was google UN GB lab and entered in my username and password. And then the software center, you can go ahead and download it. And it will remember your profile. So once you download it, you don’t have to keep downloading it from the software center each time. You’ll just have to go search for it if it’s not pinned already for you. All right, so here I have kind of started with your first task. So your first task is to create a SPSS file for the data above, and then answer the following questions based on the data set. All right, and so we have respondents. And so I went ahead and gave them their ID numbers. So here all I had to do was type then each of these. And so if you have a nominal variable, we have to still put that into mathematics. So here’s the way that you would go about it. So I would just type and the one and then protists that and then add and that will click it over into this area right here. And then two Catholic add and keep going until you got all the way up to four. Kinda the same for males and females. Again, it’s a nominal variables though. You want to change over, though the number of partners a person has as a interval ratio or in SPSS, they treat them the same. So it is a scale variable. And so in your dataset, only one that you will have, that is ordinal will be legalization of marijuana. So that’s the only one that you’re going to list. Those are ordinal. And so in order to do this here, but say if I want to create a new variable, I would just do age. Okay? And so for age, I just want one decimal. We’re not doing point whatever. Let’s go ahead and move that down. And age we are going to put as a scale variable. And then I go through and I look at the list over here. So our first respond, it looks like page at freshman was 18. And then the next one was 20 and then 21. And then a TM band looks like our fifth one has started a little later in life at 25. So that’s just how you go about adding variables into your dataset. So one of the things is UNED it as like spaces or special characters. So I kinda had to go a little bit creative with how you label these things. So if you wanted to put ID here service by me, you could that try and make sure that you know exactly what you need to make it easy enough to read, thank easier for me to be able to grade. So keeping with as close as what’s up there is probably the easiest thing for me to be able to grade though, to make extra sure that I know you did it right. I would probably use those same lines. So here, the next is after you get all of your information and so on. We did for the first five, just so that I can have enough data to show you how to do each of these. Your first task is to create a frequency distribution for the following variables. So religion, gender, and race. All right, and so to do that, we’ll go up to Analyze and then Descriptive Statistics and Frequencies. And so I already have gender here. So I also want to do religion and race. So I need to add religion and rapes. And, okay. And that gives me my frequency distributions for religion, race, and gender. And so here, we don’t really mean the inset this point, so don’t worry about that. That will actually probably come back to this here in just a minute. So the next question is what percentage of females are Catholic? And so to be able to look at that, we will go to analyze descriptives and then cross tabs. And so I want the mouth right? You are Catholic, which means I need to look at religion. And so you can go ahead and put the percentages if you want. It’ll look at my percentages first. And so this is very cluttered. But you can find out. So we are wanting to know what percent of females are Catholic, though we were to look at are females. And percent within our gender is thirty-three point three. And so essentially, a third of all the females are Catholic. And so we only have three, and we don’t have any that were nine and a Jew, Catholic and Protestant. So I went out to three. Well, it says that 33 percent and then say you would do for the both white and male, then you would have to look at the total line in order to be able to figure that one out. So if I wanted to say, oh, wow, I took the total was both female and Catholic. Then I would go down to female and want to look at the total. And here acts like we did not have any Mel’s, you were Catholic, so that would give us 20% of the population would be both female and Catholic. So now we’re cooking out of five, right? And so we have one out of five. And that’s where we get our 20 percent at. And so make sure that you’re looking at the total. So for me a lot of times, it’s easier to not have all of this and to just go and look at it. So we’ll go back to the cross tabs. And this time I’m going to take away the percentages and run it again. And so to me it’s a lot easier to just look at this and say that. You know, one out of three females, so a third, so thirty-three point three percent are Protestant or Catholic, either way, any of them, there’s going to be 33 percent. And then so if I wanted. So females who were though males and females who were Protestant as a whole, like so then that would be the two over five. So that would be 40%. So that is, It’s really up to you, however you think it is to most easily read the chart to me, the simple ones better, but some people like to be able to look at is that the key is though, which total you look at Connect. That can make a huge difference because that could be the difference between right and wrong answer as if you look at the total. So to me, that’s why I think MY simple totals are so much easier. Also for the proportion, when asked for a proportion of males to females. Then you also might want to look at the simple our chart, because this is all in percentages. So if you report percentages, then you’re going to get that question wrong. So I need to leave it and proportions. All right, and then the next few questions are about the central tendencies. And so here we have. First one is what is the best representation? Give level and count of the central tendency for religion and why. Okay? And so what go up here? And we’re going to look at and analyze descriptives, frequencies, stats, I want, I mean, median and mode. Okay? This time I just want to look at religion. Okay? And so here, even though I have selected for the mean, median and mode, it didn’t give me anything. Because you can’t really be essentially 1.5 Catholic or if he was to put these in an order. Right. The order does it really matter any at all? And so you’re not going to have a central one. And so the only one that would make sense would be the mode, which is 2 here. And so that would be very important to talk about the fact that that it just wouldn’t make sense to have a center one because there is no order to them, a rhyme or reason. And what makes sense to have an average, because then you would end up with 0 something. And so you now, on average, a person is of a certain, certain religion and this doesn’t make any sense. So you would do the mode, and so you would talk about that. And so we’ll have different numbers. And so an example of one where it would show and where does that make any sense? By just looking at the char would be if we did it for gender. Oh, sorry. I need help. I’m going to cancel this. That’s okay. I do have a model there and okay. Alright. And so here, it does give us a mean, a median and mode. And so mean this can’t really be 1.6 of male or female. You’re kind of one there unless other is an option that still, it’s a nominal variable, so you wouldn’t be 0.6 other either. And so the median is showing that it is a two. So. That you would have to translate, right? Remember what you saved as 12, so there’s five. And so we have TP mouth though, it would be 2, 2, 3, 3, 3. Which in this case, my workout that this though, you know, one of your choices with not essentially to write your choices are male or female? As to which one is the most, right? Okay, so I just gave it to you. You should do the mode. Clearly the mode to you’d have to know the conversion of. So just looking over here and looking at the frequencies, the mode is female. And so therefore, again, you’d talk about fact that it’s nominal and it just doesn’t make sense to report the others because of just the fact that white boy the coding. So if the mode was two, and here we’re talking about males versus females than people would have no clue what you’re talking about. In this case though too, does translate and TIF female. So bay, The same. Females have the most, and females are in the media. And so you could report both because they are both the same. But again, that’s up to you as long as you answer why you reported both. But make sure that you do not report the mean, because that makes no sense whatsoever for a nominal variable. And the rest of them are pretty much exactly like that. So you would pick like number of partners and y. So let’s go up here and we analyze Descriptive frequencies. And this time I want number part nerves. And there we go. Maybe go ahead, put some in there too. Oh, it’s sometimes good to have your totals as well. All right. And so here, our choices were between 0 to six and 10. So you’ll have a wider variety to work with. That here, this is a scale variable. So you could use the mean or the median or the mode. So for this one though, it looks like it may be a little bit skewed because, you know, right here where the 246 and this ten, That’s quite a bit higher than the rest of them. And so that might be bringing it up just a little bit. And so the average being higher than the mean would mean that are soluble. Mean being higher than the median, right? Means that it is key in the positive direction. And so in this case here, you might want to report the mode, but it’s also the media as well to you. So, but depending on what you end up with, at the very end, just talk about outliers and things like that and how it could affect this and this. Compare the three numbers and see which one you think would be most appropriate. Because in theory, you could report any of those for your scaled. It’s not like the nominal where you’re more limited. And so what you do that then just write up a little narrative about what you found. And so for the prejudiced with the freshmen and then prejudice for the seniors, just talk a little bit about the central tendencies. How they’ve changed, that they go up or did they go down? Things along those lines. And so we’re not going to talk about significant or not. Just talk about if it, if there does look like there was a change and talk about just the central tendencies. And so that is pretty much all you really need to do. So make sure that you keep this set of data that you fix neglect because we’re going to come back through this for a lot of or activities. So that pretty much covers Module 2. Let me know if you have any questions. Thanks.

Measures of central tendency

Central tendency
Describe points around which the rest of the scores focus
Three measures
Mean
Median
Mode

Mode is considered the typical or more frequently occurring score in a distribution of score.
Median is considered the central score, or that point which divides a distribution into two equal parts, with 50% of the distribution on one side of the median and 50% on the other side.
Mean is the arithmetic average score of all scores in a set of scores.

Each has it’s own assumptions– it is important to know these assumptions in order to know when one is appropriate to report during data analysis.
2

Mode
The most common score
Can be used with variables at all three levels of measurement
Most often used with nominal level variables
Finding the Mode
Count the number of times each score occurred
The score that occurs most often is the mode
If the variable is presented in a frequency distribution, the mode is the largest category
If the variable is presented in a line chart, the mode is the highest peak

The mode
22, 23, 25, 25, 26, 26, 26, 27, 27, 28, 29, 30, 31, 32, 33, 35

22, 23, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 29, 30, 31, 33

22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 28, 29, 30, 35, 35

Can have multiple modes, but no more than 3

Mode for grouped data

Sample of convicted murderers &
Sentence received

Years sentences
f

55-59
11

50-54
7

45-49
10

40-44
15

35-39
10

30-34
9

25-29
5

20-24
3

N=70

Sample of convicted murderers &
Sentence received

Years sentences
f

50-54
15

45-49
10

40-44
15

35-39
10

30-34
7

25-29
15

20-24
5

N=77

The mode for grouped data is defined as the midpoint of the interval containing the most frequencies. 3 is the maximum number for grouped modes as well. 4 there is no mode
Use the midpoint of the interval as the mode year
Example: the interval 40-44 has the highest amount of murders so this interval midpoint is our mode (42 years). 5/2= 2.5; 39.5 + 2.5= 42
One mode= unimodal
Two modes= bimodal
Three modes= multimodal
Nominal level or higher
Most popular is not always the most central score. Can be very far away from the central tendency

Deviant scores or outliers– scores located in one extreme or another (small or large)

Limitations of Mode
Some distributions have no mode
Some distributions have multiple modes
The mode of an ordinal or interval-ratio level variable may not be central to the whole distribution

Median
Exact center of distribution of scores
The score of the middle case
Can be used with variables measured at the ordinal or interval-ratio levels
Cannot be used for nominal level variables

Finding the Median
Array the cases from low to high (or from high to low)
Locate the middle case
If N is odd: the median is the score of the middle case
If N is even: the median is the average of the scores of the two middle cases

The median
12, 15, 17
12, 15, 17, 19
12, 15, 17, 100
9, 12, 15, 17, 100
10, 35, 39, 43, 55, 220, 320, 480, 2,000,000
9, 12, 13, 15, 15, 15, 15, 15, 15, 17, 19, 20

Point that divides a distribution of scores into two equal parts.
In an array of an uneven number of scores, the central score becomes the median.
When there is an even number we can find a median, but this number is a theoretical point dividing a distribution
15
16
16
15
55
15
9

Median for grouped data
Mdn=+(fn/ff) (i)

Satisfaction Score

Interval
f
cf

175-179
4
111

170-174
6
107

165-169
3
101

160164
13
98

155-159
8
85

150-154
7
77

145-149
10
70

140-144
9
60

135-139
10
51

130-134
15
41

125-129
11
26

120-124
10
15

115-119
5
5

N=111

LL= lower limit of the interval containing the number of frequencies we need to divide the total number of scores into two equal parts.
fn= the frequencies we need in the interval
ff= the frequencies found in the interval
I = the interval size

N/2= 111/2= 55.5
We must find the point with 55.5 scores on one side and 55.5 scores on the other side.

51 is a close are we can get to 55.5
Mdn=139.5+ 4.5/9 X 5
=139.5+22.5/9
=139.5 + 2.5
=142

Assumes data that can be measured at the ordinal scare or higher.
More stable measure of central tendency in the sense that it divides the scores in half.

Centiles, deciles, & quartiles
Centiles– divide distributions of scores into 1 % units
Deciles– divide distributions of scores into 10% units
Quartiles– divide distributions of scores into 25% units

50%= 5th decile and the 2nd quartile
75 centile is the point leaving 75% of all scores below it and 25% of scores above it
33 centile is the point leaving 33% of all scores below it and 67% of scores above it.

75% of scores form the 111 in the last chart we take (.75)(111)= 83.2 and from there we need to use the formula from last slide and plug in the amounts.
11

Mean
The average score
Requires variables measured at the interval-ratio level but is often used with ordinal-level variables
Cannot be used for nominal-level variables
The mean or arithmetic average, is by far the most commonly used measure of central tendency

Characteristics of the Mean

The mean “balances” out all of the scores in a distribution; all scores “cancel out” around the mean.
The mean is the point of minimized variation of the scores, “least squares principle”
The mean is affected by all scores; all scores are used in the calculation of the mean.
Strength – The mean uses all the available information from the variable
Weaknesses
The mean is affected by every score
If there are some very high or low scores (as with skewed distributions), the mean may be misleading

The Mean:
=
18, 19, 19, 20, 21, 21, 22, 25, 29, 32, 35, 37, 37, 38, 41, 41, 41, 43, 47, 49, 60

= the sum of the scores
N= the number of scores
695/21= 33.1
Replace the 60 with 600. What does the mean become?
(58.8)
Median is most appropriate when there are extreme scores or outliers.
15

Number of IPV incidents from women with PTSD

Intervals
f
MP
(f)(MP)

57-59
8
58
464

54-56
9
55
495

51-53
3
52
156

48-50
10
49
490

45-47
10
46
460

42-44
8
43
344

39-41
11
40
440

36-38
19
37
703

33-35
12
34
408

30-32
7
31
217

27-29
3
28
84

24-26
8
25
200

21-23
7
22
154

N=115

(N)(MP)=4,618

The mean for grouped data
=
=

Number of IPV incidents from women with PTSD

Intervals
f
MP
(f)(MP)

57-59
8
58
464

54-56
9
55
495

51-53
3
52
156

48-50
10
49
490

45-47
10
46
460

42-44
8
43
344

39-41
11
40
440

36-38
19
37
703

33-35
12
34
408

30-32
7
31
217

27-29
3
28
84

24-26
8
25
200

21-23
7
22
154

N=115

(N)(MP)=4,618

MP= interval midpoints
=

= = 40.1
17

Means, Medians, and Skew
When a distribution has a few very high or low scores, the mean will be pulled in the direction of the extreme scores
For a positive skew, the mean will be greater than the median
For a negative skew, the mean will be less than the median
When an interval-ratio level variable has a pronounced skew, the median may be the more trustworthy measure of central tendency

image2

image1

image3

image4

image5

image30

image40

image6

IBM SPSS Statistics Base V27

IBM

Note

Before using this information and the product it supports, read the information in “Notices” on page
197.

Product Information

This edition applies to version 27, release 0, modification 0 of IBM® SPSS® Statistics and to all subsequent releases and
modifications until otherwise indicated in new editions.
© Copyright International Business Machines Corporation .
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.

Contents

Chapter 1. Core features…………………………………………………………………………….1
Power Analysis…………………………………………………………………………………………………………………………… 1

Means……………………………………………………………………………………………………………………………………2
Proportions…………………………………………………………………………………………………………………………. 12
Correlations………………………………………………………………………………………………………………………… 21
Regression………………………………………………………………………………………………………………………….. 28

Codebook………………………………………………………………………………………………………………………………… 31
Codebook Output Tab………………………………………………………………………………………………………….. 31
Codebook Statistics Tab………………………………………………………………………………………………………. 33

Frequencies………………………………………………………………………………………………………………………………33
Frequencies Statistics………………………………………………………………………………………………………….. 34
Frequencies Charts……………………………………………………………………………………………………………… 35
Frequencies Format…………………………………………………………………………………………………………….. 35

Descriptives………………………………………………………………………………………………………………………………36
Descriptives Options……………………………………………………………………………………………………………. 36
DESCRIPTIVES Command Additional Features………………………………………………………………………. 37

Explore……………………………………………………………………………………………………………………………………. 37
Explore Statistics………………………………………………………………………………………………………………….38
Explore Plots………………………………………………………………………………………………………………………..38
Explore Options…………………………………………………………………………………………………………………… 39
EXAMINE Command Additional Features………………………………………………………………………………. 39

Crosstabs………………………………………………………………………………………………………………………………… 40
Crosstabs layers………………………………………………………………………………………………………………….. 40
Crosstabs clustered bar charts……………………………………………………………………………………………… 41
Crosstabs displaying layer variables in table layers………………………………………………………………….41
Crosstabs statistics……………………………………………………………………………………………………………… 41
Crosstabs cell display………………………………………………………………………………………………………….. 43
Crosstabs table format………………………………………………………………………………………………………….44

Summarize………………………………………………………………………………………………………………………………. 44
Summarize Options……………………………………………………………………………………………………………… 44
Summarize Statistics…………………………………………………………………………………………………………….45

Means……………………………………………………………………………………………………………………………………… 46
Means Options……………………………………………………………………………………………………………………..47

OLAP Cubes………………………………………………………………………………………………………………………………48
OLAP Cubes Statistics………………………………………………………………………………………………………….. 49
OLAP Cubes Differences………………………………………………………………………………………………………. 50
OLAP Cubes Title………………………………………………………………………………………………………………….50

Proportions……………………………………………………………………………………………………………………………….50
Proportions introduction………………………………………………………………………………………………………. 50
One-Sample Proportions……………………………………………………………………………………………………….51
Paired-Samples Proportions………………………………………………………………………………………………….53
Independent-Samples Proportions……………………………………………………………………………………….. 56

T Tests……………………………………………………………………………………………………………………………………..58
T Tests……………………………………………………………………………………………………………………………….. 58
Independent-Samples T Test……………………………………………………………………………………………….. 59
Paired-Samples T Test…………………………………………………………………………………………………………. 60
One-Sample T Test……………………………………………………………………………………………………………….62
T TEST Command Additional Features…………………………………………………………………………………… 63

One-Way ANOVA……………………………………………………………………………………………………………………….63
One-Way ANOVA Contrasts………………………………………………………………………………………………….. 64
One-Way ANOVA Post Hoc Tests……………………………………………………………………………………………64

iii

One-Way ANOVA Options…………………………………………………………………………………………………….. 66
ONEWAY Command Additional Features……………………………………………………………………………….. 67

GLM Univariate Analysis……………………………………………………………………………………………………………. 67
GLM Model…………………………………………………………………………………………………………………………..68
GLM Contrasts…………………………………………………………………………………………………………………….. 70
GLM Profile Plots…………………………………………………………………………………………………………………. 70
GLM Post Hoc Comparisons…………………………………………………………………………………………………..72
GLM Save……………………………………………………………………………………………………………………………. 75
GLM Estimated Marginal Means……………………………………………………………………………………………..76
GLM Options……………………………………………………………………………………………………………………….. 76
UNIANOVA Command Additional Features……………………………………………………………………………..77

Bivariate Correlations……………………………………………………………………………………………………………….. 78
Bivariate Correlations Options………………………………………………………………………………………………. 79
Bivariate Correlations Confidence Interval…………………………………………………………………………….. 79
CORRELATIONS and NONPAR CORR Command Additional Features…………………………………………80

Partial Correlations…………………………………………………………………………………………………………………… 80
Partial Correlations Options…………………………………………………………………………………………………..81
PARTIAL CORR Command Additional Features………………………………………………………………………. 81

Distances………………………………………………………………………………………………………………………………….81
Distances Dissimilarity Measures………………………………………………………………………………………….. 82
Distances Similarity Measures……………………………………………………………………………………………….82
PROXIMITIES Command Additional Features………………………………………………………………………… 83

Linear models……………………………………………………………………………………………………………………………83
To obtain a linear model………………………………………………………………………………………………………..83
Objectives ………………………………………………………………………………………………………………………….. 83
Basics …………………………………………………………………………………………………………………………………84
Model Selection ………………………………………………………………………………………………………………….. 84
Ensembles …………………………………………………………………………………………………………………………. 85
Advanced …………………………………………………………………………………………………………………………… 86
Model Options ……………………………………………………………………………………………………………………..86
Model Summary …………………………………………………………………………………………………………………..86
Automatic Data Preparation ………………………………………………………………………………………………….86
Predictor Importance ………………………………………………………………………………………………………….. 86
Predicted By Observed …………………………………………………………………………………………………………87
Residuals …………………………………………………………………………………………………………………………… 87
Outliers ……………………………………………………………………………………………………………………………… 87
Effects ……………………………………………………………………………………………………………………………….. 87
Coefficients …………………………………………………………………………………………………………………………88
Estimated Means ………………………………………………………………………………………………………………… 88
Model Building Summary …………………………………………………………………………………………………….. 88

Linear Regression…………………………………………………………………………………………………………………….. 89
Linear Regression Variable Selection Methods……………………………………………………………………….. 89
Linear Regression Set Rule…………………………………………………………………………………………………… 90
Linear Regression Plots…………………………………………………………………………………………………………90
Linear Regression: Saving New Variables………………………………………………………………………………..91
Linear Regression Statistics…………………………………………………………………………………………………..92
Linear Regression Options……………………………………………………………………………………………………. 93
REGRESSION Command Additional Features…………………………………………………………………………. 93

Ordinal Regression ……………………………………………………………………………………………………………………93
Ordinal Regression Options………………………………………………………………………………………………….. 94
Ordinal Regression Output…………………………………………………………………………………………………….95
Ordinal Regression Location Model……………………………………………………………………………………….. 95
Ordinal Regression Scale Model……………………………………………………………………………………………. 96
PLUM Command Additional Features……………………………………………………………………………………..97

Curve Estimation……………………………………………………………………………………………………………………….97
Curve Estimation Models……………………………………………………………………………………………………… 98
Curve Estimation Save…………………………………………………………………………………………………………..98

Partial Least Squares Regression………………………………………………………………………………………………..99

Model ………………………………………………………………………………………………………………………………. 100
Options ……………………………………………………………………………………………………………………………. 101

Nearest Neighbor Analysis………………………………………………………………………………………………………. 101
Neighbors ………………………………………………………………………………………………………………………… 103
Features ……………………………………………………………………………………………………………………………103
Partitions …………………………………………………………………………………………………………………………. 104
Save ………………………………………………………………………………………………………………………………… 105
Output ………………………………………………………………………………………………………………………………105
Options ……………………………………………………………………………………………………………………………. 105
Model View ………………………………………………………………………………………………………………………. 105

Discriminant Analysis……………………………………………………………………………………………………………… 108
Discriminant Analysis Define Range……………………………………………………………………………………..109
Discriminant Analysis Select Cases………………………………………………………………………………………109
Discriminant Analysis Statistics……………………………………………………………………………………………109
Discriminant Analysis Stepwise Method………………………………………………………………………………. 110
Discriminant Analysis Classification……………………………………………………………………………………..110
Discriminant Analysis Save………………………………………………………………………………………………….111
DISCRIMINANT Command Additional Features…………………………………………………………………….111

Factor Analysis ……………………………………………………………………………………………………………………….112
Factor Analysis Select Cases………………………………………………………………………………………………. 112
Factor Analysis Descriptives………………………………………………………………………………………………..113
Factor Analysis Extraction………………………………………………………………………………………………….. 113
Factor Analysis Rotation…………………………………………………………………………………………………….. 114
Factor Analysis Scores……………………………………………………………………………………………………….. 114
Factor Analysis Options……………………………………………………………………………………………………… 115
FACTOR Command Additional Features………………………………………………………………………………..115

Choosing a Procedure for Clustering………………………………………………………………………………………….115
TwoStep Cluster Analysis…………………………………………………………………………………………………………116

TwoStep Cluster Analysis Options………………………………………………………………………………………..117
TwoStep Cluster Analysis Output…………………………………………………………………………………………118
The Cluster Viewer…………………………………………………………………………………………………………….. 118

Hierarchical Cluster Analysis ……………………………………………………………………………………………………123
Hierarchical Cluster Analysis Method……………………………………………………………………………………123
Hierarchical Cluster Analysis Statistics…………………………………………………………………………………124
Hierarchical Cluster Analysis Plots……………………………………………………………………………………….124
Hierarchical Cluster Analysis Save New Variables………………………………………………………………….124
CLUSTER Command Syntax Additional Features……………………………………………………………………124

K-Means Cluster Analysis ………………………………………………………………………………………………………..124
K-Means Cluster Analysis Efficiency……………………………………………………………………………………. 125
K-Means Cluster Analysis Iterate…………………………………………………………………………………………125
K-Means Cluster Analysis Save…………………………………………………………………………………………… 126
K-Means Cluster Analysis Options………………………………………………………………………………………. 126
QUICK CLUSTER Command Additional Features……………………………………………………………………126

Nonparametric Tests………………………………………………………………………………………………………………. 127
One-Sample Nonparametric Tests………………………………………………………………………………………. 127
Independent-Samples Nonparametric Tests……………………………………………………………………….. 130
Related-Samples Nonparametric Tests……………………………………………………………………………….. 133
Model View……………………………………………………………………………………………………………………….. 135
NPTESTS command additional features………………………………………………………………………………. 139
Legacy Dialogs …………………………………………………………………………………………………………………..139

Multiple Response Analysis…………………………………………………………………………………………………….. 149
Multiple Response Analysis………………………………………………………………………………………………… 149
Multiple Response Define Sets…………………………………………………………………………………………….150
Multiple Response Frequencies………………………………………………………………………………………….. 151
Multiple Response Crosstabs……………………………………………………………………………………………… 151

Reporting Results…………………………………………………………………………………………………………………… 153
Reporting Results……………………………………………………………………………………………………………….153
Report Summaries in Rows………………………………………………………………………………………………….153

Report Summaries in Columns……………………………………………………………………………………………. 155
REPORT Command Additional Features………………………………………………………………………………. 157

Reliability Analysis…………………………………………………………………………………………………………………..157
Reliability Analysis: Statistics……………………………………………………………………………………………… 158
RELIABILITY Command Additional Features…………………………………………………………………………161

Weighted Kappa………………………………………………………………………………………………………………………161
Weighted Kappa: Criteria……………………………………………………………………………………………………. 162
Weighted Kappa: Print……………………………………………………………………………………………………….. 162

Multidimensional Scaling …………………………………………………………………………………………………………163
Multidimensional Scaling Shape of Data………………………………………………………………………………. 163
Multidimensional Scaling Create Measure……………………………………………………………………………. 164
Multidimensional Scaling Model…………………………………………………………………………………………..164
Multidimensional Scaling Options……………………………………………………………………………………….. 164
ALSCAL Command Additional Features……………………………………………………………………………….. 165

Ratio Statistics……………………………………………………………………………………………………………………….. 165
Ratio Statistics………………………………………………………………………………………………………………….. 165

ROC Analysis ………………………………………………………………………………………………………………………….166
ROC Analysis: Options……………………………………………………………………………………………………….. 167
ROC Analysis: Display………………………………………………………………………………………………………… 168
ROC Analysis: Define Groups (string)…………………………………………………………………………………… 169
ROC Analysis: Define Groups (numeric)……………………………………………………………………………….. 169

ROC Curves …………………………………………………………………………………………………………………………… 169
ROC Curve Options……………………………………………………………………………………………………………..170

Simulation……………………………………………………………………………………………………………………………… 170
To design a simulation based on a model file……………………………………………………………………….. 171
To design a simulation based on custom equations……………………………………………………………….171
To design a simulation without a predictive model……………………………………………………………….. 172
To run a simulation from a simulation plan……………………………………………………………………………172
Simulation Builder………………………………………………………………………………………………………………173
Run Simulation dialog………………………………………………………………………………………………………… 183
Working with chart output from Simulation………………………………………………………………………….. 185

Geospatial Modeling………………………………………………………………………………………………………………..186
Selecting Maps …………………………………………………………………………………………………………………. 187
Data Sources ……………………………………………………………………………………………………………………. 189
Geospatial Association Rules …………………………………………………………………………………………….. 190
Spatial Temporal Prediction ………………………………………………………………………………………………. 193
Finish ………………………………………………………………………………………………………………………………. 196

Notices………………………………………………………………………………………………..197
Trademarks…………………………………………………………………………………………………………………………….198

Index…………………………………………………………………………………………………. 199

Chapter 1. Core features

The following core features are included in IBM SPSS Statistics Base Edition.

Power Analysis
Power analysis plays a pivotal role in a study plan, design, and conduction. The calculation of power is
usually before any sample data have been collected, except possibly from a small pilot study. The precise
estimation of the power may tell investigators how likely it is that a statistically significant difference will
be detected based on a finite sample size under a true alternative hypothesis. If the power is too low,
there is little chance of detecting a significant difference, and non-significant results are likely even if real
differences truly exist.

IBM SPSS Statistics provides the following Power Analysis procedures:
One Sample T-Test

In one-sample analysis, the observed data are collected as a single random sample. It is assumed
that the sample data independently and identically follow a normal distribution with a fixed mean and
variance, and draws statistical inference about the mean parameter.

Paired Sample T-Test
In paired-sample analysis, the observed data contain two paired and correlated samples, and each
case has two measurements. It is assumed that the data in each sample independently and
identically follow a normal distribution with a fixed mean and variance, and draws statistical inference
about the difference of the two means.

Independent Sample T-Test
In independent-sample analysis, the observed data contain two independent samples. It is assumed
that the data in each sample independently and identically follow a normal distribution with a fixed
mean and variance, and draws statistical inference about the difference of the two means.

One-way ANOVA
Analysis of variance (ANOVA) is a statistical method of estimating the means of several populations
which are often assumed to be normally distributed. The One-way ANOVA, a common type of ANOVA,
is an extension of the two-sample t-test.

Example. The power of a hypothesis test to detect a correct alternative hypothesis is the probability that
the test will reject the test hypothesis. Since the probability of a type II error is the probability of
accepting the null hypothesis when the alternative hypothesis is true, the power can be expressed as (1-
probability of a type II error), which is the probability of rejecting the null hypothesis when the alternative
hypothesis is true.

Statistics and plots. One-sided test, two-sided test, significance level, Type I error rate, test
assumptions, population standard deviation, population mean under testing, hypothesized value, two-
dimensional power by sample size, two-dimensional power by effect size, three-dimensional power by
sample size, three-dimensional power by effect size, rotation degrees, group pairs, Pearson product-
moment correlation coefficient, mean difference, plot range of the effect size, pooled population standard
deviation, contrasts and pairwise differences, contrast coefficients, contrast test, BONFERRONI, SIDAK,
LSD, power by total sample size, two-dimensional power by pooled standard deviation, three-dimensional
power by total sample, three-dimensional power by total sample size, pooled standard deviation,
Student’s t-distribution, non-central t-distribution,

Power Analysis data considerations
Data

In one-sample analysis, the observed data are collected as a single random sample.
In paired-sample analysis, the observed data contain two paired and correlated samples, and each
case has two measurements.

In independent-sample analysis, the observed data contain two independent samples.
Analysis of variance (ANOVA) is a statistical method of estimating the means of several populations
which are often assumed to be normally distributed.

Assumptions
In one-sample analysis, it is assumed that the sample data independently and identically follow a
normal distribution with a fixed mean and variance, and draws statistical inference about the mean
parameter.
In paired-sample analysis, it is assumed that the data in each sample independently and identically
follow a normal distribution with a fixed mean and variance, and draws statistical inference about the
difference of the two means.
In independent-sample analysis, it is assumed that the data in each sample independently and
identically follow a normal distribution with a fixed mean and variance, and draws statistical inference
about the difference of the two means.
In one-way ANOVA, the statistical method of estimating the means of several populations are often
assumed to be normally distributed.

Obtaining a Power Analysis

1. From the menus choose:

Analyze > Power Analysis > Compare Means > One-Sample T-Test, or Paired-Sample T-Test, or
Independent-Sample T-Test, or One-way ANOVA

2. Define the required test assumptions.
3. Click OK.

Means
The following statistics features are included in IBM SPSS Statistics Base Edition.

Power Analysis of One-Sample T Test

This feature requires IBM SPSS Statistics Base Edition.

Power analysis plays a pivotal role in a study plan, design, and conduction. The calculation of power is
usually before any sample data have been collected, except possibly from a small pilot study. The precise
estimation of the power may tell investigators how likely it is that a statistically significant difference will
be detected based on a finite sample size under a true alternative hypothesis. If the power is too low,
there is little chance of detecting a significant difference, and non-significant results are likely even if real
differences truly exist.

In one-sample analysis, the observed data are collected as a single random sample. It is assumed that
the sample data independently and identically follow a normal distribution with a fixed mean and
variance, and draws statistical inference about the mean parameter.

1. From the menus choose:

Analyze > Power Analysis > Means > One-Sample T Test
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When selecting Estimate power, enter an appropriate Sample size for power estimation value. The

value must be an integer greater than 1. When selecting Estimate sample size, enter an appropriate
Power for sample size estimation value. The value must be a single value between 0 and 1.

4. Enter a value that specifies the population mean under testing in the Population mean field. The value
must be a single numeric.

5. Optionally, enter a value that specifies the null hypothesis value to be tested in the Null value field.
The value must be a single numeric.

6. Enter a Population standard deviation value. The value must be a single numeric greater than 0.
7. Select whether the test is one or two-sided.

2 IBM SPSS Statistics Base V27

Nondirectional (two-sided) analysis
When selected, a two-sided test is used. This is the default setting.

Directional (one-sided) analysis
When selected, power is computed for a one-sided test.

8. Optionally, specify the significance level of the Type I error rate for the test in the Significance level
field. The value must be a single double value between 0 and 1. The default value is 0.05.

9. You can optionally click Plot to specify “Power Analysis of One-Sample T Test: Plot” on page 3
settings (chart output, two-dimensional plot settings, three-dimensional plot settings, and tooltips).

Note: Plot is available only when Estimate power is selected as the test assumption.

Power Analysis of One-Sample T Test: Plot

You can control the plots that are output to illustrate the two and three-dimensional power by sample/
effect size charts. You can also control the display of tool tips and the vertical/horizontal rotation degrees
for three-dimensional charts.

Two-Dimensional Plot
Power estimation versus sample size

When enabled, this optional setting provides options for controlling the two-dimensional power by
sample size chart. The setting is disabled by default.
Range of sample size

When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the sample size
is used.
Lower bound

Controls the lower bound for the two-dimensional power by sample size chart. The value
must be greater than 1, and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power by sample size chart. The value
must be greater than the Lower bound value and cannot be greater than 5000.

Power estimation versus effect size
By default, this optional setting is disabled. When enabled, the chart displays in the output. When
no integer values are specified for the Lower bound or Upper bound fields, the default plot range
of the effect size used.
Range of effect size

When selected, the lower and upper bound options are available.
Lower bound

Controls the lower bound for the two-dimensional power by effect size chart. The value
must be greater than, or equal to, -5.0 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power by effect size chart. The value
must be greater than the Lower bound value and cannot be greater than 5.0.

Three-Dimensional Plot
Power estimation versus

Provides options for controlling the three-dimensional power by sample size (x-axis) and effect
size (y-axis) chart, the vertical/horizontal rotation settings, and the user specified plot range of
sample/effect size. This setting is disabled by default.
Effect size on x-axis and sample size on y-axis

The optional setting controls the three-dimensional power by sample size (x-axis) and effect
size (y-axis) chart. By default, the chart is suppressed. When specified, the chart displays.

Chapter 1. Core features 3

Effect size on y-axis and sample size on x-axis
The optional setting controls the three-dimensional power by sample size (y-axis) and effect
size (x-axis) chart. By default, the chart is suppressed. When specified, the chart displays.

Vertical rotation
The optional setting sets the vertical rotation degrees (clockwise from the left) for the three-
dimensional chart. You can use the mouse to rotate the chart vertically. The setting takes
effect when the three-dimensional plot is requested. The value must be a single integer value
less than or equal to 359. The default value is 10.

Horizontal rotation
The optional setting sets the horizontal rotation degrees (clockwise from the front) for the
three-dimensional chart. You can use the mouse to rotate the chart horizontally. The setting
takes effect when the three-dimensional plot is requested. The value must be a single integer
value less than or equal to 359. The default value is 325.

Range of sample size
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the sample size
is used.
Lower bound

Controls the lower bound for the three-dimensional power by sample size chart. The value
must be greater than 1, and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the three-dimensional power by sample size chart. The value
must be greater than the Lower bound value and cannot be greater than 5000.

Range of effect size
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the effect size
is used.
Lower bound

Controls the lower bound for the three-dimensional power by effect size chart. The value
must be greater than, or equal to, -5.0 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the three-dimensional power by effect size chart. The value
must be greater than the Lower bound value and cannot be greater than 5.0.

Power Analysis of Paired-Samples T Test

This feature requires IBM SPSS Statistics Base Edition.

In paired-samples analysis, the observed data contain two paired and correlated samples, and each case
has two measurements. It is assumed that the data in each sample independently and identically follow a
normal distribution with a fixed mean and variance, and draws statistical inference about the difference of
the two means.

1. From the menus choose:

Analyze > Power Analysis > Means > Paired-Samples T Test
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When selecting Estimate power, enter an appropriate Sample size for power estimation value. The

value must be an integer greater than 1. When selecting Estimate sample size, enter an appropriate
Power for sample size estimation value. The value must be a single value between 0 and 1.

4 IBM SPSS Statistics Base V27

4. When a single population mean is required, enter a Population mean difference value. When single
value is specified, it denotes the population mean difference μd.

Note: The value cannot be 0 when Estimate sample size is selected.
5. When multiple population means are required for the specified group pairs, enter values for

Population mean for group 1 and Population mean for group 2. When multiple values are specified,
they denote the population mean difference μ1 and μ2.

Note: The two values cannot be the same when Estimate sample size is selected.
6. When a single population mean is specified, enter the Population standard deviation for mean

difference value. When a single value is specified, it denotes the population standard deviation of the
group difference σd. The value must be a single numeric greater than 0.

7. When multiple population means are specified, enter the Population standard deviation for group 1
and Population standard deviation for group 2 values. When multiple values are specified, they
denote the population standard deviation of the group difference σ1 and σ2. The values must be a
single numerics greater than 0.

8. Optionally, enter a value that specifies the Pearson product-moment correlation coefficient ρ. The
value must be a single numeric value between -1 and 1. The value cannot be 0.

Note: When a single Population standard deviation for mean difference value is specified, this
setting is ignored. Otherwise, the values for Population standard deviation for group 1 and
Population standard deviation for group 2 are used to compute σd.

9. Select whether the test is one or two-sided.
Nondirectional (two-sided) analysis

When selected, a two-sided test is used. This is the default setting.
Directional (one-sided) analysis

When selected, power is computed for a one-sided test.
10. Optionally, specify the significance level of the Type I error rate for the test in the Significance level

field. The value must be a single double value between 0 and 1. The default value is 0.05.
11. You can optionally click Plot to specify “Power Analysis of Paired-Samples T Test: Plot” on page 5

settings (chart output, two-dimensional plot settings, three-dimensional plot settings, and tooltips).

Note: Plot is available only when Estimate power is selected as the test assumption.

Power Analysis of Paired-Samples T Test: Plot

Two-Dimensional Plot
Power estimation versus sample size

When enabled, this optional setting provides options for controlling the two-dimensional power by
sample size chart. The setting is disabled by default.
Range of sample size

Controls the lower bound for the two-dimensional power by sample size chart. The value
must be greater than 1, and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power by sample size chart. The value
must be greater than the Lower bound value and cannot be greater than 5000.

Chapter 1. Core features 5

When selected, the lower and upper bound options are available.
Lower bound

Controls the lower bound for the two-dimensional power by effect size chart. The value
must be greater than, or equal to, -5.0 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power by effect size chart. The value
must be greater than the Lower bound value and cannot be greater than 5.0.

Three-Dimensional Plot
Power estimation versus

The optional setting controls the three-dimensional power by sample size (x-axis) and effect
size (y-axis) chart. By default, the chart is suppressed. When specified, the chart displays.

Controls the lower bound for the three-dimensional power by sample size chart. The value
must be greater than 1, and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the three-dimensional power by sample size chart. The value
must be greater than the Lower bound value and cannot be greater than 5000.

Controls the lower bound for the three-dimensional power by effect size chart. The value
must be greater than, or equal to, -5.0 and cannot be greater than the Upper bound value.

6 IBM SPSS Statistics Base V27

Upper bound
Controls the upper bound for the three-dimensional power by effect size chart. The value
must be greater than the Lower bound value and cannot be greater than 5.0.

Power Analysis of Independent-Samples T Test

This feature requires IBM SPSS Statistics Base Edition.

In independent-samples analysis, the observed data contain two independent samples. It is assumed
that the data in each sample independently and identically follow a normal distribution with a fixed mean
and variance, and draws statistical inference about the difference of the two means.

1. From the menus choose:

Analyze > Power Analysis > Means > Independent-Samples T Test
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When Estimate sample size is selected, enter an appropriate Power for sample size estimation

value (the value must be a single value between 0 and 1) and a Group size ratio value for specifying
the ratio of the sample sizes (the value must be a single value between 0.01 and 100.

4. When Estimate power is selected, enter values to specify the sample size for the two groups for
comparison Sample size for group 1 and Sample size for group 2. The values must be an integers
greater than 1.

5. When a single population mean is required, enter a Population mean difference value. When single
value is specified, it denotes the population mean difference μd.

Note: The value cannot be 0 when Estimate sample size is selected.
6. When multiple population means are required for the specified group pairs, enter values for

Population mean for group 1 and Population mean for group 2. When multiple values are specified,
they denote the population mean difference μ1 and μ2.

Note: The two values cannot be the same when Estimate sample size is selected.
7. Specify whether the population standard deviations are Equal for two groups or Not equal for two

groups.

• When the population standard deviations are equal for two groups, enter a value for Pooled
standard deviation that denotes σ, and assumes that the two group variances are equal, or σ1 = σ2
= σ.

• When the population standard deviations are not equal for two groups, enter values for Standard
deviation for group 1 and Standard deviation for group 2 that denote σ1 and σ2.

Note: When the values for Standard deviation for group 1 and Standard deviation for group 2 are
identical, they are treated as a single value.

8. Select whether the test is one or two-sided.
Nondirectional (two-sided) analysis

When selected, a two-sided test is used. This is the default setting.
Directional (one-sided) analysis

When selected, power is computed for a one-sided test.
9. Optionally, specify the significance level of the Type I error rate for the test in the Significance level

field. The value must be a single double value between 0 and 1. The default value is 0.05.

Chapter 1. Core features 7

10. You can optionally click Plot to specify “Power Analysis of Independent-Samples T Test: Plot” on
page 8 settings (chart output, two-dimensional plot settings, three-dimensional plot settings, and
tooltips).

Note: Plot is available only when Estimate power is selected as the test assumption.

Power Analysis of Independent-Samples T Test: Plot

You can control the plots that are output to illustrate the two and three-dimensional power by sample
ratio, effect size, or mean difference charts. You can also control the display of tool tips and the vertical/
horizontal rotation degrees for three-dimensional charts.

Two-Dimensional Plot
Power estimation versus sample size ratio

When enabled, this optional setting provides options for controlling the two-dimensional power by
sample size ratio chart. The setting is disabled by default.
Range of sample size ratio

Controls the lower bound for the two-dimensional power by sample size ratio chart. The
value must be between 0.01 and 100 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power by sample size ratio chart. The
value must be between 0.01 and 100 and must be greater than the Lower bound value.

Power estimation versus effect size (or mean difference)
By default, this optional setting is disabled. When enabled, the chart displays in the output. When
no integer values are specified for the Lower bound or Upper bound fields, the default plot range
of the effect size (or mean difference) is used.
Range of effect size (or mean difference)

When selected, the lower and upper bound options are available.
Lower bound

Controls the lower bound for the two-dimensional power by effect size chart. The value
must be greater than, or equal to, -5.0 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power by effect size chart. The value
must be greater than the Lower bound value and cannot be greater than 5.0.

Three-Dimensional Plot
Power estimation versus

Provides options for controlling the three-dimensional power by sample size ratio (x-axis) and
effect size (y-axis) chart, the vertical/horizontal rotation settings, and the user specified plot range
of sample/effect size. This setting is disabled by default.
Effect size (or mean difference) on x-axis and sample size on y-axis

The optional setting controls the three-dimensional power by sample size ratio (x-axis) and
effect size (y-axis) chart. By default, the chart is suppressed. When specified, the chart
displays.

Effect size (or mean difference) on y-axis and sample size on x-axis
The optional setting controls the three-dimensional power by sample size (y-axis) and effect
size (x-axis) chart. By default, the chart is suppressed. When specified, the chart displays.

8 IBM SPSS Statistics Base V27

effect when the three-dimensional plot is requested. The value must be a single integer value
less than or equal to 359. The default value is 10.

Range of sample size ratio
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the sample size
is used.
Lower bound

Controls the lower bound for the three-dimensional power by sample size ratio chart. The
value must be between 0.01 and 100 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the three-dimensional power by sample size ratio chart. The
value must be between 0.01 and 100 and must be greater than the Lower bound value.

Range of effect size (or mean difference)
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the effect size
is used.
Lower bound

Controls the lower bound for the three-dimensional power by effect size chart. The value
must be greater than, or equal to, -5.0 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the three-dimensional power by effect size chart. The value
must be greater than the Lower bound value and cannot be greater than 5000.

Power Analysis of One-Way ANOVA

This feature requires IBM SPSS Statistics Base Edition.

Analysis of variance (ANOVA) is a statistical method of estimating the means of several populations which
are often assumed to be normally distributed. The One-way ANOVA, a common type of ANOVA, is an
extension of the two-sample t-test. The procedure provides approaches for estimating the power for two
types of hypothesis to compare the multiple group means, the overall test, and the test with specified
contrasts. The over test focuses on the null hypothesis that all group means are equal. The test with
specified contrasts breaks down the overall ANOVA hypotheses into smaller but more describable and
useful pieces of the means.

1. From the menus choose:

Analyze > Power Analysis > Means > One-way ANOVA
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When Estimate sample size is selected, enter an appropriate Power for sample size estimation value

(the value must be a single value between 0 and 1).
4. Enter a Pooled population standard deviation value. The value must be a single numeric greater than

Chapter 1. Core features 9

5. Specify the Group sizes and Group means values. At least two group size values must be specified
(each value must be less than, or equal to, 2). At least two group mean values must also be specified
(the number of specified values must equal the group size values).

6. Optionally, specify Group weights values. Group weights assign the group size weights when
Estimate power is selected.

Note: The Group weights settings are ignored when Group sizes values are specified.
7. Optionally, specify the significance level of the Type I error rate for the test in the Significance level

field. The value must be a single double value between 0 and 1. The default value is 0.05.
8. You can optionally click Contrast to specify “Power Analysis of One-way ANOVA: Contrast” on page

10 settings (contrast test and pairwise differences), or click Plot to specify “Power Analysis of One-
way ANOVA: Plot” on page 10 settings (chart output, two-dimensional plot settings, three-
dimensional plot settings, and tooltips).

Note: Plot is available only when Group sizes values are specified and Estimate power is selected.

Power Analysis of One-way ANOVA: Contrast

You can specify the following contrast, coefficient, and pairwise differences settings for your Power
Analysis of One-way ANOVA:

Contrast Test
Test with linear contrasts

When enabled, the contrast and coefficient settings are available.
Test direction

Nondirectional (two-sided) analysis
When selected, a two-sided test is used. This is the default setting.

Directional (one-sided) analysis
When selected, power is computed for a one-sided test.

Coefficients
Use the table to specify the contrast coefficients and request the contrast test. The table
values are optional. The number of specified values must be equal to the values specified for
Group sizes and Group means. The sum for all specified values must equal 0, otherwise the
last value will automatically be adjusted.

Pairwise Differences
Estimate the power of testing for pairwise differences

Controls whether or not to estimate the power of testing for the pairwise differences. Be default,
the optional setting is disabled, which suppresses output for the pairwise differences.
Adjust the significance level by

Determines the adjustment of multiple comparisons.
Bonferroni correction

Uses the Bonferroni correction in estimating the power of pairwise differences. This is the
default setting.

Sidak correction
Uses the Sidak correction in estimating the power of pairwise differences.

Least significant difference (LSD)
Uses the LSD correction in estimating the power of pairwise differences.

Power Analysis of One-way ANOVA: Plot

You can control the plots that are output to illustrate the two and three-dimensional power by sample and
effect size charts. You can also control the display of tool tips and the vertical/horizontal rotation degrees
for three-dimensional charts.

10 IBM SPSS Statistics Base V27

Two-Dimensional Plot
Power estimation versus total sample size

When enabled, this optional setting provides options for controlling the two-dimensional power by
total sample size chart. The setting is disabled by default.
Range of total sample size

When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the total
sample size is used.
Lower bound

Controls the lower bound for the two-dimensional power by total sample size chart. The
value must be greater than, or equal to:

• 2 x the number of integers specified for Group sizes
• 2 x the sum of the integers specified for Group sizes / by the smallest integer value for

Group sizes

The value cannot be greater than the Upper bound value.
Upper bound

Controls the upper bound for the two-dimensional power by total sample size chart. The
value must be less than, or equal to:

• 5000 / by the largest integer value specified for Group sizes x the sum of the integers
specified for Group sizes

The value must be greater than the Lower bound value and cannot be greater than
2147483647.

Power estimation versus pooled standard deviation
By default, this optional setting is disabled. The setting controls the two-dimensional power by
pooled standard deviation chart. When enabled, the chart displays in the output. When no integer
values are specified for the Lower bound or Upper bound fields, the default plot range of the
pooled standard deviation is used.

Note:

The plot is disabled when the specified Group means values are all the same.

Range of pooled standard deviation
When selected, the lower and upper bound options are available.
Lower bound

Controls the lower bound for the two-dimensional power by pooled standard deviation
chart. The value must be greater than 0 and cannot be greater than the Upper bound
value.

Upper bound
Controls the upper bound for the two-dimensional power by pooled standard deviation
chart. The value must be greater than the Lower bound value.

Three-Dimensional Plot
Power estimation versus

Provides options for controlling the three-dimensional power by total sample size (x-axis) and
effect size (y-axis) chart, the vertical/horizontal rotation settings, and the user specified plot range
of sample/effect size. This setting is disabled by default.

Note:

The plot is disabled when the specified Group means values are all the same.

Pooled standard deviation on x-axis and total sample size on y-axis
The optional setting controls the three-dimensional power by total sample size (x-axis) and
pooled standard deviation (y-axis) chart. By default, the chart is suppressed. When specified,
the chart displays.

Chapter 1. Core features 11

Pooled standard deviation on y-axis and total sample size on x-axis
The optional setting controls the three-dimensional power by total sample size (y-axis) and
pooled standard deviation (x-axis) chart. By default, the chart is suppressed. When specified,
the chart displays.

Range of total sample size
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the sample size
is used.
Lower bound

Controls the lower bound for the three-dimensional power by total sample size chart. The
value must be greater than 0 and cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the three-dimensional power by total sample size chart. The
value must be greater than the Lower bound value.

Range of pooled standard deviation
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range of the effect size
is used.
Lower bound

Controls the lower bound for the three-dimensional power by pooled standard deviation
chart. The value must be greater than 0 and cannot be greater than the Upper bound
value.

Upper bound
Controls the upper bound for the three-dimensional power by pooled standard deviation
chart. The value must be greater than the Lower bound value.

Proportions
The following statistics features are included in IBM SPSS Statistics Base Edition.

Power Analysis of Related-Sample Binomial Test

This feature requires IBM SPSS Statistics Base Edition.

Binomial distribution is based on a sequence of Bernoulli trials. It can be used to model experiments,
including a fixed number of total trials, which are assumed to be independent of each other. Each trial
leads to a dichotomous result, with the same probability for a successful outcome.

The related-sample binomial estimates the power of McNemar’s test to compare two proportion
parameters based on the matched pair subjects sampled from two related binomial populations.

12 IBM SPSS Statistics Base V27

1. From the menus choose:

Analyze > Power Analysis > Proportions > Related-Sample Binomial Test
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When selecting Estimate power, enter the appropriate Total number of pairs value. The value must

be a positive integer greater than, or equal to, 2. When selecting Estimate sample size, enter an
appropriate Power for sample size estimation value. The value must be a single value between 0 and
1.

4. Select to specify testing values for either Proportions or Counts.

• When Proportions is selected, enter values in the Proportion 1 and Proportion 2 fields. The values
must be between 0 and 1.

• When Counts is selected, enter values in the Count 1 and Count 2 fields. The values must be
between 0 and value specified for Total number of pairs.

Proportions Notes:

• Proportions is the only available option when a Power value is specified.
• When Test values are marginal is not selected: 0 < Proportion 1 + Proportion 2 ≤ 1 • When Test values are marginal is selected: – Proportion 1 * Proportion 2 > 0
– Proportion 1 < 1 – Proportion 2 < 1 – The values for Proportion 1 and Proportion 2 cannot be the same. Counts Notes: • When Test values are marginal is not selected: 0 < Count 1 + Count 2 ≤ Total number of pairs • When Test values are marginal is selected: – Count 1 * Count 2 > 0
– Count 1 < Total number of pairs – Count 2 < Total number of pairs 5. You can optionally select Test values are marginal to control whether or not the specified proportions or counts values are marginal. When Test values are marginal is enabled, you must specify a Correlation between matched pairs value. The value must be a single value between -1 and 1. 6. Select a method for estimating the power. Normal approximation Enables normal approximation. This is the default setting. Binomial enumeration Enables the binomial enumeration method. Optionally, use the Time limit field to specify the maximum number of minutes allowed to estimate the sample size. When the time limit is reached, the analysis is terminated and a warning message is displayed. When specified, the value must be a single positive integer to denote the number of minutes. The default setting is 5 minutes. 7. Select whether the test is one or two-sided. Nondirectional (two-sided) analysis When selected, a two-sided test is used. This is the default setting. Directional (one-sided) analysis When selected, power is computed for a one-sided test. 8. Optionally, specify the significance level of the Type I error rate for the test in the Significance level field. The value must be a single double value between 0 and 1. The default value is 0.05. 9. You can optionally click Plot to specify “Power Analysis of Related-Sample Binomial Test” on page 12 settings (chart output, two-dimensional plot settings, three-dimensional plot settings, and tooltips). Chapter 1. Core features 13 Note: Plot is available only when Estimate power is selected as the test assumption and Binomial enumeration is not selected. Power Analysis of Related-Sample Binomial: Plot You can control the plots that are output to illustrate the two and three-dimensional power estimation charts. You can also control the display of tool tips and the vertical/horizontal rotation degrees for three- dimensional charts. Two-Dimensional Plot Provides options for controlling the two-dimensional power estimation versus charts. This setting is disabled by default. Power estimation versus total number of pairs When enabled, this optional setting controls the two-dimensional power by total number of pairs chart. The setting is disabled by default. When selected, this setting displays the chart. Plot range of total number of pairs When selected, the lower and upper bound options are available. When no integer values are specified for the Lower bound or Upper bound fields, the default plot range is used. Lower bound Controls the lower bound for the two-dimensional power estimation versus total number of pairs chart. The value must be greater than 1, and cannot be greater than the Upper bound value. Upper bound Controls the upper bound for the two-dimensional power estimation versus total number of pairs chart. The value must be greater than the Lower bound value and cannot be greater than 2500. Power estimation versus risk difference When enabled, this optional setting controls the two-dimensional power by risk difference chart. The setting is disabled by default. When selected, this setting displays the chart. Power estimation versus risk ratio When enabled, this optional setting control the two-dimensional power by risk ratio chart. The setting is disabled by default. Plot range of risk ratio When selected, the lower and upper bound options are available. When no integer values are specified for the Lower bound or Upper bound fields, the default plot range is used. Lower bound Controls the lower bound for the two-dimensional power estimation versus risk ratio chart. The value cannot be greater than the Upper bound value. Upper bound Controls the upper bound for the two-dimensional power estimation versus risk ratio chart. The value must be greater than the Lower bound value and cannot be greater than 10. Power estimation versus odds ratio When enabled, this optional setting controls the two-dimensional power by odds ratio chart. The setting is disabled by default. When selected, this setting displays the chart. Plot range of odds ratio When selected, the lower and upper bound options are available. When no integer values are specified for the Lower bound or Upper bound fields, the default plot range is used. Lower bound Controls the lower bound for the two-dimensional power estimation versus odds ratio chart. The value cannot be greater than the Upper bound value. 14 IBM SPSS Statistics Base V27 Upper bound Controls the upper bound for the two-dimensional power estimation versus odds ratio chart. The value must be greater than the Lower bound value and cannot be greater than 10. Three-Dimensional Plot Provides options for controlling the three-dimensional power estimation versus charts. This setting is disabled by default. Power estimation versus discordant proportions When enabled, this optional setting controls the three-dimensional power by discordant proportions chart. The setting is disabled by default. When selected, this setting displays the chart. Power estimation versus marginal proportions When enabled, this optional setting controls the three-dimensional power by marginal proportions chart. The setting is disabled by default. When selected, this setting displays the chart. Note: This setting is available only when Test values are marginal is selected. Vertical rotation The optional setting sets the vertical rotation degrees (clockwise from the left) for the three- dimensional chart. You can use the mouse to rotate the chart vertically. The setting takes effect when the three-dimensional plot is requested. The value must be a single integer value less than or equal to 359. The default value is 10. Horizontal rotation The optional setting sets the horizontal rotation degrees (clockwise from the front) for the three- dimensional chart. You can use the mouse to rotate the chart horizontally. The setting takes effect when the three-dimensional plot is requested. The value must be a single integer value less than or equal to 359. The default value is 325. Power Analysis of Independent-Sample Binomial Test This feature requires IBM SPSS Statistics Base Edition. Power analysis plays a pivotal role in a study plan, design, and conduction. The calculation of power is usually before any sample data have been collected, except possibly from a small pilot study. The precise estimation of the power may tell investigators how likely it is that a statistically significant difference will be detected based on a finite sample size under a true alternative hypothesis. If the power is too low, there is little chance of detecting a significant difference, and non-significant results are likely even if real differences truly exist. The binomial distribution is based on a sequence of Bernoulli trials. It can be used to model those experiments including a fixed number of total trials that are assumed to be independent of each other. Each trial leads to a dichotomous result, with the same probability for a "successful" outcome. The independent-sample binomial test compares two independent proportion parameters. 1. From the menus choose: Analyze > Power Analysis > Proportions > Independent-Samples Binomial Test
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When Estimate sample size is selected, enter an appropriate Power for sample size estimation

value (the value must be a single value between 0 and 1) and a Group size ratio value for specifying
the ratio of the sample sizes (the value must be a single value between 0.01 and 100.

4. When Estimate power is selected, enter values to specify the total number of trials for both group 1
and group 2. The values must be an integers greater than 1.

5. Specify the proportion parameters for the two groups. Both values must be between 0 and 1.

Note: The two values cannot be the same when a Power value is specified.
6. Optionally, specify the significance level of the Type I error rate for the test in the Significance level

field. The value must be a single double value between 0 and 1. The default value is 0.05.

Chapter 1. Core features 15

7. Select the desired test assumptions:
Chi-squared test

Estimates the power based on Pearson’s chi-squared test. This is the default setting.
Standard deviation is pooled

This optional setting controls whether the estimation of the standard deviation is pooled or
unpooled. The setting is enabled by default.

Apply continuity correction
This optional setting controls whether or not the continuity correction is used. The setting is
disabled by default.

T-test
Estimates the power based on Student’s t-test.
Standard deviation is pooled

This optional setting controls whether the estimation of the standard deviation is pooled or
unpooled. The setting is enabled by default.

Likelihood ratio test
Estimates the power based on the likelihood ratio test.

Fisher’s exact test
Estimates the power based on Fisher’s exact test.

Notes:

• In some cases, Fisher’s exact test may take an extended amount of time to complete.
• All plots are blocked when Fisher’s exact test is selected.

8. Select a method for estimating the power.
Normal approximation

Enables normal approximation. This is the default setting.
Binomial enumeration

Enables the binomial enumeration method. Optionally, use the Time limit field to specify the
maximum number of minutes allowed to estimate the sample size. When the time limit is
reached, the analysis is terminated and a warning message is displayed. When specified, the
value must be a single positive integer to denote the number of minutes. The default setting is 5
minutes.

9. Select whether the test is one or two-sided.
Nondirectional (two-sided) analysis

When selected, a two-sided test is used. This is the default setting.
Directional (one-sided) analysis

When selected, power is computed for a one-sided test.
10. You can optionally click Plot to specify “Power Analysis of Independent-Samples Binomial Test:

Plot” on page 16 settings (chart output, two-dimensional plot settings, three-dimensional plot
settings, and tooltips).

Note: Plot is available only when Estimate power is selected as the test assumption and Binomial
enumeration is not selected.

Power Analysis of Independent-Samples Binomial Test: Plot

You can control the plots that are output to illustrate the two and three-dimensional power estimation
charts. You can also control the display of tool tips and the vertical/horizontal rotation degrees for three-
dimensional charts.

Two-Dimensional Plot
Provides options for controlling the two-dimensional power estimation versus charts. This setting is
disabled by default.

16 IBM SPSS Statistics Base V27

Power estimation versus group size ratio
When enabled, this optional setting controls the two-dimensional power by group size ratio chart.
The setting is disabled by default. When selected, this setting displays the chart.
Plot range of group size ratio

When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range is used.
Lower bound

Controls the lower bound for the two-dimensional power estimation versus total number
of pairs chart. The value must be greater than .01, and cannot be greater than the Upper
bound value.

Upper bound
Controls the upper bound for the two-dimensional power estimation versus total number
of pairs chart. The value must be greater than the Lower bound value and cannot be
greater than 100.

Power estimation versus risk difference
When enabled, this optional setting controls the two-dimensional power by risk difference chart.
The setting is disabled by default. When selected, this setting displays the chart.

Power estimation versus risk ratio
When enabled, this optional setting control the two-dimensional power by risk ratio chart. The
setting is disabled by default.
Plot range of risk ratio

When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range is used.
Lower bound

Controls the lower bound for the two-dimensional power estimation versus risk ratio chart.
The value cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power estimation versus risk ratio
chart. The value must be greater than the Lower bound value and cannot be greater than
10.

Power estimation versus odds ratio
When enabled, this optional setting controls the two-dimensional power by odds ratio chart. The
setting is disabled by default. When selected, this setting displays the chart.
Plot range of odds ratio

When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range is used.
Lower bound

Controls the lower bound for the two-dimensional power estimation versus odds ratio
chart. The value cannot be greater than the Upper bound value.

Upper bound
Controls the upper bound for the two-dimensional power estimation versus odds ratio
chart. The value must be greater than the Lower bound value and cannot be greater than
10.

Three-Dimensional Plot
Provides options for controlling the three-dimensional power estimation versus charts. This setting is
disabled by default.
Power estimation versus proportions

When selected, this optional setting provides the following power by proportion options:
proportion of group 1 on x-axis and proportion of group 2 on y-axis

Controls the three-dimensional power by proportion of Group 1 (x -axis) and proportion of
Group 2 (y-axis) chart. The setting is disabled by default. When selected, this setting displays
the chart.

Chapter 1. Core features 17

proportion of group 1 on y-axis and proportion of group 2 on x-axis
Controls the three-dimensional power by proportion of Group 2 (x -axis) and proportion of
Group 1 (y-axis) chart. The setting is disabled by default. When selected, this setting displays
the chart.

Power estimation versus group sizes
When selected, this optional setting provides the following power by group sizes options:
size of group 1 on x-axis and size of group 2 on y-axis

Controls the three-dimensional power by number of trials in Group 1 (x -axis) and number of
trials in Group 2 (y-axis) chart. The setting is disabled by default. When selected, this setting
displays the chart.

size of group 1 on y-axis and size of group 2 on x-axis
Controls the three-dimensional power by number of trials in Group 2 (x -axis) and number of
trials in Group 1 (y-axis) chart. The setting is disabled by default. When selected, this setting
displays the chart.

User specified plot range of size of group 1
When selected, the lower and upper bound options for the group 1 plot range are available.
When no integer values are specified for the Lower bound or Upper bound fields, the default
plot range is used.
Lower bound

Controls the lower bound for the two-dimensional power estimation versus odds ratio
chart. The value must be greater than or equal to 2 and cannot be greater than the Upper
bound value.

Upper bound
Controls the upper bound for the two-dimensional power estimation versus odds ratio
chart. The value must be greater than the Lower bound value and cannot be greater than
2500.

User specified plot range of size of group 2
When selected, the lower and upper bound options for the group 2 plot range are available.
When no integer values are specified for the Lower bound or Upper bound fields, the default
plot range is used.
Lower bound

Controls the lower bound for the two-dimensional power estimation versus odds ratio
chart. The value must be greater than or equal to 2 and cannot be greater than the Upper
bound value.

Upper bound
Controls the upper bound for the two-dimensional power estimation versus odds ratio
chart. The value must be greater than the Lower bound value and cannot be greater than
2500.

Vertical rotation
The optional setting sets the vertical rotation degrees (clockwise from the left) for the three-
dimensional chart. You can use the mouse to rotate the chart vertically. The setting takes effect
when the three-dimensional plot is requested. The value must be a single integer value less than
or equal to 359. The default value is 10.

Horizontal rotation
The optional setting sets the horizontal rotation degrees (clockwise from the front) for the three-
dimensional chart. You can use the mouse to rotate the chart horizontally. The setting takes effect
when the three-dimensional plot is requested. The value must be a single integer value less than
or equal to 359. The default value is 325.

Power Analysis of One-Sample Binomial Test

This feature requires IBM SPSS Statistics Base Edition.

18 IBM SPSS Statistics Base V27

estimation of the power may tell investigators how likely it is that a statistically significant difference will
be detected based on a finite sample size under a true alternative hypothesis. If the power is too low,
there is little chance of detecting a significant difference, and non-significant results are likely even if real
differences truly exist.

The one-sample binomial test makes statistical inference about the proportion parameter by comparing it
with a hypothesized value. The methods for estimating the power for such a test are either the normal
approximation or the binomial enumeration.

1. From the menus choose:

Analyze > Power Analysis > Proportions > One-Sample Binomial Test
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When selecting Estimate power, enter the appropriate Total number of trials value. The value must

be an integer greater than, or equal to, 1. When selecting Estimate sample size, enter an appropriate
Power for sample size estimation value. The value must be a single value between 0 and 1.

4. Enter a value that specifies the alternative hypothesis value of the proportion parameter in the
Population proportion field. The value must be a single numeric.

Note: When a Power value is specified, the Population proportionvalue cannot be equal to the Null
value.

5. Optionally, enter a value that specifies the null hypothesis value of the proportion parameter to be
tested in the Null value field. The value must be a single numeric between 0 and 1. The default value
is 0.50.

6. Select a method for estimating the power.
Normal approximation

Enables normal approximation. This is the default setting.
Apply continuity correction

Control whether or not the continuity correction is used for the normal approximation method.
Binomial enumeration

Enables the binomial enumeration method. Optionally, use the Time limit field to specify the
maximum number of minutes allowed to estimate the sample size. When the time limit is reached,
the analysis is terminated and a warning message is displayed. When specified, the value must be
a single positive integer to denote the number of minutes. The default setting is 5 minutes.

Note: The selected power estimation method has no effect when the Total number of trials value
exceeds 500.

7. Select whether the test is one or two-sided.
Nondirectional (two-sided) analysis

When selected, a two-sided test is used. This is the default setting.
Directional (one-sided) analysis

When selected, power is computed for a one-sided test.
8. Optionally, specify the significance level of the Type I error rate for the test in the Significance level

field. The value must be a single double value between 0 and 1. The default value is 0.05.
9. You can optionally click Plot to specify “Power Analysis of One-Sample Binomial: Plot” on page 20

settings (chart output, two-dimensional plot settings, three-dimensional plot settings, and tooltips).

Note: Plot is available only when Estimate power is selected as the test assumption and Binomial
enumeration is not selected.

Chapter 1. Core features 19

Power Analysis of One-Sample Binomial: Plot

You can control the plots that are output to illustrate the two and three-dimensional power by charts. You
can also control the display of tool tips and the vertical/horizontal rotation degrees for three-dimensional
charts.

Two-Dimensional Plot
Provides options for controlling the two-dimensional power estimation versus charts. This setting is
disabled by default.
Power estimation versus null hypothesis value

When enabled, this optional setting controls the two-dimensional power by null value chart. The
setting is disabled by default. When selected, this setting displays the chart.

Power estimation versus alternative hypothesis value
When enabled, this optional setting controls the two-dimensional power by alternative value
chart. The setting is disabled by default. When selected, this setting displays the chart.

Power estimation versus the difference between hypothesized values
When enabled, this optional setting controls the two-dimensional power by difference between
hypothesized values chart. The setting is disabled by default.

Power estimation versus total number of trials
When enabled, this optional setting controls the two-dimensional power by total number of trials
chart. The setting is disabled by default. When selected, this setting displays the chart.
Plot range of total number of trials

When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range is used.
Lower bound

Controls the lower bound for the two-dimensional power estimation versus total number
of trials chart. The value must be greater than 0, and cannot be greater than the Upper
bound value.

Upper bound
Controls the upper bound for the two-dimensional power estimation versus total number
of trials chart. The value must be greater than the Lower bound value and cannot be
greater than 5000.

Three-Dimensional Plot
Provides options for controlling the three-dimensional power estimation versus charts. This setting is
disabled by default.
Power estimation versus total number of trials

When selected, this setting enables the following options.
on x-axis and the difference between hypothesized values on y-axis

The optional setting controls the three-dimensional power by total number of trials (x -axis)
and difference between hypothesized values (y-axis) chart. By default, the chart is
suppressed. When specified, the chart displays.

on y-axis and the difference between hypothesized values on x-axis
The optional setting controls the three-dimensional power by total number of trials (y-axis)
and difference between hypothesized values (x -axis) chart. By default, the chart is
suppressed. When specified, the chart displays.

Plot range of total number of trials
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range is used.
Lower bound

Controls the lower bound for the three-dimensional power estimation versus total number
of trials chart. The value must be greater than 0, and cannot be greater than the Upper
bound value.

20 IBM SPSS Statistics Base V27

Upper bound
Controls the upper bound for the three-dimensional power estimation versus total number
of trials chart. The value must be greater than the Lower bound value and cannot be
greater than 5000.

Power estimation versus null hypothesis value
When selected, this setting enables the following options.
on x-axis and alternative hypothesis value on y-axis

The optional setting controls the three-dimensional power by null (x -axis) and alternative
value (y-axis) chart. By default, the chart is suppressed. When specified, the chart displays.

on y-axis and alternative hypothesis value on x-axis
The optional setting controls the three-dimensional power by null (y-axis) and alternative
value (x -axis) chart.. By default, the chart is suppressed. When specified, the chart displays.

Correlations
The following statistics features are included in IBM SPSS Statistics Base Edition.

Power Analysis of One-Sample Pearson Correlation Test

This feature requires IBM SPSS Statistics Base Edition.

Pearson’s product-moment correlation coefficient measures the strength of linear association between
two scale random variables that are assumed to follow a bivariate normal distribution. By convention, it is
a dimensionless quantity and obtained by standardizing the covariance between two continuous
variables, thereby ranging between -1 and 1.

The test uses Fisher’s asymptotic method to estimate the power for the one-sample Pearson correlation.

1. From the menus choose:

Analyze > Power Analysis > Correlations > Pearson Product-Moment
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When selecting Estimate power, enter the appropriate Sample size in pairs value. The value must be

a single integer greater than 3. When selecting Estimate sample size, enter a Power for sample size
estimation value. The value must be a single value between 0 and 1.

4. Enter a value that specifies the alternative hypothesis value of the correlation parameter in the
Pearson correlation parameter field. The value must be a single numeric between -1 and 1.

Note: When a Power value is specified, the Pearson correlation parameter value cannot be -1 or 1
and cannot be equal Null value.

Chapter 1. Core features 21

5. Optionally, enter a value that specifies the null hypothesis value of the correlation parameter to be
tested in the Null value field. The value must be a single numeric between -1 and 1. The default value
is 0.

Note: When a Power value is specified, Null value cannot be -1 or 1.
6. Optionally, select Use bias-correction formiula in the power estimation to specify whether the bias

adjustment is involved or ignored. The setting is enabled by default, which includes the bias
adjustment term in the power estimation. When the setting is not selected, the bias adjustment term is
ignored.

7. Select whether the test is one or two-sided.
Nondirectional (two-sided) analysis

When selected, a two-sided test is used. This is the default setting.
Directional (one-sided) analysis

When selected, power is computed for a one-sided test.
8. Optionally, specify the significance level of the Type I error rate for the test in the Significance level

field. The value must be a single double value between 0 and 1. The default value is 0.05.
9. You can optionally click Plot to specify “Power Analysis of One-Sample Pearson Correlation: Plot” on

page 22 settings (chart output, two-dimensional plot settings, three-dimensional plot settings, and
tooltips).

Note: Plot is available only when Estimate power is selected as the test assumption.

Power Analysis of One-Sample Pearson Correlation: Plot

Two-Dimensional Plot
Provides options for controlling the two-dimensional power estimation versus charts. The settings are
disabled by default.
Power estimation versus null hypothesis value

When enabled, this optional setting controls the two-dimensional power by null value chart. The
setting is disabled by default. When selected, this setting displays the chart.

Power estimation versus sample size (in pairs)
When enabled, this optional setting controls the two-dimensional power by sample size chart. The
setting is disabled by default. When selected, this setting displays the chart.
Plot range of sample size

When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range is used.
Lower bound

Controls the lower bound for the two-dimensional power estimation by sample size chart.
The value must be greater than or equal to 4, and cannot be greater than the Upper bound
value.

Upper bound
Controls the upper bound for the two-dimensional power estimation by sample size chart.
The value must be greater than the Lower bound value and cannot be greater than 5000.

22 IBM SPSS Statistics Base V27

Three-Dimensional Plot
Provides options for controlling the three-dimensional power estimation versus charts. This setting is
disabled by default.
Power estimation versus sample size

When selected, this setting enables the following options.
on x-axis and the difference between hypothesized values on y-axis

The optional setting controls the three-dimensional power by sample size (x -axis) and
difference between hypothesized values (y-axis) chart. By default, the chart is suppressed.
When specified, the chart displays.

on y-axis and the difference between hypothesized values on x-axis
The optional setting controls the three-dimensional power by sample size (y -axis) and
difference between hypothesized values (x-axis) chart. By default, the chart is suppressed.
When specified, the chart displays

Plot range of sample size (in pairs)
When selected, the lower and upper bound options are available. When no integer values are
specified for the Lower bound or Upper bound fields, the default plot range is used.
Lower bound

Controls the lower bound for the three-dimensional power estimation by sample size
chart. The value must be greater than or equal to 4, and cannot be greater than the Upper
bound value.

Upper bound
Controls the upper bound for the three-dimensional power estimation by sample size
chart. The value must be greater than the Lower bound value and cannot be greater than
5000.

Power estimation versus null hypothesis value
When selected, this setting enables the following options.
on x-axis and alternative hypothesis value on y-axis

The optional setting controls the three-dimensional power by null (x -axis) and alternative
value (y-axis) chart. By default, the chart is suppressed. When specified, the chart displays.

Power Analysis of One-Sample Spearman Correlation Test

This feature requires IBM SPSS Statistics Base Edition.

Chapter 1. Core features 23

Spearman rank-order correlation coefficient is a rank-based nonparametric statistic to measure the
monotonic relationship between two variables that are usually censored and not normally distributed.
The Spearman rank-order correlation is equal to the Pearson correlation between the rank values of the
two variables, thereby also ranging between -1 and 1. Detecting the power of the Spearman rank
correlation test is an important topic in the analysis of hydrological time series data.

The test uses Fisher’s asymptotic method to estimate the power for the one-sample Spearman rank-
order correlation.

1. From the menus choose:

Analyze > Power Analysis > Correlations > Spearman Rank-Order
2. Select a test assumption setting (Estimate sample size or Estimate power).
3. When selecting Estimate power, enter the appropriate Sample size in pairs value. The value must be

a single integer greater than 3. When selecting Estimate sample size, enter a Power for sample size
estimation value. The value must be a single value between 0 and 1.

4. Enter a value that specifies the alternative hypothesis value of the correlation parameter in the
Spearman correlation parameter field. The value must be a single numeric between -1 and 1.

Note: When a Power value is specified, the Spearman correlation parameter value cannot be -1 or 1
and cannot be equal Null value.

Note: When a Power value is specified, Null value cannot be -1 or 1.
6. Optionally, select an option that determines how the asymptotic variance is estimated for the power

analysis.
Bonett and Wright

Estimates the variance suggested by Bonett and Wright. This is the default setting.
Fieller, Hartley and Pearson

Estimates the variance suggested by Fieller, Hartley and Pearson.
Caruso and Cliff

Estimates the variance suggested by Caruso and Cliff.
7. Select whether the test is one or two-sided.

Nondirectional (two-sided) analysis
When selected, a two-sided test is used. This is the default setting.

Directional (one-sided) analysis
When selected, power is computed for a one-sided test.

9. You can optionally click Plot to specify “Power Analysis of One-Sample Spearman Correlation: Plot” on
page 24 settings (chart output, two-dimensional plot settings, three-dimensional plot settings, and
tooltips).

Note: Plot is available only when Estimate power is selected as the test assumption.

Power Analysis of One-Sample Spearman Correlation: Plot

Two-Dimensional Plot
Provides options for controlling the two-dimensional power estimation versus charts. The settings are
disabled by default.

24 IBM SPSS Statistics Base V27

Power estimation versus null hypothesis value
When enabled, this optional setting controls the two-dimensional power by null value chart. The
setting is disabled by default. When selected, this setting displays the chart.