IB Biology Internal Assessment (23/24)

Hi all!

Below I will attach a PDF of my Biology IA (submitted for Biology HL). It scored 23/24 (which, according to the boundaries of the M20 session, was a 7). Unfortunately, I don’t know where I lost the one mark.

Quick disclaimer; my Biology IA was a database IA, so the majority of the tips I’ll share in this post will specifically relate to writing a successful database IA. If you’re looking for tips for an experiment-based IA, I’d recommend you go check my post about my Chemistry IA, where I share some of my experiences with an experiment-based IA as well as a general structure I’d replicate when writing an experiment-based IA.

Across my three sciences (Biology HL, Chemistry HL, Physics SL), I wrote two experiment-based IAs and one database IA. As such, I feel as though I have a pretty solid understanding of the pros and cons of each IA “type” (excluding, of course, a simulation-based IA). In short, I can express the essence of these two IA types in quite a rudimentary table:

Experiment-based IADatabase IA
Time and EffortLaboriousMinimal
Analysis and EvaluationStraightforwardIntense
A table comparing experiment-based and database IAs

As per the above table, one of the downsides of an experiment-based IA is the amount of effort required to complete it. In experiment-based IAs, a lot of time and effort goes into planning your methodology, conducting preliminary trials, conducting the experiment itself etc. However, this hard work has a payoff, given that an advantage of an experiment-based IA is that the analysis and evaluation of your data is pretty straightforward, since there’s so much you could talk about when it comes to the accuracy and precision of your experiment.

On the other hand, a database IA requires a considerably smaller amount of time and effort to plan. Once you find a good data source and set up your primary equations on a spreadsheet, Excel practically does the rest of the work for you. It personally took me about 2 days to find all my data and process it. However, the drawback to a database IA is that it requires a lot of critical thinking and understanding of statistics and data sampling when it comes to the analysis and evaluation (which contribute half of the points you could achieve for your IA). Ultimately, because most people don’t have a good enough understanding of statistics and data sampling, they tend to score poorly in database IAs or shy away from them completely to begin with. In this post, I hope to provide you with a solid understanding of how to successfully complete a database IA, and hopefully my own IA acts as a decent exemplar for all of you to use.

The IA which I wrote was a “correlation-based IA”, which essentially means it explored the correlation between two (biology) related variables. I have not yet seen someone write a database IA that wasn’t correlation-based, so in this post I’ll be focusing on the structure and content of a correlation-based database IA. To do this, I’ll propose a general structure to use when writing a correlation-based database IA, and expand on some of the technical information that you should include in each section.

1. Research Question:  In this section, state your research question. If you’re writing a correlation-based database IA, you want to make sure that your research question isn’t too simple, and that you add some unique ‘twist’ to your investigation. For example, instead of just determining the correlation between HDI and mortality rates due to CHD, I decided to specifically look at the distinction between this correlation in developing and developed countries. Some other ‘twists’ you could add to your investigation is to look at your correlation in different age groups, or between men and women.

2. Introduction: In this section, introduce why you ended up choosing to explore your particular research question. This is where I’d sneak in a bit about the connect between the research question and your interests/personal life (I was personally inspired to write my IA after I shadowed a cardiologist at a local hospital). You might also want to mention how answering your research question has important applications in the real world. In my own IA, I made the ‘Introduction’ section part of the ‘Background Information’ section to make sure my IA didn’t exceed the 12 page limit, but if you’re not running out of space I’d recommend making two separate sections.

3. Background Information: In this section, you want to illustrate all the biology knowledge that’s pertinent to your research question. This section is very important in a correlation-based database IA given that it’s one of the only sections where you’re provided an opportunity to discuss the biological background of your investigation. This section also acts as a reminder that your IA is biology-focused, not maths-focused. Additionally, in this section you should discuss other important background information that’s relevant to your investigation. For example, if you’re exploring the correlation between HDI and CHD mortality (as I have done), you’ll want to use the ‘Background Information” section to not only explain the pathogenesis of CHD but also the significance of CHD as a socioeconomic indicator.

4. Hypothesis: This section is pretty self-explanatory; just state your hypothesis. This should ideally be accompanied by a scientific explanation to support your hypothesis. In my case, I referenced a study about the correlation between the HDI and healthcare quality in a country to justify why HDI and CHD mortality should be negatively correlated.

5. Approach to the Research Question: In this section you should illustrate some of your personal engagement with the IA by explaining how you developed your methodology. For a correlation-based database IA, I suggest that three main points should be considered in this section: 1) how you will control confounding variables in your investigation, 2) how you minimised the effects of errors and variability in your data and, 3) how you standardised your variables. Below I further elaborate on these 3 points, using what I hope is a useful analogy.

In its most basic form, a correlation-based database IA is the development of an algorithm to process raw data into a form which allows you to determine whether a correlation exists between two variables. You can think of this algorithm like a machine, where your raw data is the input and the processed data is the output. In the “Approach to the Research Question” section, you essentially outline the three main ‘steps’ of the machine. The diagram below is a helpful guide:

Diagram of database IA “machine”

As you see, the first “step” in the database machine is to control the raw data you collect for confounding variables. A confounding variable is a variable that influences both you dependent and independent variable (e.g. a variable that influences both HDI and mortality rates due to CHD). As such, if confounding variables are not controlled for it could lead to spurious correlations in your investigation. Confounding variables can also be variables other than your independent variable that influences your dependent variable, which you should also control (these types of variables are analogous to controlled variables in experiment-based IAs). For instance, lifestyle habits are an example of a variable which may affect both the HDI of a country and the mortality rate due to CHD. Ultimately, to control confounding variables in your experiment you must develop an inclusion criteria. The “Inclusion Criteria” section comes up later in the IA but you can foreshadow its existence in this section already.

The second “step” in the database machine is to take the data you’ve adjusted for confounding and further adjust it, this time for random variability. Random variability in data may be caused for a variety of reasons, and typically these reasons are difficult to identify. However, the existence of random errors in your data may contribute to a spurious correlation, and therefore random variability in data must be accounted for. For example, in my IA I looked at data relating to CHD mortality across different years in different countries. At any one year, there might have been some unknown factor which influenced the CHD mortality in a given country. This factor could be, for example, a sampling error or the introduction of a new procedure to treat CHD. As such, I decided to account for random variability by calculating the mean mortality rate due to CHD.

The last “step” in the database machine is to take the data you’ve adjusted (for confounding and random variability) and standardise it. Standardising data allows you to fairly compare it. For example, in my IA I looked at mortality rates due to CHD, and decided to standardise the mortality rate which I collected by expressing it per 100,000 people in a country’s population. This is important, given that the number of people who die from CHD in any given country is relative to that country’s population. There are, of course, many other ways to standardize data, but for most correlation-based database IAs which I’ve seen (where mortality/survival rates are used), expressing your data per the population is a good way to go.

6. Data sources: In this section of your IA, you should list all of the data sources which you’ve used to carry out your investigation. You should also provide an explanation as to how your chosen data sources are reliable and credible. Generally, if your data sources are well-recognised data-collecting institutions (e.g. the WHO, the World Bank), you can argue that they are also trustworthy and ergo reliable. For population statistics I’d use the World Bank database, mortality rates due to a variety of different diseases are provided by the WHO, and HDI data can be found online on United Nations Development Programme’s website.

7. Variables: In this section, state the final variables which you will explore in the investigation. This includes your independent variable (e.g. HDI) and your dependent variable (e.g. mortality rates due to CHD per 100,00 people). Additionally, state that other variables exist which you need to control (e.g. confounding variables), and that you will design an inclusion criteria in your investigation to control these variables.

8. Inclusion Criteria: In this section you will outline the inclusion criteria which you’ve designed for your investigation. In short, inclusion criteria are characteristics which the raw data you use must have in order to be used in the investigation. These criteria don’t only aim to adjust your data for confounding, but also to control other factors to ensure your results are more accurate and representative. As an example, the inclusion criteria for my own IA were as follows:

Inclusion Criteria

As you can see, my inclusion criteria consisted of four variables; location, population, HDI, and socioeconomic organisation, which were presented in a table. Given that my investigation looked at the distinction between developing and developed countries, I created separate inclusion criteria for each. For each inclusion criteria which you design, you need to provide an explanation for how it will enhance the accuracy or representativeness of your results. Below I outline the reason for choosing each of my variables. In your own IA, you should also provide a justification for the inclusion criteria you design.

Location: I chose to limit my chosen countries to European countries in order to limit the effects of confounding variables such as lifestyle and dietary habits. These European countries were those defined by the World Health Organidation, as per their website.This inclusion criteria was the same for both developing and developed countries.

Population: If you are sampling data from individual countries, it is necessary to ensure that the population size of these countries is sufficiently large. The larger the population, the more price and representative your results will be (and vice versa). Naturally, I’m not knowledgeable enough to decide which population size is sufficiently large to have confidence in the precision of my data. As such, I referenced a scientific study by Zhu et al. which stated that a sample size of 2 million is enough to ensure the precision of my data. This inclusion criteria excluded certain European countries, such as Liechtenstein and Monaco, from being included in my investigation.

HDI: According to the United Nations Development Programme, “countries with an HDI score higher than 0.788 are considered to be developed, while countries with an HDI value lower than 0.788 are considered to be developing”. I used this parameter to determine which sampled countries are developing and which are developed.

Socioeconomic organisation: I chose to further limit the eligible countries in my investigation to two socioeconomic organisations in order to limit the effects of confounding variables such as economic and cultural status. The two socioeconomic organisations which I chose were the CEIT (Countries with Economies in Transition) for developing countries and the OECD (Organisation for Economics Co-operation and Development) for developed countries.

As you can see, my inclusion criteria specified that variables such as population and HDI needed to be relevant as of 2000; meaning that an eligible developing country had to have, for example; a HDI smaller than 0.788 since the year 2000. This is because I sampled data from my investigation from the year 2000 onwards (given that this was the scope of raw data which I was able to find). Depending on the time period from which you sample your raw data from, this year would likely be different.

9. Safety, Environmental and Ethical Considerations: In this section, briefly outline which safety, environmental, and ethical precautions are necessary when conducting the experiment. Given the nature of a database IAs, there are no substantial safety and environmental considerations to be made. However, you may want to note that it is necessary to use data ethically and in accordance to guidelines set by your database sources (e.g. abide by copyright laws).

10. Methodology and Trial Investigation: In this section you should conduct a trial investigation to gain insight into the feasibility of the correlation you’re investigating, thus providing a justification for you to proceed and carry out the final investigation. Additionally, I would recommend using the trial investigation to explain the methodology you’ve designed for your IA. This will not only allow you to gain points in the ‘Analysis’ and ‘Communication’ criteria of the IA, but it will also save you space given that you will only need to provide the final results of your investigation later on, seeing as you’ve already explained your methodology beforehand.

In order to carry out a trial investigation, it is necessary to randomly sample your data to ensure that your trial investigation is truly representative of the rest of your data. For my IA, I randomly sampled 5 developing and 5 developed countries and carried out the investigation with their data. The way in which you randomly sample your data will vary per IA. Hereafter, explain your investigation’s methodology and all the different tables and calculation which you’ve used.  For every calculation you make in the processing of your data, make sure to include a sample calculation. After processing all of your data and presenting it in a graph, determine which correlation exists in your data and justify why you should go ahead and conduct your final investigation. In my case, I used the R2 values from my graphs to superficially assess how strong my correlations were, and thus whether I should continue with my final investigation.

(For those of you who don’t know, the R2 value on a graph represents the proportion of the variance in the dependent variable that is predictable from the independent variable or, in layman terms, the degree of scattering of your data around the fitter trendline. The greater the R2 value for a graph, the less scattering there is around the trendline, which may suggest a stronger correlation.)

11. Investigation and Results: Given that you’ve already explained your methodology in the previous section of your IA, all you need to do in this section is present the final processed data as well as any final graphs or tables you’ve created. Make sure to state in this section that you utilised the same methodology shown in the trial investigation to conduct the final one. Additionally, you may want to state that the raw data for the final investigation is “available upon request”, just to indicate to the person reading your IA that you actually processed the data yourself.

12. Statistical Testing: This section is, in my opinion, the one where most students miss out on marks for the ‘Evaluation’ criterion of the IA. In a correlation-based database IA, this section is where most students will conduct a statistical test to determine the strength of their correlation. Below I will provide a short description of how to conduct statistical testing for a correlation-based database IA:

Firstly, you need to determine which statistical test you will conduct. The two most frequently used statistical test for correlation are the Pearson’s correlation and Spearman’s correlation. The Pearson’s correlation tests for linear relationships, whereas the Spearman’s correlation tests for monotonic relationships. The difference between these two types of correlations is illustrated in the graphs below:

Linear vs Monotonic correlation

As you see, a linear relationship is a “straight-line” relationship between two variables, whereas a monotonic relationship is one where the function either always increases or always decreases, not both. Evidently, all linear relationships are monotonic, but not all monotonic relationships are linear. However, it will most probably not be clear whether the processed data in your investigation represents a linear relationship or one that is only monotonic. However, in order to conduct a Pearson’s correlation your data needs to meet certain assumptions, one of which is that your data is normally distributed, given that the test is sensitive to outliers and skewness in the data. As such, if you determine that your data is normally distributed, you should conduct a Pearson’s correlation. If your data is not normally distributed you won’t be able to conduct a Pearson’s correlation and should instead conduct a Spearman’s correlation.

An easy way to test whether your processed data is normally distributed, and thus whether you should conduct a Pearson’s correlation or not, is to conduct a skewness analysis. A skewness analysis is a quick calculation which tells you whether or not you data warrants concern of skewness. In a skewness analysis, you need to determine the value of two variables; the “skewness coefficient” and the “standard error”. Both of these variables can be calculated on Microsoft Excel.

The skewness coefficient is a variable which expresses how skewed your data is, and is a separate value for your independent and dependent variable data. Let’s say you want to calculate the skewness coefficient of your independent variable data. First, paste your data into a column on an Excel sheet. If your data spans from, say, cell E8 to cell E28, type the following equation into Excel in order to calculate the skewness coefficient of your data:

=SKEW(E8:E28)

Use the same equation to calculate the skewness coefficient of your dependent variable data.

The standard error is different to the skewness coefficient and is usually the same value for both your independent and dependent variable data. The value of the standard error of your data depends on how many data points each of your variables has. In my investigation I had 31 pairs of data points, and therefore each of my variables (independent and dependent) had 31 data points. The value of the standard error was, therefore, the same for both the independent and dependent variable data. To calculate the standard error of your own data, use the following equation on Excel, where ‘N’ is the number of data points you have:

=SQRT((6*(N)*(N-1)/((N-2)*(N+1)*(N+3))))

Finally, in order to assess the skewness of your data, you need to compare the absolute value of the skewness coefficient for each of your variables with twice the value of the standard error. If the value for the skewness coefficient is less than twice its standard error, then there is no concern of skewness in the data and the Pearson’s correlation can be conducted. If the value of the skewness coefficient is greater than twice its standard error, then there is concern of skewness and you need to conduct the Spearman’s correlation.

In short, the results of a skewness analysis can be presented in a table, as follows:

After the skewness analysis you need to conduct your chosen statistical test. I personally conducted the Pearson’s correlation, but I will demonstrate how to conduct both the Pearson’s and Spearman’s correlation below:

Pearson’s correlation: The Pearson’s correlation tests the strength of a linear correlation. The result of the Pearson’s correlation; the Pearson correlation coefficient (r), expresses the strength of and direction of a linear correlation (ranging from -1 to 1). The Pearson’s correlation is conducted using the following formula, where r is the Pearson correlation coefficient, x is your independent variable data, y is your dependent variable data, and n is the number of data pairs in your investigation.

As illustrated by the above equation, it is necessary to determine the sum of x, y, xy, x2 and y2. After doing so, plug in your results into the above equation (alongside the value for n), and the result will be your Pearson correlation coefficient.

Spearman’s correlation: Conducting the Spearman’s correlation is slightly more complex than the Pearson’s correlation. Similarly to the Pearson’s correlation coefficient, the Spearman’s correlation coefficient expresses the strength of and direction of a linear correlation (ranging from -1 to 1). Given that I haven’t personally conducted the Spearman’s correlation for my IA, I’m not very experienced in the process of doing so, but I found a great link which is very clear at describing how to calculate the Spearman’s correlation, which I will link here.

Lastly, after conducting the statistical test of choice, you need to ensure that the results of your statistical test are “statistically significant”; that is to say that the correlation which you’ve determined using the statistical test is caused by something other than chance. To determine statistical significance, you need to compare the result of your statistical test to a certain “critical value” which is based on the degrees of freedom and level of confidence assumed. I defined the two latter terms below:

  • degrees of freedom: the number of values in the final calculation of a statistic that are free to vary. The degrees of freedom for an investigation is calculated as the number of data pairs minus 2 (e.g. for my investigation, which had 31 data pairs, there would be 29 degrees of freedom)
  • level of confidence: the level of confidence when determining statistical significance refers to the risk that the correlation investigated is due to chance. Typically, a level of confidence of 0.05 is chosen, which denotes a 5% risk that the correlation investigated is due to chance.

You can determine the critical value for your investigation using either this document for the Pearson’s correlation or this document for the Spearman’s correlation. For instance, if you conducted a Pearson’s correlation and had 10 degrees of freedom at a level of confidence of 0.05, your critical value would be 0.576 (with reference to the appropriate document). Ultimately, if the absolute value of the correlation coefficient you have determined is greater than your assigned critical value, the results of your statistical testing are statistically significant, and vice versa.

I know this section was long, but it’s really important to get this part of the IA right in order to score highly. Remember, the statistical testing has three main parts: 1) conduct a skewness analysis to determine which statistical test to conduct, 2) conduct your chosen statistical test and, 3) determine if the results of your statistical test are statistically significant.

13. Analysis and Conclusion: In this section, analyze your final, processed data and provide an answer to your research question (if possible). This section should summarize the data which you’ve collected and how it (hopefully) supports your initial hypothesis. When analyzing the data, take into account the results of your statistical testing as well as the R2 values from your final graphs.

14. Evaluation of Errors and Improvements: This section is of paramount important to the overall quality of your IA. The more detailed and thoughtful your evaluation of your investigation is, the better. To begin your evaluation, start by pointing out some of the strengths of your investigation. This could be the use of a trial investigation, or the thoroughness of your statistical testing. However, the bulk of the ‘Evaluation’ section should focus on identifying errors in your investigation and suggesting possible improvements to them. I mainly focused on how my methodology failed to take into account certain confounding variables, given that I suggested that these confounding variables were what caused my final correlations to be less than perfect. As such, most of the major errors in my investigation were linked to the nature of my inclusion criteria. Additionally, you may wish to point out some methodological errors in your investigation, such as the way in which you standardised your data, or how you could enhance the precision of your results by reducing the effects of certain random errors.

15. Extensions: In this section, identify any possible extensions to your investigation. It’s important to differentiate between improvements in the previous section and extensions in this one. An improvement involves tweaking your current methodology to ensure a more accurate and precise investigation. An extension, on the other hand, is suggesting an entirely new part of the methodology that would explore another aspect of your investigation. The extension you identify should, however, still be aimed at exploring something in the domain of your research question.

16. Literature: This is the last section of your IA and should include all of the sources which you’ve used, referenced in whichever style you want (I chose Chicago-style citation). Make sure to also reference any images which you’ve included in your IA in this section as well.

I hope this information is useful, and good luck!

12 thoughts on “IB Biology Internal Assessment (23/24)

  1. Thank you so much for this!!! You probably saved my ass, bc I had to write a second biology IA (the first was was too shitty). This is such a good guide, thanks again!!

    Like

  2. You don’t know how thankful i am for this like you saved my ass but i still need more helppppppp! My ANOVA test isnt’t working and my R values aren’t matching my data. Please help

    Like

  3. Hi! Thank you for the post, it was really helpful. I was just wondering what font, font size and margin size you used? Also, I’m still not sure if the bibliography counts in the page limit but, correct me if I’m wrong, it didn’t seem to count in your IA?

    Like

    1. Hi! I’m glad you found the post useful. I used Times New Roman size 11. I’m not really sure what margin size I used – I basically stretched the margins as far as I could because I had quite a lot of words to fit into the 12-page limit. To my knowledge, the bibliography does not count as part of the page limit. All the best!

      Like

    1. Hi! You usually need to download an Excel sheet or look through large tables to gather data from databases. For the WHO database, for instance, you can download different Excel sheets depending on what type of data you’re looking for. Hope that helps!

      Like

  4. Hello. I just wanted to thank you for your precious advices. They are coming in really handy since I am taking HL biology too.

    Like

  5. Hi! I was feeling so stressed because i didnt knew what to do with my ia and this helped a lot. Thank you so much!

    Like

  6. Damn, I find myself on this website a few days before Christmas as the only one from my school doing a DB IA and I gotta admit I was super lost until I found your step-by-step guide Thank you very much.

    Like

Leave a comment

Design a site like this with WordPress.com
Get started