Assumptions kill

This is a small reminder to myself, that assumptions kill. This post is intended to be a part of my own development observations.

The assumption occurred as part of a pagination function, that I was developing. I was a working on a crawler for a customer API. The idea was to crawl the API endpoint from start to finish. To achieve this, I had 2 API variables: pageSize and pageNumber to work with.

I choose to use the approach:

  • Get a page, starting with the first page.
  • Work through all the items on the page.
  • Fetch next page.
  • Continue until, there are no more pages.

The approach is simple enough and the main aspect is essentially to execute a task on each item on the page. The lead me to use an paginator to abstract away the fetching and iteration of items and only focus on the task.

My paginator has 2 methods:

  • has_next_item : This method returns true or false, dependent on, if there are more items in the endpoint.
  • get_item : This method returns the next item.

The get_item function should only be called if the has_next_item is true, the other way around would be illogical :).

The Code

Lets look at the code

has_next_item

public function has_next_item(): bool {
  $count = $this->agendas->count();

  // Initialize agendas (First run)
  // Count should be Zero
  if ( 0 === $count ) {
    $agendas = $this->client->getAgendasByCommittee(
	$this->committee,
	$this->page,
	$this->page_size,
	1
    );
    $this->agendas = collect( $agendas );
    return true;
  }

  // As long as agendas contains an element get next
  // Count should be larger than 0 and smaller than count (20)
  if ( $count > 0 && $this->index < $count ) {
    return true;
  }

  // Get next page
  // After last item, index should be equal to page_size.
  if ( $this->index === $this->page_size ) {
    $this->page += 1;
    $agendas     = $this->client->getAgendasByCommittee(
      $this->committee,
      $this->page,
      $this->page_size,
      1
    );

    if ( 0 === count( $agendas ) ) {
      return false;
    }

    $this->index   = 0;
    $this->agendas = collect( $agendas );
    return true;
  }

  return false;
}

In order to understand the method above, it is important to look at the constructor for the paginator class.

The constructor

The constructor contains the API client object, page_size, page and index variables of the type integer. These are used to crawl the API. Lastly, there is a committee variable and an agendas collection witch acts as the local work queue. The committee variable is a unique identifier for the the API, identifying the committee, that we are working with.

public function __construct( FirstAgendaService $client, string $committee_uid ) {
  $this->client    = $client;
  $this->page_size = 20;
  $this->page      = 0;
  $this->index     = 0;
  $this->committee = $committee_uid;
  $this->agendas   = collect( [] );
}

get_item

The get_item function gets the agenda equal to the current index, increments the index and returns the agenda.

public function get_item() {
  $agenda = $this->agendas[ $this->index ];
  $this->index++;
  return $agenda;
}

The assumption

The assumption, that broke my code, was the each new page contained a number of items. This is also true for the most part, but is not true for all multiples of 20. Therefore, I needed a work-around for this. My work-around is added to the fetch clause in the has_next_item method, where I added a zero check.