When Custom Import is a Better Choice than Migrate -- Custom Code Support and Examples

An image of an echidna working on a panel

With the arrival of Drupal 8, several contributed modules -- which were previously add-ons for Drupal 7 -- received renewed attention and were integrated into Drupal 8 core. Those useful technologies were no longer simply existing in contrib space, but were seen as essential components of the latest iteration of Drupal. Views and date module were integrated, as was migrate.

And while there migrate is a well-developed framework, there are still some situations that occur during development that requires some customization -- which is what my focus is on for this post.

The Migrate framework is now in core and it supports the ability for hundreds of sites, built in Drupal 6 or 7, to be migrated to Drupal 8. It’s an amazing ecosystem of modules that is growing to become one of the major core components. Migrate is robust and, despite some critical issues, is ready to be used for most of the migration needs. If you haven’t checked it out yet, please do so as is it gives you a solid foundation for migration tasks in Drupal.

In today’s blog post, I’ll dive into the architecture of custom large-scale imports of data, define why one would use it instead of migrate, and discover how to build scalable and robust import solution from scratch.

Factors in choosing custom import over migrate framework

  • Your import needs to be lightweight, recurring, can be processed outside of cron or use CI;
  • Your import may come from combined external sources while only needs to save data once;
  • You want to have flexibility and control over what actions are performed with the content: unpublishing, deleting, updating just one field; 
  • You want it to be scalable and editable with minimum code re-writes or use of new plugins; and
  • You need precise control over what parts of the data need to be updated and when.

Where to start

Gather your requirements by answering these questions:

  • Is it a periodic or one time import?
  • Will it be manual or completely automated?
  • Will it be a cron job or continuous integration?

And don’t forget to think about the Architecture: Content type architecture; PHP class hierarchy

You’ll also want to determine import sources: from what environment and what format you will be importing into Drupal?

What are the options?

You have a variety of options for import triggers depending on your needs. The simplest one is a manual Import - triggered manually by the user through the Drupal UI.

If you need your import to run on background you may use a cron job to trigger and run it. However, keep in mind that some hostings have limitations on the length of the cron job. This may result into partial completion of the import during each cron run and may require multiple cron runs to complete import in full. If you need timely updates of information, this may not be the desired option.

To run a completely automated import without the need to use cron, you may consider using Continuous Integration. There are many good options available. I recommend Jenkins as it’s easy to use and install, and it has a very user-friendly UI.

Analyze the sources

Before you start any import you need to run a discovery on from which sources the data will be coming, where the source files are stored, and how you will be accessing them.

The simplest solution is when the source files live inside a Drupal installation. They are either manually uploaded to the public files folder through FTP or automatically dropped in by the third-party script to the Drupal install.

The more complicated scenario happens when source files are outside of Drupal: external XML or JSON feed. Sometimes connection to the external database is also required.

While analyzing the sources consider the format of files or feeds, credentials, and any drivers required to connect to the external sources.

Make sure you are planning for the parsing resources. Do you need any external libraries for connection to SFTP or parsing complicated XML? Check to see if required modules like CURL are installed on your server and that you have all of the needed access rights and permissions. Establishing good foundation and checking on these factors before you even start coding will lead to less problems and debugging in the process of development.

Architecture

After you have completed  the foundation, you are ready to build on top of it. Everything starts with structure. In terms of import it’s structure of your content types and fields and structure of your source files.

Sitebuilding should be done and all content types should be well defined before you start coding. Think not only about separate content types but also about content types’ “eco system.” Relationships between content types and taxonomies are very important as imported data may need to be migrated into multiple content types or have taxonomy terms attached. The relationships should be dictated by the desired functionality and taken into consideration when building the import.

When it comes to the source files’ architecture, it’s important to keep in mind that structure and encoding of the source files should be set in stone. The structure of the source files defines mapping to the Drupal fields and cannot vary without breaking the import functionality.

Although there are many libraries that allow to detect and change encoding of the data, they are not always 100 per cent perfect. And it’s important to remember that PHP works best with UTF-8 encoding.

Import in a nutshell

Import consists of three simple operations: get data, parse data, save data. These three operations will be handled by the three queues: the storage queue, which stores data dumps; the getter queue, which parses data dumps and saves them into separate records; and the setter queue, which saves records into Drupal entities. This approach allows us to organize the process to run in parallel for multiple records, but each individual record to be always processed consequently.

To begin an import we need to create a trigger and a set of batch operations. We will discuss how to use manual trigger. We start with configuration form. Create custom module and in src/Form directory create a Form class.

<?php
/**
 * @file
 * Contains \Drupal\import_example\Form\ConfigImportForm.
 */

namespace Drupal\import_example\Form;

use Drupal\Core\Form\ConfigFormBase;
use Drupal\Core\Form\FormStateInterface;
use Drupal\Core\Queue\QueueFactory;
use Drupal\Core\Queue\ReliableQueueInterface;
use Drupal\Core\Queue\QueueWorkerInterface;
use Drupal\Core\Queue\QueueWorkerManagerInterface;
use Drupal\Core\Queue\SuspendQueueException;
use Symfony\Component\DependencyInjection\ContainerInterface;


class ConfigImportForm extends ConfigFormBase {

  /**
   * @var QueueFactory
   */
  protected $queueFactory;

  /**
   * @var QueueWorkerManagerInterface
   */
  protected $queueManager;

  public static $queueManagerInstance;

  /**
   * {@inheritdoc}
   */
  public function __construct(QueueFactory $queue, QueueWorkerManagerInterface $queue_manager) {
    $this->queueFactory = $queue;
    $this->queueManager = $queue_manager;
  }

  /**
   * {@inheritdoc}
   */
  public static function create(ContainerInterface $container) {
    return new static(
      $container->get('queue'),
      $container->get('plugin.manager.queue_worker')
    );
  }

  /**
   * {@inheritdoc}
   */
  protected function getEditableConfigNames() {
    return ['import_example.settings',];
  }

  /**
   * {@inheritdoc}.
   */
  public function getFormId() {
    return 'example_import_form';
  }

  /**
   * {@inheritdoc}.
   */
  public function buildForm(array $form, FormStateInterface $form_state) {
    $config = $this->config('import_example.settings');
    $form['help'] = array(
      '#type' => 'markup',
      '#markup' => $this->t('If you want to trigger import manually, please press "Trigger Import" button'),
    );
    $run_overnight = $config->get('run_overnight');
    if (!isset($run_overnight)) {
      $run_overnight = 1;
    }
    $form['run_overnight'] = array(
      '#type' => 'checkbox',
      '#title' => $this->t('Run import over night?'),
      '#default_value' => $run_overnight,
    );
    $form['actions']['#type'] = 'actions';
     $form['actions']['run_import'] = array(
      '#type' => 'submit',
      '#value' => $this->t('Trigger Import'),
      '#button_type' => 'primary',
    );

    return parent::buildForm($form, $form_state);
  }

  /**
   * {@inheritdoc}
   */
  public function submitForm(array &$form, FormStateInterface $form_state) {
    // saving config
    $this->config('import_examples.settings')
      ->set('run_overnight', $form_state->getValue('run_overnight'))
      ->save();
    $operation = $form_state->getValues()['op']->__toString();
    if (isset($operation) && $operation === 'Trigger Import') {
      // trigger manual import
      $this->ImportDataQueuePopulate();
    }
    parent::submitForm($form, $form_state);
  }
}

We can populate needed queues with the following helper function:

 <?php
 /**
   * Helper function to populate Queue with data from CSV file.
   *
   */
  protected function ImportDataQueuePopulate() {
    // get manual queue instance.
    $queue_manual = $this->queueFactory->get('import_get_manual', TRUE);
    $queue_manual->deleteQueue();
    // get manual save queue instance.
    $queue_save_manual = $this->queueFactory->get('import_save_manual', TRUE);
    $queue_save_manual->deleteQueue();
    $operations = array();
    // making a queue of the files.
    $operations[] = array('\Drupal\example_import\Form\ConfigImportForm::getBatchOperation', array('queueCreateItem', array($queue_manual, 'File1.csv', 'data1')));
    // ... all your source files are gotten here.
    // getter queue.
    $operations[] = array('\Drupal\example_import\Form\ConfigImportForm::queueProcessItem', array($queue_manual, 'import_get_manual'));
    // setter queue.
    $operations[] = array('\Drupal\example_import\Form\ConfigImportForm::queueProcessItem', array($queue_save_manual, 'import_save_manual'));
    $batch = array(
      'title' => $this->t('Import'),
      'operations' => $operations,
      'label' => $this->t('Import'),
      'finished' => NULL,
    );
    batch_set($batch);
  }

And the helper method that are used to get batch operation and to create queue items are as follows:

<?php
  /**
   * Helper function to get batch operation.
   *
   * @param $callback_name.
   *    name of the callback function that needs to be called.
   *
   * @param array $arguments.
   *    array of arguments that needs to be passed to the callback function.
   */
  public static function getBatchOperation($callback_name, $arguments, &$context) {
    switch ($callback_name) {
      case 'queueCreateItem':
        self::$callback_name($arguments[0], $arguments[1], $arguments[2]);
      break;
      case 'queueProcessItem':
        self::$callback_name($arguments[0], $arguments[1], $context);
      break;
    }

  }


  /**
   * Helper function to create queue item.
   *
   * @param queue
   *    QueueInterface object.
   *
   * @param str $file_name
   *    Name of the file.
   *
   * @param str $key
   *    Key to identify part of the import.
   */
  protected static function queueCreateItem($queue, $file_name, $key) {
    $item = new \stdClass();
    $data = self::getFileContents($file_name);
    $item->content = array('key' => $key, 'info' => $data);
    $queue->createItem($item);
  }


    /**
   * Helper function to CURL content of the file.
   *
   * @param str $file
   *    file name.
   *
   * @return str $data | FALSE
   *    data blob of CURL request.
   */
 protected static function getFileContents($file) {
    $username = "username";
    $password = "password";
    $uri = 'public://import/' . $file;
    $url = file_create_url($uri);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_USERPWD, "$username:$password");
    curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
    curl_setopt($ch, CURLOPT_URL, $url);
    $data = curl_exec($ch);
    if(!curl_errno($ch)) {
      $info = curl_getinfo($ch,  CURLINFO_HTTP_CODE);
      if($info!= 200) {
        $data = FALSE;
      }
    }
    else {
      $data = FALSE;
    }
    curl_close($ch);
    return $data;
  }

}

Define queues inside the src/Plugin/QueueWorker folders

<?php
/**
 * @file
 * Contains Drupal\example_import\Plugin\QueueWorker\ImportGetManual.php
 */

namespace Drupal\example_import\Plugin\QueueWorker;

use Drupal\example_import\Plugin\QueueWorker\ImportGetBase;
use Drupal\Core\Queue\QueueWorkerBase;
use Drupal\Core\Queue\QueueFactory;
use Drupal\Core\Queue\ReliableQueueInterface;
use Drupal\Core\Queue\QueueWorkerInterface;
use Drupal\Core\Queue\QueueWorkerManagerInterface;
use Drupal\Core\Plugin\ContainerFactoryPluginInterface;
use Symfony\Component\DependencyInjection\ContainerInterface;


/**
 * Provides Base CURL functionality for the CSV files of fund details values.
 *
 * @QueueWorker(
 *   id = "import_get_manual",
 *   title = @Translation("Import: get CSV data"),
 * )
 */
class ImportGetManual extends ImportGetBase {}

And process items. The end result of the processing operations should be populating of another queue that follows this one in order. For example dump queue populates getter queue and getter queue in turn populates setter queue. The processing function will vary based on your source and on your fields structure.

Finally setter queue saves information into Drupal.

Following these three easy steps and discovery process you can build your own large-scale imports that are customizable to your needs. Happy importing!

Categories

CONNECT WITH US