(The following is an excerpt from "Beginning Java Google App Engine," published by APress)
Designing highly scalable, data-intensive applications can be tricky. If you've ever used hardware or software load balancing, you know that your users can be interacting with any one of a dozen or so web and database servers. A user's request may not be serviced from the same server that handled his previous request. These servers could be spread out in different data centers or perhaps in different countries, requiring you to implement processes to keep your data safe, secure, and synchronized. The hardware and software required to scale your application can also be complex and expensive, and may even dictate that you outsource or hire dedicated resources.
With App Engine, Google takes care of everything for you. The App Engine datastore provides distribution, replication, and load-balancing services behind the scenes, freeing you up to focus on implementing your business logic. App Engine's datastore is powered mainly by two Google services: Bigtable and Google File System (GFS).).
Bigtable is a highly distributed and scalable service for storing and managing structured data. It was designed to scale to an extremely large size with petabytes of data across thousands of clustered commodity servers. It is the same service that Google uses for over 60 of its own projects including web indexing, Google Finance, and Google Earth.
The datastore also uses GFS to store data and log files. GFS is a scalable, faulttolerant file system designed for large, distributed, data-intensive applications such as Gmail and YouTube. Originally developed to store crawling data and search indexes, GFS is now widely used to store user-generated content for numerous Google products.
Bigtable stores data as entities with properties organized by application-defined kinds such as customers, sales orders, or products. Entities of the same kind are not required to have the same properties or the same value types for the same properties. Bigtable queries entities of the same kind and can use filters and sort orders on both keys and property values. It also pre-indexes all queries, which results in impressive performance even with very large data sets. The service also supports transactional updates on single or application-defined groups of entities.
The first thing you'll notice about Bigtable is that it is not a relational database. Bigtable utilizes a non-relationship object model to store entities, allowing you to create simple, fast, and scalable applications. Google isn't alone in offering this type of architecture. Amazon's SimpleDB and many open-source datastores (for example, CouchDB and Hypertable) use this same approach, which requires no schema while providing auto-indexing of data and simple APIs for storage and access.
You can interact with Bigtable using either a standard API or a-low level API. With the standard API, either a Java Data Objects (JDO) or Java Persistence API (JPA)) implementation, you can ensure that your applications are portable to other hosting providers and database technologies if you decide to jump ship. This makes a good argument for App Engine as it prevents vendor lock-in. If you are certain that your
application will always run on App Engine, you can utilize the low-level API as it exposes the full capabilities of Bigtable. Both APIs achieve roughly the same results in terms of ability and performance, so it comes down to personal preference. Do you like working with low-level database functionality or abstracting this layer so that your experience is applicable across multiple datastore implementations?
Working with Entities
The fundamental unit of data in the datastore is an 'entity,' which consists of an immutable identifier and zero or more properties. Once again, entities are schemaless and this allows for some interesting possibilities. Since entities are not required to have the same properties or types, your application must enforce adherence to your data model, whatever that may be at the time. A property can have one or more
values, embedded classes, child objects, and even values of mixed types. Entities are very flexible and are not defined by a database schema as in a relational database. At any point during the application life cycle you can add or remove entity properties. Newly created and fetched entities will utilize this new schema. Your application's logic must be able to handle these changes.
App Engine uses the Java Persistence API (JPA)) and Java Data Objects (JDO) interfaces for modeling and persisting entities. These APIs, rather than the low-level API, ensure application portability. For your application, you'll use JDO since the Eclipse plug-in generates your JDO configuration files. Of course, JPA is supported, but it requires some additional setup and configuration steps. If you are familiar with Hibernate or other object-relational mapping (ORM) ) solutions, JDO should be fairly easy to grok as these solutions share many features.
App Engine's JDO implementation is provided by the DataNucleus Access Platform, an open-source implementation of JDO 2.3. Again, the JDO specification is database-agnostic and defines high-level interfaces for annotating simple POJOs, persisting and querying objects, and utilizing transactions. Applications implementing JDO can query for entities by property values or they can fetch a specific entity from the datastore using its key. Queries can return zero or more entities and sort them by property values, if desired.
Classes and Fields
JDO uses annotations on POJOs to describe how these objects are persisted to the datastore and how to recreate them when they are, in turn, fetched from the datastore. The kind of entity is defined by the simple name of the class while each class member specified as persistent represents a property of the entity. The data class is required to have a field dedicated to storing the primary key of its corresponding entity.
Each entity has a key that is unique to Bigtable. Keys consist of the application ID, the entity ID, and the kind of entity. Some keys may also contain information pertaining to the entity group. Your application can generate keys for your entities, or you can allow Bigtable to automatically assign numeric IDs for you. In most cases it is easier to let Bigtable assign your keys so you don't have to write code to ensure that your keys are unique across all objects of the same kind plus entity group parent (if being used).
There are four types of primary key fields:
1. Long: An ID that is automatically generated by Bigtable when the instance is saved.
2. Uncoded String: An ID or "key name" that your application provides to the instance prior to being saved.
3. Key: A value that includes the key of any entity-group parent that is being used and an application-generated string ID or a systemgenerated numeric ID.
4. Key as Encoded String: Essentially, an encoded key to ensure portability and still allow your application to take advantage ofBigtable's entity groups.
If you want to implement your own key system, you simply use the createKey static method of the KeyFactory class. You pass the method the kind and either an application-assigned string or a system-assigned number, and the method returns the appropriate Key instance.