This application uses Meta’s LLM Llama 3 8B to teach users about our planet’s geography, cultures, ecology, and history. Here’s how it’s used in each of the four play modes.
Ask Earth: Retrieves answers to users’ questions.
Explore: Gathers information about a selected location on the globe and generates short descriptions for landmarks.
Today in History: Identifies a significant historical event that occurred on the current month and day according to the user’s local time.
Daily Quiz: Creates trivia questions of varying difficulties, including incorrect options and the latitude/longitude coordinates for the correct answer’s location.
Core Files
The core files for this integration are located within the directory app/src/main/java/com.meta.pixelandtexel.geovoyage/services/llama and its subfolders.
This application supports two services for running the model and receiving results: Ollama and AWS Bedrock.
The primary way to use the querying service in this application is through the QueryLlamaService.submitQuery function. You can access both server types via the single wrapper function shown below. The server type used is determined by the value stored in the application’s SharedPreferences, which you can change via a toggle in the Settings menu. By default, it is set to AWS Bedrock.
fun submitQuery(
query: String,
creativity: Float = .6f, // temperature
diversity: Float = .9f, // top_p
handler: IQueryLlamaServiceHandler
) {
if(queryTemplate.isNullOrEmpty()) {
throw Exception("Llama query template not created")
}
val fullQuery = String.format(queryTemplate!!, query)
val temperature = creativity.clamp01()
val top_p = diversity.clamp01()
val serverType = SettingsService.get(
KEY_LLAMA_SERVER_TYPE, LlamaServerType.AWS_BEDROCK.value)
when (serverType) {
LlamaServerType.OLLAMA.value -> queryOllama(
fullQuery,
temperature,
top_p,
handler
)
LlamaServerType.AWS_BEDROCK.value -> queryAWSBedrock(
fullQuery,
temperature,
top_p,
handler
)
}
}
Note: Both the queryOllama and queryAWSBedrock server functions use multithreading because they’re long-running operations that stream responses as the model generates them. Additionally, these implementations exclusively use the “generate” or “invoke” functionalities, although both APIs also support the LLM “chat” feature. This feature allows you to send follow-up queries that incorporate the previous dialog into the response generation. The choice of querying type should align with your specific use-case.
Ollama
The Ollama model invocation uses a simple, unauthenticated HTTP request through the /api/generate endpoint, as detailed in the official Ollama documentation. You should configure the server URL in your secrets.properties file, but you can override it in the in-app Settings menu by selecting Ollama as your server type and entering the URL in the text field.
If you decide to use this server type for Llama invocation in a production application, it is highly recommended to add some form of authentication to your requests. An illustration of server-side authentication implementation was beyond the scope of this project, which serves as a proof-of-concept for integrating this service.
Ollama supports a number of parameters for configuring your queries. For this application, only the parameters temperature and top_p were used to match the parameters that the AWS Bedrock model invocation SDK supports, which is more limited by comparison. The configuration of these parameters is explained in the Model Parameters section.
The Kotlin representation of the Ollama request payload is located in the file app/src/main/java/com.meta.pixelandtexel.geovoyage/services/llama/models/OllamaRequest.kt, and is serialized into JSON by the gson dependency before being set as the request body.
val jsonMediaType = "application/json; charset=utf-8".toMediaTypeOrNull()
val nativeRequest = OllamaRequest(query, OllamaRequestParams(temp, top_p))
val requestBody = gson.toJson(nativeRequest).toRequestBody(jsonMediaType)
val request = ollamaRequestBuilder.post(requestBody).build()
More information on the query construction can be found in the Templated Queries section below.
AWS Bedrock
The AWS Bedrock model invocation uses the AWS Kotlin SDK and requires access key and secret key authentication.
The AWS Kotlin SDK supports three parameters when invoking Meta’s Llama model: temperature, top_p, and max_gen_length. The configuration of these parameters is detailed in the section titled Model Parameters.
The Kotlin representation of the AWS Bedrock request payload is located in the file app/src/main/java/com.meta.pixelandtexel.geovoyage/services/llama/models/BedrockRequest.kt, and it is also serialized into JSON by the gson dependency. Constructing the AWS Bedrock request payload is more complex than using Ollama, as it requires Llama 3’s instruction format.
// Embed the prompt in Llama 3's instruction format.
val instruction = """
<|begin_of_text|>
<|start_header_id|>user<|end_header_id|>
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
""".trimIndent().replace("", query)
val nativeRequest = BedrockRequest(instruction, temp, top_p)
val requestBody = gson.toJson(nativeRequest)
val request = InvokeModelWithResponseStreamRequest {
modelId = "meta.llama3-8b-instruct-v1:0"
contentType = "application/json"
accept = "application/json"
body = requestBody.encodeToByteArray()
}
More information on the query construction can be found in the section below Templated Queries.
Querying
Both Llama server types support a range of functionalities and options that let you configure the nature of the response when invoking models. The three most important techniques considered for this application are model parameters, templated queries, and response streaming.
Model parameters
Only two parameters are actively utilized in the model invocation within this application: temperature and top_p. Although the system technically supports parameters to set the maximum number of tokens for the generated response, these are maintained at default values specific to each server type integration—128 for Ollama and 512 for AWS Bedrock.
A detailed discussion of these parameters and their impact on query responses is beyond the scope of this document, but here is a brief overview:
Temperature: This parameter controls the creativity level of the model. A low value of 0.1 ensures minimal randomness and higher predictability.
Top_p: This parameter determines the diversity level of the model. A default high value of 0.9 was selected to enhance the diversity of the responses.
Selecting the appropriate model invocation parameters depends on your specific use case. The values chosen for this application were determined after extensive testing to achieve a balance between educational value and engaging content.
Templated queries
This app uses the “templated queries” technique, where variables or data are injected into pre-defined queries. There are 3 query templates defined below:
explorescreenbase_query: In the Explore play mode, geocoordinates formatted in common notation (E/W and N/S instead of +/- to denote hemisphere) are injected at token 1, and the place name returned from the Google Geocoding API (if found) is injected at token 2. For more information regarding the usage of the Google Geocoding API, see Geocoding API Overview.
For example, “What is one notable city or landmark near the coordinates 37.4°N, 139.76°E in Japan?”
today_in_history_base_query: In the Today in History play mode, the user’s local date is injected at token 1 in the format of MMMM d.
For example, “What is one notable event in history that occurred on August 30?”
base_query_template: Lastly, all queries are injected into the base query, which is templated in the QueryLlamaService.submitQuery function just before being sent to the designated model server. This query accomplishes two tasks: ensuring the result isn’t too long (preventing overflow in the allocated panel area and eliminating the need for the user to scroll), and transforming the response into Markdown format. All Llama text responses are displayed inside of the MarkdownText composable function available through the compose-markdown dependency. This method is a slightly unconventional, yet effective way to display a nicely formatted response from Llama in the application.
For example, a user question: “In a short response formatted with markdown, answer the following question: Where are the tallest mountains on Earth?”
For example, a Today in History query: “In a short response formatted with markdown, answer the following question: What is one notable event in history that occurred on August 30?”
The wording of templated queries significantly influences the content of the responses returned from model invocation, and extensive testing and tweaking were conducted to find the best wording for this application’s purposes. For your own purposes, it is recommended that you dedicate time in your development to test and refine your templated queries if you choose to use this strategy.
Response streaming
Integrating the Llama model invocation in this application utilizes response streaming to enhance user experience. Both Llama server types support non-streaming requests. However, using streaming minimizes waiting time for invocation completion and provides progressive visual feedback, keeping the user engaged. In most cases where you display text responses to the user, we recommend adopting this approach.
Example usage
val fullQuery = String.format(templateQuery, data)
QueryLlamaService.submitQuery(
query = fullQuery,
creativity = 1f,
diversity = .9f,
handler = object : IQueryLlamaServiceHandler {
override fun onStreamStart() {
// (optional) hide loading message/graphic
}
override fun onPartial(partial: String) {
// (optional) update result UI with partial response
}
override fun onFinished(answer: String) {
// update result UI with full, final response
}
override fun onError(reason: String) {
// handle querying error
}
}
)
Pre-generated data
In addition to the queries executed during runtime, Llama 3 was used to generate educational data for users displayed in different play modes:
Daily quiz: The questions and answers for this mode were generated with the following instructions:
Generate 100 trivia questions related to Earth geography and cultures, ranked from easy to difficult, including the latitude and longitude coordinates of each location answer. Format the response in XML and provide two incorrect answers for each question.
Explore: Landmark descriptions for this mode were generated with the following instructions:
Provide short descriptions for each of the following landmarks: the Great Egyptian Pyramids, the Eiffel Tower, Chichén Itzá, the Sydney Opera House, Taj Mahal, the Christ the Redeemer statue, the Colosseum, Mount Vinson, and Victoria Falls. Format the responses in XML, including the name, description, latitude, and longitude of each landmark.