This application uses Meta’s Llama 3 8B LLM to teach users about Earth’s geography, cultures, ecology, and history. The model powers four play modes:
Ask Earth: Retrieves answers to users’ questions.
Explore: Gathers information about a selected location on the globe and generates short descriptions for landmarks.
Today in history: Identifies a significant historical event that occurred on the current month and day according to the user’s local time.
Daily quiz: Creates trivia questions of varying difficulties, including incorrect options and the latitude/longitude coordinates for the correct answer’s location.
Core files
The core files for this integration are located within the directory app/src/main/java/com.meta.pixelandtexel.geovoyage/services/llama and its subfolders.
This application supports two services for running the model: Ollama and AWS Bedrock.
Use the QueryLlamaService.submitQuery function to query the service. This wrapper function supports both server types. The application determines which server to use based on the value stored in SharedPreferences, which you can change via a toggle in the Settings menu. AWS Bedrock is the default.
fun submitQuery(
query: String,
creativity: Float = .6f, // temperature
diversity: Float = .9f, // top_p
handler: IQueryLlamaServiceHandler
) {
if(queryTemplate.isNullOrEmpty()) {
throw Exception("Llama query template not created")
}
val fullQuery = String.format(queryTemplate!!, query)
val temperature = creativity.clamp01()
val top_p = diversity.clamp01()
val serverType = SettingsService.get(
KEY_LLAMA_SERVER_TYPE, LlamaServerType.AWS_BEDROCK.value)
when (serverType) {
LlamaServerType.OLLAMA.value -> queryOllama(
fullQuery,
temperature,
top_p,
handler
)
LlamaServerType.AWS_BEDROCK.value -> queryAWSBedrock(
fullQuery,
temperature,
top_p,
handler
)
}
}
Note: Both queryOllama and queryAWSBedrock use multithreading because they’re long-running operations that stream responses as the model generates them. Both implementations exclusively use the “generate” or “invoke” functionalities. While both APIs also support the “chat” feature for sending follow-up queries that incorporate previous dialog, the querying type you choose should align with your specific use case.
Ollama
The Ollama model invocation uses a simple, unauthenticated HTTP request through the /api/generate endpoint, as detailed in the official Ollama documentation. Configure the server URL in your secrets.properties file. You can override it in the Settings menu by selecting Ollama as your server type and entering the URL in the text field.
For production applications, add authentication to your Ollama requests. This project serves as a proof-of-concept and doesn’t include server-side authentication implementation.
Ollama supports several parameters for configuring queries. This application uses only temperature and top_p to match the parameters supported by AWS Bedrock’s model invocation SDK. The Model Parameters section explains how to configure these parameters.
The Kotlin representation of the Ollama request payload is located in app/src/main/java/com.meta.pixelandtexel.geovoyage/services/llama/models/OllamaRequest.kt. The gson dependency serializes it into JSON before setting it as the request body.
val jsonMediaType = "application/json; charset=utf-8".toMediaTypeOrNull()
val nativeRequest = OllamaRequest(query, OllamaRequestParams(temp, top_p))
val requestBody = gson.toJson(nativeRequest).toRequestBody(jsonMediaType)
val request = ollamaRequestBuilder.post(requestBody).build()
More information on the query construction can be found in the Templated Queries section below.
AWS Bedrock
The AWS Bedrock model invocation uses the AWS Kotlin SDK and requires access key and secret key authentication.
The AWS Kotlin SDK supports three parameters when invoking Meta’s Llama model: temperature, top_p, and max_gen_length. The Model Parameters section details how to configure these parameters.
The Kotlin representation of the AWS Bedrock request payload is located in app/src/main/java/com.meta.pixelandtexel.geovoyage/services/llama/models/BedrockRequest.kt. The gson dependency serializes it into JSON. Constructing the AWS Bedrock request payload is more complex than using Ollama because it requires Llama 3’s instruction format.
// Embed the prompt in Llama 3's instruction format.
val instruction = """
<|begin_of_text|>
<|start_header_id|>user<|end_header_id|>
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
""".trimIndent().replace("", query)
val nativeRequest = BedrockRequest(instruction, temp, top_p)
val requestBody = gson.toJson(nativeRequest)
val request = InvokeModelWithResponseStreamRequest {
modelId = "meta.llama3-8b-instruct-v1:0"
contentType = "application/json"
accept = "application/json"
body = requestBody.encodeToByteArray()
}
More information on the query construction can be found in the Templated queries section below.
Querying
Both Llama server types support a range of functionalities and options for configuring response generation. This application uses three key techniques: model parameters, templated queries, and response streaming.
Model parameters
This application uses two parameters: temperature and top_p. The system supports parameters to set the maximum number of tokens for generated responses, but these remain at default values—128 for Ollama and 512 for AWS Bedrock.
Here is a brief overview of the active parameters:
temperature: Controls the creativity level of the model. A low value of 0.1 ensures minimal randomness and higher predictability.
Top_p: Determines the diversity level of the model. The default value of 0.9 enhances response diversity.
Choose parameter values based on your use case. This application’s values were determined through extensive testing to balance educational value with engaging content.
Templated queries
This app uses templated queries, where variables or data are injected into pre-defined queries. Three query templates are defined:
explorescreenbase_query: In Explore play mode, geocoordinates formatted in common notation (E/W and N/S instead of +/- to denote hemisphere) are injected at token 1. The place name returned from the Google Geocoding API (if found) is injected at token 2. For more information about the Google Geocoding API, see Geocoding API Overview.
Example: “What is one notable city or landmark near the coordinates 37.4°N, 139.76°E in Japan?”
today_in_history_base_query: In Today in History play mode, the user’s local date is injected at token 1 in the format MMMM d.
Example: “What is one notable event in history that occurred on Aug 30?”
base_query_template: All queries are injected into this base query template in the QueryLlamaService.submitQuery function before being sent to the model server. This query accomplishes two tasks: it limits response length (preventing overflow in the allocated panel area and eliminating scrolling), and it transforms the response into Markdown format. All Llama text responses are displayed inside the MarkdownText composable function from the compose-markdown dependency. This approach effectively displays nicely formatted responses from Llama in the application.
User question example: “In a short response formatted with markdown, answer the following question: Where are the tallest mountains on Earth?”
Today in History example: “In a short response formatted with markdown, answer the following question: What is one notable event in history that occurred on Aug 30?”
The wording of templated queries significantly influences model responses. Extensive testing was conducted to find optimal wording for this application. If you use this strategy, dedicate development time to test and refine your templated queries.
Response streaming
This application uses response streaming to enhance user experience. Both Llama server types support non-streaming requests. However, streaming minimizes waiting time and provides progressive visual feedback, keeping users engaged. For most text responses displayed to users, this approach is recommended.
Example usage
val fullQuery = String.format(templateQuery, data)
QueryLlamaService.submitQuery(
query = fullQuery,
creativity = 1f,
diversity = .9f,
handler = object : IQueryLlamaServiceHandler {
override fun onStreamStart() {
// (optional) hide loading message/graphic
}
override fun onPartial(partial: String) {
// (optional) update result UI with partial response
}
override fun onFinished(answer: String) {
// update result UI with full, final response
}
override fun onError(reason: String) {
// handle querying error
}
}
)
Pre-generated data
In addition to runtime queries, Llama 3 was used to generate educational data displayed in different play modes:
Daily quiz: Questions and answers were generated with the following instructions:
Generate 100 trivia questions related to Earth geography and cultures, ranked from easy to difficult, including the latitude and longitude coordinates of each location answer. Format the response in XML and provide two incorrect answers for each question.
Explore: Landmark descriptions were generated with the following instructions:
Provide short descriptions for each of the following landmarks: the Great Egyptian Pyramids, the Eiffel Tower, Chichén Itzá, the Sydney Opera House, Taj Mahal, the Christ the Redeemer statue, the Colosseum, Mount Vinson, and Victoria Falls. Format the responses in XML, including the name, description, latitude, and longitude of each landmark.