DTS311TC Final Year Project Application of Smart Car for Line Following and Large Model Integration

发布时间：2025-11-08

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DTS311TC Final Year Project

Application of Smart Car for Line Following and Large Model Integration

Abstract

In modern retail environments, efficiency and automation are crucial to meet increasing customer demands and reduce operational costs. This proposal focuses on the development of a smart car system designed to operate in a mapped environment, starting from an initial point where the car utilizes the large AI model "iFLYTEK Spark" to communicate with humans. The smart car records the required items mentioned during the conversation and navigates using SLAM (Simultaneous Localization and Mapping) to reach the line-following section. The line-following mechanism is implemented based on the RKNN platform, allowing precise navigation through predefined paths. Finally, the smart car arrives at the supermarket and identifies the items mentioned in the dialogue to complete the task. This approach aims to automate the process of item retrieval and delivery in retail settings, improving efficiency and enhancing user experience. The integration of advanced AI technologies and automated navigation presents significant potential for revolutionizing traditional retail logistics, making operations more streamlined, and providing a seamless customer experience.

1.Introduction

1.1 Introduction and Background

In the modern retail industry, supermarkets face significant challenges such as rising labor costs, increasing customer demand for convenience, and the need for efficient operations. To address these issues, automation and intelligent systems are being introduced to streamline processes, enhance customer satisfaction, and reduce operational costs.

The concept of a smart car in retail environments aims to address these challenges by providing autonomous navigation, item identification, and customer interaction. Equipped with advanced sensors and AI technologies, the smart car surpasses traditional inventory and logistics robots by offering multifunctional capabilities tailored to the needs of a retail setting. By leveraging sensor fusion, computer vision, and deep learning, the smart car can effectively navigate the supermarket, assist customers, and manage inventory tasks with minimal human intervention.

The integration of large AI models, such as "iFLYTEK Spark," further enhances the smart car's capabilities, particularly in natural language understanding and dialogue management. This enables the smart car to engage in more meaningful interactions with customers, understand their needs, and accurately perform the requested tasks. The use of SLAM (Simultaneous Localization and Mapping) technology provides the smart car with the ability to navigate complex environments autonomously, while RKNN-based line following ensures that the vehicle can adhere to predefined paths effectively.

1.2 Scope and Objectives

The scope of this project is to develop a smart car system that can autonomously navigate in a supermarket environment, interact with humans to gather information on required items, and carry out the retrieval and delivery of those items. The key components of this system include SLAM-based navigation, RKNN-powered line-following, and the integration of a large AI model for advanced human-robot interaction (Thrun, Burgard, & Fox, 2005; Rockchip, 2019).

The specific objectives of this project are to develop a multifunctional smart car capable of autonomous navigation, human interaction, line following, and item recognition. The project aims to create a navigation system that enables the smart car to autonomously explore and operate within a mapped environment using SLAM technology. To facilitate effective human-robot interaction, the "iFLYTEK Spark" large AI model will be employed, allowing the smart car to understand verbal instructions and record required items. Additionally, an RKNN-based line-following mechanism will be implemented to ensure precise path adherence within the supermarket. The smart car will also be equipped with the ability to recognize items mentioned during interaction and autonomously retrieve them from supermarket shelves. Such AI-driven mechanisms align with advancements in deep learning applications for real-world scenarios (Goodfellow, Bengio, & Courville, 2016).

Ultimately, the project seeks to improve the efficiency of item retrieval and delivery in a retail environment, reducing reliance on human labor and enhancing the overall customer experience. The integration of these components will lead to a fully functional smart car system capable of transforming traditional retail logistics by providing automated solutions for item management, customer assistance, and operational efficiency.

2. Literature Review

The development of autonomous systems in retail environments is an area of active research and innovation. This literature review will explore key technologies and methodologies that have been previously studied and implemented to address similar challenges in retail automation.

2.1 Autonomous Navigation in Retail

Autonomous navigation has been a critical focus in retail automation, aiming to provide robots with the capability to navigate crowded and dynamic environments effectively. Research studies have demonstrated the use of SLAM (Simultaneous Localization and Mapping) technology for creating detailed maps of supermarket layouts, enabling autonomous vehicles to navigate efficiently. For example, Thrun et al. (2005) in their seminal work on probabilistic robotics, introduced techniques that form the foundation for modern SLAM methods, allowing for reliable mapping and localization in dynamic environments. Algorithms like GMapping and Hector SLAM have been widely used to address localization and mapping issues in unknown environments, making autonomous systems more efficient in their navigation capabilities.

The mathematical foundation of SLAM involves probabilistic estimation, often modeled using Kalman filters or particle filters. The core problem in SLAM is to estimate the state of the robot (e.g., position and orientation) while simultaneously building a map of the environment. Techniques such as Extended Kalman Filters (EKF) and Particle Filters (PF) are employed to handle uncertainties in measurements and robot motion. These methods allow for continuous map building and correction, which is crucial in dynamic supermarket environments where aisles may change frequently.

The Extended Kalman Filter (EKF) used in SLAM can be expressed mathematically as follows:

State Prediction:

Predicts the robot's next state based on its current state and control input.

(1)

x k|k-1：Predicted state at time t.

f(.)：Motion model, which maps the previous state and control input to the predicted state.

ut ：Control input, such as velocity or steering angle.

wt ：process noise, accounting for uncertainty in motion.

Kalman Gain:

Calculates the optimal weighting for the measurement update, balancing measurement noise.

(2)

PK|K—1 : Predicted covariance, representing the uncertainty in the state prediction.

Ht ：Jacobian matrix of the measurement model.

Rt ：Measurement noise covariance, accounting for sensor inaccuracies.

State Update:

Corrects the predicted state using the measurement to yield an updated estimate.

(3)

Zk：Actual measurement from the environment.

h（.）：Measurement model, mapping the predicted state to the measurement space.

These steps allow the robot to build a continuously updated map of the environment while estimating its position accurately. This is especially critical in retail settings where the layout may frequently change due to dynamic obstacles like customers and inventory adjustments.

2.2 Human-Robot Interaction (HRI) Using Large AI Models

Human-robot interaction in retail environments plays a crucial role in enhancing customer experience. The use of large AI models, like "iFLYTEK Spark" or GPT-based architectures, has proven effective in understanding natural language and engaging in meaningful dialogues. Vaswani et al. (2017) introduced the Transformer architecture, which has become a core component in modern language models, enabling them to understand and generate human-like responses. These models significantly enhance the communication capabilities of autonomous robots, allowing them to comprehend customer needs, provide recommendations, and record item requests seamlessly, thereby contributing to an interactive and personalized shopping experience.

The Transformer architecture relies on self-attention mechanisms that allow the model to weigh different parts of an input sequence, making it highly efficient at handling conversational context.

The self-attention mechanism is computed as follows:

(4)

Q, K , V：Query, key, and value matrices derived from the input sequence.

: Scaled dot product, capturing relationships between input elements.

softmax（.）：Normalizes attention scores to focus on key tokens.

The Transformer also includes a feed-forward network at each position, which operates as follows:

(5)

x：Normalizes attention scores to focus on key tokens.

W1 , W2 ：Trainable weight matrices.

b1，b 2：Bias terms.

ReLU：Non - linear activation function, introducing complexity.

2.3 Line Following and Path Planning

Line following is a foundational aspect of robotic navigation, particularly in controlled environments such as warehouses and supermarkets. Traditional methods include using infrared sensors or cameras to detect lines, while recent advancements involve deep learning-based methods to improve robustness. The RKNN (Rockchip Neural Network) platform provides an efficient solution for edge AI, facilitating real-time line following with high accuracy. Integrating RKNN with computer vision methods, as highlighted by Goodfellow et al. (2016) in their work on deep learning, has enabled smarter and more reliable navigation systems that can adapt to varying lighting and surface conditions, making them suitable for real-world retail environments.

The line-following task can be modeled as an image classification or segmentation problem, where a convolutional neural network (CNN) processes camera images to detect the path. The RKNN platform allows the deployment of such models efficiently on edge devices, enabling real-time decision-making directly on the vehicle.

Pooling Layer (e.g., Max Pooling):

(6)

x[i : i + k, j + j + k]：Size of the pooling window

k：Size of the pooling window

y[i, j]：Output value, representing the maximum value in the pooling region.

Pooling layers improve computational efficiency and focus the model's attention on the most critical features of the path, enabling robust navigation in dynamic retail environments.

3.Project Plans

The successful implementation of the smart car system for autonomous navigation and human-robot interaction in retail environments requires a well-structured project plan. With the hardware components already available, the focus will shift to the development, testing, and deployment phases. This includes integrating navigation, line-following, and large AI model-based communication functionalities into a cohesive system. Drawing from proven methodologies in autonomous systems (Zhang & Singh, 2017), foundational principles of deep learning (LeCun, Bengio, & Hinton, 2015), and advanced segmentation techniques (Shi & Malik, 2000), the project emphasizes robustness and efficiency to ensure reliable performance in real-world retail settings.

3.1 Smart Car System Design and Operational Framework

The proposed solution focuses on integrating autonomous navigation, human-robot interaction, and line-following capabilities to develop a smart car system that can navigate efficiently in a retail environment, assist customers, and retrieve items autonomously. The methodology involves the following steps:

3.1.1 SLAM-Based Navigation

To enable real-time mapping and localization in a dynamic supermarket environment, SLAM (Simultaneous Localization and Mapping) will be implemented. The smart car will use LiDAR, IMU, and RGB cameras to build a real-time map of the environment and accurately determine its position. Path planning algorithms will then be developed to navigate through aisles and shelves, allowing the robot to move efficiently while avoiding obstacles.

The SLAM workflow, as depicted in Figure 1: SLAM Workflow Diagram, outlines the sequential processes involved in real-time mapping and localization. The system begins with data collection from sensors such as LiDAR and IMU, which is then processed through data association to match observations with existing map features. Tracks for new features are initialized, while outdated or irrelevant tracks and features are removed to maintain accuracy. The system predicts the smart car's next state based on control inputs like velocity and direction and dynamically updates the map to reflect changes in the environment. This iterative process ensures the smart car can adapt to dynamic environments such as crowded supermarkets (Thrun, Burgard, & Fox, 2005; Engel, Schöps, & Cremers, 2014).

Figure 1:SLAM Workflow Diagram

The SLAM system will continuously update the map as the robot moves, enabling the car to adapt to dynamic changes in the environment. This approach aligns with the techniques introduced in LSD-SLAM, a method for monocular visual SLAM that is highly adaptable to large-scale settings (Engel, Schöps, & Cremers, 2014). Furthermore, leveraging ORB-SLAM offers an accurate and versatile alternative for monocular SLAM applications, which is particularly valuable for robust, real-time localization (Mur-Artal, Montiel, & Tardos, 2015). The integration of object detection algorithms such as YOLOv3 also enhances the smart car's ability to detect and avoid obstacles with high precision and low latency (Redmon & Farhadi, 2018), while multi-view 3D object detection supports effective environmental perception across multiple perspectives (Chen et al., 2017). This comprehensive integration ensures that the smart car can navigate through a constantly changing supermarket environment in real time (Thrun, Burgard, & Fox, 2005).

Figure 2 shows the system architecture, including the interaction between the SLAM, AI interaction, and RKNN modules, as well as the sensors used for mapping and localization.

Figure 2: System Architecture Diagram

3.1.2 Human-Robot Interaction Using AI Models

The smart car will be integrated with the large AI model "iFLYTEK Spark" to enable natural language processing (NLP) for meaningful interaction with customers. This capability allows the car to understand spoken customer queries, such as requests for specific products or assistance with store navigation. The system also incorporates speech recognition and intent detection modules to accurately interpret customer needs, leveraging the self-attention mechanisms in Transformer models to handle complex conversational context (Vaswani et al., 2017). Additionally, pre-trained language models, such as BERT, have proven effective in enhancing the precision of intent detection, further refining the interaction quality in customer queries (Devlin et al., 2019).

Once the intent is detected, the system generates contextual responses, enabling the robot to engage with customers in a personalized manner. This interaction model is supported by techniques like the GPT-3 model, which enhances contextual response generation through advanced language modeling capabilities (Brown et al., 2020). This interaction will be key to enhancing the shopping experience by enabling the smart car to provide real-time assistance in a natural, human-like way.

Figure 3 illustrates the flow of interaction from customer speech input to the car’s response, showing the process of speech recognition, intent detection, and contextual response generation.

Figure 3: AI Interaction Flowchart

3.1.3 Line Following Using RKNN

The RKNN (Rockchip Neural Network) platform will be used to implement a line-following mechanism for the smart car. A Convolutional Neural Network (CNN) will be trained to detect floor lines and guide the robot along predefined paths within the supermarket. This setup allows the robot to follow a specific route while avoiding obstacles and other disruptions, with CNNs proving effective for feature extraction in image-based tasks (Goodfellow, Bengio, & Courville, 2016). Additionally, the integration of MobileNet, an efficient neural network for mobile and embedded vision applications, supports edge AI processing on resource-limited devices, allowing real-time line detection and path correction (Howard et al., 2017).

As illustrated in Figure 4, the architecture demonstrates an RKNN-based framework for time-dependent state prediction. RKNN (Rockchip Neural Network) provides an efficient platform for deploying deep learning models on edge devices, enabling real-time processing. In this setup, the system uses the current state yr ，i（tj） and control parameter μ（tj） as inputs, processes them through multiple neural network layers, and outputs the predicted state yr ，i（tj+1 ) for the next time step.

This architecture leverages RKNN's efficient computation capabilities, which are particularly suitable for dynamic system modeling and real-time control applications. RKNN provides a robust platform for deploying deep learning models on edge devices, enabling high-performance inference even on resource-constrained hardware (Rockchip, 2019). This design ensures accurate and consistent state predictions, making it ideal for real-world dynamic environments.

Figure 4: MLP for Time-State Prediction

Using edge AI enables the decision-making process to be executed instantly, ensuring the car reacts swiftly to environmental changes. This capability is essential for navigating narrow aisles and following complex paths, which are often challenging for traditional line-following algorithms. Moreover, implementing transfer learning techniques on edge devices has shown promise in further improving model performance and adaptability to dynamic environments (Pan & Yang, 2010).

Figure 3.3 shows a diagram illustrating how the CNN model processes the camera feed to detect the path and guide the smart car along its designated route.

Figure 5: CNN Line Following Diagram

3.2 Experimental Design

To evaluate the performance of the proposed smart car system, a comprehensive experimental setup will be implemented, focusing on both individual modules—SLAM-based navigation, human-robot interaction, and line following—and the integrated system's overall functionality. As shown in Figure 6: Task Map, a 4.5 × 4.5 m test area will be constructed. The experiment begins at point A, where the smart car initiates a dialogue with the user to obtain task details using the human-robot interaction module. After understanding and recording the requested item, the smart car employs the SLAM module to navigate autonomously to the yellow area, representing the item’s location. During this stage, the SLAM system ensures accurate localization and obstacle avoidance. Once the car reaches the yellow area and simulates item collection, it returns to A, switching to line-following mode. Finally, the car follows a predefined line from A to C to complete the task. This experimental design ensures a thorough evaluation of all system components under realistic conditions, providing valuable insights into their performance and integration.

Figure 6: Task Map

The performance of the smart car system will be evaluated through a comprehensive series of tests focusing on individual modules—SLAM-based navigation, human-robot interaction, and line following—as well as the overall integrated system.

The SLAM system will be tested in a simulated supermarket environment featuring dynamically changing elements, such as moving obstacles and variable aisle layouts. Localization accuracy will be assessed by comparing the estimated positions from SLAM with ground truth data, following methods suggested by Thrun, Burgard, & Fox (2005). Additionally, map quality will be evaluated based on its completeness and accuracy in capturing the environment, while path-planning efficiency will be measured in terms of the time and distance required for the smart car to reach its destination (Engel, Schöps, & Cremers, 2014).

For human-robot interaction (HRI), user studies will be conducted where participants engage with the smart car by requesting items or asking questions. Metrics such as speech recognition accuracy and response time will quantify system performance, while post-interaction surveys will assess user satisfaction and provide insights into the system’s usability in a retail setting. This dual approach of quantitative metrics and qualitative feedback is critical for capturing the nuanced effectiveness of the AI model, particularly in its natural language processing (NLP) and contextual response capabilities, as highlighted by Vaswani et al. (2017).

The line-following system will be evaluated in supermarket-like environments with varied lighting conditions and floor surfaces. The accuracy of the smart car in following designated paths will be observed, alongside its ability to quickly respond to path deviations. These tests will assess the robustness of the CNN model for line detection and path tracking, crucial for stable navigation. Robustness across different environmental conditions and the system's ability to maintain consistent performance are key metrics, as discussed in Goodfellow, Bengio, & Courville (2016).

Finally, for the overall system evaluation, the smart car will autonomously navigate to designated locations, interact with users, and retrieve items based on their requests. Task completion time will be recorded to evaluate efficiency, and item retrieval accuracy will measure the system’s effectiveness in locating and delivering requested items. Additionally, system stability will be monitored to ensure that no unexpected errors occur during operation. This end-to-end evaluation will confirm the system’s readiness for real-world deployment in dynamic retail environments, as recommended by Redmon & Farhadi (2018).

3.3 Progress Analysis and Gantt Chart

The progress of the smart car development project for deploying a large AI model and completing line-following tasks in a supermarket environment is tracked using a Gantt chart. This chart outlines the key phases, tasks, and their respective timelines, helping to visualize the schedule, dependencies, and progress for each task to ensure timely project completion.

3.3.1 Progress Analysis

The project is divided into five main phases: Preparation, Planning, Implementation, Evaluation & Improvement, and Summary. Each phase contains specific tasks and milestones, with progress assessed based on the completion of these milestones.

l Phase 1: Preparation (Weeks 5-7)

The initial preparation tasks have been successfully completed. This phase included setting up the project environment, exploring the deployment requirements for the large AI model, defining the problem statement, and conducting an initial literature review to establish a solid foundation for integrating large-scale AI and line-following technology.

l Phase 2: Planning (Weeks 8-12)

The planning phase is underway, focusing on designing the methodology for integrating SLAM-based navigation, line-following, and large AI model-driven interaction. Data collection for training the AI model and refining navigation and line-following algorithms has been completed. Proposal drafting is ongoing and expected to be finalized by the end of Week 9.

l Phase 3: Implementation (Weeks 10-17)

The implementation phase is set to begin in Week 10. Key tasks include developing the 2D auto-encoder and memory module for handling real-time SLAM data, training a CNN model for line-following, and integrating the large AI model (e.g., iFLYTEK Spark) for customer interaction. Initial testing of the line-following and SLAM systems will be carried out towards the end of this phase.

l Phase 4: Evaluation & Improvement (Weeks 18-25)

Model evaluation and iterative improvements will start in Week 18. This phase includes testing the AI interaction model in a simulated supermarket environment, fine-tuning the line-following and navigation algorithms, and comparing different configurations to optimize overall system performance.

l Phase 5: Summary (Weeks 26-28)

The final phase focuses on documentation, project presentation, and deployment preparation. A draft of the project report will be created, followed by the final presentation. The project will conclude with the submission of the dissertation and deployment plan for real-world testing.

3.3.2 Gantt Chart

The Gantt chart below provides a timeline for each phase and task, distributed across two semesters. This breakdown helps to track progress and manage project timelines effectively.

Figure3.3.2:Project Timeline for Smart Car Development

4.Conclusion

In this project, we developed a smart car system aimed at achieving autonomous navigation and human-robot interaction in retail environments. The system combines SLAM-based navigation for real-time mapping and localization, an AI-driven interaction model for processing and responding to customer requests, and a line-following mechanism to ensure precise movement along predefined routes. Through iterative testing and optimization, the smart car is designed to navigate dynamic spaces, assist customers effectively, and handle obstacles autonomously. This approach highlights the potential for AI-integrated robotics to enhance efficiency, reduce labor, and improve the customer experience in retail settings.

Reference

[1] Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic Robotics. MIT Press. (pp. 51-72)

[2] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. (pp. 191-250)

[3] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv preprint.

[4] Engel, J., Schöps, T., & Cremers, D. (2014). LSD-SLAM: Large-Scale Direct Monocular SLAM. European Conference on Computer Vision (ECCV).

[5] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NeurIPS).

[6] Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-View 3D Object Detection Network for Autonomous Driving. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics.

[8] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT.

[9] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems (NeurIPS).

[10] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H.

(2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint.

[11] Pan, S. J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

[12] Rockchip. (2019). RKNN Toolkit User Guide. Retrieved from https://opensource.rock-chips.com/wiki_RKNN_Toolkit.

[13] Zhang, J., & Singh, S. (2017). LOAM: Lidar Odometry and Mapping in Real-time. Robotics: Science and Systems (RSS).

[14] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436–444.

[15] Shi, J., & Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(8), 888–905.